Skip to content
This repository has been archived by the owner on Nov 15, 2021. It is now read-only.

Block processing issues #302

Closed
brianlenz opened this issue Mar 8, 2018 · 10 comments
Closed

Block processing issues #302

brianlenz opened this issue Mar 8, 2018 · 10 comments

Comments

@brianlenz
Copy link
Contributor

We've noticed some block processing issues today across neo-python and neo-cli nodes.

It looks like blocks 2000190 and 2000880 are at least two of the problematic blocks.

We have 4 neo-python nodes for JSON-RPC & REST. Interestingly, three of them are stuck on block 2000190. The other one is stuck on 2000880.

Here is the data for 2000190 on one of the nodes stuck on it:

neo> header 2000190
{
    "hash": "0x1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516",
    "version": 0,
    "previousblockhash": "0x361f566c40f43e46678d378f2787f1cb148e399f712126b1dcea95278e424a51",
    "merkleroot": "0xfcb100c1fd41d64d8fc1fc21bee4e4e01b817100c8054db3a1cd47c021378a9c",
    "time": 1520521102,
    "index": 2000190,
    "next_consensus": "0x55bfa4cc95efe9bb65c104bf27385d2b655de759",
    "consensus data": 17474561437052617493,
    "script": {
        "invocation": "402d620acd5e10d4991b3ba644ccbd7095ecfcc9a42c691b217d20b1868545a5b51f69ce5e33e1ac4b63b7924b8f12cfb19cb86cba81352b6b241d098ce8dc49bf406a514e789aeac464470b0dffd0c5134765eb7ad236cf87fd05cb4ec5c8b92888374a88225494b52c013992c2267db52b47549b8a37def16a8c2d349c87916c69401bf83a6b48618d6a1e372468e1908df7f65fa3ec52d7ea2f61e0c83fdf38d0f52403d26e164eeeb61a31a2c61f0a44e3c88dbda0ce62447b69a5575a175feb8040f41ca77cc9e4cb43de23512c823e114c38f874d10680d4608469637eb849cb010ae508cc0b1c3027afa97d4df5bf2814cfbe26145a3c20473b5dd9c7c1bafd7e40a88f04f2a16d125dfc53f26360adbf8299da9d16139cf95cc0e9b3d88134fe1dc0f40e97cef1b26444182239bc4ad7b6036ef53727a499287645e164378ab097",
        "verification": "552102486fd15702c4490a26703112a5cc1d0923fd697a33406bd5a1c00e0013b09a7021024c7b7fb6c310fccf1ba33b082519d82964ea93868d676662d4a59ad548df0e7d2102aaec38470f6aad0042c6e877cfd8087d2676b0f516fddd362801b9bd3936399e2103b209fd4f53a7170ea4444e0cb0a6bb6a53c2bd016926989cf85f9b0fba17a70c2103b8d9d5771d8f513aa0869b9cc8d50986403b78c6da36890638c3d46a5adce04a2102ca0e27697b9c248f6f16e085fd0061e26f44da85b58ee835c110caa5ec3ba5542102df48f60e8f3e01c48ff40b9b7f1310d7a8b2a193188befe1c2e3df740e89509357ae"
    }
}

The block info results in a traceback, however:

neo> block 0x1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516
Could not execute command: list index out of range
  File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/_threads/_threadworker.py", line 46, in work
    task()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/_threads/_team.py", line 190, in doWork
    task()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "prompt.py", line 932, in run
    traceback.print_stack()
Traceback (most recent call last):
  File "prompt.py", line 895, in run
    self.show_block(arguments)
  File "prompt.py", line 575, in show_block
    bjson = json.dumps(block.ToJson(), indent=4)
  File "/app/neo-python/neo/Core/Block.py", line 257, in ToJson
    if self.Transactions[0] and isinstance(self.Transactions[0], str):
IndexError: list index out of range

One the node stuck at 2000880, it doesn't know about a block with that hash:

neo> block 0x1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516
Could not locate block 0x1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516

Here's the hash of that block:

neo> header 2000190
{
    "hash": "0x8cb9fee28a48a45468e3c0a229fd4473288cdd9794c10cac7b8f4681ca404342",
    "version": 0,
    "previousblockhash": "0x361f566c40f43e46678d378f2787f1cb148e399f712126b1dcea95278e424a51",
    "merkleroot": "0x55487fec7d46696f66ae4ab9946f9447d4fde7f79e52bce2e9914dd5efd2f036",
    "time": 1520521121,
    "index": 2000190,
    "next_consensus": "0x55bfa4cc95efe9bb65c104bf27385d2b655de759",
    "consensus data": 7357964124241302497,
    "script": {
        "invocation": "405d72435752d32ef7197cec4a4433cf6ffc88ea02583be03d12851cf7a794345ad78d3db7503bebdb6ecd3c3f15133e89bdcfe09191f98f10ad2ad019f8eb730c403c7d056eb02363479f4a5aed73080f1b53b2911e78d0077f1f74e1e9525bfe6856eb31b0ef78144b44b57fd6303688a1eba266212c5db2bdfa7dfebcc474688240fd655218d6f70379d4544969e1f76921764b228502561c15b7fc6e22bc99f866e74bcbf8d6d8ac7f26cd27079498bd5d5fc4bf746693357b2e40f20088449c6c4068eedbb215d14152ab6bab031039771c71847d690d1076d847c83169ab9cbfe849ab80d4bca2923d1a83edc0c345727e7837c8719b1771d629960ef9f675820b40304d0adb6435cb38eac9e932ec5ddd672d1b62e040098ea4ed482ccedc7885ed4a29ccede198cbcc38063c7e7e95febde326ed49634ad893af362567eb804bda",
        "verification": "552102486fd15702c4490a26703112a5cc1d0923fd697a33406bd5a1c00e0013b09a7021024c7b7fb6c310fccf1ba33b082519d82964ea93868d676662d4a59ad548df0e7d2102aaec38470f6aad0042c6e877cfd8087d2676b0f516fddd362801b9bd3936399e2103b209fd4f53a7170ea4444e0cb0a6bb6a53c2bd016926989cf85f9b0fba17a70c2103b8d9d5771d8f513aa0869b9cc8d50986403b78c6da36890638c3d46a5adce04a2102ca0e27697b9c248f6f16e085fd0061e26f44da85b58ee835c110caa5ec3ba5542102df48f60e8f3e01c48ff40b9b7f1310d7a8b2a193188befe1c2e3df740e89509357ae"
    }
}

And here's the actual block data:

neo> block 0x8cb9fee28a48a45468e3c0a229fd4473288cdd9794c10cac7b8f4681ca404342
{
    "hash": "0x8cb9fee28a48a45468e3c0a229fd4473288cdd9794c10cac7b8f4681ca404342",
    "version": 0,
    "previousblockhash": "0x361f566c40f43e46678d378f2787f1cb148e399f712126b1dcea95278e424a51",
    "merkleroot": "0x55487fec7d46696f66ae4ab9946f9447d4fde7f79e52bce2e9914dd5efd2f036",
    "time": 1520521121,
    "index": 2000190,
    "next_consensus": "0x55bfa4cc95efe9bb65c104bf27385d2b655de759",
    "consensus data": 7357964124241302497,
    "script": {
        "invocation": "405d72435752d32ef7197cec4a4433cf6ffc88ea02583be03d12851cf7a794345ad78d3db7503bebdb6ecd3c3f15133e89bdcfe09191f98f10ad2ad019f8eb730c403c7d056eb02363479f4a5aed73080f1b53b2911e78d0077f1f74e1e9525bfe6856eb31b0ef78144b44b57fd6303688a1eba266212c5db2bdfa7dfebcc474688240fd655218d6f70379d4544969e1f76921764b228502561c15b7fc6e22bc99f866e74bcbf8d6d8ac7f26cd27079498bd5d5fc4bf746693357b2e40f20088449c6c4068eedbb215d14152ab6bab031039771c71847d690d1076d847c83169ab9cbfe849ab80d4bca2923d1a83edc0c345727e7837c8719b1771d629960ef9f675820b40304d0adb6435cb38eac9e932ec5ddd672d1b62e040098ea4ed482ccedc7885ed4a29ccede198cbcc38063c7e7e95febde326ed49634ad893af362567eb804bda",
        "verification": "552102486fd15702c4490a26703112a5cc1d0923fd697a33406bd5a1c00e0013b09a7021024c7b7fb6c310fccf1ba33b082519d82964ea93868d676662d4a59ad548df0e7d2102aaec38470f6aad0042c6e877cfd8087d2676b0f516fddd362801b9bd3936399e2103b209fd4f53a7170ea4444e0cb0a6bb6a53c2bd016926989cf85f9b0fba17a70c2103b8d9d5771d8f513aa0869b9cc8d50986403b78c6da36890638c3d46a5adce04a2102ca0e27697b9c248f6f16e085fd0061e26f44da85b58ee835c110caa5ec3ba5542102df48f60e8f3e01c48ff40b9b7f1310d7a8b2a193188befe1c2e3df740e89509357ae"
    },
    "tx": [
        "0x983da7ed889a09a6acc30c7716f644319e8c49c601d92ccf492d22a94f32edfb",
        "0x4f151888d43f2da4812196c03d7bde572f225cbbb53408173e42c751a87ca40c",
        "0x4069937f07ce45bd498013d456c6c5ecfe2b1c2f464b60f79ae912eae6e4434b",
        "0x3d0215de0af1f7d71b12253d16d439fe6c75498d40bce8716f313b096530f5f3",
        "0x4d1c7c1d966b8f95e2673766835a62715c9cae44145d49ff30deaeb15b7798b5",
        "0x8033b5095bfc71e79b4fb2ec00836f6025a5c3ac5524983cfd4674cc98759cd1",
        "0xbd88a9e3eb9cd981591a8a5d5b6fa45ed4374f3a87ff350af9a3092c0ca6a83b",
        "0x4d1bdfcaf514db1063c354df05112423e01ef394fcb05123dd92d30985913620",
        "0x0a740a978bd6b97ece5262dea538d3ed485539221f3890ef3586aae61410f219",
        "0x66300a20ee20499e2020a93350ce0cad7f5d39b4c135cb2bf118a1e41d0dfeea",
        "0x820a9b20bbd21e670c4810b8f91ea7f388d77aef05c1c1a639fb254c50c8b43e",
        "0x3655364cbe3eb4b61af84c64b743899b64aeba5398a9c628bdbc6084325cf87b",
        "0xd16bb1dbe8dda7665fe89439d25ee7c96c8bbe3af8e950606126e6be942dffc4",
        "0xb994374a76f92ed691d0c99fa511d3558e709b6a01c99e248d6acb379a7c5396",
        "0x5659cc0a1214fcae5720a907fe7f1a73d4fc9f528f3b004988ae2b970dad922e",
        "0x9f0eba9f7377ce47a904da134335810d175bc00529ad7aab0e1288a9333ecb2d",
        "0x0dd6028e1f2b5e9b017f25ae50f95e7b5696b98cc27f2e3308bbaddbf3dca5e3",
        "0x48a1997052dcc61c5299fc8e9f1ca674ae8ee51931623d862b14d51bc29137ac",
        "0xf267b36899abc3236d47d208906901a36e3102afc7332c63c3afb456a69abf5f",
        "0xf53ac6a0a77e75bc02d99369c3c549b6b6d31678645cc5153bc5f5d7521a7cc7",
        "0x67a46ec11631126134962be5101536f93955a9d2ba13d340e0882d4875aeecaa",
        "0x21277031faaa383dfee7fbe13f514023a801a0a12f99f68fcf36ea26bd43c9cb",
        "0x3ed23024ff806ada8fa814b16092f59be63151fe7eb0da17d2a459ccac574d28",
        "0xd30109ba9d7fdbd18f8fa6b704dd93be9e48026f89acca25e53290dc9b39843f",
        "0x797ee669c2643c4c38a2885420fb20eac46550f19abe5e3fa254f096d68d971d",
        "0xbbad1a32af18153c24062434268f042f3e3b862fc1b99f703aef2e5098cdcd55",
        "0x59c77a6bad8ed13b0a1919c0276bc531a6118d51f2f0f704cb4afb4a7bae85f3",
        "0xe753da7b736ead04bb631557cb1bfc9bded9164f16a7e0ddec363c07c62bfe13",
        "0x7a21f0914e84234b25ef159a4eff32047185a77cba3133f1f48684aaf3c824fb",
        "0x1696d6d695af000ed9d9bba967ec2e59ae1e7be15742bc5b7d342c99c3cc6161"
    ],
    "sys_fee": 210761
}

Here's stuck block 2000880:

neo> header 2000880
{
    "hash": "0x9df8583dbbfe2f07fecb0a454aaa9cce526f58150db064a53d4e56a204fa3455",
    "version": 0,
    "previousblockhash": "0xe10dc86372fbfa4244e62623f935f4c88e8463c70e3c302ee8cc89b4a6dd59e2",
    "merkleroot": "0xf1dcf0505a76c6cffc9d088a32f00bf33027fa25335be31505b287959826ded4",
    "time": 1520539399,
    "index": 2000880,
    "next_consensus": "0x55bfa4cc95efe9bb65c104bf27385d2b655de759",
    "consensus data": 11661830401268941375,
    "script": {
        "invocation": "4089448935a81ea54c79a08427206951f93baec4da55279c8142190339225a13b35adcf9d543ab303365a27704cd291dab32e74f2ea75d3482d7056ea8b111595e40138fb36476924eefd27033077e984a01935c305f2ea2c2e2008aa881228b5794a31f0cd4401f724927e719c49e1f79c3147dc285e58f31ae5521044ba8f2e24640333079239e79c42e8ef4556771ab92ba1f7b8c35fe74f06e1d5ccab1b21e29ddb463eac6aea63f6977d18dde7c149b3c931c298f72fc5e471afcddb081f0f04840645a8abf3fbb8dd2a50b9d4d450e5b0a0dcc9c5c63ef80a447ed6629ee7b47115a704c170a64a7698cfe2357b456c6a89592dd91ee70e72eca9a80d447ded751403975d9925a7461ec0a1ec5c2075a8fcd45fe7695da94567151ae8c143cdf18f0840a3ce933f3ad045b192bcc354233a07ad10ee41275ac1693576c4d4e10e94e",
        "verification": "552102486fd15702c4490a26703112a5cc1d0923fd697a33406bd5a1c00e0013b09a7021024c7b7fb6c310fccf1ba33b082519d82964ea93868d676662d4a59ad548df0e7d2102aaec38470f6aad0042c6e877cfd8087d2676b0f516fddd362801b9bd3936399e2103b209fd4f53a7170ea4444e0cb0a6bb6a53c2bd016926989cf85f9b0fba17a70c2103b8d9d5771d8f513aa0869b9cc8d50986403b78c6da36890638c3d46a5adce04a2102ca0e27697b9c248f6f16e085fd0061e26f44da85b58ee835c110caa5ec3ba5542102df48f60e8f3e01c48ff40b9b7f1310d7a8b2a193188befe1c2e3df740e89509357ae"
    }
}

And it yields the same traceback as before when trying to view the block data:

neo> block 0x9df8583dbbfe2f07fecb0a454aaa9cce526f58150db064a53d4e56a204fa3455
Could not execute command: list index out of range
  File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/_threads/_threadworker.py", line 46, in work
    task()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/_threads/_team.py", line 190, in doWork
    task()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "prompt.py", line 922, in run
    traceback.print_stack()
Traceback (most recent call last):
  File "prompt.py", line 885, in run
    self.show_block(arguments)
  File "prompt.py", line 575, in show_block
    bjson = json.dumps(block.ToJson(), indent=4)
  File "/app/neo-python/neo/Core/Block.py", line 257, in ToJson
    if self.Transactions[0] and isinstance(self.Transactions[0], str):
IndexError: list index out of range

Note: these nodes are currently running a slightly older version of neo-python, so we are going to update them to the latest development branch now and see if that makes a difference.

@localhuman
Copy link
Collaborator

the errors are in the ToJson method which assumes that there would be at least 1 transaction in the block. If there isn't one TX is the block then something other than the ToJson method is wrong

@brianlenz
Copy link
Contributor Author

I don't know if this will be helpful or not, but here's the Chains dir from the one neo-python node that somehow got past block 2000190:

https://www.dropbox.com/s/c3pk8mjq6radn3d/Chains.tgz?dl=0

@brianlenz
Copy link
Contributor Author

It's interesting to note that neoscan is generating an error for the bad block hash:

https://neoscan.io/block/1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516

But it does show what appears to be the correct hash for 2000190 (confirmed via other JSON-RPC nodes, as well):

https://neoscan.io/block/8cb9fee28a48a45468e3c0a229fd4473288cdd9794c10cac7b8f4681ca404342

Interestingly, here is block 2000189:

https://neoscan.io/block/361f566c40f43e46678d378f2787f1cb148e399f712126b1dcea95278e424a51

And the "Next block" link takes you to the first (errant) block hash I linked, even though the second link appears to be correct.

So it seems to have had the same issue as neo-python.

Blocks 2000191 and 2000192 both had 500 transactions each, for whatever that's worth. Same thing with block 2000880; the next six+ blocks have had 500 transactions each.

@AshRolls
Copy link
Contributor

AshRolls commented Mar 9, 2018

There is the same issue with block 1860095 which some nodes (reports from neo-gui, neo-cli, and neo-python) get stuck on. The nodes that get stuck report 1860095 with a hash of b57c1d56dc1763c5a16a0d019f13073df472c36b6633582a22ec8c0ca38f8fd6

However if I go to neo-scan and select block 1860095 I get a different hash 35e84031dbe8981d6fc838748b5db0525c2624c7bd54a2082625266c160d125a

If at neo-scan I select the previous block 1860094 and hit next block it tries to take me to the same hash that stuck nodes report with a server error

https://neoscan.io/block/b57c1d56dc1763c5a16a0d019f13073df472c36b6633582a22ec8c0ca38f8fd6

@localhuman
Copy link
Collaborator

This is not a neo-python issue but more of a network issue.

Anyways, anyone stuck on this can re-bootstrap mainnet to fast forward past the troublesome blocks with: python bootstrap.py -m

@brianlenz
Copy link
Contributor Author

I do think ultimately neo-python needs to have a way to recover from this situation more gracefully. The current behavior is that you get a stuck node and you basically have to start over syncing. Sure, you can bootstrap and fast-forward (once somebody provides a bootstrap file to do so with), but that's not always a realistic solution. For example, if your node is processing events (as ours is), you can't simply skip a large number of blocks or else you get incomplete data. In that case, you have to be able to recover (somehow) and continue processing from where you left off. Ideally, there would be some way to roll back the previous transaction or two so you can start re-syncing. Perhaps snapshots of the most recent two blocks would be a way to provide some kind of rollback/recovery mechanism so it's not a complete dead end, as currently?

In our case, I had to restore to an older bootstrap file and update our event handler to gracefully handle re-processing events that had already been processed. It's not ideal for every node consumer to be required to do this, so I think the tooling should provide means to work through these types of issues.

Perhaps this is an issue that needs to be addressed with every NEO node implementation, too?

@localhuman
Copy link
Collaborator

It seems a system issue rather than a neo-python issue.

There's no real way to 'rollback' a block unless each SC invocation has an undo method. The only other way I can think of is keeping an entire copy of the db for each block and then deleting the one before that, but that is not something I would want us to be doing in a production setting

@brianlenz
Copy link
Contributor Author

Fair enough! It definitely could be addressed by preventing this situation from occurring in the first place. It shouldn't be possible for a bad block to be distributed like this.

What's a best next step for this issue? Should I record an issue in neo core? I think it's something that needs to be actively investigated (if it's not already) since it obviously had widespread impact. I can't find any issues that seem to be related...

@brianlenz
Copy link
Contributor Author

And now the issue has happened again :(

http://monitor.cityofzion.io/

Almost every neo-python node and many of the neo-cli nodes are stuck, presumably due to the same incorrect block issue.

@brianlenz
Copy link
Contributor Author

Created this issue: neo-project/neo#193

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants