Block processing issues #302

brianlenz · 2018-03-08T20:44:47Z

We've noticed some block processing issues today across neo-python and neo-cli nodes.

It looks like blocks 2000190 and 2000880 are at least two of the problematic blocks.

We have 4 neo-python nodes for JSON-RPC & REST. Interestingly, three of them are stuck on block 2000190. The other one is stuck on 2000880.

Here is the data for 2000190 on one of the nodes stuck on it:

neo> header 2000190
{
    "hash": "0x1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516",
    "version": 0,
    "previousblockhash": "0x361f566c40f43e46678d378f2787f1cb148e399f712126b1dcea95278e424a51",
    "merkleroot": "0xfcb100c1fd41d64d8fc1fc21bee4e4e01b817100c8054db3a1cd47c021378a9c",
    "time": 1520521102,
    "index": 2000190,
    "next_consensus": "0x55bfa4cc95efe9bb65c104bf27385d2b655de759",
    "consensus data": 17474561437052617493,
    "script": {
        "invocation": "402d620acd5e10d4991b3ba644ccbd7095ecfcc9a42c691b217d20b1868545a5b51f69ce5e33e1ac4b63b7924b8f12cfb19cb86cba81352b6b241d098ce8dc49bf406a514e789aeac464470b0dffd0c5134765eb7ad236cf87fd05cb4ec5c8b92888374a88225494b52c013992c2267db52b47549b8a37def16a8c2d349c87916c69401bf83a6b48618d6a1e372468e1908df7f65fa3ec52d7ea2f61e0c83fdf38d0f52403d26e164eeeb61a31a2c61f0a44e3c88dbda0ce62447b69a5575a175feb8040f41ca77cc9e4cb43de23512c823e114c38f874d10680d4608469637eb849cb010ae508cc0b1c3027afa97d4df5bf2814cfbe26145a3c20473b5dd9c7c1bafd7e40a88f04f2a16d125dfc53f26360adbf8299da9d16139cf95cc0e9b3d88134fe1dc0f40e97cef1b26444182239bc4ad7b6036ef53727a499287645e164378ab097",
        "verification": "552102486fd15702c4490a26703112a5cc1d0923fd697a33406bd5a1c00e0013b09a7021024c7b7fb6c310fccf1ba33b082519d82964ea93868d676662d4a59ad548df0e7d2102aaec38470f6aad0042c6e877cfd8087d2676b0f516fddd362801b9bd3936399e2103b209fd4f53a7170ea4444e0cb0a6bb6a53c2bd016926989cf85f9b0fba17a70c2103b8d9d5771d8f513aa0869b9cc8d50986403b78c6da36890638c3d46a5adce04a2102ca0e27697b9c248f6f16e085fd0061e26f44da85b58ee835c110caa5ec3ba5542102df48f60e8f3e01c48ff40b9b7f1310d7a8b2a193188befe1c2e3df740e89509357ae"
    }
}

The block info results in a traceback, however:

neo> block 0x1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516
Could not execute command: list index out of range
  File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/_threads/_threadworker.py", line 46, in work
    task()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/_threads/_team.py", line 190, in doWork
    task()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "prompt.py", line 932, in run
    traceback.print_stack()
Traceback (most recent call last):
  File "prompt.py", line 895, in run
    self.show_block(arguments)
  File "prompt.py", line 575, in show_block
    bjson = json.dumps(block.ToJson(), indent=4)
  File "/app/neo-python/neo/Core/Block.py", line 257, in ToJson
    if self.Transactions[0] and isinstance(self.Transactions[0], str):
IndexError: list index out of range

One the node stuck at 2000880, it doesn't know about a block with that hash:

neo> block 0x1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516
Could not locate block 0x1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516

Here's the hash of that block:

neo> header 2000190
{
    "hash": "0x8cb9fee28a48a45468e3c0a229fd4473288cdd9794c10cac7b8f4681ca404342",
    "version": 0,
    "previousblockhash": "0x361f566c40f43e46678d378f2787f1cb148e399f712126b1dcea95278e424a51",
    "merkleroot": "0x55487fec7d46696f66ae4ab9946f9447d4fde7f79e52bce2e9914dd5efd2f036",
    "time": 1520521121,
    "index": 2000190,
    "next_consensus": "0x55bfa4cc95efe9bb65c104bf27385d2b655de759",
    "consensus data": 7357964124241302497,
    "script": {
        "invocation": "405d72435752d32ef7197cec4a4433cf6ffc88ea02583be03d12851cf7a794345ad78d3db7503bebdb6ecd3c3f15133e89bdcfe09191f98f10ad2ad019f8eb730c403c7d056eb02363479f4a5aed73080f1b53b2911e78d0077f1f74e1e9525bfe6856eb31b0ef78144b44b57fd6303688a1eba266212c5db2bdfa7dfebcc474688240fd655218d6f70379d4544969e1f76921764b228502561c15b7fc6e22bc99f866e74bcbf8d6d8ac7f26cd27079498bd5d5fc4bf746693357b2e40f20088449c6c4068eedbb215d14152ab6bab031039771c71847d690d1076d847c83169ab9cbfe849ab80d4bca2923d1a83edc0c345727e7837c8719b1771d629960ef9f675820b40304d0adb6435cb38eac9e932ec5ddd672d1b62e040098ea4ed482ccedc7885ed4a29ccede198cbcc38063c7e7e95febde326ed49634ad893af362567eb804bda",
        "verification": "552102486fd15702c4490a26703112a5cc1d0923fd697a33406bd5a1c00e0013b09a7021024c7b7fb6c310fccf1ba33b082519d82964ea93868d676662d4a59ad548df0e7d2102aaec38470f6aad0042c6e877cfd8087d2676b0f516fddd362801b9bd3936399e2103b209fd4f53a7170ea4444e0cb0a6bb6a53c2bd016926989cf85f9b0fba17a70c2103b8d9d5771d8f513aa0869b9cc8d50986403b78c6da36890638c3d46a5adce04a2102ca0e27697b9c248f6f16e085fd0061e26f44da85b58ee835c110caa5ec3ba5542102df48f60e8f3e01c48ff40b9b7f1310d7a8b2a193188befe1c2e3df740e89509357ae"
    }
}

And here's the actual block data:

neo> block 0x8cb9fee28a48a45468e3c0a229fd4473288cdd9794c10cac7b8f4681ca404342
{
    "hash": "0x8cb9fee28a48a45468e3c0a229fd4473288cdd9794c10cac7b8f4681ca404342",
    "version": 0,
    "previousblockhash": "0x361f566c40f43e46678d378f2787f1cb148e399f712126b1dcea95278e424a51",
    "merkleroot": "0x55487fec7d46696f66ae4ab9946f9447d4fde7f79e52bce2e9914dd5efd2f036",
    "time": 1520521121,
    "index": 2000190,
    "next_consensus": "0x55bfa4cc95efe9bb65c104bf27385d2b655de759",
    "consensus data": 7357964124241302497,
    "script": {
        "invocation": "405d72435752d32ef7197cec4a4433cf6ffc88ea02583be03d12851cf7a794345ad78d3db7503bebdb6ecd3c3f15133e89bdcfe09191f98f10ad2ad019f8eb730c403c7d056eb02363479f4a5aed73080f1b53b2911e78d0077f1f74e1e9525bfe6856eb31b0ef78144b44b57fd6303688a1eba266212c5db2bdfa7dfebcc474688240fd655218d6f70379d4544969e1f76921764b228502561c15b7fc6e22bc99f866e74bcbf8d6d8ac7f26cd27079498bd5d5fc4bf746693357b2e40f20088449c6c4068eedbb215d14152ab6bab031039771c71847d690d1076d847c83169ab9cbfe849ab80d4bca2923d1a83edc0c345727e7837c8719b1771d629960ef9f675820b40304d0adb6435cb38eac9e932ec5ddd672d1b62e040098ea4ed482ccedc7885ed4a29ccede198cbcc38063c7e7e95febde326ed49634ad893af362567eb804bda",
        "verification": "552102486fd15702c4490a26703112a5cc1d0923fd697a33406bd5a1c00e0013b09a7021024c7b7fb6c310fccf1ba33b082519d82964ea93868d676662d4a59ad548df0e7d2102aaec38470f6aad0042c6e877cfd8087d2676b0f516fddd362801b9bd3936399e2103b209fd4f53a7170ea4444e0cb0a6bb6a53c2bd016926989cf85f9b0fba17a70c2103b8d9d5771d8f513aa0869b9cc8d50986403b78c6da36890638c3d46a5adce04a2102ca0e27697b9c248f6f16e085fd0061e26f44da85b58ee835c110caa5ec3ba5542102df48f60e8f3e01c48ff40b9b7f1310d7a8b2a193188befe1c2e3df740e89509357ae"
    },
    "tx": [
        "0x983da7ed889a09a6acc30c7716f644319e8c49c601d92ccf492d22a94f32edfb",
        "0x4f151888d43f2da4812196c03d7bde572f225cbbb53408173e42c751a87ca40c",
        "0x4069937f07ce45bd498013d456c6c5ecfe2b1c2f464b60f79ae912eae6e4434b",
        "0x3d0215de0af1f7d71b12253d16d439fe6c75498d40bce8716f313b096530f5f3",
        "0x4d1c7c1d966b8f95e2673766835a62715c9cae44145d49ff30deaeb15b7798b5",
        "0x8033b5095bfc71e79b4fb2ec00836f6025a5c3ac5524983cfd4674cc98759cd1",
        "0xbd88a9e3eb9cd981591a8a5d5b6fa45ed4374f3a87ff350af9a3092c0ca6a83b",
        "0x4d1bdfcaf514db1063c354df05112423e01ef394fcb05123dd92d30985913620",
        "0x0a740a978bd6b97ece5262dea538d3ed485539221f3890ef3586aae61410f219",
        "0x66300a20ee20499e2020a93350ce0cad7f5d39b4c135cb2bf118a1e41d0dfeea",
        "0x820a9b20bbd21e670c4810b8f91ea7f388d77aef05c1c1a639fb254c50c8b43e",
        "0x3655364cbe3eb4b61af84c64b743899b64aeba5398a9c628bdbc6084325cf87b",
        "0xd16bb1dbe8dda7665fe89439d25ee7c96c8bbe3af8e950606126e6be942dffc4",
        "0xb994374a76f92ed691d0c99fa511d3558e709b6a01c99e248d6acb379a7c5396",
        "0x5659cc0a1214fcae5720a907fe7f1a73d4fc9f528f3b004988ae2b970dad922e",
        "0x9f0eba9f7377ce47a904da134335810d175bc00529ad7aab0e1288a9333ecb2d",
        "0x0dd6028e1f2b5e9b017f25ae50f95e7b5696b98cc27f2e3308bbaddbf3dca5e3",
        "0x48a1997052dcc61c5299fc8e9f1ca674ae8ee51931623d862b14d51bc29137ac",
        "0xf267b36899abc3236d47d208906901a36e3102afc7332c63c3afb456a69abf5f",
        "0xf53ac6a0a77e75bc02d99369c3c549b6b6d31678645cc5153bc5f5d7521a7cc7",
        "0x67a46ec11631126134962be5101536f93955a9d2ba13d340e0882d4875aeecaa",
        "0x21277031faaa383dfee7fbe13f514023a801a0a12f99f68fcf36ea26bd43c9cb",
        "0x3ed23024ff806ada8fa814b16092f59be63151fe7eb0da17d2a459ccac574d28",
        "0xd30109ba9d7fdbd18f8fa6b704dd93be9e48026f89acca25e53290dc9b39843f",
        "0x797ee669c2643c4c38a2885420fb20eac46550f19abe5e3fa254f096d68d971d",
        "0xbbad1a32af18153c24062434268f042f3e3b862fc1b99f703aef2e5098cdcd55",
        "0x59c77a6bad8ed13b0a1919c0276bc531a6118d51f2f0f704cb4afb4a7bae85f3",
        "0xe753da7b736ead04bb631557cb1bfc9bded9164f16a7e0ddec363c07c62bfe13",
        "0x7a21f0914e84234b25ef159a4eff32047185a77cba3133f1f48684aaf3c824fb",
        "0x1696d6d695af000ed9d9bba967ec2e59ae1e7be15742bc5b7d342c99c3cc6161"
    ],
    "sys_fee": 210761
}

Here's stuck block 2000880:

neo> header 2000880
{
    "hash": "0x9df8583dbbfe2f07fecb0a454aaa9cce526f58150db064a53d4e56a204fa3455",
    "version": 0,
    "previousblockhash": "0xe10dc86372fbfa4244e62623f935f4c88e8463c70e3c302ee8cc89b4a6dd59e2",
    "merkleroot": "0xf1dcf0505a76c6cffc9d088a32f00bf33027fa25335be31505b287959826ded4",
    "time": 1520539399,
    "index": 2000880,
    "next_consensus": "0x55bfa4cc95efe9bb65c104bf27385d2b655de759",
    "consensus data": 11661830401268941375,
    "script": {
        "invocation": "4089448935a81ea54c79a08427206951f93baec4da55279c8142190339225a13b35adcf9d543ab303365a27704cd291dab32e74f2ea75d3482d7056ea8b111595e40138fb36476924eefd27033077e984a01935c305f2ea2c2e2008aa881228b5794a31f0cd4401f724927e719c49e1f79c3147dc285e58f31ae5521044ba8f2e24640333079239e79c42e8ef4556771ab92ba1f7b8c35fe74f06e1d5ccab1b21e29ddb463eac6aea63f6977d18dde7c149b3c931c298f72fc5e471afcddb081f0f04840645a8abf3fbb8dd2a50b9d4d450e5b0a0dcc9c5c63ef80a447ed6629ee7b47115a704c170a64a7698cfe2357b456c6a89592dd91ee70e72eca9a80d447ded751403975d9925a7461ec0a1ec5c2075a8fcd45fe7695da94567151ae8c143cdf18f0840a3ce933f3ad045b192bcc354233a07ad10ee41275ac1693576c4d4e10e94e",
        "verification": "552102486fd15702c4490a26703112a5cc1d0923fd697a33406bd5a1c00e0013b09a7021024c7b7fb6c310fccf1ba33b082519d82964ea93868d676662d4a59ad548df0e7d2102aaec38470f6aad0042c6e877cfd8087d2676b0f516fddd362801b9bd3936399e2103b209fd4f53a7170ea4444e0cb0a6bb6a53c2bd016926989cf85f9b0fba17a70c2103b8d9d5771d8f513aa0869b9cc8d50986403b78c6da36890638c3d46a5adce04a2102ca0e27697b9c248f6f16e085fd0061e26f44da85b58ee835c110caa5ec3ba5542102df48f60e8f3e01c48ff40b9b7f1310d7a8b2a193188befe1c2e3df740e89509357ae"
    }
}

And it yields the same traceback as before when trying to view the block data:

neo> block 0x9df8583dbbfe2f07fecb0a454aaa9cce526f58150db064a53d4e56a204fa3455
Could not execute command: list index out of range
  File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/_threads/_threadworker.py", line 46, in work
    task()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/_threads/_team.py", line 190, in doWork
    task()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/app/neo-python/venv/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "prompt.py", line 922, in run
    traceback.print_stack()
Traceback (most recent call last):
  File "prompt.py", line 885, in run
    self.show_block(arguments)
  File "prompt.py", line 575, in show_block
    bjson = json.dumps(block.ToJson(), indent=4)
  File "/app/neo-python/neo/Core/Block.py", line 257, in ToJson
    if self.Transactions[0] and isinstance(self.Transactions[0], str):
IndexError: list index out of range

Note: these nodes are currently running a slightly older version of neo-python, so we are going to update them to the latest development branch now and see if that makes a difference.

The text was updated successfully, but these errors were encountered:

localhuman · 2018-03-08T21:23:32Z

the errors are in the ToJson method which assumes that there would be at least 1 transaction in the block. If there isn't one TX is the block then something other than the ToJson method is wrong

brianlenz · 2018-03-08T21:28:25Z

I don't know if this will be helpful or not, but here's the Chains dir from the one neo-python node that somehow got past block 2000190:

https://www.dropbox.com/s/c3pk8mjq6radn3d/Chains.tgz?dl=0

brianlenz · 2018-03-08T22:30:14Z

It's interesting to note that neoscan is generating an error for the bad block hash:

https://neoscan.io/block/1b9a78b3c1358af990039a4ba7cbc5c581855aed06955c3cd9aa718acc3a8516

But it does show what appears to be the correct hash for 2000190 (confirmed via other JSON-RPC nodes, as well):

https://neoscan.io/block/8cb9fee28a48a45468e3c0a229fd4473288cdd9794c10cac7b8f4681ca404342

Interestingly, here is block 2000189:

https://neoscan.io/block/361f566c40f43e46678d378f2787f1cb148e399f712126b1dcea95278e424a51

And the "Next block" link takes you to the first (errant) block hash I linked, even though the second link appears to be correct.

So it seems to have had the same issue as neo-python.

Blocks 2000191 and 2000192 both had 500 transactions each, for whatever that's worth. Same thing with block 2000880; the next six+ blocks have had 500 transactions each.

AshRolls · 2018-03-09T10:10:21Z

There is the same issue with block 1860095 which some nodes (reports from neo-gui, neo-cli, and neo-python) get stuck on. The nodes that get stuck report 1860095 with a hash of b57c1d56dc1763c5a16a0d019f13073df472c36b6633582a22ec8c0ca38f8fd6

However if I go to neo-scan and select block 1860095 I get a different hash 35e84031dbe8981d6fc838748b5db0525c2624c7bd54a2082625266c160d125a

If at neo-scan I select the previous block 1860094 and hit next block it tries to take me to the same hash that stuck nodes report with a server error

https://neoscan.io/block/b57c1d56dc1763c5a16a0d019f13073df472c36b6633582a22ec8c0ca38f8fd6

localhuman · 2018-03-14T15:49:07Z

This is not a neo-python issue but more of a network issue.

Anyways, anyone stuck on this can re-bootstrap mainnet to fast forward past the troublesome blocks with: python bootstrap.py -m

brianlenz · 2018-03-14T18:18:20Z

I do think ultimately neo-python needs to have a way to recover from this situation more gracefully. The current behavior is that you get a stuck node and you basically have to start over syncing. Sure, you can bootstrap and fast-forward (once somebody provides a bootstrap file to do so with), but that's not always a realistic solution. For example, if your node is processing events (as ours is), you can't simply skip a large number of blocks or else you get incomplete data. In that case, you have to be able to recover (somehow) and continue processing from where you left off. Ideally, there would be some way to roll back the previous transaction or two so you can start re-syncing. Perhaps snapshots of the most recent two blocks would be a way to provide some kind of rollback/recovery mechanism so it's not a complete dead end, as currently?

In our case, I had to restore to an older bootstrap file and update our event handler to gracefully handle re-processing events that had already been processed. It's not ideal for every node consumer to be required to do this, so I think the tooling should provide means to work through these types of issues.

Perhaps this is an issue that needs to be addressed with every NEO node implementation, too?

localhuman · 2018-03-14T18:25:56Z

It seems a system issue rather than a neo-python issue.

There's no real way to 'rollback' a block unless each SC invocation has an undo method. The only other way I can think of is keeping an entire copy of the db for each block and then deleting the one before that, but that is not something I would want us to be doing in a production setting

brianlenz · 2018-03-14T18:33:46Z

Fair enough! It definitely could be addressed by preventing this situation from occurring in the first place. It shouldn't be possible for a bad block to be distributed like this.

What's a best next step for this issue? Should I record an issue in neo core? I think it's something that needs to be actively investigated (if it's not already) since it obviously had widespread impact. I can't find any issues that seem to be related...

brianlenz · 2018-03-14T21:34:14Z

And now the issue has happened again :(

http://monitor.cityofzion.io/

Almost every neo-python node and many of the neo-cli nodes are stuck, presumably due to the same incorrect block issue.

brianlenz · 2018-03-14T21:55:36Z

Created this issue: neo-project/neo#193

…ues #302 and #360

localhuman closed this as completed Mar 14, 2018

brianlenz mentioned this issue Mar 14, 2018

Invalid block relays neo-project/neo#193

Closed

brianlenz mentioned this issue Mar 27, 2018

error in json serialization of certain block #360

Closed

localhuman added a commit that referenced this issue Apr 30, 2018

add check for tx length of block before adding to chain. resolves iss…

e651b79

…ues #302 and #360

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block processing issues #302

Block processing issues #302

brianlenz commented Mar 8, 2018

localhuman commented Mar 8, 2018

brianlenz commented Mar 8, 2018

brianlenz commented Mar 8, 2018

AshRolls commented Mar 9, 2018

localhuman commented Mar 14, 2018

brianlenz commented Mar 14, 2018

localhuman commented Mar 14, 2018

brianlenz commented Mar 14, 2018

brianlenz commented Mar 14, 2018

brianlenz commented Mar 14, 2018

Block processing issues #302

Block processing issues #302

Comments

brianlenz commented Mar 8, 2018

localhuman commented Mar 8, 2018

brianlenz commented Mar 8, 2018

brianlenz commented Mar 8, 2018

AshRolls commented Mar 9, 2018

localhuman commented Mar 14, 2018

brianlenz commented Mar 14, 2018

localhuman commented Mar 14, 2018

brianlenz commented Mar 14, 2018

brianlenz commented Mar 14, 2018

brianlenz commented Mar 14, 2018