Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client: "Error: invalid transaction trie" leading to corrupted state DB #1066

Closed
holgerd77 opened this issue Jan 21, 2021 · 7 comments
Closed

Comments

@holgerd77
Copy link
Member

Here is another one, this might be the last (occurring during the first 200.000 or so mainnet blocks, not related to some specific blocks though):

INFO [01-21|18:26:09] Imported blocks count=700 number=62001 hash=3c1e0ffd... peers=8
INFO [01-21|18:26:09] Executed blocks count=50 first=57643 hash=3c9230be... last=57692 hash=308a3a08... with txs=8
ERROR [01-21|18:26:11] Error: invalid transaction trie
    at Block.validateData (/EthereumJS/ethereumjs-vm/packages/block/dist/block.js:179:19)
    at async Block.validate (/EthereumJS/ethereumjs-vm/packages/block/dist/block.js:162:9)
    at async VM.applyBlock (/EthereumJS/ethereumjs-vm/packages/vm/dist/runBlock.js:132:13)
    at async VM.runBlock (/EthereumJS/ethereumjs-vm/packages/vm/dist/runBlock.js:70:18)
    at async /EthereumJS/ethereumjs-vm/packages/vm/dist/runBlockchain.js:27:13
    at async /EthereumJS/ethereumjs-vm/packages/blockchain/dist/index.js:597:21
    at async Blockchain.runWithLock (/EthereumJS/ethereumjs-vm/packages/blockchain/dist/index.js:188:27)
    at async Blockchain.initAndLock (/EthereumJS/ethereumjs-vm/packages/blockchain/dist/index.js:176:16)
    at async Blockchain._iterator (/EthereumJS/ethereumjs-vm/packages/blockchain/dist/index.js:576:9)
    at async VM.runBlockchain (/EthereumJS/ethereumjs-vm/packages/vm/dist/runBlockchain.js:10:5)
INFO [01-21|18:26:11] Imported blocks count=100 number=62701 hash=9f1d35d9... peers=9
INFO [01-21|18:26:11] Imported blocks count=50 number=62801 hash=f144c086... peers=9

This one occurs during some - I would say - random PiT during block execution. Execution then stops and also doesn't resume on restarting the client, after merging of #1065 we will have some better output here. This is not related to a specific block, I re-ran the execution with a state DB backup and on the second run it passed the blocks where it broke the first time. So this is likely rather some race condition.

@holgerd77
Copy link
Member Author

holgerd77 commented Jan 21, 2021

My analysis so far: so this happens - as stated in the error - due to the transaction trie evaluated as being invalid. I had to look into the code, the transaction trie is simply the transactions from the block put into a trie by some keys and then compare the root hash with the hash stored in the block. So this is a static thing and a check should - normally - not be a problem, also a hint that this is rather a race condition, this will likely/eventually be somewhat hard to debug.

What then happens is the following: in VM.runBlockchain() the catch clause gets then activated and the block is deleted by the blockchain class. This block is from that moment on missing from the DB, it seems that this code part in VM.runBlockchain() - eventually in conjunction with the client Chain code - is not working yet as intended, since what I would expect is that the call of the delBlock() function would set a new head and the sync should restart (at least on a client restart) from that new head?

Anyhow: these can actually be treated as two different problems, fixing the delBlock() case for cases where the block actually should be deleted - we need to be prepared for this too.

I our case here though it is rather the case - since this is only a race condition - that we might want to just re-run the block (until we find the root cause of the race condition, so as a first-round fix) without deleting.

There are two options here since at the moment we have no control over the delete code in VM.runBlockchain():

a) We add a new option deleteBlockOnError to VM.runBlockchain() which defaults to true (for backwards compatibility). In the client we use false here and then handle the delete ourselves. If there is an Invalid trie error though we do not delete but just re-run the block execution for the block, together with a logger warning

b) We take over the whole runBlockchain() code into the client since we might generally want to have better control over that code part (this was in discussion already)

I am actually not sure here. 🤔 Maybe for now we stick with a) (I am not sure in general if it is a good idea to "abandon" the runBlockchain code from the VM which would naturally happen) and - if this doesn't suffice - over time we can still switch over to b).

What do you guys think?

@jochem-brouwer
Copy link
Member

Very weird that it seems we have a race condition on the transactionsTrie. Since the runBlock is essentially synchronous, I don't see how this would happen.

You are right that if we delete a block, it should reset the head to a block in the past which worked as intended (up to the genesis block). However, I am pretty sure that when I refactored the blockchain package, I explicitly checked that this deletion stuff was correct - I think there are also tests for it.

If you manually reset the head to the previous block, does the "bad block" then pass? (The problem here would be that the actual block which was executed is deleted from the DB).

I think the best option would be (b), but I don't think we should remove runBlockchain from VM. Just for EVM purposes it makes sense to call this method. I am ok with moving the logic to the client, it makes a lot of sense to me. If we error a block, we should raise an error message with (1) the error and (2) the block hash. This ensures that we can retrieve which specific block was problematic, and we can re-run it to see if we have a bug in EVM or that it was actually a bad block (which is very unlikely).

@holgerd77
Copy link
Member Author

At least I've now a local copy of an "invalid transaction trie" db. 😅

@jochem-brouwer
Copy link
Member

Can you dump the block body and the block header? That should be sufficient data to figure out what's going wrong (and possibly find the source of this error).

@holgerd77
Copy link
Member Author

holgerd77 commented Jan 29, 2021

Yeah, got it. 😄

This was actually not that easy since the block got deleted due to the error thrown. But I now added some additional debugging output messages and after 200.000 blocks or so the error got retriggered, this is the output:

image

This is actually returning false on the zero txs check part of Block._validateTransactionsTrie(). I very much guess that this is a case where another client is not providing us with the transactions - the affected block 208759 actually does have one transaction included (I would assume the fact that it's only a single one here is irrelevant) and therefore the comparison with the transactionsTrie field from the block header fails.

Without further digging in it just occurred to me that this error should actually not be triggered along the VM run but already on block addition since this is a relatively common case that we receive malformed data for some reason, and - tada - on a closer look it just is the case currently that we just don't have activated block validation on blockchain yet in the client, lol. 😋

I will activate along a PR with other fixes I am working on right now. Not sure yet how well the BlockFetcher is prepared to handle such cases, but this should already bring us in a better situation by not having the state DB compromised, the block fetcher is at least throwing in such cases and the malformed block should not be added to the DB, and - even if a client run is stopped - not sure - it should at least be able to pick up by restarting the client. Will have a closer look at such a scenario and eventually also add some code improving how these cases are handled.

@holgerd77
Copy link
Member Author

Side note: fixed various typos on the longer post above, please read on site.

@holgerd77
Copy link
Member Author

Fixed by #1075

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants