Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Digest item must match that calculated - Probably cache issue #9697

Closed
crystalin opened this issue Sep 5, 2021 · 10 comments · Fixed by #11407
Closed

Digest item must match that calculated - Probably cache issue #9697

crystalin opened this issue Sep 5, 2021 · 10 comments · Fixed by #11407
Labels
I3-bug The node fails to follow expected behavior.

Comments

@crystalin
Copy link
Contributor

Happening on Parachain, probably related to cumulus.
Observed in Moonbeam networks (I haven't checked other parachains), currently based on polkadot 9.8 but this happened at least back to polkadot 9.0

So summary:
This bug initiated last year. Collators were panicking because of "Digest item must match that calculated".
The original bug was in the state cache, reported in #7964 and fixed in #7990

However we found some very infrequent "Digest item must match that calculated" still happening in our community collators.
We have run our company nodes with --state-cache-size 4 for at least 6 month and haven't observed this message so I still suspect there are "rare" use cases where the state cache is broken.

I won't have the time to monitor the database hoping for it to happen again (which seems to be not even once a month per node), so I don't expect this to be fixed soon.

I'll keep this ticket to add more information if I find some.

@h4x3rotab
Copy link
Contributor

h4x3rotab commented Sep 7, 2021

Same happens on Phala's testnet. All the full nodes stop at the same block, and a simple restart can walk-around the problem. However the collators are not affected. Usually it can happen once per week or two.

@h4x3rotab
Copy link
Contributor

Maybe relevant: paritytech/cumulus#573

@crystalin
Copy link
Contributor Author

It happens frequently on Moonriver. Mostly when syncing, but also when importing blocks during normal operations (fully synced).
I'll verify again if the --state-cache-size 4 is impacting

@crystalin
Copy link
Contributor Author

Other collators have reported that using the --state-cache-size=4 is not preventing this bug from happening.
It mostly happen when syncing the node. I'm going to see if I can easily reproduce

@btwiuse
Copy link

btwiuse commented Sep 15, 2021

Maybe caused by same problem: Error: Storage root must match that calculated airalab/robonomics#184

@crystalin
Copy link
Contributor Author

crystalin commented Sep 19, 2021

I grabbed 200+Gb of state data when triggering this error but I deletee it by mistake when trying to manipulate it :(
However I can confirm:

  1. It happens on polkadot v0.9.9 based parachain
  2. it happens when there is no --state-cache-size 1 (I can't guarantee if it happens when the flag is there but it doesn't seem so)

@grenade
Copy link

grenade commented Nov 30, 2021

we see this on calamari parachain on kusama. collators run manta version 3.1.0-1 which uses polkadot-v0.9.12 dependencies.

i would like to understand what --state-cache-size 1 does before applying it to running collators. if anyone can shed some light on what causes this error and what the state cache size parameter does to work around it, it would help us to make a decision about how to handle this error moving forward. currently we have to restart any collator that exhibits this error in order to get it working again.

2021-11-29 20:13:33 [Parachain] panicked at 'Transaction will be valid in the future', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:393:17
2021-11-29 20:13:33 [Parachain] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x25fabd - <unknown>!rust_begin_unwind\n    1: 0x1fee - <unknown>!core::panicking::panic_fmt::hf69c8b08bc9d2ee5\n    2: 0xae533 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hd256b0687bba04ef\n    3: 0xad252 - <unknown>!Core_execute_block\n"))))    
2021-11-29 20:13:33 [Parachain] 💔 Error importing block 0xf53d338405cb4d3d0b93ff0f53485c5df74b6164c0f37ec0153e90fa0c299809: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x25fabd - <unknown>!rust_begin_unwind\\n    1: 0x1fee - <unknown>!core::panicking::panic_fmt::hf69c8b08bc9d2ee5\\n    2: 0xae533 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hd256b0687bba04ef\\n    3: 0xad252 - <unknown>!Core_execute_block\\n\")")))    
2021-11-29 20:13:35 [Relaychain] 🔍 Discovered new external address for our node: /ip4/100.105.51.64/tcp/30334/ws/p2p/12D3KooWFDKx5Y1zpS3iEjgboqBPR9e9XZncG4tdm6vb7fYvd7gQ    
2021-11-29 20:13:35 [Parachain] panicked at 'Transaction will be valid in the future', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:393:17    
2021-11-29 20:13:35 [Parachain] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x25fabd - <unknown>!rust_begin_unwind\n    1: 0x1fee - <unknown>!core::panicking::panic_fmt::hf69c8b08bc9d2ee5\n    2: 0xae533 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hd256b0687bba04ef\n    3: 0xad252 - <unknown>!Core_execute_block\n"))))    
2021-11-29 20:13:35 [Parachain] 💔 Error importing block 0xf53d338405cb4d3d0b93ff0f53485c5df74b6164c0f37ec0153e90fa0c299809: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x25fabd - <unknown>!rust_begin_unwind\\n    1: 0x1fee - <unknown>!core::panicking::panic_fmt::hf69c8b08bc9d2ee5\\n    2: 0xae533 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hd256b0687bba04ef\\n    3: 0xad252 - <unknown>!Core_execute_block\\n\")")))

@bkchr
Copy link
Member

bkchr commented Nov 30, 2021

As the name suggests, it sets the size of the state cache. There are bugs in it. Just run with --state-cache-size 0 for now. Nodes will just be a little bit slower on import.

@ggwpez
Copy link
Member

ggwpez commented Mar 4, 2022

I found a way to consistently reproduce this;

  1. Start a partial client setup with new_chain_ops
  2. Load the last block with client.block(BlockId::Number(client.info().best_number))
  3. Re-execute that block with client.runtime_api().execute_block

Unfortunately this coincides with what I'm trying to do and state-cache-size does not affect this.

@bkchr
Copy link
Member

bkchr commented Mar 4, 2022

@ggwpez your problem is not related to this issue here. Your problem is that you pass the block to execute_block with the seal digest of the consensus engine. You first need to remove the seal digest before executing the block. This is normally done by the consensus engine (Babe for example) before we execute a block.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug The node fails to follow expected behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants