fix: prune uses the latest block height as the target by default #46

welkin22 · 2024-01-16T11:00:54Z

Description

After executing the pruning command geth snapshot prune-state --datadir {the data dir of your node} --triesInMemory=32 on the current op-geth, the node may encounter a situation where the block height is stuck and unable to increase when restarted.

When pruning, geth will by default select the block height corresponding to the bottom layer of diffLayer in the snapshot structure as the target block height. Since we have configured triesInMemory=32 , there are a total of 32 layers in diffLayer, and the target block height is the latest block height minus 31 blocks. After pruning, all block heights except for the target block height will have their state data cleared.
When geth is restarted, the code will automatically start rolling back the chain until a block height with state data is found, since the latest block height has lost state data. Therefore, the unsafe block height will roll back 31 block heights. It is worth noting that although the unsafe block height is rolled back, the header, body, receipts and other data are not deleted and still exist in the database.
At this time, op-node will also start, and it will use op-geth to obtain the new unsafe block height, and based on this, it will produce a block to advance the block height header.
At this time, if the node is a sequencer, we will encounter two situations:

The hash values of the new unsafe blocks obtained by the op-node from the 1002nd block to the 1032nd block are different from those in the op-geth database. Therefore, the newPayload interface will reconstruct the blocks without skipping any processing, and the state data corresponding to the blocks in this interval will be rebuilt. The 1033rd block can be inserted normally, and the block height can increase normally. The biggest problem with this situation is that the transactions of the 31 blocks that the user has previously put on the chain are discarded, the hash value of the block height changes, and the transactions included in the block have also changed.
The new unsafe block height hash value obtained by the op-node in block 1002 is the same as the one in the op-geth database, so the newPayload interface skips processing this block height. Since block 1002 belongs to the Canonical chain, it will not trigger the SetCanonical method in the subsequent forkchoiceUpdated interface, and the state data will not be rebuilt. However, the hash value obtained in block 1003 is different, so the newPayload interface will not skip processing this block height. Since block 1002 does not have state data, the subsequent processing flow in block 1003 will not pass the check, causing the entire chain to get stuck at block 1003. This situation can also occur for non-sequence nodes.

Rationale

To solve the above problem, I changed the default target block of prune-state to the latest block, so there will be no rollback and all problems will be solved. In opBNB, we rarely encounter reOrg, so there is no need to roll back the chain.

I have another solution PR: #47
Both of these solutions can solve our problem, we can discuss choosing one or both.

Example

After this PR takes effect, the recommended pruning process is as follows:

Stop the nodes that need to be pruned and confirm the latest block height at the time of shutdown through logs. Record its number and hash value.
Backup the datadir (not mandatory).
Wait for a period of time to ensure that the latest block height at the time of shutdown has been finalized. At the same time, make sure that after it has been finalized, its hash has not changed.
Execute the pruning command: geth snapshot prune-state --datadir {the data dir of your node} --triesInMemory=32.
If the pruning is successful, you will see the log: "State pruning successful".
Confirm the target block height for pruning through logs: "Selecting user-specified state as the pruning target", making sure that this target block height has been finalized.
Start the node and observe the logs.

Note: If you execute pruning while the target block height has not yet been finalized, and a chain reorganization occurs at this time, causing the head of the chain to be less than or equal to your target block height, your node will not be able to recover after pruning. Therefore, make sure that the latest block height at the time of shutdown has been finalized before executing the pruning command. If you find that a chain reorganization has occurred during this period, you should restart the node and then select an another appropriate time to perform pruning.

Changes

Notable changes:

Update the default target block for pruning to the latest block of the chain.

core/state/pruner/pruner.go

bendanzhentan

ACK and I have followed the above instructions to test the code.

After this PR takes effect, the recommended pruning process is as follows:

Stop the nodes that need to be pruned and confirm the latest block height at the time of shutdown through logs. Record its number and hash value.

Backup the datadir (not mandatory).

Wait for a period of time to ensure that the latest block height at the time of shutdown has been finalized. At the same time, make sure that after it has been finalized, its hash has not changed.

Execute the pruning command: geth snapshot prune-state --datadir {the data dir of your node} --triesInMemory=32.

If the pruning is successful, you will see the log: "State pruning successful".

Confirm the target block height for pruning through logs: "Selecting user-specified state as the pruning target", making sure that this target block height has been finalized.

Start the node and observe the logs.

…-chain#46)

fix: prune uses the latest block height as the target by default

8a2bcd7

github-actions bot requested review from bendanzhentan and owen-reorg January 16, 2024 11:01

welkin22 mentioned this pull request Jan 16, 2024

fix: if already known beacon payload hasn't state after prune, fix it #47

Closed

welkin22 added 2 commits January 23, 2024 10:23

Merge branch 'develop' into feature/prune_issue_fix_2

ea59954

Merge branch 'develop' into feature/prune_issue_fix_2

5e48bba

bendanzhentan reviewed Jan 30, 2024

View reviewed changes

core/state/pruner/pruner.go Outdated Show resolved Hide resolved

bendanzhentan previously approved these changes Jan 30, 2024

View reviewed changes

fix comments

cc47830

welkin22 dismissed bendanzhentan’s stale review via cc47830 January 30, 2024 07:21

welkin22 requested a review from bendanzhentan January 30, 2024 07:21

bendanzhentan approved these changes Jan 30, 2024

View reviewed changes

owen-reorg approved these changes Jan 30, 2024

View reviewed changes

owen-reorg merged commit 9114a97 into develop Jan 30, 2024
1 check passed

andyzhang2023 pushed a commit to andyzhang2023/op-geth that referenced this pull request Mar 4, 2024

fix: prune uses the latest block height as the target by default (bnb…

6cf7c94

…-chain#46)

sysvm deleted the feature/prune_issue_fix_2 branch July 29, 2024 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prune uses the latest block height as the target by default #46

fix: prune uses the latest block height as the target by default #46

welkin22 commented Jan 16, 2024 •

edited

Loading

bendanzhentan left a comment

fix: prune uses the latest block height as the target by default #46

fix: prune uses the latest block height as the target by default #46

Conversation

welkin22 commented Jan 16, 2024 • edited Loading

Description

Rationale

Example

Changes

bendanzhentan left a comment

Choose a reason for hiding this comment

welkin22 commented Jan 16, 2024 •

edited

Loading