Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prune uses the latest block height as the target by default #46

Merged
merged 4 commits into from
Jan 30, 2024

Conversation

welkin22
Copy link
Contributor

@welkin22 welkin22 commented Jan 16, 2024

Description

After executing the pruning command geth snapshot prune-state --datadir {the data dir of your node} --triesInMemory=32 on the current op-geth, the node may encounter a situation where the block height is stuck and unable to increase when restarted.
prune1 drawio
When pruning, geth will by default select the block height corresponding to the bottom layer of diffLayer in the snapshot structure as the target block height. Since we have configured triesInMemory=32 , there are a total of 32 layers in diffLayer, and the target block height is the latest block height minus 31 blocks. After pruning, all block heights except for the target block height will have their state data cleared.
When geth is restarted, the code will automatically start rolling back the chain until a block height with state data is found, since the latest block height has lost state data. Therefore, the unsafe block height will roll back 31 block heights. It is worth noting that although the unsafe block height is rolled back, the header, body, receipts and other data are not deleted and still exist in the database.
At this time, op-node will also start, and it will use op-geth to obtain the new unsafe block height, and based on this, it will produce a block to advance the block height header.
At this time, if the node is a sequencer, we will encounter two situations:

  1. The hash values of the new unsafe blocks obtained by the op-node from the 1002nd block to the 1032nd block are different from those in the op-geth database. Therefore, the newPayload interface will reconstruct the blocks without skipping any processing, and the state data corresponding to the blocks in this interval will be rebuilt. The 1033rd block can be inserted normally, and the block height can increase normally. The biggest problem with this situation is that the transactions of the 31 blocks that the user has previously put on the chain are discarded, the hash value of the block height changes, and the transactions included in the block have also changed.
  2. The new unsafe block height hash value obtained by the op-node in block 1002 is the same as the one in the op-geth database, so the newPayload interface skips processing this block height. Since block 1002 belongs to the Canonical chain, it will not trigger the SetCanonical method in the subsequent forkchoiceUpdated interface, and the state data will not be rebuilt. However, the hash value obtained in block 1003 is different, so the newPayload interface will not skip processing this block height. Since block 1002 does not have state data, the subsequent processing flow in block 1003 will not pass the check, causing the entire chain to get stuck at block 1003. This situation can also occur for non-sequence nodes.

Rationale

To solve the above problem, I changed the default target block of prune-state to the latest block, so there will be no rollback and all problems will be solved. In opBNB, we rarely encounter reOrg, so there is no need to roll back the chain.

I have another solution PR: #47
Both of these solutions can solve our problem, we can discuss choosing one or both.

Example

After this PR takes effect, the recommended pruning process is as follows:

  1. Stop the nodes that need to be pruned and confirm the latest block height at the time of shutdown through logs. Record its number and hash value.
  2. Backup the datadir (not mandatory).
  3. Wait for a period of time to ensure that the latest block height at the time of shutdown has been finalized. At the same time, make sure that after it has been finalized, its hash has not changed.
  4. Execute the pruning command: geth snapshot prune-state --datadir {the data dir of your node} --triesInMemory=32.
  5. If the pruning is successful, you will see the log: "State pruning successful".
  6. Confirm the target block height for pruning through logs: "Selecting user-specified state as the pruning target", making sure that this target block height has been finalized.
  7. Start the node and observe the logs.

Note: If you execute pruning while the target block height has not yet been finalized, and a chain reorganization occurs at this time, causing the head of the chain to be less than or equal to your target block height, your node will not be able to recover after pruning. Therefore, make sure that the latest block height at the time of shutdown has been finalized before executing the pruning command. If you find that a chain reorganization has occurred during this period, you should restart the node and then select an another appropriate time to perform pruning.

Changes

Notable changes:

  • Update the default target block for pruning to the latest block of the chain.

bendanzhentan
bendanzhentan previously approved these changes Jan 30, 2024
Copy link
Contributor

@bendanzhentan bendanzhentan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK and I have followed the above instructions to test the code.

After this PR takes effect, the recommended pruning process is as follows:

  1. Stop the nodes that need to be pruned and confirm the latest block height at the time of shutdown through logs. Record its number and hash value.
  2. Backup the datadir (not mandatory).
  3. Wait for a period of time to ensure that the latest block height at the time of shutdown has been finalized. At the same time, make sure that after it has been finalized, its hash has not changed.
  4. Execute the pruning command: geth snapshot prune-state --datadir {the data dir of your node} --triesInMemory=32.
  5. If the pruning is successful, you will see the log: "State pruning successful".
  6. Confirm the target block height for pruning through logs: "Selecting user-specified state as the pruning target", making sure that this target block height has been finalized.
  7. Start the node and observe the logs.

@owen-reorg owen-reorg merged commit 9114a97 into develop Jan 30, 2024
1 check passed
andyzhang2023 pushed a commit to andyzhang2023/op-geth that referenced this pull request Mar 4, 2024
@sysvm sysvm deleted the feature/prune_issue_fix_2 branch July 29, 2024 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants