Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Util/Admin] Creating protocol snapshot from checkpoint file #5604

Merged
merged 8 commits into from
Apr 4, 2024

Conversation

zhangchiqing
Copy link
Member

@zhangchiqing zhangchiqing commented Mar 28, 2024

Close #5580

This PR creates an admin tool and util command to generate a protocol snapshot from latest checkpoint file.
The generated protocol snapshot file can be used along with the checkpoint file for dynamic bootstrapping an execution node.

@codecov-commenter
Copy link

codecov-commenter commented Mar 28, 2024

Codecov Report

Attention: Patch coverage is 6.66667% with 70 lines in your changes are missing coverage. Please review.

Project coverage is 55.61%. Comparing base (5922cda) to head (846967e).
Report is 101 commits behind head on master.

Files Patch % Lines
admin/commands/storage/read_protocol_snapshot.go 0.00% 61 Missing ⚠️
cmd/execution_builder.go 0.00% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5604      +/-   ##
==========================================
- Coverage   55.65%   55.61%   -0.04%     
==========================================
  Files        1037     1042       +5     
  Lines      101377   101957     +580     
==========================================
+ Hits        56424    56708     +284     
- Misses      40613    40897     +284     
- Partials     4340     4352      +12     
Flag Coverage Δ
unittests 55.61% <6.66%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhangchiqing zhangchiqing force-pushed the leo/5580-checkpoint-protocol-snapshot branch 7 times, most recently from 07164e1 to 6a0c386 Compare March 28, 2024 18:48
@zhangchiqing zhangchiqing force-pushed the leo/5580-checkpoint-protocol-snapshot branch from 6a0c386 to 1e77fdc Compare March 28, 2024 20:34
@zhangchiqing zhangchiqing marked this pull request as ready for review March 28, 2024 20:34
Comment on lines 60 to 62
if !ok {
return nil, fmt.Errorf("could not parse blocks-to-skip: %v", data)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the validator should do this validation.

return nil, 0, flow.DummyStateCommitment, fmt.Errorf("could not find finalized height for sealed height %v: %w", sealedHeight, err)
}

return state.AtHeight(finalizedHeight), sealedHeight, commit, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't handle checking that the full sealing segment is included. Is that required for ENs?

validSnapshot, err := b.getValidSnapshot(snapshot, 0, false)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the getValidSnapshot contains some necessary validation to the generated protocol snapshot, such as ensuring the finalized block and sealed block belong to the same epoch.

However, these checks are placed in the access backend (engine/access/rpc/backend_network.go), I think it makes more sense to be placed in a package inside protocol state, so that it can be imported by other packages, such as util / admin.

Any idea why we didn't do that in the first place? @jordanschalm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely the reason is that the protocol state needs to be able to return a snapshot for every incorporated block, which includes snapshots that getValidSnapshot considers invalid for dynamic bootstrapping.

I'm happy to move the checks into the protocol package and agree that makes sense. It just needs to be differentiated from the existing snapshot API, something like func GetDynamicBootstrapSnapshot(state, referenceSnapshot).

// Expected error returns during normal operations:
// * ErrSnapshotPhaseMismatch - snapshot does not contain a valid sealing segment
// All other errors should be treated as exceptions.
func (b *backendNetwork) getValidSnapshot(snapshot protocol.Snapshot, blocksVisited int, findNextValidSnapshot bool) (protocol.Snapshot, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to snapshots.getValidSnapshot

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Added mainly documentation suggestions.

Main feedback is regarding how we find the snapshot corresponding to a particular state commitment, details in this comment: #5604 (comment).

cmd/util/common/checkpoint.go Show resolved Hide resolved
cmd/util/common/checkpoint.go Show resolved Hide resolved
Comment on lines 230 to 231
// the finalized height that seals the given sealed height must be above the sealed height
// so if we iterate through each height, we should eventually find the finalized height
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we iterate through each height, we should eventually find the finalized height

This is not true in the general case. For example, suppose we have the following structure:

A <- B <- C <- D[Seal_A,Seal_B]

and suppose we are trying to find the block sealing A.

Block D seals A, however the latest sealed block as of D is not A, its B.

In this kind of scenario, we will never successfully exit this loop. In particular, the way sealing works, no snapshot exists for which Seal_A.StateCommitment is the sealed commitment.

We could solve this algorithmically by changing the scanning logic. Rather than scanning blocks and exiting when we find a block B such that Seal_B.StateCommitment is in our search set, we could exit when we find a block B such that seals.HighestInFork(B.ID()) (the latest seal as of block B) is in our search set. Then we could skip the step of translating a sealed height into a finalized height (findFinalizedHeightBySealedHeight).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and good idea!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's simple here that we don't need to implement this function, but I think it could be a useful utility function to know which block finalizes the seal for a given block. We don't have an index for this type of lookup, and the search is currently missing.

The scenario could be: I submitted a transaction A, which is included in block 10, I would like to know which block seals my transaction (seals block 10).

Is there any other scenario that needed this function?

state/protocol/snapshots/dynamic_bootstrap.go Outdated Show resolved Hide resolved
state/protocol/snapshots/dynamic_bootstrap.go Show resolved Hide resolved
state/protocol/snapshots/dynamic_bootstrap.go Outdated Show resolved Hide resolved
state/protocol/snapshots/dynamic_bootstrap.go Show resolved Hide resolved
zhangchiqing and others added 2 commits April 3, 2024 15:19
Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>
@@ -20,6 +20,7 @@ import (
// ErrEOFNotReached for indicating end of file not reached error
var ErrEOFNotReached = errors.New("expect to reach EOF, but actually didn't")

// TODO: validate the header file and the sub file that contains the root hashes
Copy link
Member Author

@zhangchiqing zhangchiqing Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will implement in a separate PR
#5623

@zhangchiqing zhangchiqing enabled auto-merge April 4, 2024 16:01
@zhangchiqing zhangchiqing added this pull request to the merge queue Apr 4, 2024
Merged via the queue into master with commit ca8a169 Apr 4, 2024
55 checks passed
@zhangchiqing zhangchiqing deleted the leo/5580-checkpoint-protocol-snapshot branch April 4, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Util] Generate protocol snapshot file from latest checkpoint file
5 participants