Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Implement checkpoint sync #2244

Closed
wants to merge 49 commits into from

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Mar 5, 2021

Issue Addressed

Closes #1891
Closes #1784

Proposed Changes

Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint.

Additional Info

  • Return unavailable status for out-of-range blocks requested by peers (Network Update for Weak Subjectivity Sync #2561)
  • Implement sync daemon for fetching historical blocks (Network Update for Weak Subjectivity Sync #2561)
  • Verify chain hashes (either in historical_blocks.rs or the calling module)
  • Consistency check for initial block + state
  • Fetch the initial state and block from a beacon node HTTP endpoint
  • Don't crash fetching beacon states by slot from the API
  • Background service for state reconstruction, triggered by CLI flag or API call.

Considered out of scope for this PR:

  • Drop the requirement to provide the --checkpoint-block (this would require some pretty heavy refactoring of block verification)

@michaelsproul michaelsproul added the work-in-progress PR is a work-in-progress label Mar 5, 2021
@michaelsproul michaelsproul self-assigned this Mar 5, 2021
@paulhauner paulhauner added the v1.5.1 To be included in the v1.5.1 relase label Aug 2, 2021
@paulhauner
Copy link
Member

I've tagged this for v1.5.1. That might be optimistic, but I just wanted to keep this on the radar. Feel free to downgrade :)

@michaelsproul
Copy link
Member Author

I've made some tweaks to help with debugging:

  • Update lcli pretty-ssz to print JSON, and support SignedBeaconBlock
  • Add two new /lighthouse/database endpoints:
    • /info: info about the split slot, anchor point, and schema version
    • /historical_blocks: POST endpoint for pushing historical blocks to a BN manually

My debugging flow (using scripts from eth2-scripts) has been:

$ ssz_block.sh finalized > block.ssz
$ ssz_state.sh finalized > state.ssz
$ lighthouse bn --initial-state state.ssz --initial-block block.ssz --http
$ historical_sync.py

Now to fix the actual problems listed in the PR description 😅

@michaelsproul
Copy link
Member Author

There's an issue with the Altair beacon chain tests. Probably something benign, but I haven't had a chance to look into it yet.

@michaelsproul
Copy link
Member Author

Also realised there are probably issues at skipped slots. If the initial block's slot doesn't match the state's I think we will run into problems. I'll look into the best way to handle this next week.

@michaelsproul michaelsproul changed the title Implement weak subjectivity sync Implement checkpoint sync Aug 16, 2021
* Close to first draft

* Further progress to first draft

* Further progress, before rebase

* First draft

* Cleanup and fixes

* Notifier updates and bug fixes

* Fix off-by-one errors

* Remove todo

* Increase backfill buffer

* Gracefully handle requests during backfill

* Improve comments

* fmt

* Update error handling

* Reviewers suggestions

* Take historic blocks by ref, avoid clone

* Clear batch on pause

* Further reviewers suggestions

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed blocked labels Sep 14, 2021
@michaelsproul
Copy link
Member Author

This is ready for review! Batteries now included 🎉

* Give error when WS sync with existing DB

* Give wrong network hint for bad SSZ

* Use checkpoint instead of check-point
* Log about backfill less often

* Log when backfill complete

* Update docs
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such an awesome feature, I had so much fun using it 🎉

I've left a few rather minor suggestions and comments. I also have a few that don't relate to changes so I can't associate to a specific line:

  • The --shutdown-after-sync flag will now shutdown after we reach head, but not before we finish back fill. I don't really know what the behaviour should be here, but at the very least we could add a message to the CLI help string flag indicating that this is the case.
  • We also have the --wss-checkpoint flag on the BN. I think it should probably conflict with all the checkpoint-sync flags.

I also noticed we don't have the client-side of the new /lighthouse API endpoints. It would be fairly painless to add these and some smoke tests. I might pick this up when I get a chance and make a PR to this PR.

beacon_node/beacon_chain/src/schema_change.rs Show resolved Hide resolved
beacon_node/network/src/sync/backfill_sync/mod.rs Outdated Show resolved Hide resolved
beacon_node/store/src/reconstruct.rs Show resolved Hide resolved
book/src/checkpoint-sync.md Outdated Show resolved Hide resolved
beacon_node/network/src/sync/manager.rs Outdated Show resolved Hide resolved
@paulhauner paulhauner added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Sep 17, 2021
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Sep 20, 2021
@michaelsproul
Copy link
Member Author

Thanks @paulhauner for the review, and thanks @AgeManning for fixing up that sync TODO. All comments have now been addressed, so I think we must be almost ready to merge 🎉

@michaelsproul
Copy link
Member Author

michaelsproul commented Sep 20, 2021

Ah, I didn't address the flag-related suggestions from your main comment. Will do that now.

@michaelsproul
Copy link
Member Author

Done.

Points to note:

  • Deleted the /lighthouse/backfill API because its functionality was covered by lighthouse/syncing. Age was just using it during testing.
  • The lighthouse/database/historical_blocks API is untested because it would be a pain to do a non-trivial test (requires a WSS node, and some blocks to POST). Seeing as it's for developers only and the docs state as much, I think this is an acceptable tradeoff.
  • I've left --wss-checkpoint and --checkpoint-* as non-conflicting, because I think there are cases in which it could be useful to have both, e.g. if you start a checkpoint sync from several epochs back but want to verify that you reach some more recent checkpoint. This can be useful when you want quick access to some recent history, without having to backfill + reconstruct. I've updated the help text to point to --checkpoint-sync-url.

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I can't wait to see this in the hands of users!

You have my approval, merge at will capt'n 🎉

@paulhauner paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Sep 22, 2021
@michaelsproul
Copy link
Member Author

michaelsproul commented Sep 22, 2021

😮 😮 😮

bors r+

bors bot pushed a commit that referenced this pull request Sep 22, 2021
## Issue Addressed

Closes #1891
Closes #1784

## Proposed Changes

Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint.

## Additional Info

- [x] Return unavailable status for out-of-range blocks requested by peers (#2561)
- [x] Implement sync daemon for fetching historical blocks (#2561)
- [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module)
- [x] Consistency check for initial block + state
- [x] Fetch the initial state and block from a beacon node HTTP endpoint
- [x] Don't crash fetching beacon states by slot from the API
- [x] Background service for state reconstruction, triggered by CLI flag or API call.

Considered out of scope for this PR:

- Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification)


Co-authored-by: Diva M <divma@protonmail.com>
@bors
Copy link

bors bot commented Sep 22, 2021

Pull request successfully merged into unstable.

Build succeeded:

@bors bors bot changed the title Implement checkpoint sync [Merged by Bors] - Implement checkpoint sync Sep 22, 2021
@bors bors bot closed this Sep 22, 2021
@michaelsproul michaelsproul deleted the weak-subj-sync branch September 22, 2021 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v2.0.0 Altair on mainnet release (v2.0.0)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants