Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't GC manifests #4959

Merged
merged 28 commits into from
Nov 27, 2024
Merged

fix: don't GC manifests #4959

merged 28 commits into from
Nov 27, 2024

Conversation

ruseinov
Copy link
Contributor

@ruseinov ruseinov commented Oct 31, 2024

Summary of changes

Changes introduced in this pull request:

  • blessed column for persistent storage

Reference issue to close (if applicable)

Closes #4926

Other information and links

The actors are found fine after several GC rounds, so this is confirmed. Also, the migration now works as intended.

Manual testing:

Several GC runs prior to state migration:

2024-11-12T14:19:05.975826Z  INFO forest_filecoin::db::gc: filter keys for GC
2024-11-12T14:19:06.544906Z  INFO forest_filecoin::db::gc: GC sweep
2024-11-12T14:19:06.793863Z  INFO forest_filecoin::db::gc: GC finished sweep: 4189 deleted records

There are of course other errors that have to do with modifying the state migration epoch, but no bundle retrieval issues, I have also tested this before the persistent fix and it failed miserably trying to retrieve those from the DB.

2024-11-12T15:18:29.939559Z  INFO compute_tipset_state_blocking: forest_filecoin::state_migration: Running TukTuk migration at epoch 2136520
2024-11-12T15:18:29.985712Z  INFO forest_filecoin::chain_sync::tipset_syncer: Validating tipset: EPOCH = 2136611, N blocks = 1
2024-11-12T15:18:29.985734Z  WARN forest_filecoin::chain_sync::tipset_syncer: Got block from the future, but within clock drift threshold, 1731424710 > 1731424709
2024-11-12T15:18:29.988647Z  WARN forest_filecoin::chain_sync::tipset_syncer: Validating block [CID = bafy2bzacea3gbxtureg5u7msimtsnns2j7edqmsetgghs7uy4cxlqdnejgzgo] in EPOCH = 2136611 failed: Validation error: Processing error: Error calculating weight: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzaceaxzciqzroykafdbt2fcizvzphu2hex4dceseo3lzrbl6qtuy3d7u, state_root_version=unknown, Processing error: Failed to calculate state: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzaceaxzciqzroykafdbt2fcizvzphu2hex4dceseo3lzrbl6qtuy3d7u, state_root_version=unknown, Processing error: Could not update state: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzaceaxzciqzroykafdbt2fcizvzphu2hex4dceseo3lzrbl6qtuy3d7u, state_root_version=unknown, Validation error: Consensus error: Power actor not found, Consensus error: StateManager error: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzaceaxzciqzroykafdbt2fcizvzphu2hex4dceseo3lzrbl6qtuy3d7u, state_root_version=unknown
2024-11-12T15:18:29.988693Z  WARN forest_filecoin::chain_sync::tipset_syncer: Sync messages check state failed for single tipset
2024-11-12T15:18:29.988701Z ERROR forest_filecoin::chain_sync::tipset_syncer: Syncing tipset range [2136521, 2136611] failed: Validation error: Processing error: Error calculating weight: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzaceaxzciqzroykafdbt2fcizvzphu2hex4dceseo3lzrbl6qtuy3d7u, state_root_version=unknown, Processing error: Failed to calculate state: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzaceaxzciqzroykafdbt2fcizvzphu2hex4dceseo3lzrbl6qtuy3d7u, state_root_version=unknown, Processing error: Could not update state: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzaceaxzciqzroykafdbt2fcizvzphu2hex4dceseo3lzrbl6qtuy3d7u, state_root_version=unknown, Validation error: Consensus error: Power actor not found, Consensus error: StateManager error: Can't create a valid state tree from the given root. This error may indicate unsupported version. state_root_cid=bafy2bzaceaxzciqzroykafdbt2fcizvzphu2hex4dceseo3lzrbl6qtuy3d7u, state_root_version=unknown
thread 'state migration thread: 0' panicked at /Users/romanuseinov/projects/chainsafe/forest/src/state_migration/common/state_migration.rs:134:29:
failed executing job for address: t04, Reason: state migration failed for bafk2bzaceclefusmffhuuvtggrmadr3cwpwsgphtlj2wb222ztwwv5mssu5ea actor, addr t04:Serialization error for Cbor protocol: RequireLength { name: "tuple", expect: 15, value: 17 }
2024-11-12T15:18:30.090031Z  INFO forest_filecoin::state_migration::common::state_migration: Processed 100000 actors

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

@ruseinov ruseinov marked this pull request as ready for review November 4, 2024 20:32
@ruseinov ruseinov requested a review from a team as a code owner November 4, 2024 20:32
@ruseinov ruseinov requested review from hanabi1224 and LesnyRumcajs and removed request for a team November 4, 2024 20:32
@ruseinov
Copy link
Contributor Author

ruseinov commented Nov 6, 2024

Follow-up issue - create common helpers to use for migrations and also for DB operations.
We have to duplicate the way that the keys are being constructed in several different manners, for e.g. EthMappings and the Graph. It would be nice to share the same code that is used by the application itself to avoid human error.

@LesnyRumcajs
Copy link
Member

Let's hold off merging and releasing until the network upgrade on the mainnet (Nov 20th) to reduce chance of potential issues.

@LesnyRumcajs LesnyRumcajs requested review from elmattic and removed request for LesnyRumcajs November 15, 2024 06:57
@@ -124,6 +125,19 @@ where
}
}

impl<ReaderT> PersistentStore for AnyCar<ReaderT>
Copy link
Contributor

@hanabi1224 hanabi1224 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be less error-prone to make Blockstore trait persistent by default and have a GarbageCollectableStore trait for storing blockchain data. Currently, ppl have to interpret Blockstore as GarbageCollectableStore to avoid mistakes which is counter-intuitive. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create an issue for that and we’ll address it as prioritized. It’s out of scope for this PR, I have already created an issue to generalize different stores a bit more to be able to rely on tests for different database types better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m also not sure that making Blockstore persistent by default is a good idea as the use-cases for persistent storage are pretty limited at the moment.
It seems our default use-case is garbage-collected, so having persistent storage as an add-on makes more sense. I do agree that in general case this would be counter-intuitive.

let depth = 5 as ChainEpochDelta;
let current_epoch = 0 as ChainEpochDelta;

let tester = GCTester::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we test both MemoryDB and ParityDB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParityDB is tricky to test, there is an issue for that. Therefore it has been tested manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue there is that deletes are not propagating immediately, so unit testing it reliably is not currently possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, no issue. creating one

// garbage collected. Until this is fixed, the GC has to be disabled.
// Tracking issue: https://github.com/ChainSafe/forest/issues/4926
// if !opts.no_gc {
if false {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to see GC being manually triggered in at least one of the CI tests. Could you create a tracking issue if it cannot land in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an issue for manual gc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruseinov
Copy link
Contributor Author

@hanabi1224 I've created some issues and linked others to address the comments to this PR:

  1. Redesigning the DB trait structure is subject to discussion, however there is an issue here: Share more logic between different DB implementations. #5005, feel free to add to it. I'm also looking into introducing a wrapper for MemoryDB/ParityDB in rust-f3 to have a unified interface for all kinds of storages, perhaps we could adopt that approach in forest later. When it comes to making persistent storage a default - I have my doubts about the viability of this approach just to support what is currently one niche use-case (manifest storage).
  2. Manual GC trigger issue is here: [GC] Add manual trigger. #4461, it's already prioritised as low as there isn't much use for it, except the CI integration testing. It will require a bit of a redesign to be able to support channels and override the automatic GC.
  3. ParityDB unit testing issue here: Figure out a way to manually trigger parityDB sync. #5008, it's not trivial, I remember spending time investigating a way to manually sync all the pending transactions and it was not reliably possible at the moment. Would be great to revisit this again.

Copy link
Contributor

@hanabi1224 hanabi1224 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It'd be great to have @lemmih making a final review as well.

@hanabi1224 hanabi1224 requested a review from lemmih November 27, 2024 01:57
@ruseinov
Copy link
Contributor Author

LGTM. It'd be great to have @lemmih making a final review as well.

we can do this retroactively I guess, if anything pops-up - can be fixed in the next iteration.
I’d love to tackle some of the issues linked from this PR.

@ruseinov ruseinov added this pull request to the merge queue Nov 27, 2024
Merged via the queue into main with commit 1cf6eec Nov 27, 2024
35 checks passed
@ruseinov ruseinov deleted the ru/fix/gc-manifest branch November 27, 2024 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garbage collector should not collect manifests
4 participants