Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: state: Fast migration for v15 #7901

Closed
wants to merge 2 commits into from
Closed

feat: state: Fast migration for v15 #7901

wants to merge 2 commits into from

Conversation

arajasek
Copy link
Contributor

@arajasek arajasek commented Jan 7, 2022

Related Issues

Fixes #7870

Proposed Changes

Additional Info

Checklist

Before you mark the PR ready for review, please make sure that:

  • All commits have a clear commit message.
  • The PR title is in the form of of <PR type>: <#issue number> <area>: <change being made>
    • example: fix: #1234 mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, INTERFACE BREAKING CHANGE, CONSENSUS BREAKING, build, chore, ci, docs, misc,perf, refactor, revert, style, test
    • area: api, chain, state, vm, data transfer, market, mempool, message, block production, multisig, networking, paychan, proving, sealing, wallet
  • This PR has tests for new functionality or change in behaviour
  • If new user-facing features are introduced, clear usage guidelines and / or documentation updates should be included in https://lotus.filecoin.io or Discussion Tutorials.
  • CI is green

@arajasek arajasek requested a review from a team as a code owner January 7, 2022 23:19
@arajasek
Copy link
Contributor Author

arajasek commented Jan 7, 2022

Some more experimenting and tweaking is needed, but this gets the job done well. Will share latest numbers soon (roughly 35 mins for premigration, 3 mins for migration on my box).

@@ -1245,8 +1247,15 @@ func PreUpgradeActorsV7(ctx context.Context, sm *stmgr.StateManager, cache stmgr
workerCount /= 2
}

config := nv15.Config{MaxWorkers: uint(workerCount)}
_, err := upgradeActorsV7Common(ctx, sm, cache, root, epoch, ts, config)
lbts, lbRoot, err := stmgr.GetLookbackTipSetForRound(ctx, sm, ts, epoch)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration requires us to be reorg-insensitive, so the premigration has to be based on finality.

@@ -1255,7 +1264,12 @@ func upgradeActorsV7Common(
root cid.Cid, epoch abi.ChainEpoch, ts *types.TipSet,
config nv15.Config,
) (cid.Cid, error) {
buf := blockstore.NewTieredBstore(sm.ChainStore().StateBlockstore(), blockstore.NewMemorySync())
writeStore, err := autobatch.NewBlockstore(sm.ChainStore().StateBlockstore(), blockstore.NewMemorySync(), 100_000, 100, true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100_000 here wasn't chosen very scientifically...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blockstore:

  1. Doesn't guarantee a flush order.
  2. Doesn't actually wait in Flush for the all data to be written.
  3. Definitely wasn't written for this use-case.

3 isn't really an issue, but 1 and 2 are. Ideally the node won't be borked if you crash during a pre-migration.

@arajasek
Copy link
Contributor Author

Update to the tagged actors RC before landing this.

@codecov
Copy link

codecov bot commented Jan 10, 2022

Codecov Report

Merging #7901 (ebeb7b0) into master (3fb71cd) will decrease coverage by 0.05%.
The diff coverage is 21.17%.

❗ Current head ebeb7b0 differs from pull request most recent head a52ca11. Consider uploading reports for the commit a52ca11 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7901      +/-   ##
==========================================
- Coverage   39.38%   39.33%   -0.06%     
==========================================
  Files         655      656       +1     
  Lines       70254    70331      +77     
==========================================
- Hits        27673    27662      -11     
- Misses      37818    37903      +85     
- Partials     4763     4766       +3     
Impacted Files Coverage Δ
chain/actors/builtin/paych/message5.go 0.00% <0.00%> (ø)
chain/actors/builtin/paych/message6.go 0.00% <0.00%> (ø)
cmd/lotus-shed/main.go 0.00% <0.00%> (ø)
cmd/lotus-shed/migrations.go 0.00% <0.00%> (ø)
chain/consensus/filcns/upgrades.go 34.06% <20.00%> (-0.18%) ⬇️
chain/actors/builtin/paych/message7.go 86.66% <100.00%> (ø)
chain/actors/builtin/paych/v7.go 72.41% <100.00%> (+7.96%) ⬆️
chain/stmgr/forks.go 46.84% <100.00%> (ø)
chain/stmgr/stmgr.go 64.94% <100.00%> (ø)
markets/loggers/loggers.go 89.28% <0.00%> (-10.72%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fb71cd...a52ca11. Read the comment docs.

@arajasek arajasek added this to the v1.14.0 milestone Jan 10, 2022
Comment on lines +42 to +46
{{if (ge .v 7)}}
Sv: toV{{.v}}SignedVoucher(*sv),
{{else}}
Sv: *sv,
{{end}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whitespace?

Suggested change
{{if (ge .v 7)}}
Sv: toV{{.v}}SignedVoucher(*sv),
{{else}}
Sv: *sv,
{{end}}
{{if (ge .v 7) -}}
Sv: toV{{.v}}SignedVoucher(*sv),
{{- else -}}
Sv: *sv,
{{- end}}

@@ -1255,7 +1264,12 @@ func upgradeActorsV7Common(
root cid.Cid, epoch abi.ChainEpoch, ts *types.TipSet,
config nv15.Config,
) (cid.Cid, error) {
buf := blockstore.NewTieredBstore(sm.ChainStore().StateBlockstore(), blockstore.NewMemorySync())
writeStore, err := autobatch.NewBlockstore(sm.ChainStore().StateBlockstore(), blockstore.NewMemorySync(), 100_000, 100, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blockstore:

  1. Doesn't guarantee a flush order.
  2. Doesn't actually wait in Flush for the all data to be written.
  3. Definitely wasn't written for this use-case.

3 isn't really an issue, but 1 and 2 are. Ideally the node won't be borked if you crash during a pre-migration.

@Stebalien
Copy link
Member

3 isn't really an issue, but 1 and 2 are. Ideally the node won't be borked if you crash during a pre-migration.

I'd just implement a much simpler version of this. Basically, you need something with 3 stages:

  1. Buffered.
  2. Flushing.
  3. Backing.

Writes go into buffered. When it fills, put signals (uses a length-one channel) some background goroutine to flush. That background goroutine will move everything from buffered to flushing, and start writing it to the backing blockstore.

Gets check buffered, then flushing, then backing.

The background goroutine ensures that data is always flushed in-order.

@arajasek arajasek closed this Jan 11, 2022
@arajasek
Copy link
Contributor Author

Superseded by #7933

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v7 migration integration
2 participants