FIP-0045: Implement migrations #85

arajasek · 2022-10-02T03:23:07Z

Closes #69

The sequence of steps is:

Step 0: Add the DataCap actor to the v8 state tree with a fake Head
Step 1A: Migrate all actors except the StorageMarketActor and VerifiedRegistryActor as part of the "regular" migration workflow (more details on the migrations below), set the results in the output state tree
Step 1B: In a separate, parallel goroutine migrate the VerifiedRegistryActor, and then the StorageMarketActor
Block on 1A and 1B
Step 2: Set the VerifiedRegistryActor and StorageMarketActor in the output state tree
Step 3: Flush and return

Miner Migration

The minerMigrator performs the following migrations:

FIP-0029: Sets the Beneficary to the Owner, and sets empty values for BeneficiaryTerm and PendingBeneficiaryTerm
FIP-0034: For each SectorPreCommitOnChainInfo in PreCommitedSectors, calculates the unsealed CID (assuming there are deals)
FIP-0045: For each SectorOnChainInfo in Sectors, set SimpleQAPower = (DealWeight == 0 && VerifiedDealWeight == 0)
FIP-0045: For each Deadline in Deadlines: for each SectorOnChainInfo in SectorsSnapshot, set SimpleQAPower = (DealWeight == 0 && VerifiedDealWeight == 0)

The migration has the following layers of caching:

at the ActorHead level -- if a matching (address, Head) tuple is found in the cache, use the cached value
at the SectorsAmt level -- if a matching Sectors CID is found in the cache, use the cached value
- This is used for migrating both the Sectors AMT, as well as the SectorsSnapshot AMTs in each deadline
at the "previous SectorsAmt" level -- if we have the previous input and output of migrating this particular miner's SectorsAmt before, load them, and transform previous output to current output based on the diff between previous input and current input

Further caching is possible at the following levels;

at the PreCommits level -- if a matching PreCommittedSectors CID is found in the cache, use the cached value
at the individual PreCommit level -- if a matching SectorPreCommitOnChainInfo CID is found in the cache, use the cached value

DataCap "migration"

The DataCap State is set based on the Verified Registry's VerifiedClients map. For each client:

its balance is converted to tokens that are added to the TokenState
an entry is created in the TokenState's allowances map, with infinite allowance and the Market actor as operator

VerifiedRegistry & Market migration

As in #69

builtin/v8/util/adt/set.go

builtin/v9/migration/top.go

ZenGround0 · 2022-10-03T18:12:02Z

builtin/v9/migration/miner.go


-		return nil
-	})
+	newPrecommits, err := m.migratePrecommits(ctx, wrappedStore, inState.PreCommittedSectors)


Potentially worth caching

Agreed. I'm gonna start an issue capturing possible further caching / speedups. I do want to keep the cache focused on highest impact stuff, because it can get big...

builtin/v9/migration/miner.go

ZenGround0 · 2022-10-03T18:25:29Z

builtin/v9/migration/miner.go

+		for _, dealID := range info.Info.DealIDs {
+			deal, err := m.proposals.GetDealProposal(dealID)
+			if err != nil {
+				// Possible for the proposal to be missing if it's expired (but the deal is still in a precommit that's yet to be cleaned up)


Nice let's make sure we test this

builtin/v9/migration/miner.go

ZenGround0 · 2022-10-03T19:07:23Z

builtin/v9/migration/miner.go

+				return cid.Undef, err
+			}
+
+			outSectorsSnapshotCid, err := cache.Load(SectorsAmtKey(inDeadline.SectorsSnapshot), func() (cid.Cid, error) {


If we are doing diffing in the SectorInfo context it probably makes sense to do diffing here too. The function inside load should then be the extracted function used above.

And to be more clear I'm suggesting diffing against MinerPrevSectorsInKey(minerAddr). We're never going to be more than 1 Day plus Wait time between premigrations of diffs.

If things are still taking a long time and you are feeling experimental you could introduce deadline index / address keys to do diffing against previously cached deadline sectors snapshots but I guess that this won't be a good complexity performance tradeoff.

It's a little bit more complicated than that. The diffing for the Sectors AMT is based on snapshotting (for each miner) the input and output tree at the time of the premigration.

It's unclear what the equivalent to do here would be. We could snapshot each deadline for each miner, but that feels like a lot. We could also just use the input/output of the migration of the "real" Sectors AMT (which we perform right before we start migrating deadlines) as the basis of the diff for the sectors snapshots in each of the deadlines. Realistically, that's as good a baseline for the snapshotted AMTs.

My thought was that all but 3-4 of the deadlines per miner will be perfectly cached by the premigration anyway (since their SectorsSnapshots can't change in the 2 hours between the premigration and the migration). Further, if they do change, there's a good chance the snapshot winds up being the same as the "real" Sectors AMT, and so are found in the cache from the previous step.

All that to say, I agree that some caching is possible here, but there's some open questions about the best way to do it. Some part of me is wary of introducing more caching than necessary lest doing so introduce bugs.

We could also just use the input/output of the migration of the "real" Sectors AMT (which we perform right before we start migrating deadlines) as the basis of the diff for the sectors snapshots in each of the deadlines. Realistically, that's as good a baseline for the snapshotted AMTs.

Yeah I think this is what I was trying to suggest with " diffing against MinerPrevSectorsInKey(minerAddr)" thought I'm missing some details. Good point that "all but 3-4 of the deadlines per miner will be perfectly cached by the premigration anyway" so doing more caching is definitely not worth it.

Unless migration is already coming in under a blocktime we need to do diffing here. Cost savings are on the order of the entire miner migration * the fraction of miners that add or compact one or more sectors during premigration period. It's ok to leave for a follow up if you want to measure first.

Yeah, I'd prefer to leave that to a followup if that works for you -- see #69.

builtin/v9/migration/miner.go

geoff-vball · 2022-10-03T19:54:31Z

manifest/manifest.go

+func GetBuiltinActorsKeys() []string {
+	keys := []string{
+		AccountKey,
+		CronKey,
+		DataCapKey,
+		InitKey,
+		MarketKey,
+		MinerKey,
+		MultisigKey,
+		PaychKey,
+		PowerKey,
+		RewardKey,
+		SystemKey,
+		VerifregKey,
+	}
+	return keys
+}


We have a version of this in chain/actors/manifest.go. I think this is a better location for it, but it still should take actors version as an input.

ZenGround0

A couple things before looking at top level

builtin/v9/migration/miner.go

ZenGround0 · 2022-10-05T15:11:26Z

builtin/v9/migration/miner.go

+				return cid.Undef, err
+			}
+
+			outSectorsSnapshotCid, err := cache.Load(SectorsAmtKey(inDeadline.SectorsSnapshot), func() (cid.Cid, error) {


Unless migration is already coming in under a blocktime we need to do diffing here. Cost savings are on the order of the entire miner migration * the fraction of miners that add or compact one or more sectors during premigration period. It's ok to leave for a follow up if you want to measure first.

ZenGround0 · 2022-10-05T15:53:56Z

builtin/v9/migration/verifreg.go

+			Data:     proposal.PieceCID,
+			Size:     proposal.PieceSize,
+			TermMin:  proposal.Duration(),
+			TermMax:  market9.DealMaxDuration,


For consistency with market policy:

min( alloc_term_min + policy.market_default_allocation_term_buffer, policy.maximum_verified_allocation_term, )

Hmm, I implemented this based on the FIP https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0045.md#migration. Not sure what to do.

Offline I said its ok to keep as is. One small change: I think you should set TermMax to DealMaxDuration + MarketDefaultAllocationTermBuffer. This way deals with maximal deal lifetime get the same TermMax as they would from market code.

ZenGround0 · 2022-10-05T15:55:36Z

builtin/v9/migration/verifreg.go

+			TermMin:  proposal.Duration(),
+			TermMax:  market9.DealMaxDuration,
+			// TODO: priorEpoch + 1???
+			Expiration: verifreg9.MaximumVerifiedAllocationExpiration + priorEpoch,


For consistency with market policy:

min(deal.proposal.start_epoch, prior_epoch + policy.maximum_verified_allocation_expiration)

prior_epoch should be fine because the deal was made latest during the epoch before the migration.

ZenGround0 · 2022-10-05T16:23:28Z

builtin/v9/migration/verifreg.go

+
+		clientAllocationMap, ok := allocationsMapMap[clientIDAddress]
+		if !ok {
+			clientAllocationMap, err = adt9.AsMap(adtStore, emptyMapCid, builtin.DefaultHamtBitwidth)


I don't think this adt.Map ever gets added to the allocationsMapMap golang map (https://go.dev/play/p/ouAvN7Ea813) causing all but the last iterated verified deal from client to be added.

GOOD catch, thank you.

* Add accessors for allocation HAMT in verifreg actor

ZenGround0

I'm ok with this merging once correctness is addressed and defer performance to later. However I would prefer getting at least the deferred migrations in parallel with the rest before merge since this looks like a great opportunity and in need of testing asap.

ZenGround0 · 2022-10-05T16:52:27Z

builtin/v9/migration/top.go

-	log.Log(rt.INFO, "All %d done after %v (%.0f/s). Flushing state tree root.", doneCount, elapsed, rate)
+	log.Log(rt.INFO, "All %d done after %v (%.0f/s). Starting deferred migrations.", doneCount, elapsed, rate)
+
+	// Fetch actor states needed for deferred migrations


If we can get away with this it is nice and easy. However if we find that this is a bottleneck in time we will want to refactor this to run in parallel with the miner migrations. If I remember correctly most migration workloads spend bulk of time blocking on blockstore io. If that's the case then doing these jobs alongside others could basically remove the entire wait time of the deferred migrations.

I think the simplest way to do this is to add a one off go routine in parallel that skips the job and result channel stuff with this code inside it and then block on grp.Wait() && waiting on this goroutine.

I guess that this is the most useful place to focus on speeding up the migration

For threadsafety without hacking into the main wait group logic you can read inputs from actorsIn before spawning goroutine and write to actorsOut after all other migrations have finished.

builtin/v9/migration/top.go

builtin/v9/verifreg/verifreg_types.go

* Add token actor default bitwitdth * methods to return map of all allocation or claims for an actor

anorth

I reviewed primarily for correctness of the state migrations themselves. I'm pretty far from the mechanics of how they are executed now and did not dive deep there.

builtin/v9/datacap/datacap_state.go

builtin/shared.go

builtin/v9/verifreg/verified_registry_state.go

anorth · 2022-10-06T19:45:50Z

builtin/v9/migration/miner.go

@@ -165,3 +179,265 @@ func (m minerMigrator) migrateState(ctx context.Context, store cbor.IpldStore, i
 		newHead:    newHead,
 	}, err
 }
+
+func (m minerMigrator) migratePrecommits(ctx context.Context, wrappedStore adt8.Store, inRoot cid.Cid) (cid.Cid, error) {


FYI @Kubuxu it's worth you casting an eye over this method

builtin/v9/migration/verifreg.go

anorth · 2022-10-06T20:16:21Z

builtin/v9/verifreg/verified_registry_state.go

+	if uint64(allocation.Client) != clientId {
+		return nil, false, xerrors.Errorf("clientId: %d did not match client in allocation: %d", clientId, allocation.Client)
+	}


If the state somehow ended up like this, I don't think this is the right place to catch it. This does not mirror the Rust code.

I'd prefer to leave it -- I think it's reasonable for the state layer to know that if it's asked to find a claim for a specific provider, then the returned claim should match that provider.

No harm done in the sanity check, I think.

anorth · 2022-10-06T20:16:36Z

builtin/v9/verifreg/verified_registry_state.go

+	if uint64(claim.Provider) != providerId {
+		return nil, false, xerrors.Errorf("providerId: %d did not match provider in claim: %d", providerId, claim.Provider)
+	}


Ditto. Less smarts in the state layer.

builtin/v9/verifreg/verified_registry_state.go

anorth · 2022-10-06T23:14:42Z

builtin/v9/migration/verifreg.go

 		if err = clientAllocationMap.Put(nextAllocationID, &verifreg9.Allocation{
 			Client:     abi.ActorID(clientIDu64),
 			Provider:   abi.ActorID(providerIDu64),
 			Data:       proposal.PieceCID,
 			Size:       proposal.PieceSize,
 			TermMin:    proposal.Duration(),
-			TermMax:    market9.DealMaxDuration,
-			Expiration: verifreg9.MaximumVerifiedAllocationExpiration + priorEpoch,
+			TermMax:    market9.DealMaxDuration + market9.MarketDefaultAllocationTermBuffer,


I think we want proposal.Duration() + market9.MarketDefaultAllocationTermBuffer

arajasek added 3 commits October 1, 2022 23:15

FIP-0045: Miner migration

65b6525

FIP-0045: DataCap migration

a74cecc

FIP-0045: Verifreg & Market migration

6b83c10

This was referenced Oct 2, 2022

FIP-0045: Miner migration #83

Closed

FIP-0045: DataCap migration #84

Closed

arajasek added 2 commits October 2, 2022 17:39

Migration fixes and logs

ed7f7a0

Use Cbor-types for serialized primitives

9189cb8

arajasek force-pushed the asr/verifreg-migration branch from 9586bb6 to 9189cb8 Compare October 3, 2022 17:59

ZenGround0 reviewed Oct 3, 2022

View reviewed changes

geoff-vball reviewed Oct 3, 2022

View reviewed changes

Address review

60ff234

arajasek force-pushed the asr/verifreg-migration branch from 8dc677b to 60ff234 Compare October 4, 2022 15:28

geoff-vball mentioned this pull request Oct 5, 2022

feat: actors: Integrate builtin-actors changes for FIP-0045 filecoin-project/lotus#9355

Merged

5 tasks

ZenGround0 reviewed Oct 5, 2022

View reviewed changes

Add accessors for allocation HAMT in verifreg actor (#86)

6a94d2a

* Add accessors for allocation HAMT in verifreg actor

ZenGround0 approved these changes Oct 5, 2022

View reviewed changes

geoff-vball and others added 3 commits October 6, 2022 11:25

Add accessors in verifreg actor (#88)

1a190d1

* Add token actor default bitwitdth * methods to return map of all allocation or claims for an actor

Address review

cff514f

Migrate the DataCap actor in parallel with miner actors

dd56fad

anorth requested changes Oct 6, 2022

View reviewed changes

arajasek added 3 commits October 6, 2022 17:30

fix: only create allocations for pending verified deals

ed8e47d

Migrate the Market & Verifreg actors in parallel with miners

02da936

Address review

0cddeb6

anorth reviewed Oct 6, 2022

View reviewed changes

anorth self-requested a review October 6, 2022 23:16

anorth approved these changes Oct 6, 2022

View reviewed changes

ZenGround0 approved these changes Oct 6, 2022

View reviewed changes

arajasek mentioned this pull request Oct 7, 2022

FIP-0045: Migration #69

Closed

4 tasks

arajasek merged commit f69e651 into master Oct 7, 2022

arajasek deleted the asr/verifreg-migration branch October 7, 2022 00:12

arajasek mentioned this pull request Oct 7, 2022

Tweak Verifreg migration params #91

Merged

arajasek mentioned this pull request Oct 17, 2022

fix: verifreg migration: set TermMax to DealMaxDuration #97

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIP-0045: Implement migrations #85

FIP-0045: Implement migrations #85

arajasek commented Oct 2, 2022 •

edited

Loading

ZenGround0 Oct 3, 2022

arajasek Oct 3, 2022

ZenGround0 Oct 3, 2022

ZenGround0 Oct 3, 2022

ZenGround0 Oct 3, 2022

arajasek Oct 3, 2022

ZenGround0 Oct 4, 2022

ZenGround0 Oct 5, 2022

arajasek Oct 5, 2022

geoff-vball Oct 3, 2022 •

edited

Loading

ZenGround0 left a comment

ZenGround0 Oct 5, 2022

ZenGround0 Oct 5, 2022

arajasek Oct 5, 2022

ZenGround0 Oct 5, 2022

ZenGround0 Oct 5, 2022

ZenGround0 Oct 5, 2022

arajasek Oct 5, 2022

ZenGround0 left a comment

ZenGround0 Oct 5, 2022

ZenGround0 Oct 5, 2022

anorth left a comment

anorth Oct 6, 2022

anorth Oct 6, 2022

arajasek Oct 6, 2022

anorth Oct 6, 2022

anorth Oct 6, 2022

FIP-0045: Implement migrations #85

FIP-0045: Implement migrations #85

Conversation

arajasek commented Oct 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoff-vball Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

ZenGround0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZenGround0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anorth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arajasek commented Oct 2, 2022 •

edited

Loading

geoff-vball Oct 3, 2022 •

edited

Loading