Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIP-0045: Implement migrations #85

Merged
merged 13 commits into from
Oct 7, 2022
Merged

FIP-0045: Implement migrations #85

merged 13 commits into from
Oct 7, 2022

Conversation

arajasek
Copy link
Contributor

@arajasek arajasek commented Oct 2, 2022

Closes #69

The sequence of steps is:

  • Step 0: Add the DataCap actor to the v8 state tree with a fake Head
  • Step 1A: Migrate all actors except the StorageMarketActor and VerifiedRegistryActor as part of the "regular" migration workflow (more details on the migrations below), set the results in the output state tree
  • Step 1B: In a separate, parallel goroutine migrate the VerifiedRegistryActor, and then the StorageMarketActor
  • Block on 1A and 1B
  • Step 2: Set the VerifiedRegistryActor and StorageMarketActor in the output state tree
  • Step 3: Flush and return

Miner Migration

The minerMigrator performs the following migrations:

  • FIP-0029: Sets the Beneficary to the Owner, and sets empty values for BeneficiaryTerm and PendingBeneficiaryTerm
  • FIP-0034: For each SectorPreCommitOnChainInfo in PreCommitedSectors, calculates the unsealed CID (assuming there are deals)
  • FIP-0045: For each SectorOnChainInfo in Sectors, set SimpleQAPower = (DealWeight == 0 && VerifiedDealWeight == 0)
  • FIP-0045: For each Deadline in Deadlines: for each SectorOnChainInfo in SectorsSnapshot, set SimpleQAPower = (DealWeight == 0 && VerifiedDealWeight == 0)

The migration has the following layers of caching:

  • at the ActorHead level -- if a matching (address, Head) tuple is found in the cache, use the cached value
  • at the SectorsAmt level -- if a matching Sectors CID is found in the cache, use the cached value
    • This is used for migrating both the Sectors AMT, as well as the SectorsSnapshot AMTs in each deadline
  • at the "previous SectorsAmt" level -- if we have the previous input and output of migrating this particular miner's SectorsAmt before, load them, and transform previous output to current output based on the diff between previous input and current input

Further caching is possible at the following levels;

  • at the PreCommits level -- if a matching PreCommittedSectors CID is found in the cache, use the cached value
  • at the individual PreCommit level -- if a matching SectorPreCommitOnChainInfo CID is found in the cache, use the cached value

DataCap "migration"

The DataCap State is set based on the Verified Registry's VerifiedClients map. For each client:

  • its balance is converted to tokens that are added to the TokenState
  • an entry is created in the TokenState's allowances map, with infinite allowance and the Market actor as operator

VerifiedRegistry & Market migration

As in #69

This was referenced Oct 2, 2022
@arajasek arajasek force-pushed the asr/verifreg-migration branch from 9586bb6 to 9189cb8 Compare October 3, 2022 17:59
builtin/v8/util/adt/set.go Show resolved Hide resolved
builtin/v9/migration/top.go Outdated Show resolved Hide resolved

return nil
})
newPrecommits, err := m.migratePrecommits(ctx, wrappedStore, inState.PreCommittedSectors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially worth caching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm gonna start an issue capturing possible further caching / speedups. I do want to keep the cache focused on highest impact stuff, because it can get big...

builtin/v9/migration/miner.go Show resolved Hide resolved
for _, dealID := range info.Info.DealIDs {
deal, err := m.proposals.GetDealProposal(dealID)
if err != nil {
// Possible for the proposal to be missing if it's expired (but the deal is still in a precommit that's yet to be cleaned up)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice let's make sure we test this

builtin/v9/migration/miner.go Outdated Show resolved Hide resolved
builtin/v9/migration/miner.go Outdated Show resolved Hide resolved
builtin/v9/migration/miner.go Show resolved Hide resolved
builtin/v9/migration/miner.go Outdated Show resolved Hide resolved
return cid.Undef, err
}

outSectorsSnapshotCid, err := cache.Load(SectorsAmtKey(inDeadline.SectorsSnapshot), func() (cid.Cid, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are doing diffing in the SectorInfo context it probably makes sense to do diffing here too. The function inside load should then be the extracted function used above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to be more clear I'm suggesting diffing against MinerPrevSectorsInKey(minerAddr). We're never going to be more than 1 Day plus Wait time between premigrations of diffs.

If things are still taking a long time and you are feeling experimental you could introduce deadline index / address keys to do diffing against previously cached deadline sectors snapshots but I guess that this won't be a good complexity performance tradeoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little bit more complicated than that. The diffing for the Sectors AMT is based on snapshotting (for each miner) the input and output tree at the time of the premigration.

It's unclear what the equivalent to do here would be. We could snapshot each deadline for each miner, but that feels like a lot. We could also just use the input/output of the migration of the "real" Sectors AMT (which we perform right before we start migrating deadlines) as the basis of the diff for the sectors snapshots in each of the deadlines. Realistically, that's as good a baseline for the snapshotted AMTs.

My thought was that all but 3-4 of the deadlines per miner will be perfectly cached by the premigration anyway (since their SectorsSnapshots can't change in the 2 hours between the premigration and the migration). Further, if they do change, there's a good chance the snapshot winds up being the same as the "real" Sectors AMT, and so are found in the cache from the previous step.

All that to say, I agree that some caching is possible here, but there's some open questions about the best way to do it. Some part of me is wary of introducing more caching than necessary lest doing so introduce bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just use the input/output of the migration of the "real" Sectors AMT (which we perform right before we start migrating deadlines) as the basis of the diff for the sectors snapshots in each of the deadlines. Realistically, that's as good a baseline for the snapshotted AMTs.

Yeah I think this is what I was trying to suggest with " diffing against MinerPrevSectorsInKey(minerAddr)" thought I'm missing some details. Good point that "all but 3-4 of the deadlines per miner will be perfectly cached by the premigration anyway" so doing more caching is definitely not worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless migration is already coming in under a blocktime we need to do diffing here. Cost savings are on the order of the entire miner migration * the fraction of miners that add or compact one or more sectors during premigration period. It's ok to leave for a follow up if you want to measure first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd prefer to leave that to a followup if that works for you -- see #69.

builtin/v9/migration/miner.go Show resolved Hide resolved
Comment on lines 30 to 46
func GetBuiltinActorsKeys() []string {
keys := []string{
AccountKey,
CronKey,
DataCapKey,
InitKey,
MarketKey,
MinerKey,
MultisigKey,
PaychKey,
PowerKey,
RewardKey,
SystemKey,
VerifregKey,
}
return keys
}
Copy link
Contributor

@geoff-vball geoff-vball Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a version of this in chain/actors/manifest.go. I think this is a better location for it, but it still should take actors version as an input.

Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple things before looking at top level

builtin/v9/migration/miner.go Show resolved Hide resolved
return cid.Undef, err
}

outSectorsSnapshotCid, err := cache.Load(SectorsAmtKey(inDeadline.SectorsSnapshot), func() (cid.Cid, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless migration is already coming in under a blocktime we need to do diffing here. Cost savings are on the order of the entire miner migration * the fraction of miners that add or compact one or more sectors during premigration period. It's ok to leave for a follow up if you want to measure first.

Data: proposal.PieceCID,
Size: proposal.PieceSize,
TermMin: proposal.Duration(),
TermMax: market9.DealMaxDuration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with market policy:

 min(
        alloc_term_min + policy.market_default_allocation_term_buffer,
        policy.maximum_verified_allocation_term,
    )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I implemented this based on the FIP https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0045.md#migration. Not sure what to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline I said its ok to keep as is. One small change: I think you should set TermMax to DealMaxDuration + MarketDefaultAllocationTermBuffer. This way deals with maximal deal lifetime get the same TermMax as they would from market code.

TermMin: proposal.Duration(),
TermMax: market9.DealMaxDuration,
// TODO: priorEpoch + 1???
Expiration: verifreg9.MaximumVerifiedAllocationExpiration + priorEpoch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with market policy:

min(deal.proposal.start_epoch, prior_epoch + policy.maximum_verified_allocation_expiration)

prior_epoch should be fine because the deal was made latest during the epoch before the migration.


clientAllocationMap, ok := allocationsMapMap[clientIDAddress]
if !ok {
clientAllocationMap, err = adt9.AsMap(adtStore, emptyMapCid, builtin.DefaultHamtBitwidth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this adt.Map ever gets added to the allocationsMapMap golang map (https://go.dev/play/p/ouAvN7Ea813) causing all but the last iterated verified deal from client to be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GOOD catch, thank you.

* Add accessors for allocation HAMT in verifreg actor
Copy link
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this merging once correctness is addressed and defer performance to later. However I would prefer getting at least the deferred migrations in parallel with the rest before merge since this looks like a great opportunity and in need of testing asap.

log.Log(rt.INFO, "All %d done after %v (%.0f/s). Flushing state tree root.", doneCount, elapsed, rate)
log.Log(rt.INFO, "All %d done after %v (%.0f/s). Starting deferred migrations.", doneCount, elapsed, rate)

// Fetch actor states needed for deferred migrations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can get away with this it is nice and easy. However if we find that this is a bottleneck in time we will want to refactor this to run in parallel with the miner migrations. If I remember correctly most migration workloads spend bulk of time blocking on blockstore io. If that's the case then doing these jobs alongside others could basically remove the entire wait time of the deferred migrations.

I think the simplest way to do this is to add a one off go routine in parallel that skips the job and result channel stuff with this code inside it and then block on grp.Wait() && waiting on this goroutine.

I guess that this is the most useful place to focus on speeding up the migration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For threadsafety without hacking into the main wait group logic you can read inputs from actorsIn before spawning goroutine and write to actorsOut after all other migrations have finished.

builtin/v9/migration/top.go Outdated Show resolved Hide resolved
builtin/v9/verifreg/verifreg_types.go Outdated Show resolved Hide resolved
geoff-vball and others added 3 commits October 6, 2022 11:25
* Add token actor default bitwitdth

* methods to return map of all allocation or claims for an actor
Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed primarily for correctness of the state migrations themselves. I'm pretty far from the mechanics of how they are executed now and did not dive deep there.

builtin/v9/datacap/datacap_state.go Show resolved Hide resolved
builtin/shared.go Show resolved Hide resolved
builtin/v9/verifreg/verified_registry_state.go Outdated Show resolved Hide resolved
@@ -165,3 +179,265 @@ func (m minerMigrator) migrateState(ctx context.Context, store cbor.IpldStore, i
newHead: newHead,
}, err
}

func (m minerMigrator) migratePrecommits(ctx context.Context, wrappedStore adt8.Store, inRoot cid.Cid) (cid.Cid, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @Kubuxu it's worth you casting an eye over this method

builtin/v9/migration/verifreg.go Show resolved Hide resolved
builtin/v9/migration/verifreg.go Outdated Show resolved Hide resolved
builtin/v9/migration/verifreg.go Outdated Show resolved Hide resolved
Comment on lines +106 to +108
if uint64(allocation.Client) != clientId {
return nil, false, xerrors.Errorf("clientId: %d did not match client in allocation: %d", clientId, allocation.Client)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the state somehow ended up like this, I don't think this is the right place to catch it. This does not mirror the Rust code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to leave it -- I think it's reasonable for the state layer to know that if it's asked to find a claim for a specific provider, then the returned claim should match that provider.

No harm done in the sanity check, I think.

Comment on lines +140 to +142
if uint64(claim.Provider) != providerId {
return nil, false, xerrors.Errorf("providerId: %d did not match provider in claim: %d", providerId, claim.Provider)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Less smarts in the state layer.

builtin/v9/verifreg/verified_registry_state.go Outdated Show resolved Hide resolved
if err = clientAllocationMap.Put(nextAllocationID, &verifreg9.Allocation{
Client: abi.ActorID(clientIDu64),
Provider: abi.ActorID(providerIDu64),
Data: proposal.PieceCID,
Size: proposal.PieceSize,
TermMin: proposal.Duration(),
TermMax: market9.DealMaxDuration,
Expiration: verifreg9.MaximumVerifiedAllocationExpiration + priorEpoch,
TermMax: market9.DealMaxDuration + market9.MarketDefaultAllocationTermBuffer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want proposal.Duration() + market9.MarketDefaultAllocationTermBuffer

@anorth anorth self-requested a review October 6, 2022 23:16
@arajasek arajasek mentioned this pull request Oct 7, 2022
4 tasks
@arajasek arajasek merged commit f69e651 into master Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FIP-0045: Migration
4 participants