-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIP-0045: Implement migrations #85
Conversation
9586bb6
to
9189cb8
Compare
|
||
return nil | ||
}) | ||
newPrecommits, err := m.migratePrecommits(ctx, wrappedStore, inState.PreCommittedSectors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially worth caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'm gonna start an issue capturing possible further caching / speedups. I do want to keep the cache focused on highest impact stuff, because it can get big...
for _, dealID := range info.Info.DealIDs { | ||
deal, err := m.proposals.GetDealProposal(dealID) | ||
if err != nil { | ||
// Possible for the proposal to be missing if it's expired (but the deal is still in a precommit that's yet to be cleaned up) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice let's make sure we test this
return cid.Undef, err | ||
} | ||
|
||
outSectorsSnapshotCid, err := cache.Load(SectorsAmtKey(inDeadline.SectorsSnapshot), func() (cid.Cid, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are doing diffing in the SectorInfo context it probably makes sense to do diffing here too. The function inside load should then be the extracted function used above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to be more clear I'm suggesting diffing against MinerPrevSectorsInKey(minerAddr)
. We're never going to be more than 1 Day plus Wait time between premigrations of diffs.
If things are still taking a long time and you are feeling experimental you could introduce deadline index / address keys to do diffing against previously cached deadline sectors snapshots but I guess that this won't be a good complexity performance tradeoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little bit more complicated than that. The diffing for the Sectors AMT is based on snapshotting (for each miner) the input and output tree at the time of the premigration.
It's unclear what the equivalent to do here would be. We could snapshot each deadline for each miner, but that feels like a lot. We could also just use the input/output of the migration of the "real" Sectors AMT (which we perform right before we start migrating deadlines) as the basis of the diff for the sectors snapshots in each of the deadlines. Realistically, that's as good a baseline for the snapshotted AMTs.
My thought was that all but 3-4 of the deadlines per miner will be perfectly cached by the premigration anyway (since their SectorsSnapshots can't change in the 2 hours between the premigration and the migration). Further, if they do change, there's a good chance the snapshot winds up being the same as the "real" Sectors AMT, and so are found in the cache from the previous step.
All that to say, I agree that some caching is possible here, but there's some open questions about the best way to do it. Some part of me is wary of introducing more caching than necessary lest doing so introduce bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just use the input/output of the migration of the "real" Sectors AMT (which we perform right before we start migrating deadlines) as the basis of the diff for the sectors snapshots in each of the deadlines. Realistically, that's as good a baseline for the snapshotted AMTs.
Yeah I think this is what I was trying to suggest with " diffing against MinerPrevSectorsInKey(minerAddr)" thought I'm missing some details. Good point that "all but 3-4 of the deadlines per miner will be perfectly cached by the premigration anyway" so doing more caching is definitely not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless migration is already coming in under a blocktime we need to do diffing here. Cost savings are on the order of the entire miner migration * the fraction of miners that add or compact one or more sectors during premigration period. It's ok to leave for a follow up if you want to measure first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd prefer to leave that to a followup if that works for you -- see #69.
manifest/manifest.go
Outdated
func GetBuiltinActorsKeys() []string { | ||
keys := []string{ | ||
AccountKey, | ||
CronKey, | ||
DataCapKey, | ||
InitKey, | ||
MarketKey, | ||
MinerKey, | ||
MultisigKey, | ||
PaychKey, | ||
PowerKey, | ||
RewardKey, | ||
SystemKey, | ||
VerifregKey, | ||
} | ||
return keys | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a version of this in chain/actors/manifest.go
. I think this is a better location for it, but it still should take actors version as an input.
8dc677b
to
60ff234
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things before looking at top level
return cid.Undef, err | ||
} | ||
|
||
outSectorsSnapshotCid, err := cache.Load(SectorsAmtKey(inDeadline.SectorsSnapshot), func() (cid.Cid, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless migration is already coming in under a blocktime we need to do diffing here. Cost savings are on the order of the entire miner migration * the fraction of miners that add or compact one or more sectors during premigration period. It's ok to leave for a follow up if you want to measure first.
builtin/v9/migration/verifreg.go
Outdated
Data: proposal.PieceCID, | ||
Size: proposal.PieceSize, | ||
TermMin: proposal.Duration(), | ||
TermMax: market9.DealMaxDuration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with market policy:
min(
alloc_term_min + policy.market_default_allocation_term_buffer,
policy.maximum_verified_allocation_term,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I implemented this based on the FIP https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0045.md#migration. Not sure what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline I said its ok to keep as is. One small change: I think you should set TermMax to DealMaxDuration + MarketDefaultAllocationTermBuffer. This way deals with maximal deal lifetime get the same TermMax as they would from market code.
builtin/v9/migration/verifreg.go
Outdated
TermMin: proposal.Duration(), | ||
TermMax: market9.DealMaxDuration, | ||
// TODO: priorEpoch + 1??? | ||
Expiration: verifreg9.MaximumVerifiedAllocationExpiration + priorEpoch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with market policy:
min(deal.proposal.start_epoch, prior_epoch + policy.maximum_verified_allocation_expiration)
prior_epoch should be fine because the deal was made latest during the epoch before the migration.
|
||
clientAllocationMap, ok := allocationsMapMap[clientIDAddress] | ||
if !ok { | ||
clientAllocationMap, err = adt9.AsMap(adtStore, emptyMapCid, builtin.DefaultHamtBitwidth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this adt.Map ever gets added to the allocationsMapMap golang map (https://go.dev/play/p/ouAvN7Ea813) causing all but the last iterated verified deal from client to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GOOD catch, thank you.
* Add accessors for allocation HAMT in verifreg actor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this merging once correctness is addressed and defer performance to later. However I would prefer getting at least the deferred migrations in parallel with the rest before merge since this looks like a great opportunity and in need of testing asap.
builtin/v9/migration/top.go
Outdated
log.Log(rt.INFO, "All %d done after %v (%.0f/s). Flushing state tree root.", doneCount, elapsed, rate) | ||
log.Log(rt.INFO, "All %d done after %v (%.0f/s). Starting deferred migrations.", doneCount, elapsed, rate) | ||
|
||
// Fetch actor states needed for deferred migrations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can get away with this it is nice and easy. However if we find that this is a bottleneck in time we will want to refactor this to run in parallel with the miner migrations. If I remember correctly most migration workloads spend bulk of time blocking on blockstore io. If that's the case then doing these jobs alongside others could basically remove the entire wait time of the deferred migrations.
I think the simplest way to do this is to add a one off go routine in parallel that skips the job and result channel stuff with this code inside it and then block on grp.Wait() && waiting on this goroutine.
I guess that this is the most useful place to focus on speeding up the migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For threadsafety without hacking into the main wait group logic you can read inputs from actorsIn before spawning goroutine and write to actorsOut after all other migrations have finished.
* Add token actor default bitwitdth * methods to return map of all allocation or claims for an actor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed primarily for correctness of the state migrations themselves. I'm pretty far from the mechanics of how they are executed now and did not dive deep there.
@@ -165,3 +179,265 @@ func (m minerMigrator) migrateState(ctx context.Context, store cbor.IpldStore, i | |||
newHead: newHead, | |||
}, err | |||
} | |||
|
|||
func (m minerMigrator) migratePrecommits(ctx context.Context, wrappedStore adt8.Store, inRoot cid.Cid) (cid.Cid, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @Kubuxu it's worth you casting an eye over this method
if uint64(allocation.Client) != clientId { | ||
return nil, false, xerrors.Errorf("clientId: %d did not match client in allocation: %d", clientId, allocation.Client) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the state somehow ended up like this, I don't think this is the right place to catch it. This does not mirror the Rust code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to leave it -- I think it's reasonable for the state layer to know that if it's asked to find a claim for a specific provider, then the returned claim should match that provider.
No harm done in the sanity check, I think.
if uint64(claim.Provider) != providerId { | ||
return nil, false, xerrors.Errorf("providerId: %d did not match provider in claim: %d", providerId, claim.Provider) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Less smarts in the state layer.
if err = clientAllocationMap.Put(nextAllocationID, &verifreg9.Allocation{ | ||
Client: abi.ActorID(clientIDu64), | ||
Provider: abi.ActorID(providerIDu64), | ||
Data: proposal.PieceCID, | ||
Size: proposal.PieceSize, | ||
TermMin: proposal.Duration(), | ||
TermMax: market9.DealMaxDuration, | ||
Expiration: verifreg9.MaximumVerifiedAllocationExpiration + priorEpoch, | ||
TermMax: market9.DealMaxDuration + market9.MarketDefaultAllocationTermBuffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want proposal.Duration() + market9.MarketDefaultAllocationTermBuffer
Closes #69
The sequence of steps is:
Miner Migration
The minerMigrator performs the following migrations:
Beneficary
to theOwner
, and sets empty values forBeneficiaryTerm and PendingBeneficiaryTerm
SectorPreCommitOnChainInfo
inPreCommitedSectors
, calculates the unsealed CID (assuming there are deals)SimpleQAPower = (DealWeight == 0 && VerifiedDealWeight == 0)
SectorOnChainInfo
inSectorsSnapshot
, setSimpleQAPower = (DealWeight == 0 && VerifiedDealWeight == 0)
The migration has the following layers of caching:
ActorHead
level -- if a matching(address, Head)
tuple is found in the cache, use the cached valueSectorsAmt
level -- if a matchingSectors
CID is found in the cache, use the cached valueSectors
AMT, as well as theSectorsSnapshot
AMTs in each deadlineFurther caching is possible at the following levels;
PreCommits
level -- if a matchingPreCommittedSectors
CID is found in the cache, use the cached valuePreCommit
level -- if a matchingSectorPreCommitOnChainInfo
CID is found in the cache, use the cached valueDataCap "migration"
The
DataCap
State is set based on the Verified Registry'sVerifiedClients
map. For each client:TokenState
allowances
map, with infinite allowance and the Market actor as operatorVerifiedRegistry & Market migration
As in #69