-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply block updates to split shards #4847
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through the PR and realized that there is nothing that I can efficiently review here without spending too much time on it. I defer my review vote to other reviewers who understand the problem space and the code base around those places much better than I do. Ping me if my vote blocks you from landing this PR.
Thanks @frol! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good! Though I still could not fully follow all the logic and it would be better if there is some sanity test :)
/// CatchingUp is for when apply_chunks is called through catchup_blocks, this is to catch up the | ||
/// shard states for the next epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this more explicit in the name of the variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggestions? I am not sure which area you want it to be more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like StateCatchingUp
? I am also fine as is
chain/chain/src/chain.rs
Outdated
let chunk_extra = ChunkExtra::new(&state_root, CryptoHash::default(), vec![], 0, 0, 0); | ||
chain_update.chain_store_update.save_chunk_extra(&prev_hash, &shard_uid, chunk_extra); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment explaining why chunk_extra
here is initialized the way it is? We also probably should fix such code in the future :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah right now I'm just using ChunkExtra to store the state_root and not using any other field in chunk_extra.
nearcore/src/runtime/mod.rs
Outdated
let next_shard_layout = { | ||
let next_block_epoch_id = if self.is_next_block_epoch_start(prev_block_hash)? { | ||
self.get_next_epoch_id_from_prev_block(prev_block_hash)? | ||
} else { | ||
self.get_epoch_id_from_prev_block(prev_block_hash)? | ||
}; | ||
self.get_shard_layout(&next_block_epoch_id)? | ||
}; | ||
let shard_uid = self.get_shard_uid_from_prev_hash(shard_id, prev_block_hash)?; | ||
let (consolidated_state_changes, split_state_apply_results) = | ||
if self.will_shard_layout_change(prev_block_hash)? { | ||
let consolidated_state_changes = | ||
ConsolidatedStateChanges::from_raw_state_changes(&apply_result.state_changes); | ||
// split states are ready, apply update to them now | ||
if let Some(state_roots) = split_state_roots { | ||
let split_state_results = Some(self.apply_update_to_split_states( | ||
block_hash, | ||
shard_uid.clone(), | ||
state_root, | ||
state_roots, | ||
&next_shard_layout, | ||
consolidated_state_changes, | ||
)?); | ||
(None, split_state_results) | ||
} else { | ||
// split states are not ready yet, store state changes in consolidated_state_changes | ||
(Some(consolidated_state_changes), None) | ||
} | ||
} else { | ||
(None, None) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to refactor this out into its own function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm debating whether to move this part outside of runtime and to chain, I'll leave it as it is right now and do the refactoring at the end
/// CatchingUp is for when apply_chunks is called through catchup_blocks, this is to catch up the | ||
/// shard states for the next epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like StateCatchingUp
? I am also fine as is
// We want to guarantee that transactions are only applied once for each shard, even | ||
// though apply_chunks may be called twice, once with ApplyChunksMode::NotCaughtUp | ||
// once with ApplyChunksMode::CatchingUp | ||
// Note that this variable does not guard whether we split states or not, see the comments | ||
// before `need_to_split_state` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is very informative!
@@ -128,6 +136,13 @@ pub fn account_id_to_shard_id(account_id: &AccountId, shard_layout: &ShardLayout | |||
} | |||
} | |||
|
|||
pub fn account_id_to_shard_uid(account_id: &AccountId, shard_layout: &ShardLayout) -> ShardUId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this is not a method on ShardLayout, seems natural
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good point. I think I just extended from the original account_id_to_shard_id
function and didn't think too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
My comments are mostly nits, but please make sure there is no problem with db as per @bowenwang1996 comment
Great work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(adding ✔️ review to unblock: Rust code looks great to me, but I can't vouch for correctness :)
Thanks @EgorKulikov! |
Thanks @matklad ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also some nits
core/store/src/trie/split_state.rs
Outdated
} | ||
Ok(trie_changes_map) | ||
} | ||
|
||
/// Apply `changes` to build states for new shards | ||
/// `state_roots` contains state roots for the new shards | ||
/// The caller must guarantee that `state_roots` contains all shard_ids | ||
/// that `key_to_shard_id` that may return | ||
/// Ignore changes on DelayedReceipts or DelayedReceiptsIndices | ||
/// Update `store_update` and return new state_roots | ||
/// used for building states for new shards in resharding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is helpful, but I feel that it is not up to date:
/// Apply `changes` to build states for new shards
Does it refer to apply_state_changes_to_split_states
above?
/// Update `store_update` and return new state_roots
/// used for building states for new shards in resharding
Should it be "Return store_update
and new state_roots"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I have updated the comments.
Thanks for pointing out! I think it is confusing that apply_state_changes_to_split_states
and add_values_to_split_states
are kind of similar. I have added comments for apply_state_changes_to_split_states
as well. Hopefully that makes it clearer
let new_shard_uid: ShardUId = account_id_to_shard_id(&receipt.receiver_id); | ||
if !trie_updates.contains_key(&new_shard_uid) { | ||
let err = format!( | ||
"Account {} is in new shard {:?} but state_roots only contains {:?}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Account {} is in new shard {:?} but state_roots only contains {:?}", | |
"Account {:?} is in new shard {:?} but state_roots only contains {:?}", |
It's interesting that package compiles successfully but CLion claims that AccountId
doesn't implement Display. The same applies to line 223 above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I'm not sure why, it works in my CLion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I had to update my version of CLion and then it worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic! I particularly like that there are ample comments where the logic is complex and this makes reviewing the code much easier. Great work 🚀
chain/chain/src/store.rs
Outdated
// We should not remove state changes for the same chunk twice | ||
assert!(self.remove_state_changes_for_split_states.insert((block_hash, shard_id))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The style here is not consistent with the function above :) I personally prefer assertions to not have side effects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 while assert!
is guaranteed to be always on in Rust, and it's OK to, eg, rely on that in unsafe code, splitting side-effectful part of it into a separate statement makes the code more obviously clear.
This PR makes block updates and catchups also update split states for the next epoch. The algorithm works like follows. There are two possibilities: 1) States for next epoch are not ready when a block is processed. In this case, `apply_chunks_preprocessing` will be called first with `ApplyChunksMode::NotCaughtUp` and then `ApplyChunksMode::CatchingUp`. With `NotCaughtUp`, if a shard will be split into multiple shards in the next epoch and the validator cares about one of the split shards, the validator stores the state changes in the database through a new column `ConsolidatedStateChanges`. Later, when catching up blocks, `apply_chunks_preprocessing` with `CatchingUp` will read the stored state changes and process them. Note: we cannot use the existing stored `state_changes` or `trie_changes` for updating split states. `trie_changes` are updates on trie nodes and trie structure of the old and new states are different. The existing `state_changes` do not include updates on internal states such as postponed receipts, delayed receipts, etc.. 2) States for next epoch are ready. In this case, `apply_chunks_preprocessing` is only called once with `ApplyChunksMode::CaughtUp`. `apply_transactions` can update the states for shard in this epoch and the split shards in next epoch together
This PR makes block updates and catchups also update split states for the next epoch.
The algorithm works like follows.
There are two possibilities:
apply_chunks_preprocessing
will be called first withApplyChunksMode::NotCaughtUp
and thenApplyChunksMode::CatchingUp
. WithNotCaughtUp
, if a shard will be splitinto multiple shards in the next epoch and the validator cares about one of the split shards, the validator stores the state changes in the database through a new column
ConsolidatedStateChanges
. Later, when catching up blocks,apply_chunks_preprocessing
withCatchingUp
will read the stored state changes and process them.Note: we cannot use the existing stored
state_changes
ortrie_changes
for updating split states.trie_changes
are updates on trie nodes and trie structure of the old and new states are different. The existingstate_changes
do not include updates on internal states such as postponed receipts, delayed receipts, etc..apply_chunks_preprocessing
is only called once withApplyChunksMode::CaughtUp
.apply_transactions
can update the states for shard in this epoch and the split shards in next epoch together