Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply block updates to split shards #4847

Merged
merged 13 commits into from
Sep 22, 2021
Merged

Apply block updates to split shards #4847

merged 13 commits into from
Sep 22, 2021

Conversation

mzhangmzz
Copy link
Contributor

@mzhangmzz mzhangmzz commented Sep 20, 2021

This PR makes block updates and catchups also update split states for the next epoch.

The algorithm works like follows.
There are two possibilities:

  1. States for next epoch are not ready when a block is processed. In this case, apply_chunks_preprocessing will be called first with ApplyChunksMode::NotCaughtUp and then ApplyChunksMode::CatchingUp. With NotCaughtUp, if a shard will be split
    into multiple shards in the next epoch and the validator cares about one of the split shards, the validator stores the state changes in the database through a new column ConsolidatedStateChanges. Later, when catching up blocks, apply_chunks_preprocessing with CatchingUp will read the stored state changes and process them.
    Note: we cannot use the existing stored state_changes or trie_changes for updating split states. trie_changes are updates on trie nodes and trie structure of the old and new states are different. The existing state_changes do not include updates on internal states such as postponed receipts, delayed receipts, etc..
  2. States for next epoch are ready. In this case, apply_chunks_preprocessing is only called once with ApplyChunksMode::CaughtUp. apply_transactions can update the states for shard in this epoch and the split shards in next epoch together

@mzhangmzz mzhangmzz marked this pull request as draft September 20, 2021 17:11
@mzhangmzz mzhangmzz marked this pull request as ready for review September 20, 2021 19:51
Copy link
Collaborator

@frol frol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through the PR and realized that there is nothing that I can efficiently review here without spending too much time on it. I defer my review vote to other reviewers who understand the problem space and the code base around those places much better than I do. Ping me if my vote blocks you from landing this PR.

@mzhangmzz
Copy link
Contributor Author

I read through the PR and realized that there is nothing that I can efficiently review here without spending too much time on it. I defer my review vote to other reviewers who understand the problem space and the code base around those places much better than I do. Ping me if my vote blocks you from landing this PR.

Thanks @frol!

Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good! Though I still could not fully follow all the logic and it would be better if there is some sanity test :)

Comment on lines +96 to +97
/// CatchingUp is for when apply_chunks is called through catchup_blocks, this is to catch up the
/// shard states for the next epoch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this more explicit in the name of the variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions? I am not sure which area you want it to be more explicit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like StateCatchingUp? I am also fine as is

Comment on lines 1879 to 1880
let chunk_extra = ChunkExtra::new(&state_root, CryptoHash::default(), vec![], 0, 0, 0);
chain_update.chain_store_update.save_chunk_extra(&prev_hash, &shard_uid, chunk_extra);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment explaining why chunk_extra here is initialized the way it is? We also probably should fix such code in the future :)

Copy link
Contributor Author

@mzhangmzz mzhangmzz Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah right now I'm just using ChunkExtra to store the state_root and not using any other field in chunk_extra.

chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
Comment on lines 537 to 567
let next_shard_layout = {
let next_block_epoch_id = if self.is_next_block_epoch_start(prev_block_hash)? {
self.get_next_epoch_id_from_prev_block(prev_block_hash)?
} else {
self.get_epoch_id_from_prev_block(prev_block_hash)?
};
self.get_shard_layout(&next_block_epoch_id)?
};
let shard_uid = self.get_shard_uid_from_prev_hash(shard_id, prev_block_hash)?;
let (consolidated_state_changes, split_state_apply_results) =
if self.will_shard_layout_change(prev_block_hash)? {
let consolidated_state_changes =
ConsolidatedStateChanges::from_raw_state_changes(&apply_result.state_changes);
// split states are ready, apply update to them now
if let Some(state_roots) = split_state_roots {
let split_state_results = Some(self.apply_update_to_split_states(
block_hash,
shard_uid.clone(),
state_root,
state_roots,
&next_shard_layout,
consolidated_state_changes,
)?);
(None, split_state_results)
} else {
// split states are not ready yet, store state changes in consolidated_state_changes
(Some(consolidated_state_changes), None)
}
} else {
(None, None)
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to refactor this out into its own function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating whether to move this part outside of runtime and to chain, I'll leave it as it is right now and do the refactoring at the end

chain/chain/src/chain.rs Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
chain/chain/src/chain.rs Outdated Show resolved Hide resolved
core/store/src/trie/split_state.rs Outdated Show resolved Hide resolved
core/store/src/trie/split_state.rs Outdated Show resolved Hide resolved
core/store/src/trie/split_state.rs Outdated Show resolved Hide resolved
core/store/src/trie/split_state.rs Outdated Show resolved Hide resolved
tools/restored-receipts-verifier/src/main.rs Show resolved Hide resolved
Comment on lines +96 to +97
/// CatchingUp is for when apply_chunks is called through catchup_blocks, this is to catch up the
/// shard states for the next epoch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like StateCatchingUp? I am also fine as is

Comment on lines +2951 to +2955
// We want to guarantee that transactions are only applied once for each shard, even
// though apply_chunks may be called twice, once with ApplyChunksMode::NotCaughtUp
// once with ApplyChunksMode::CatchingUp
// Note that this variable does not guard whether we split states or not, see the comments
// before `need_to_split_state`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is very informative!

chain/chain/src/chain.rs Outdated Show resolved Hide resolved
core/store/src/db.rs Show resolved Hide resolved
@@ -128,6 +136,13 @@ pub fn account_id_to_shard_id(account_id: &AccountId, shard_layout: &ShardLayout
}
}

pub fn account_id_to_shard_uid(account_id: &AccountId, shard_layout: &ShardLayout) -> ShardUId {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this is not a method on ShardLayout, seems natural

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point. I think I just extended from the original account_id_to_shard_id function and didn't think too much.

Copy link
Contributor

@EgorKulikov EgorKulikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
My comments are mostly nits, but please make sure there is no problem with db as per @bowenwang1996 comment
Great work

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(adding ✔️ review to unblock: Rust code looks great to me, but I can't vouch for correctness :)

@mzhangmzz
Copy link
Contributor Author

mzhangmzz commented Sep 22, 2021

LGTM
My comments are mostly nits, but please make sure there is no problem with db as per @bowenwang1996 comment
Great work

Thanks @EgorKulikov!

@mzhangmzz
Copy link
Contributor Author

mzhangmzz commented Sep 22, 2021

(adding ✔️ review to unblock: Rust code looks great to me, but I can't vouch for correctness :)

Thanks @matklad !

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some nits

}
Ok(trie_changes_map)
}

/// Apply `changes` to build states for new shards
/// `state_roots` contains state roots for the new shards
/// The caller must guarantee that `state_roots` contains all shard_ids
/// that `key_to_shard_id` that may return
/// Ignore changes on DelayedReceipts or DelayedReceiptsIndices
/// Update `store_update` and return new state_roots
/// used for building states for new shards in resharding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is helpful, but I feel that it is not up to date:

/// Apply `changes` to build states for new shards

Does it refer to apply_state_changes_to_split_states above?

/// Update `store_update` and return new state_roots
/// used for building states for new shards in resharding

Should it be "Return store_update and new state_roots"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I have updated the comments.

Thanks for pointing out! I think it is confusing that apply_state_changes_to_split_states and add_values_to_split_states are kind of similar. I have added comments for apply_state_changes_to_split_states as well. Hopefully that makes it clearer

let new_shard_uid: ShardUId = account_id_to_shard_id(&receipt.receiver_id);
if !trie_updates.contains_key(&new_shard_uid) {
let err = format!(
"Account {} is in new shard {:?} but state_roots only contains {:?}",
Copy link
Member

@Longarithm Longarithm Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Account {} is in new shard {:?} but state_roots only contains {:?}",
"Account {:?} is in new shard {:?} but state_roots only contains {:?}",

It's interesting that package compiles successfully but CLion claims that AccountId doesn't implement Display. The same applies to line 223 above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I'm not sure why, it works in my CLion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I had to update my version of CLion and then it worked.

Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic! I particularly like that there are ample comments where the logic is complex and this makes reviewing the code much easier. Great work 🚀

Comment on lines 1900 to 1901
// We should not remove state changes for the same chunk twice
assert!(self.remove_state_changes_for_split_states.insert((block_hash, shard_id)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style here is not consistent with the function above :) I personally prefer assertions to not have side effects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 while assert! is guaranteed to be always on in Rust, and it's OK to, eg, rely on that in unsafe code, splitting side-effectful part of it into a separate statement makes the code more obviously clear.

@near-bulldozer near-bulldozer bot merged commit 8f90b92 into master Sep 22, 2021
@near-bulldozer near-bulldozer bot deleted the shard branch September 22, 2021 21:12
nikurt pushed a commit that referenced this pull request Sep 30, 2021
This PR makes block updates and catchups also update split states for the next epoch.

The algorithm works like follows. 
There are two possibilities:
1) States for next epoch are not ready when a block is processed. In this case, `apply_chunks_preprocessing` will be called first with `ApplyChunksMode::NotCaughtUp` and then `ApplyChunksMode::CatchingUp`. With `NotCaughtUp`, if a shard will be split
into multiple shards in the next epoch and the validator cares about one of the split shards, the validator stores the state changes in the database through a new column `ConsolidatedStateChanges`. Later, when catching up blocks, `apply_chunks_preprocessing` with `CatchingUp` will read the stored state changes and process them.
Note: we cannot use the existing stored `state_changes` or `trie_changes` for updating split states. `trie_changes` are updates on trie nodes and trie structure of the old and new states are different.  The existing `state_changes` do not include updates on internal states such as postponed receipts, delayed receipts, etc..
2) States for next epoch are ready. In this case, `apply_chunks_preprocessing` is only called once with `ApplyChunksMode::CaughtUp`. `apply_transactions` can update the states for shard in this epoch and the split shards in next epoch together
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants