Apply block updates to split shards #4847

mzhangmzz · 2021-09-20T17:10:53Z

This PR makes block updates and catchups also update split states for the next epoch.

The algorithm works like follows.
There are two possibilities:

States for next epoch are not ready when a block is processed. In this case, apply_chunks_preprocessing will be called first with ApplyChunksMode::NotCaughtUp and then ApplyChunksMode::CatchingUp. With NotCaughtUp, if a shard will be split
into multiple shards in the next epoch and the validator cares about one of the split shards, the validator stores the state changes in the database through a new column ConsolidatedStateChanges. Later, when catching up blocks, apply_chunks_preprocessing with CatchingUp will read the stored state changes and process them.
Note: we cannot use the existing stored state_changes or trie_changes for updating split states. trie_changes are updates on trie nodes and trie structure of the old and new states are different. The existing state_changes do not include updates on internal states such as postponed receipts, delayed receipts, etc..
States for next epoch are ready. In this case, apply_chunks_preprocessing is only called once with ApplyChunksMode::CaughtUp. apply_transactions can update the states for shard in this epoch and the split shards in next epoch together

frol

I read through the PR and realized that there is nothing that I can efficiently review here without spending too much time on it. I defer my review vote to other reviewers who understand the problem space and the code base around those places much better than I do. Ping me if my vote blocks you from landing this PR.

mzhangmzz · 2021-09-21T00:41:35Z

I read through the PR and realized that there is nothing that I can efficiently review here without spending too much time on it. I defer my review vote to other reviewers who understand the problem space and the code base around those places much better than I do. Ping me if my vote blocks you from landing this PR.

Thanks @frol!

bowenwang1996

Generally looks good! Though I still could not fully follow all the logic and it would be better if there is some sanity test :)

bowenwang1996 · 2021-09-21T02:56:59Z

chain/chain/src/chain.rs

+/// CatchingUp is for when apply_chunks is called through catchup_blocks, this is to catch up the
+/// shard states for the next epoch


Maybe make this more explicit in the name of the variant?

Do you have any suggestions? I am not sure which area you want it to be more explicit

Something like StateCatchingUp? I am also fine as is

bowenwang1996 · 2021-09-21T02:59:59Z

chain/chain/src/chain.rs

+            let chunk_extra = ChunkExtra::new(&state_root, CryptoHash::default(), vec![], 0, 0, 0);
+            chain_update.chain_store_update.save_chunk_extra(&prev_hash, &shard_uid, chunk_extra);


Could we add a comment explaining why chunk_extra here is initialized the way it is? We also probably should fix such code in the future :)

yeah right now I'm just using ChunkExtra to store the state_root and not using any other field in chunk_extra.

chain/chain/src/chain.rs

bowenwang1996 · 2021-09-21T03:37:06Z

nearcore/src/runtime/mod.rs

+        let next_shard_layout = {
+            let next_block_epoch_id = if self.is_next_block_epoch_start(prev_block_hash)? {
+                self.get_next_epoch_id_from_prev_block(prev_block_hash)?
+            } else {
+                self.get_epoch_id_from_prev_block(prev_block_hash)?
+            };
+            self.get_shard_layout(&next_block_epoch_id)?
        };
+        let shard_uid = self.get_shard_uid_from_prev_hash(shard_id, prev_block_hash)?;
+        let (consolidated_state_changes, split_state_apply_results) =
+            if self.will_shard_layout_change(prev_block_hash)? {
+                let consolidated_state_changes =
+                    ConsolidatedStateChanges::from_raw_state_changes(&apply_result.state_changes);
+                // split states are ready, apply update to them now
+                if let Some(state_roots) = split_state_roots {
+                    let split_state_results = Some(self.apply_update_to_split_states(
+                        block_hash,
+                        shard_uid.clone(),
+                        state_root,
+                        state_roots,
+                        &next_shard_layout,
+                        consolidated_state_changes,
+                    )?);
+                    (None, split_state_results)
+                } else {
+                    // split states are not ready yet, store state changes in consolidated_state_changes
+                    (Some(consolidated_state_changes), None)
+                }
+            } else {
+                (None, None)
+            };


It might make sense to refactor this out into its own function

I'm debating whether to move this part outside of runtime and to chain, I'll leave it as it is right now and do the refactoring at the end

chain/chain/src/chain.rs

core/store/src/trie/split_state.rs

tools/restored-receipts-verifier/src/main.rs

chain/epoch_manager/src/lib.rs

nearcore/src/migrations.rs

bowenwang1996 · 2021-09-21T23:32:03Z

chain/chain/src/chain.rs

+/// CatchingUp is for when apply_chunks is called through catchup_blocks, this is to catch up the
+/// shard states for the next epoch


Something like StateCatchingUp? I am also fine as is

bowenwang1996 · 2021-09-21T23:34:25Z

chain/chain/src/chain.rs

+            // We want to guarantee that transactions are only applied once for each shard, even
+            // though apply_chunks may be called twice, once with ApplyChunksMode::NotCaughtUp
+            // once with ApplyChunksMode::CatchingUp
+            // Note that this variable does not guard whether we split states or not, see the comments
+            // before `need_to_split_state`


This comment is very informative!

chain/chain/src/chain.rs

core/store/src/db.rs

nearcore/src/runtime/mod.rs

EgorKulikov · 2021-09-22T10:39:59Z

core/primitives/src/shard_layout.rs

@@ -128,6 +136,13 @@ pub fn account_id_to_shard_id(account_id: &AccountId, shard_layout: &ShardLayout
    }
 }

+pub fn account_id_to_shard_uid(account_id: &AccountId, shard_layout: &ShardLayout) -> ShardUId {


I wonder why this is not a method on ShardLayout, seems natural

That's a very good point. I think I just extended from the original account_id_to_shard_id function and didn't think too much.

chain/chain/src/store.rs

EgorKulikov

LGTM
My comments are mostly nits, but please make sure there is no problem with db as per @bowenwang1996 comment
Great work

matklad

(adding ✔️ review to unblock: Rust code looks great to me, but I can't vouch for correctness :)

mzhangmzz · 2021-09-22T15:46:01Z

LGTM
My comments are mostly nits, but please make sure there is no problem with db as per @bowenwang1996 comment
Great work

Thanks @EgorKulikov!

mzhangmzz · 2021-09-22T15:46:47Z

(adding ✔️ review to unblock: Rust code looks great to me, but I can't vouch for correctness :)

Thanks @matklad !

Longarithm

Also some nits

Longarithm · 2021-09-22T17:10:23Z

core/store/src/trie/split_state.rs

+        }
+        Ok(trie_changes_map)
+    }
+
    /// Apply `changes` to build states for new shards
    /// `state_roots` contains state roots for the new shards
    /// The caller must guarantee that `state_roots` contains all shard_ids
    /// that `key_to_shard_id` that may return
    /// Ignore changes on DelayedReceipts or DelayedReceiptsIndices
    /// Update `store_update` and return new state_roots
    /// used for building states for new shards in resharding


This comment is helpful, but I feel that it is not up to date:

/// Apply `changes` to build states for new shards

Does it refer to apply_state_changes_to_split_states above?

/// Update `store_update` and return new state_roots /// used for building states for new shards in resharding

Should it be "Return store_update and new state_roots"?

Good catch, I have updated the comments.

Thanks for pointing out! I think it is confusing that apply_state_changes_to_split_states and add_values_to_split_states are kind of similar. I have added comments for apply_state_changes_to_split_states as well. Hopefully that makes it clearer

Longarithm · 2021-09-22T17:16:39Z

core/store/src/trie/split_state.rs

+        let new_shard_uid: ShardUId = account_id_to_shard_id(&receipt.receiver_id);
+        if !trie_updates.contains_key(&new_shard_uid) {
+            let err = format!(
+                "Account {} is in new shard {:?} but state_roots only contains {:?}",


Suggested change

"Account {} is in new shard {:?} but state_roots only contains {:?}",

"Account {:?} is in new shard {:?} but state_roots only contains {:?}",

It's interesting that package compiles successfully but CLion claims that AccountId doesn't implement Display. The same applies to line 223 above.

hmm I'm not sure why, it works in my CLion

Oh, I had to update my version of CLion and then it worked.

bowenwang1996

Epic! I particularly like that there are ample comments where the logic is complex and this makes reviewing the code much easier. Great work 🚀

bowenwang1996 · 2021-09-22T17:30:29Z

chain/chain/src/store.rs

+        // We should not remove state changes for the same chunk twice
+        assert!(self.remove_state_changes_for_split_states.insert((block_hash, shard_id)));


The style here is not consistent with the function above :) I personally prefer assertions to not have side effects

+1 while assert! is guaranteed to be always on in Rust, and it's OK to, eg, rely on that in unsafe code, splitting side-effectful part of it into a separate statement makes the code more obviously clear.

This PR makes block updates and catchups also update split states for the next epoch. The algorithm works like follows. There are two possibilities: 1) States for next epoch are not ready when a block is processed. In this case, `apply_chunks_preprocessing` will be called first with `ApplyChunksMode::NotCaughtUp` and then `ApplyChunksMode::CatchingUp`. With `NotCaughtUp`, if a shard will be split into multiple shards in the next epoch and the validator cares about one of the split shards, the validator stores the state changes in the database through a new column `ConsolidatedStateChanges`. Later, when catching up blocks, `apply_chunks_preprocessing` with `CatchingUp` will read the stored state changes and process them. Note: we cannot use the existing stored `state_changes` or `trie_changes` for updating split states. `trie_changes` are updates on trie nodes and trie structure of the old and new states are different. The existing `state_changes` do not include updates on internal states such as postponed receipts, delayed receipts, etc.. 2) States for next epoch are ready. In this case, `apply_chunks_preprocessing` is only called once with `ApplyChunksMode::CaughtUp`. `apply_transactions` can update the states for shard in this epoch and the split shards in next epoch together

mzhangmzz requested review from bowenwang1996, frol, Longarithm, matklad and nikurt as code owners September 20, 2021 17:10

mzhangmzz marked this pull request as draft September 20, 2021 17:11

mzhangmzz marked this pull request as ready for review September 20, 2021 19:51

frol reviewed Sep 20, 2021

View reviewed changes

first commit

c224850

mzhangmzz force-pushed the shard branch from 6e2cda6 to c224850 Compare September 21, 2021 00:41

bowenwang1996 reviewed Sep 21, 2021

View reviewed changes

matklad reviewed Sep 21, 2021

View reviewed changes

This was referenced Sep 21, 2021

Refactor code duplication when creating NightshadeRuntime::new #4745

Closed

Consider using RuntimeTester to test runtime #4854

Open

nikurt reviewed Sep 21, 2021

View reviewed changes

chain/epoch_manager/src/lib.rs Outdated Show resolved Hide resolved

nikurt reviewed Sep 21, 2021

View reviewed changes

nearcore/src/migrations.rs Show resolved Hide resolved

nikurt approved these changes Sep 21, 2021

View reviewed changes

Min Zhang added 4 commits September 21, 2021 14:53

add test and store deleted delayed receipts in ConsolidatedStateChanges

46ca90b

address some comments

f713837

address comments

7628029

refactor some code

4bb7251

mzhangmzz requested review from bowenwang1996 and matklad September 21, 2021 21:46

bowenwang1996 reviewed Sep 21, 2021

View reviewed changes

reorganize code logic

62fa2bd

mzhangmzz requested a review from bowenwang1996 September 22, 2021 04:30

more comments

e41878c

EgorKulikov reviewed Sep 22, 2021

View reviewed changes

nearcore/src/runtime/mod.rs Outdated Show resolved Hide resolved

EgorKulikov reviewed Sep 22, 2021

View reviewed changes

chain/chain/src/store.rs Show resolved Hide resolved

EgorKulikov approved these changes Sep 22, 2021

View reviewed changes

matklad approved these changes Sep 22, 2021

View reviewed changes

Min Zhang and others added 4 commits September 22, 2021 10:27

add db migration

3db84a3

Egor comments

f9e8597

fix db migrations

05293ca

Merge branch 'master' into shard

28647cc

update comments

de1924c

Longarithm approved these changes Sep 22, 2021

View reviewed changes

bowenwang1996 approved these changes Sep 22, 2021

View reviewed changes

address comments

83401fe

mzhangmzz added the S-automerge label Sep 22, 2021

near-bulldozer bot merged commit 8f90b92 into master Sep 22, 2021

near-bulldozer bot deleted the shard branch September 22, 2021 21:12

gmilescu mentioned this pull request Nov 8, 2022

Consider using RuntimeTester to test runtime #9957

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply block updates to split shards #4847

Apply block updates to split shards #4847

mzhangmzz commented Sep 20, 2021 •

edited

Loading

frol left a comment •

edited

Loading

mzhangmzz commented Sep 21, 2021

bowenwang1996 left a comment

bowenwang1996 Sep 21, 2021

mzhangmzz Sep 21, 2021

bowenwang1996 Sep 21, 2021

bowenwang1996 Sep 21, 2021

mzhangmzz Sep 21, 2021 •

edited

Loading

bowenwang1996 Sep 21, 2021

mzhangmzz Sep 21, 2021

bowenwang1996 Sep 21, 2021

bowenwang1996 Sep 21, 2021

EgorKulikov Sep 22, 2021

mzhangmzz Sep 22, 2021

EgorKulikov left a comment

matklad left a comment

mzhangmzz commented Sep 22, 2021 •

edited

Loading

mzhangmzz commented Sep 22, 2021 •

edited

Loading

Longarithm left a comment

Longarithm Sep 22, 2021

mzhangmzz Sep 22, 2021

Longarithm Sep 22, 2021 •

edited

Loading

mzhangmzz Sep 22, 2021

Longarithm Sep 22, 2021

bowenwang1996 left a comment

bowenwang1996 Sep 22, 2021

mzhangmzz Sep 22, 2021

matklad Sep 23, 2021

		/// CatchingUp is for when apply_chunks is called through catchup_blocks, this is to catch up the
		/// shard states for the next epoch

		let chunk_extra = ChunkExtra::new(&state_root, CryptoHash::default(), vec![], 0, 0, 0);
		chain_update.chain_store_update.save_chunk_extra(&prev_hash, &shard_uid, chunk_extra);

	"Account {} is in new shard {:?} but state_roots only contains {:?}",
	"Account {:?} is in new shard {:?} but state_roots only contains {:?}",

		// We should not remove state changes for the same chunk twice
		assert!(self.remove_state_changes_for_split_states.insert((block_hash, shard_id)));

Apply block updates to split shards #4847

Apply block updates to split shards #4847

Conversation

mzhangmzz commented Sep 20, 2021 • edited Loading

frol left a comment • edited Loading

Choose a reason for hiding this comment

mzhangmzz commented Sep 21, 2021

bowenwang1996 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzhangmzz Sep 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorKulikov left a comment

Choose a reason for hiding this comment

matklad left a comment

Choose a reason for hiding this comment

mzhangmzz commented Sep 22, 2021 • edited Loading

mzhangmzz commented Sep 22, 2021 • edited Loading

Longarithm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Longarithm Sep 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bowenwang1996 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzhangmzz commented Sep 20, 2021 •

edited

Loading

frol left a comment •

edited

Loading

mzhangmzz Sep 21, 2021 •

edited

Loading

mzhangmzz commented Sep 22, 2021 •

edited

Loading

mzhangmzz commented Sep 22, 2021 •

edited

Loading

Longarithm Sep 22, 2021 •

edited

Loading