feat(state-sync): DB Snapshots #9090

nikurt · 2023-05-22T18:53:59Z

FlatStorage provides a fast way to generate state parts, but it needs a consistent view of state at the beginning of the epoch.
This PR makes a snapshot of the whole DB, deletes unused columns. This gives the node a read-only database with exactly the view of Flat State that is needed for State Sync.
The snapshot is made in a separate actix Actor to avoid blocking ClientActor.

To improve the iteration speed on the development, added a testing mechanism. A config option can trigger state snapshots every N blocks. Makes no sense for state sync, but very useful to observe enough snapshot events.

Tested that a node can process blocks while snapshotting is in progress.
Without compaction snapshots can result in 100GB extra disk space requirement.

Enabling compaction reduces extra disk overhead from ~100GB to ~10GB.

Gennerate inner part of state part using flat storage using idea present in #8984. In short, if flat storage head corresponds to the state root for which we sync state, it is enough to read only boundary nodes, and inner trie part can be reconstructed using range of KV pairs from state. The main logic for that is contained in `Trie::get_trie_nodes_for_part_with_flat_storage`. It requires couple of minor changes: * now we allow creating "view" `Trie`s with flat storage as well. As before, we want to avoid creating non-view `Tries` because `TrieCache` accesses may be blocking for chunk processing * `get_head_hash` and `shard_uid` methods for `FlatStorage` allowing to make correct range query to flat storage * `FlatStateValue` moved to `primitives` to allow more general access ## TODO * prometheus metrics * integration test checking that flat storage is used during normal block processing on client (or wait for #9090) ## Testing https://nayduck.near.org/#/run/3023 Big sanity test `get_trie_nodes_for_part_with_flat_storage` covering all scenarios I could think of: * results with/without flat storage must match * result with incorrect flat storage must be an error * result with flat storage and missing intermediate node should be still okay

Longarithm · 2023-05-29T14:15:07Z

I like the code, it's very clear. But I have some general concern.

We shouldn't put state_snapshot fields in NightshadeRuntime. It is designed to be stateless and hold only logic for transaction processing. My rule of thumb is: the less methods we have in RuntimeAdapter, the better, as it is already bloated, and there was a huge effort to separate EpochManager from it. Logic for making snapshots and getting state parts don't really fit in Runtime because it is not related to txn processing. I think that having obtain_state_part inside RuntimeAdapter was a mistake, which is causing this unnatural dependency of state parts logic on Runtime.

I don't have opinion yet where this logic should be - maybe it's a separate struct like StatePartManager in Client, maybe part of StateSync. But it is more or less clear that it doesn't need to know about Runtime. As a short-term hack we could pass StateSnapshot to obtain_state_part to make the code work. Happy to discuss it here or offline.

Longarithm · 2023-05-30T12:48:40Z

Maybe this logic fits well into ShardTries, because this struct owns state store updates and provides state views. Looks like it already knows all needed context.

Gennerate inner part of state part using flat storage using idea present in #8984. In short, if flat storage head corresponds to the state root for which we sync state, it is enough to read only boundary nodes, and inner trie part can be reconstructed using range of KV pairs from state. The main logic for that is contained in `Trie::get_trie_nodes_for_part_with_flat_storage`. It requires couple of minor changes: * now we allow creating "view" `Trie`s with flat storage as well. As before, we want to avoid creating non-view `Tries` because `TrieCache` accesses may be blocking for chunk processing * `get_head_hash` and `shard_uid` methods for `FlatStorage` allowing to make correct range query to flat storage * `FlatStateValue` moved to `primitives` to allow more general access * prometheus metrics * integration test checking that flat storage is used during normal block processing on client (or wait for #9090) https://nayduck.near.org/#/run/3023 Big sanity test `get_trie_nodes_for_part_with_flat_storage` covering all scenarios I could think of: * results with/without flat storage must match * result with incorrect flat storage must be an error * result with flat storage and missing intermediate node should be still okay

nikurt · 2023-06-01T17:14:04Z

@Longarithm Please take another look.
Moved the data and logic to ShardTries. Maybe this logic needs to be in a separate class?
Also, I butchered some logic that in shard_tries.rs, please advise.

Many tests are failing because they use epoch length that is too short. It takes a couple of seconds to:

delete an old snapshot
checkpoint
delete columns
compact
And if snapshots are requested too often, it's not happy. Handled that by ignoring consecutive requests if I can't lock flat storage.

Longarithm · 2023-06-02T12:00:05Z

Looking.

Many tests are failing because they use epoch length that is too short.

I see... We can disable removing columns and compaction for tests, because state is very small there. Could it simplify changes?

nikurt · 2023-06-02T15:06:49Z

I see... We can disable removing columns and compaction for tests, because state is very small there. Could it simplify changes?

It'll help, but there are still limits to it, and I don't want to introduce more configuration flags.

nikurt · 2023-06-05T16:48:26Z

@Longarithm Please take a look

Longarithm · 2023-05-26T11:42:53Z

chain/chain/src/state_snapshot_actor.rs

+        assert_ne!(
+            flat_storage_manager.set_flat_state_updates_mode(false),
+            Some(false),
+            "Failed to lock flat state updates"
+        );


Assertion looks fine (would it even be possible to propagate error from this actor to main actor?).
But let's change set_flat_state_updates_mode return type to Result<bool, StorageError>.

Changed this to check whether set_flat_state_updates_mode() succeeded. If not, then ignore the snapshotting request.
I don't return an error, because the caller doesn't care about an error. The blocks need to be processed further. Prefer the fail-open (aka fail-safe) approach, because the snapshots are not critical to any individual node. Though they are critical to the network as a whole.

Longarithm · 2023-05-26T11:44:28Z

core/store/src/flat/manager.rs

+                Some(v) => assert_eq!(
+                    prev_shard_value, v,
+                    "All FlatStorage are expected to have the same value of `move_head_enabled`"


Similarly to comment below get_make_snapshot_callback - let's make it an error and propagate further. Looks like we can't return None from here anyway.

Please see an updated version of this function.

core/store/src/trie/shard_tries.rs

nearcore/src/runtime/mod.rs

core/store/src/trie/state_parts.rs

core/o11y/src/macros.rs

core/store/src/flat/manager.rs

core/store/src/trie/shard_tries.rs

Longarithm · 2023-06-05T19:20:33Z

integration-tests/src/tests/client/process_blocks.rs

@@ -2547,11 +2568,40 @@ fn test_catchup_gas_price_change() {
    genesis.config.min_gas_price = min_gas_price;
    genesis.config.gas_limit = 1000000000000;
    let chain_genesis = ChainGenesis::new(&genesis);
+
+    let tmp_dir1 = tempfile::tempdir().unwrap();


Why we can't reuse real_epoch_managers and nightshade_runtimes as before?

nightshade_runtimes() initializes runtime with home_dir ../../../../ where I can't create a new directory. Therefore, I create a runtime manually.
real_epoch_managers() by itself is fine, but if I use it, I don't have an epoch manager to pass to the runtime I'm initializing manually.

integration-tests/src/tests/client/process_blocks.rs

…ng of an epoch.

nikurt · 2023-06-19T19:55:47Z

Nayduck tests are fine, but a few tests fail occasionally due to reasons unrelated to this PR.

This reverts commit a5ede1d.

This was added in near#9090 to provide a way to reduce the size of snapshots, as the commit message said. But that's not needed anymore when we just drop the unneeded column families and have a smaller snapshot to begin with

…pshots (#12589) `test_resharding_v3_shard_shuffling_slower_post_processing_tasks` exposes a bug that can be triggered if child flat storages are not split after a resharding by the time we want to take a state snapshot. Then the state snapshot code will fail because the flat storage is not ready, but will not retry. To fix it, we add a `want_snapshot` field that will be set when we decide to take a state snapshot. We also add a `split_in_progress` field to the `FlatStorageManager` that will be set to `true` when a resharding is started, and back to false when it's finished and the catchup code has progressed to a height close to the desired snapshot height. The state snapshot code will wait until `split_in_progress` is false to proceed, and the flat storage catchup code will wait until `want_snapshot` is cleared if it has already advanced to the desired snapshot hash, so that we don't advance past the point that was wanted by the state snapshot. The first one is the one actually causing the test failure, but the second one is also required. We implement this waiting by rescheduling the message sends in the future. A Condvar would be a very natural choice, but it unfortunately doesn't seem to work in testloop, since actors that are normally running on different threads are put on the same thread, and a blocker on a Condvar won't be woken up. Here we are making a change to the behavior of the old `set_flat_state_updates_mode()`, which used to refuse to proceed if the update mode was already set to the same value. This seems to be an artifact of the fact that when state snapshots were implemented in #9090, this extra logic was added because there was another user of this function (`inline_flat_state_values()` added in #9037), but that function has since been deleted, so the state snapshot code is now the only user of `set_flat_state_updates_mode()`.

nikurt force-pushed the nikurt-seize-the-state branch from c234bf3 to b21eab5 Compare May 22, 2023 19:19

Longarithm mentioned this pull request May 23, 2023

feat: construct state sync parts using flat storage #8927

Merged

nikurt requested a review from Longarithm May 24, 2023 12:32

nikurt marked this pull request as ready for review May 24, 2023 12:32

nikurt requested a review from a team as a code owner May 24, 2023 12:32

Longarithm mentioned this pull request May 24, 2023

[Resharding] Offline prototype for using Flat Storage to reconstruct trie #9105

Closed

Longarithm mentioned this pull request Jun 1, 2023

Initiative: Integrate Flat storage with State sync #8899

Closed

4 tasks

Longarithm reviewed Jun 5, 2023

View reviewed changes

nikurt mentioned this pull request Jun 7, 2023

fix(state-parts): Instrument creation of state parts #9148

Merged

nikurt requested a review from Longarithm June 12, 2023 17:14

Nikolay Kurtov added 12 commits June 13, 2023 12:49

feat(flat-storage-snapshots): Make a snapshot of state at the beginni…

8b1f2c2

…ng of an epoch.

More debug and metrics

2b13cd5

More spans

521c999

More debug and metrics

17421d8

Keep necessary columns

a9f3ebe

Testing - snapshot on startup

bebea80

Testing - snapshot on startup

7cd9e0b

Testing - snapshot on startup

1bbbe8b

Copy instead of snapshotting

779b937

Copy columns instead of checkpointing

cdd35d7

Undo do_copy

43a0fbc

Move snapshotting to a different thread

f577c8b

Nikolay Kurtov added 11 commits June 19, 2023 17:01

debug stress.py

d627f8c

debug stress.py

59e12c0

fix test_mock_node_basic

3790753

fix test_mock_node_basic

7c83df9

debug stress.py

48e8320

fix test-mock-node

e0d74ad

debug stress.py

d507f0b

fmt

1bb22e5

fmt

e640ce4

fmt

5d53a32

fmt

cfe688c

Nikolay Kurtov and others added 6 commits June 20, 2023 10:57

fmt

fab0df9

fmt

31a095e

fix infinite_loops

b06700c

fix stress.py

65533fb

Merge branch 'master' into nikurt-seize-the-state

782ba83

debug gc_sync_after_sync

cf87b82

nikurt added the S-automerge label Jun 20, 2023

Merge master into nikurt-seize-the-state

63387c2

near-bulldozer bot merged commit a5ede1d into master Jun 20, 2023

near-bulldozer bot deleted the nikurt-seize-the-state branch June 20, 2023 14:07

ppca added a commit that referenced this pull request Sep 19, 2023

Revert "feat(state-sync): DB Snapshots (#9090)"

331ab6f

This reverts commit a5ede1d.

marcelo-gonzalez mentioned this pull request Sep 24, 2024

feat(state-sync): sync to the current epoch instead of the previous #12102

Merged

marcelo-gonzalez mentioned this pull request Dec 10, 2024

fix(resharding): wait until child flat storages are split to take snapshots #12589

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(state-sync): DB Snapshots #9090

feat(state-sync): DB Snapshots #9090

nikurt commented May 22, 2023 •

edited

Loading

Longarithm commented May 29, 2023

Longarithm commented May 30, 2023

nikurt commented Jun 1, 2023

Longarithm commented Jun 2, 2023

nikurt commented Jun 2, 2023

nikurt commented Jun 5, 2023

Longarithm May 26, 2023

nikurt Jun 6, 2023

Longarithm May 26, 2023

nikurt Jun 6, 2023

Longarithm Jun 5, 2023

nikurt Jun 6, 2023

nikurt commented Jun 19, 2023

feat(state-sync): DB Snapshots #9090

feat(state-sync): DB Snapshots #9090

Conversation

nikurt commented May 22, 2023 • edited Loading

Longarithm commented May 29, 2023

Longarithm commented May 30, 2023

nikurt commented Jun 1, 2023

Longarithm commented Jun 2, 2023

nikurt commented Jun 2, 2023

nikurt commented Jun 5, 2023

Longarithm May 26, 2023

Choose a reason for hiding this comment

nikurt Jun 6, 2023

Choose a reason for hiding this comment

Longarithm May 26, 2023

Choose a reason for hiding this comment

nikurt Jun 6, 2023

Choose a reason for hiding this comment

Longarithm Jun 5, 2023

Choose a reason for hiding this comment

nikurt Jun 6, 2023

Choose a reason for hiding this comment

nikurt commented Jun 19, 2023

nikurt commented May 22, 2023 •

edited

Loading