-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(state-sync): DB Snapshots #9090
Conversation
c234bf3
to
b21eab5
Compare
Gennerate inner part of state part using flat storage using idea present in #8984. In short, if flat storage head corresponds to the state root for which we sync state, it is enough to read only boundary nodes, and inner trie part can be reconstructed using range of KV pairs from state. The main logic for that is contained in `Trie::get_trie_nodes_for_part_with_flat_storage`. It requires couple of minor changes: * now we allow creating "view" `Trie`s with flat storage as well. As before, we want to avoid creating non-view `Tries` because `TrieCache` accesses may be blocking for chunk processing * `get_head_hash` and `shard_uid` methods for `FlatStorage` allowing to make correct range query to flat storage * `FlatStateValue` moved to `primitives` to allow more general access ## TODO * prometheus metrics * integration test checking that flat storage is used during normal block processing on client (or wait for #9090) ## Testing https://nayduck.near.org/#/run/3023 Big sanity test `get_trie_nodes_for_part_with_flat_storage` covering all scenarios I could think of: * results with/without flat storage must match * result with incorrect flat storage must be an error * result with flat storage and missing intermediate node should be still okay
I like the code, it's very clear. But I have some general concern. We shouldn't put I don't have opinion yet where this logic should be - maybe it's a separate struct like |
Maybe this logic fits well into |
Gennerate inner part of state part using flat storage using idea present in #8984. In short, if flat storage head corresponds to the state root for which we sync state, it is enough to read only boundary nodes, and inner trie part can be reconstructed using range of KV pairs from state. The main logic for that is contained in `Trie::get_trie_nodes_for_part_with_flat_storage`. It requires couple of minor changes: * now we allow creating "view" `Trie`s with flat storage as well. As before, we want to avoid creating non-view `Tries` because `TrieCache` accesses may be blocking for chunk processing * `get_head_hash` and `shard_uid` methods for `FlatStorage` allowing to make correct range query to flat storage * `FlatStateValue` moved to `primitives` to allow more general access * prometheus metrics * integration test checking that flat storage is used during normal block processing on client (or wait for #9090) https://nayduck.near.org/#/run/3023 Big sanity test `get_trie_nodes_for_part_with_flat_storage` covering all scenarios I could think of: * results with/without flat storage must match * result with incorrect flat storage must be an error * result with flat storage and missing intermediate node should be still okay
@Longarithm Please take another look. Many tests are failing because they use epoch length that is too short. It takes a couple of seconds to:
|
Looking.
I see... We can disable removing columns and compaction for tests, because state is very small there. Could it simplify changes? |
It'll help, but there are still limits to it, and I don't want to introduce more configuration flags. |
@Longarithm Please take a look |
assert_ne!( | ||
flat_storage_manager.set_flat_state_updates_mode(false), | ||
Some(false), | ||
"Failed to lock flat state updates" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion looks fine (would it even be possible to propagate error from this actor to main actor?).
But let's change set_flat_state_updates_mode
return type to Result<bool, StorageError>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to check whether set_flat_state_updates_mode()
succeeded. If not, then ignore the snapshotting request.
I don't return an error, because the caller doesn't care about an error. The blocks need to be processed further. Prefer the fail-open (aka fail-safe) approach, because the snapshots are not critical to any individual node. Though they are critical to the network as a whole.
core/store/src/flat/manager.rs
Outdated
Some(v) => assert_eq!( | ||
prev_shard_value, v, | ||
"All FlatStorage are expected to have the same value of `move_head_enabled`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to comment below get_make_snapshot_callback
- let's make it an error and propagate further. Looks like we can't return None
from here anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see an updated version of this function.
@@ -2547,11 +2568,40 @@ fn test_catchup_gas_price_change() { | |||
genesis.config.min_gas_price = min_gas_price; | |||
genesis.config.gas_limit = 1000000000000; | |||
let chain_genesis = ChainGenesis::new(&genesis); | |||
|
|||
let tmp_dir1 = tempfile::tempdir().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we can't reuse real_epoch_managers
and nightshade_runtimes
as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nightshade_runtimes() initializes runtime with home_dir ../../../../ where I can't create a new directory. Therefore, I create a runtime manually.
real_epoch_managers() by itself is fine, but if I use it, I don't have an epoch manager to pass to the runtime I'm initializing manually.
Nayduck tests are fine, but a few tests fail occasionally due to reasons unrelated to this PR. |
This was added in near#9090 to provide a way to reduce the size of snapshots, as the commit message said. But that's not needed anymore when we just drop the unneeded column families and have a smaller snapshot to begin with
This was added in near#9090 to provide a way to reduce the size of snapshots, as the commit message said. But that's not needed anymore when we just drop the unneeded column families and have a smaller snapshot to begin with
This was added in near#9090 to provide a way to reduce the size of snapshots, as the commit message said. But that's not needed anymore when we just drop the unneeded column families and have a smaller snapshot to begin with
This was added in near#9090 to provide a way to reduce the size of snapshots, as the commit message said. But that's not needed anymore when we just drop the unneeded column families and have a smaller snapshot to begin with
…pshots (#12589) `test_resharding_v3_shard_shuffling_slower_post_processing_tasks` exposes a bug that can be triggered if child flat storages are not split after a resharding by the time we want to take a state snapshot. Then the state snapshot code will fail because the flat storage is not ready, but will not retry. To fix it, we add a `want_snapshot` field that will be set when we decide to take a state snapshot. We also add a `split_in_progress` field to the `FlatStorageManager` that will be set to `true` when a resharding is started, and back to false when it's finished and the catchup code has progressed to a height close to the desired snapshot height. The state snapshot code will wait until `split_in_progress` is false to proceed, and the flat storage catchup code will wait until `want_snapshot` is cleared if it has already advanced to the desired snapshot hash, so that we don't advance past the point that was wanted by the state snapshot. The first one is the one actually causing the test failure, but the second one is also required. We implement this waiting by rescheduling the message sends in the future. A Condvar would be a very natural choice, but it unfortunately doesn't seem to work in testloop, since actors that are normally running on different threads are put on the same thread, and a blocker on a Condvar won't be woken up. Here we are making a change to the behavior of the old `set_flat_state_updates_mode()`, which used to refuse to proceed if the update mode was already set to the same value. This seems to be an artifact of the fact that when state snapshots were implemented in #9090, this extra logic was added because there was another user of this function (`inline_flat_state_values()` added in #9037), but that function has since been deleted, so the state snapshot code is now the only user of `set_flat_state_updates_mode()`.
FlatStorage provides a fast way to generate state parts, but it needs a consistent view of state at the beginning of the epoch.
This PR makes a snapshot of the whole DB, deletes unused columns. This gives the node a read-only database with exactly the view of Flat State that is needed for State Sync.
The snapshot is made in a separate actix Actor to avoid blocking ClientActor.
To improve the iteration speed on the development, added a testing mechanism. A config option can trigger state snapshots every N blocks. Makes no sense for state sync, but very useful to observe enough snapshot events.
Tested that a node can process blocks while snapshotting is in progress.
Without compaction snapshots can result in 100GB extra disk space requirement.
Enabling compaction reduces extra disk overhead from ~100GB to ~10GB.