Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: flat state value inlining migration #9037

Merged
merged 1 commit into from
May 19, 2023

Conversation

pugachAG
Copy link
Contributor

@pugachAG pugachAG commented May 10, 2023

Part of #8243.

This PR implements migration process for inlining FlatState values. For more details see the second approach in this comment.
Migration is not currently executed on the running node, that will be implemented in a separate PR. Instead migrate-value-inlining sub-command is added as part of flat-storage command.

Can be executed via cargo run --release -p neard -- --verbose store flat-storage migrate-value-inlining. Progress log example:

2023-05-15T14:22:54.280387Z  INFO store: Starting FlatState value inlining migration read_state_threads=16 batch_size=50000
...
2023-05-15T16:00:24.210821Z DEBUG store: Processed flat state value inlining batch batch_index=1580 inlined_batch_count=50000 inlined_total_count=35943298 batch_duration=67.303985ms
2023-05-15T16:00:25.388086Z DEBUG store: Processed flat state value inlining batch batch_index=1581 inlined_batch_count=50000 inlined_total_count=35993298 batch_duration=89.054046ms
...
2023-05-15T17:02:14.707594Z  INFO store: Finished FlatState value inlining migration inlined_total_count=128780116 migration_elapsed=4388.085856057s

@pugachAG pugachAG force-pushed the flat-state-inline-migration branch 12 times, most recently from df80d27 to 68ff47f Compare May 15, 2023 15:22
@pugachAG pugachAG requested review from Longarithm and jbajic May 15, 2023 16:07
@pugachAG pugachAG added the A-storage Area: storage and databases label May 15, 2023
@pugachAG pugachAG marked this pull request as ready for review May 15, 2023 16:08
@pugachAG pugachAG requested a review from a team as a code owner May 15, 2023 16:08
@pugachAG pugachAG force-pushed the flat-state-inline-migration branch from 68ff47f to 5f23a97 Compare May 15, 2023 17:07
core/store/src/flat/inlining_migration.rs Outdated Show resolved Hide resolved
core/store/src/flat/inlining_migration.rs Outdated Show resolved Hide resolved
core/store/src/flat/manager.rs Outdated Show resolved Hide resolved
@pugachAG pugachAG force-pushed the flat-state-inline-migration branch from 5f23a97 to 452bb05 Compare May 19, 2023 08:23
@pugachAG pugachAG force-pushed the flat-state-inline-migration branch from 452bb05 to f6fc151 Compare May 19, 2023 08:59
@near-bulldozer near-bulldozer bot merged commit 5b78611 into near:master May 19, 2023
near-bulldozer bot pushed a commit that referenced this pull request May 25, 2023
…9093)

Part of #8243.

This PR enables the migration added in #9037 to be executed in the background on the running node.
It supports graceful stop when the node is shut down. The implementation is heavily inspired by state sync background dumping to S3.

This PR also introduces a new column `DBCol::Misc`. For now it only stores the status of the migration, but it can hold any small pieces of data, similar to `DBCol::BlockMisc`.

`FlatStorageManager` is exposed as part of `RuntimeAdapter` in this PR. This is the first step in cleaning `RuntimeAdapter` from all other flat storage related methods, as the manager can be directly used instead.

Tested by manually running a node and checking metrics and log messages. After that flat storage was checked with `flat-storage verify` cmd.
nikurt pushed a commit that referenced this pull request May 31, 2023
…9093)

Part of #8243.

This PR enables the migration added in #9037 to be executed in the background on the running node.
It supports graceful stop when the node is shut down. The implementation is heavily inspired by state sync background dumping to S3.

This PR also introduces a new column `DBCol::Misc`. For now it only stores the status of the migration, but it can hold any small pieces of data, similar to `DBCol::BlockMisc`.

`FlatStorageManager` is exposed as part of `RuntimeAdapter` in this PR. This is the first step in cleaning `RuntimeAdapter` from all other flat storage related methods, as the manager can be directly used instead.

Tested by manually running a node and checking metrics and log messages. After that flat storage was checked with `flat-storage verify` cmd.
nikurt pushed a commit that referenced this pull request Jun 13, 2023
…9093)

Part of #8243.

This PR enables the migration added in #9037 to be executed in the background on the running node.
It supports graceful stop when the node is shut down. The implementation is heavily inspired by state sync background dumping to S3.

This PR also introduces a new column `DBCol::Misc`. For now it only stores the status of the migration, but it can hold any small pieces of data, similar to `DBCol::BlockMisc`.

`FlatStorageManager` is exposed as part of `RuntimeAdapter` in this PR. This is the first step in cleaning `RuntimeAdapter` from all other flat storage related methods, as the manager can be directly used instead.

Tested by manually running a node and checking metrics and log messages. After that flat storage was checked with `flat-storage verify` cmd.
github-merge-queue bot pushed a commit that referenced this pull request Dec 15, 2024
…pshots (#12589)

`test_resharding_v3_shard_shuffling_slower_post_processing_tasks`
exposes a bug that can be triggered if child flat storages are not split
after a resharding by the time we want to take a state snapshot. Then
the state snapshot code will fail because the flat storage is not ready,
but will not retry. To fix it, we add a `want_snapshot` field that will
be set when we decide to take a state snapshot. We also add a
`split_in_progress` field to the `FlatStorageManager` that will be set
to `true` when a resharding is started, and back to false when it's
finished and the catchup code has progressed to a height close to the
desired snapshot height. The state snapshot code will wait until
`split_in_progress` is false to proceed, and the flat storage catchup
code will wait until `want_snapshot` is cleared if it has already
advanced to the desired snapshot hash, so that we don't advance past the
point that was wanted by the state snapshot. The first one is the one
actually causing the test failure, but the second one is also required.

We implement this waiting by rescheduling the message sends in the
future. A Condvar would be a very natural choice, but it unfortunately
doesn't seem to work in testloop, since actors that are normally running
on different threads are put on the same thread, and a blocker on a
Condvar won't be woken up.

Here we are making a change to the behavior of the old
`set_flat_state_updates_mode()`, which used to refuse to proceed if the
update mode was already set to the same value. This seems to be an
artifact of the fact that when state snapshots were implemented in
#9090, this extra logic was added
because there was another user of this function
(`inline_flat_state_values()` added in
#9037), but that function has since
been deleted, so the state snapshot code is now the only user of
`set_flat_state_updates_mode()`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: storage and databases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants