feat: add `ETL` to Hashing Stages #7030

joshieDo · 2024-03-07T10:58:19Z

When taking the code path of hashing every account/storage (aka clean hashing), it now uses ETL without comitting inbetween.

Mainnet

AccountsHashing

34min -> 2min24s
17.9GiB -> 12.2GiB

StorageHashing

1h23min -> 53min
82.9GiB -> 61.1GiB

Holesky @ 1M

AccountsHashing

142 seconds -> 9 seconds
1.4GiB -> 981 MiB

StorageHashing

108 seconds to 34 9 seconds
1.3GiB -> 991 Mib

…-etl

joshieDo · 2024-03-14T16:25:40Z

crates/stages/src/stages/hashing_account.rs

@@ -356,91 +310,6 @@ mod tests {
        assert!(runner.validate_execution(input, result.ok()).is_ok(), "execution validation");
    }

-    #[tokio::test]
-    async fn execute_clean_account_hashing_with_commit_threshold() {


clean hashing execution no longer relies on commit_threshold

joshieDo · 2024-03-14T16:25:58Z

crates/stages/src/stages/hashing_storage.rs

@@ -310,156 +263,21 @@ mod tests {
        }
    }

-    #[tokio::test]
-    async fn execute_clean_storage_hashing_with_commit_threshold() {


clean hashing execution no longer relies on commit_threshold

gakonst

LGTM -- would like @onbjerg & @mattsse to quickly inspect, and if good let's move on to the indexing stages as well.

mattsse

less code good

we could also limit the rayon global pool instead, but should definitely restrict how many cores we're using for this, this could be integrate in the stages builder setup where the pool is then shared with the relevant stages

mattsse · 2024-03-18T19:25:44Z

crates/stages/src/stages/hashing_account.rs

+                // Spawn the hashing task onto the global rayon pool
+                rayon::spawn(move || {


this has the same issue as sender recovery stage,

what we could do instead is add a pool that has < available_parallelism() threads

this would also solve the sender recovery issue I believe,

ref

reth/crates/rpc/rpc-builder/src/constants.rs

Lines 30 to 31 in 273f3c7

std::thread::available_parallelism()

.map_or(25, |cpus| max(cpus.get().saturating_sub(RESERVED), RESERVED))

agree @joshieDo

#7267

should take care of it

mattsse

that makes sense to me

pending @shekhirin

mattsse · 2024-03-26T00:02:19Z

crates/stages/src/stages/hashing_account.rs

+            let mut channels = Vec::with_capacity(10_000);
+
+            // itertools chunks doesn't support enumerate, so we use a counter to know when to flush
+            // hashes from channels to disk.
+            let mut flush_counter = 1;
+
+            // channels used to return result of account hashing
+            for chunk in &accounts_cursor.walk(start)?.chunks(100) {


where are these numbers coming from?

Made a few changes, but they're arbitrary. I tried this in some benchmarks, and didn't have apparent performance degradation. We might want to tweak them in the future though.

Before we were just pushing all channels into vec, which would grow the memory usage immensily. So batching and flushing seems a more reasonable approach.

crates/stages/src/stages/hashing_account.rs

shekhirin

LGTM, logging nit

add etl to hashing stages

d7530c4

joshieDo mentioned this pull request Mar 7, 2024

Tracking: Add ETL to HashingStages & HistoryStages #6909

Closed

2 tasks

joshieDo added 7 commits March 12, 2024 23:04

add some observability logs to rayon works

0bb2f45

clippy

63c7169

Merge remote-tracking branch 'origin/main' into joshie/hashing-stages…

150ebde

…-etl

add EtlConfig to hashing stages

7d3aa2d

use etl on bin run

b053691

add etl to stage test runners

ecef62e

remove outdated tests

e29055d

joshieDo commented Mar 14, 2024

View reviewed changes

joshieDo marked this pull request as ready for review March 14, 2024 16:27

joshieDo requested review from onbjerg, rkrasiuk, shekhirin and gakonst as code owners March 14, 2024 16:27

pretty debug log

ae63f3c

gakonst approved these changes Mar 15, 2024

View reviewed changes

mattsse requested changes Mar 18, 2024

View reviewed changes

joshieDo requested a review from mattsse March 21, 2024 13:51

mattsse approved these changes Mar 22, 2024

View reviewed changes

hash in batches

d34395d

mattsse reviewed Mar 26, 2024

View reviewed changes

joshieDo added 2 commits March 26, 2024 15:11

handle hashing in batches for storage as well

10c8dc5

add MAXIMUM_CHANNELS and WORKER_CHUNK_SIZE consts

fb087ad

shekhirin reviewed Mar 26, 2024

View reviewed changes

crates/stages/src/stages/hashing_account.rs Show resolved Hide resolved

shekhirin approved these changes Mar 26, 2024

View reviewed changes

joshieDo added this pull request to the merge queue Mar 26, 2024

Merged via the queue into main with commit 96e39d2 Mar 26, 2024
27 checks passed

joshieDo deleted the joshie/hashing-stages-etl branch March 26, 2024 16:58

shekhirin mentioned this pull request Mar 28, 2024

fix: Add more observability on stdout during ETL stages execution #7377

Merged

Ruteri pushed a commit to Ruteri/reth that referenced this pull request Apr 17, 2024

feat: add ETL to Hashing Stages (paradigmxyz#7030)

62f1428

Ruteri pushed a commit to Ruteri/reth that referenced this pull request Apr 17, 2024

feat: add ETL to Hashing Stages (paradigmxyz#7030)

10ddf36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `ETL` to Hashing Stages #7030

feat: add `ETL` to Hashing Stages #7030

joshieDo commented Mar 7, 2024 •

edited

Loading

joshieDo Mar 14, 2024

joshieDo Mar 14, 2024

gakonst left a comment

mattsse left a comment

mattsse Mar 18, 2024

gakonst Mar 20, 2024

joshieDo Mar 21, 2024

mattsse left a comment

mattsse Mar 26, 2024

joshieDo Mar 26, 2024

shekhirin left a comment

		// Spawn the hashing task onto the global rayon pool
		rayon::spawn(move \|\| {

	std::thread::available_parallelism()
	.map_or(25, \|cpus\| max(cpus.get().saturating_sub(RESERVED), RESERVED))

feat: add ETL to Hashing Stages #7030

feat: add ETL to Hashing Stages #7030

Conversation

joshieDo commented Mar 7, 2024 • edited Loading

Mainnet

AccountsHashing

StorageHashing

Holesky @ 1M

AccountsHashing

StorageHashing

joshieDo Mar 14, 2024

Choose a reason for hiding this comment

joshieDo Mar 14, 2024

Choose a reason for hiding this comment

gakonst left a comment

Choose a reason for hiding this comment

mattsse left a comment

Choose a reason for hiding this comment

mattsse Mar 18, 2024

Choose a reason for hiding this comment

gakonst Mar 20, 2024

Choose a reason for hiding this comment

joshieDo Mar 21, 2024

Choose a reason for hiding this comment

mattsse left a comment

Choose a reason for hiding this comment

mattsse Mar 26, 2024

Choose a reason for hiding this comment

joshieDo Mar 26, 2024

Choose a reason for hiding this comment

shekhirin left a comment

Choose a reason for hiding this comment

feat: add `ETL` to Hashing Stages #7030

feat: add `ETL` to Hashing Stages #7030

joshieDo commented Mar 7, 2024 •

edited

Loading