-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ETL
to Hashing Stages
#7030
Conversation
@@ -356,91 +310,6 @@ mod tests { | |||
assert!(runner.validate_execution(input, result.ok()).is_ok(), "execution validation"); | |||
} | |||
|
|||
#[tokio::test] | |||
async fn execute_clean_account_hashing_with_commit_threshold() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean hashing execution no longer relies on commit_threshold
@@ -310,156 +263,21 @@ mod tests { | |||
} | |||
} | |||
|
|||
#[tokio::test] | |||
async fn execute_clean_storage_hashing_with_commit_threshold() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean hashing execution no longer relies on commit_threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
less code good
we could also limit the rayon global pool instead, but should definitely restrict how many cores we're using for this, this could be integrate in the stages builder setup where the pool is then shared with the relevant stages
// Spawn the hashing task onto the global rayon pool | ||
rayon::spawn(move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has the same issue as sender recovery stage,
what we could do instead is add a pool that has < available_parallelism() threads
this would also solve the sender recovery issue I believe,
ref
reth/crates/rpc/rpc-builder/src/constants.rs
Lines 30 to 31 in 273f3c7
std::thread::available_parallelism() | |
.map_or(25, |cpus| max(cpus.get().saturating_sub(RESERVED), RESERVED)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree @joshieDo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should take care of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense to me
pending @shekhirin
let mut channels = Vec::with_capacity(10_000); | ||
|
||
// itertools chunks doesn't support enumerate, so we use a counter to know when to flush | ||
// hashes from channels to disk. | ||
let mut flush_counter = 1; | ||
|
||
// channels used to return result of account hashing | ||
for chunk in &accounts_cursor.walk(start)?.chunks(100) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are these numbers coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a few changes, but they're arbitrary. I tried this in some benchmarks, and didn't have apparent performance degradation. We might want to tweak them in the future though.
Before we were just pushing all channels into vec, which would grow the memory usage immensily. So batching and flushing seems a more reasonable approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, logging nit
ref #6909
When taking the code path of hashing every account/storage (aka clean hashing), it now uses ETL without comitting inbetween.
Mainnet
AccountsHashing
StorageHashing
Holesky @ 1M
AccountsHashing
StorageHashing
349 seconds