Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Cleans up stale accounts hash cache files #34933

Merged
merged 2 commits into from
Jan 24, 2024

Conversation

brooksprumo
Copy link
Contributor

@brooksprumo brooksprumo commented Jan 24, 2024

Problem

Once the Incremental Accounts Hash feature was enabled on mnb, validators started noticing their accounts hash cache directory was significantly growing in size. Some reports were over 200 GB!

The accounts hash cache files are intentionally not cleaned up when performing an incremental accounts hash. This turns out to be quite an issue on mnb, but was not an issue/not noticed on other clusters/during testing.

Summary of Changes

Clean up stale accounts hash cache files when calculating an incremental accounts hash.

Additional Testing

I've been running this code on a node against mnb and have observed there are no longer stale accounts hash cache files left on disk. The accounts hash cache directory stays around 30 GB, which is the expected size. A "pre-release" version of this code was also running overnight, so it went through a few complete full snapshot intervals.

@brooksprumo brooksprumo self-assigned this Jan 24, 2024
@brooksprumo brooksprumo force-pushed the iah/fix-cache-hash-PR branch 2 times, most recently from ae3ef96 to 5186cb9 Compare January 24, 2024 17:31
@brooksprumo brooksprumo force-pushed the iah/fix-cache-hash-PR branch from 5186cb9 to 393f6b2 Compare January 24, 2024 17:32
Copy link
Contributor

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@brooksprumo brooksprumo marked this pull request as ready for review January 24, 2024 18:14
@brooksprumo brooksprumo added v1.17 PRs that should be backported to v1.17 v1.18 PRs that should be backported to v1.18 labels Jan 24, 2024
Copy link
Contributor

mergify bot commented Jan 24, 2024

Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule.

Copy link
Contributor

mergify bot commented Jan 24, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (1810fea) 81.6% compared to head (daffbed) 81.6%.
Report is 6 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #34933   +/-   ##
=======================================
  Coverage    81.6%    81.6%           
=======================================
  Files         827      827           
  Lines      223874   223911   +37     
=======================================
+ Hits       182846   182877   +31     
- Misses      41028    41034    +6     

@@ -192,7 +193,8 @@ impl CacheHashDataFile {
pub(crate) struct CacheHashData {
cache_dir: PathBuf,
pre_existing_cache_files: Arc<Mutex<HashSet<PathBuf>>>,
should_delete_old_cache_files_on_drop: bool,
/// Decides which old cache files to delete. See `delete_old_cache_files()` for more info.
storages_start_slot: Option<Slot>,
Copy link
Contributor

@HaoranYi HaoranYi Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a more descriptive Enum instead of generic option to make the meaning for deletion more explicitly.

enum DeletePolicy {
  All,
  OnAndAfter(Slot),
}

Doesn't have to do in this PR. Can wailt for a follow up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're speaking my language! I like this idea a lot. I'll probably do it in a subsequent PR unless others prefer I address that here.

Copy link
Contributor

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brooksprumo brooksprumo merged commit 5898b9a into solana-labs:master Jan 24, 2024
37 checks passed
@brooksprumo brooksprumo deleted the iah/fix-cache-hash-PR branch January 24, 2024 20:31
mergify bot pushed a commit that referenced this pull request Jan 24, 2024
(cherry picked from commit 5898b9a)

# Conflicts:
#	accounts-db/src/cache_hash_data.rs
mergify bot pushed a commit that referenced this pull request Jan 24, 2024
brooksprumo added a commit that referenced this pull request Jan 24, 2024
…#34937)

* Cleans up stale accounts hash cache files (#34933)

(cherry picked from commit 5898b9a)

# Conflicts:
#	accounts-db/src/cache_hash_data.rs

* resolves merge conflicts

---------

Co-authored-by: Brooks <brooks@solana.com>
brooksprumo added a commit that referenced this pull request Jan 25, 2024
…#34938)

Cleans up stale accounts hash cache files (#34933)

(cherry picked from commit 5898b9a)

Co-authored-by: Brooks <brooks@solana.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.17 PRs that should be backported to v1.17 v1.18 PRs that should be backported to v1.18
Projects
Development

Successfully merging this pull request may close these issues.

3 participants