-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Cleans up stale accounts hash cache files #34933
Cleans up stale accounts hash cache files #34933
Conversation
ae3ef96
to
5186cb9
Compare
5186cb9
to
393f6b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #34933 +/- ##
=======================================
Coverage 81.6% 81.6%
=======================================
Files 827 827
Lines 223874 223911 +37
=======================================
+ Hits 182846 182877 +31
- Misses 41028 41034 +6 |
@@ -192,7 +193,8 @@ impl CacheHashDataFile { | |||
pub(crate) struct CacheHashData { | |||
cache_dir: PathBuf, | |||
pre_existing_cache_files: Arc<Mutex<HashSet<PathBuf>>>, | |||
should_delete_old_cache_files_on_drop: bool, | |||
/// Decides which old cache files to delete. See `delete_old_cache_files()` for more info. | |||
storages_start_slot: Option<Slot>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a more descriptive Enum instead of generic option to make the meaning for deletion more explicitly.
enum DeletePolicy {
All,
OnAndAfter(Slot),
}
Doesn't have to do in this PR. Can wailt for a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're speaking my language! I like this idea a lot. I'll probably do it in a subsequent PR unless others prefer I address that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
(cherry picked from commit 5898b9a) # Conflicts: # accounts-db/src/cache_hash_data.rs
(cherry picked from commit 5898b9a)
Problem
Once the Incremental Accounts Hash feature was enabled on mnb, validators started noticing their accounts hash cache directory was significantly growing in size. Some reports were over 200 GB!
The accounts hash cache files are intentionally not cleaned up when performing an incremental accounts hash. This turns out to be quite an issue on mnb, but was not an issue/not noticed on other clusters/during testing.
Summary of Changes
Clean up stale accounts hash cache files when calculating an incremental accounts hash.
Additional Testing
I've been running this code on a node against mnb and have observed there are no longer stale accounts hash cache files left on disk. The accounts hash cache directory stays around 30 GB, which is the expected size. A "pre-release" version of this code was also running overnight, so it went through a few complete full snapshot intervals.