Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds active stat items for clean subtasks #2245

Merged
merged 1 commit into from
Jan 2, 2025

Conversation

brooksprumo
Copy link

Problem

At startup there are OOM reports that we believe are caused by clean. But when there's an OOM, we don't get the metrics from clean to know which part was the culprit.

Summary of Changes

Add "active stat" items for the clean subtasks.

@brooksprumo brooksprumo self-assigned this Jul 22, 2024
@brooksprumo brooksprumo marked this pull request as ready for review July 22, 2024 21:57
@@ -17,6 +27,16 @@ pub struct ActiveStats {
#[derive(Debug, Copy, Clone)]
pub enum ActiveStatItem {
Clean,
CleanCollectCandidates,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow. this is a lot of granularity. my gut says we need half of these. otoh, what is the downside of all these? some noise? not sure. certainly helpful as we debug these oom situations. not sure how useful it is usually. since they are metrics it also seems that we don't need log files to find 'info' logs. I'm thinking we don't really need the 1/0 results. maybe if we just logged the 1 each time each phase started and we could look at the code to see there are no gaps between stages. that seems kind of hacky and just trying to save half the logs. each of these logs will be once per clean, so once per 100 slots. This is a rambling comment. In short, I'm reluctant to put in this much granularity. But, maybe it is hte right idea and I shouldn't be so conservative?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a lot. I agree likely too much. I figured I'd start with "everything" and then we can reduce it. I plan to run this on the dev box to see where the oom happens. I figure for that use-case, more is better. Maybe it'll inform what parts can be reduced for this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've revived this PR and reduced the number of new items a bit. How's the new set look?

Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a good set of stat to have.

I believe that these stats are reported every ~100 slots. right?

@brooksprumo
Copy link
Author

I believe that these stats are reported every ~100 slots. right?

Yeah, whenever clean runs we'll see these new stats. On a normal validator that's ~100 slots.

HaoranYi
HaoranYi previously approved these changes Dec 20, 2024
jeffwashington
jeffwashington previously approved these changes Jan 2, 2025
Copy link

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brooksprumo
Copy link
Author

I had to resolve merge conflicts (removal of uncleaned_roots), so I rebased and force-pushed. No code was changed.

Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@brooksprumo brooksprumo merged commit 6269bfe into anza-xyz:master Jan 2, 2025
40 checks passed
@brooksprumo brooksprumo deleted the accounts-db-active branch January 2, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants