-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: Improve monitoring BPF maps and userspace caches #1950
Conversation
WIP ! |
Moved to draft since it seems to be WIP. |
yes thank you @kkourt |
3fa3d38
to
dce5732
Compare
ready to review |
@sadath-12 I see your PR has conflicts with the main branch, can you rebase? |
Done @lambdanis |
@sadath-12 It seems commits are still a bit off. You can check this in the checkpath action run (here). There should be no merge commits, and ideally different tasks you implemented in this PR should be separate commits - they're quite independent of each other, so separate commits would make it easier to review the git history. I think it would be good to rewrite the git history here:
|
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
cee5c96
to
c691609
Compare
Please fix the merge commit to have a proper description. Also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! I left a few comments requesting changes.
Could you also update the commit messages to be a bit more precise, describing which metric is modified exactly? Messages like removed eventcache
are confusing, as eventcache itself stays there.
981ac45
to
51688bb
Compare
@sadath-12 I see the CI is failing because of a commit message being too long: Could you rephrase it to fit the characters limit? |
ya reworded that . Tried to explain in description without making much sense (but enough) in the title of commit msg |
@lambdanis @willfindlay could you take a look and merge if its ready to go? Thanks. CI failure there is something unrelated, I see this failure from time to time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One request for a fix in a metrics description, otherwise looks good.
4c8ca78
to
e520fbd
Compare
Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>
This metric was counting process cache evictions, so effectively duplicating tetragon_errors_total{type="process_cache_evicted"}. Additionally, having it in the mapmetrics package was confusing, as it's monitoring a userspace process cache, not BPF maps. Let's remove it. Co-authored-by: sadath-12 <sadathsadu2002@gmail.com> Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Using the map size metric for reporting the event cache size was confusing, as event cache is not a BPF map. Let's remove it. A metric reporting the event cache size will be added in a follow-up commit. Co-authored-by: sadath-12 <sadathsadu2002@gmail.com> Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Some of them were incorrect. Co-authored-by: sadath-12 <sadathsadu2002@gmail.com> Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Replace tetragon_map_in_use_gauge{map="processLru"} with tetragon_process_cache_size. It's reporting the process cache size. Using the map size metric for this purpose was confusing, as process cache is not a BPF map. Co-authored-by: sadath-12 <sadathsadu2002@gmail.com> Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Thanks @sadath-12 for the updates. The code changes look good now. I rewrote the commit messages to describe exactly which metrics changed and split some commits to not mix up different changes. I dropped the |
The CI failure is a known issue: #2010. I'm merging this PR. |
tetragon_map_drops_total
tetragon_map_in_use_gauge{map="eventcache"}
tetragon_map_in_use_gauge{map="processLru"}
withtetragon_process_cache_size
Ref #1774