-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Optionally support regular expressions when applying ignore rules for package names/package upstream names #2445
Open
adammcclenaghan
wants to merge
3
commits into
anchore:main
Choose a base branch
from
adammcclenaghan:ignore-rules-regex-optional
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my hope that we would move these ignore rules to be data driven at some point, but by hardcoding them in the functions here, this seems to move in the opposite direction. I'm having a hard time understanding where the time/allocation is: should we just check to see if
strings.Contains(name, ".*")
here to avoid making patterns and return a different function which just does a string==
comparison?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this seems like a good way to do things, however the fact will still remain that some of the ignore rules (as far as I can tell just one currently) need to be treated as regex while the vast majority just need to be string equals comparisons.
In this data driven approach we could move towards a more 'robust' solution similar to the one I allude to in the PR description where the data itself lets callers know whether to treat the rule as a regular expression or not.
I could have explained that better and just realised my PR description screenshots maybe don't make it totally clear.
This pprof diagram might be a bit more helpful
You can see that
ApplyExplicitIgnoreRules
perform a lot of allocations.If we look at
top10
we can see this as wellAnd we can jump to the problem with:
The majority of allocations are happening here:
To step through the code that gets us here...
filter.IgnoreMatch(match)...
that many times...filter.IgnoreMatch
it calls heregetIgnoreConditionsForRule(r)
getIgnoreConditionsForRule
is where the allocations are happening. This is because every time it gets called it's compiling a pattern for the rule. So we compile regex patterns (Num Packages * Matches * Rules) times.We can confirm through the heap profile:
Here you see the largest offender is the package name rules being re-compiled:
And just to see it all out through to the end:
And the compile calls from
packageNameRegex
:Wrt time the problem is that the amount of short lived allocations causes a lot of work for the GC as expected:
data:image/s3,"s3://crabby-images/3aa23/3aa23d65b112387139f1112fbb1935791b89d509" alt="Screenshot 2025-02-13 at 16 05 31"
I may be misunderstanding, are you suggesting that we use a
strings.Contains(name, ".*")
check to determine if we should perform pattern matching, otherwise perform the==
comparison?The problem is if we do this to try to support regular expressions more broadly, it does not go far enough. There are a whole host of other symbols we'd have to check for to determine if the rule is a valid regular expression, in fact the best way to check would be to compile the regex in the first place (which we are trying to avoid)
I think that while it looks a little ugly to have the hard coded definitions here, it may make the intent clearer to readers for now.. Its also why I suggested that we perhaps use a shared const to make it even clearer that the linux kernel headers are the exception to the rule (for now)
As mentioned, a more proper solution is to have rules surface how they should be treated and how they should be matched against. However I felt doing that is beyond the scope of this fix for now.