fix: Optionally support regular expressions when applying ignore rules for package names/package upstream names #2445

adammcclenaghan · 2025-02-13T12:30:18Z

Description

Hello folks 👋

In #1809 both ifPackageNameApplies and ifUpstreamPackageNameApplies were updated to use regular expressions for all rule names/upstream names. This was recently updated in #2320 too.

So far as I can see from the codebase only the linux kernel headers ignore rules require regular expression matching.

In my testing I've found that FindMatches is significantly slowed down by the amount of regular expression compilation being performed.

For one of my test scans with about ~60k packages, using this branch results in a 2x speedup of FindMatches with ~8GB less cumulative allocations

Before

After

For now, I've modified ifPackageNameApplies and ifUpstreamPackageNameApplies to handle the linux-headers as special cases.

In future if we wish to support regular expressions more broadly perhaps there should be a nicer interface passed into the likes of ifPackageNameApplies and ifUpstreamPackageNameApplies rather than a string, and this interface could define whether or not to treat the match type as a regex etc...

I've had to make some changes to the unit tests.

I've removed two tests which seem to be testing behaviour that Grype doesn't have configuration for:

there are no existing ignore rules which define kernel-headers.* so I've removed the test ignore on name regex
there are no existing ignore rules which define regular expresions that start and end with anchor tags, so I've removed the test ignore on name regex, line termination test match

I've also modified some of the tests to match the actual patterns set out in Grype, namely:
-- linux-.*-headers-.* is now declared linux(-.*)?-headers-.*
-- linux-.* is now defined as linux.* .. as a result of fixing this I've also updated the upstream name from linux to notalinux for the test case. The ignore rules defined today would actually ignore linux, the only reason the test was passing before is because it defined the regex as linux-.*

In lieue of a larger refactor to have a nicer interface for ifPackageNameApplies/ifUpstreamPackageNameApplies which I've previously mentioned, it would at least perhaps be nice to declare the linux(-.*)?-headers-.* and linux.* as a const in some shared place.

If the maintainers can suggest somewhere that would be a good fit, I'd be happy to create this.

It looks like only the linux kernel headers ignore rules make use of pattern matching. Create and use regex patterns for these cases only to reduce memory allocations from pattern.compile and improve FindMatches performance. In future if we wish to support regular expressions more broadly perhaps then perhaps ignore rules should have some way to define whether they should be treated as a regular expression or not. Signed-off-by: Adam McClenaghan <adam@mcclenaghan.co.uk>

Signed-off-by: Adam McClenaghan <adam@mcclenaghan.co.uk>

kzantow · 2025-02-13T15:32:15Z

grype/match/ignore.go

-	pattern, err := packageNameRegex(name)
-	if err != nil {
-		return func(Match) bool { return false }
+	if name == "linux(-.*)?-headers-.*" {


It was my hope that we would move these ignore rules to be data driven at some point, but by hardcoding them in the functions here, this seems to move in the opposite direction. I'm having a hard time understanding where the time/allocation is: should we just check to see if strings.Contains(name, ".*") here to avoid making patterns and return a different function which just does a string == comparison?

It was my hope that we would move these ignore rules to be data driven at some point, but by hardcoding them in the functions here, this seems to move in the opposite direction

Yes this seems like a good way to do things, however the fact will still remain that some of the ignore rules (as far as I can tell just one currently) need to be treated as regex while the vast majority just need to be string equals comparisons.

In this data driven approach we could move towards a more 'robust' solution similar to the one I allude to in the PR description where the data itself lets callers know whether to treat the rule as a regular expression or not.

I'm having a hard time understanding where the time/allocation is

I could have explained that better and just realised my PR description screenshots maybe don't make it totally clear.

This pprof diagram might be a bit more helpful

You can see that ApplyExplicitIgnoreRules perform a lot of allocations.

If we look at top10 we can see this as well

(pprof) top10 Showing nodes accounting for 10164.59MB, 62.40% of 16288.23MB total Dropped 376 nodes (cum <= 81.44MB) Showing top 10 nodes out of 140 flat flat% sum% cum cum% 3203.41MB 19.67% 19.67% 3203.41MB 19.67% regexp/syntax.(*compiler).inst (inline) 1913.64MB 11.75% 31.42% 1913.64MB 11.75% regexp.onePassCopy 1023.75MB 6.29% 37.70% 1023.75MB 6.29% encoding/json.(*Decoder).refill 882.94MB 5.42% 43.12% 9653.96MB 59.27% github.com/anchore/grype/grype/match.ApplyExplicitIgnoreRules

And we can jump to the problem with:

(pprof) list github.com/anchore/grype/grype/match.ApplyExplicitIgnoreRules Total: 15.91GB ROUTINE ======================== github.com/anchore/grype/grype/match.ApplyExplicitIgnoreRules in /Users/adammcclenaghan/go/pkg/mod/github.com/anchore/grype@v0.87.0/grype/match/explicit_ignores.go 882.94MB 9.43GB (flat, cum) 59.27% of Total 2MB 2MB 72:func ApplyExplicitIgnoreRules(provider ExclusionProvider, matches Matches) (Matches, []IgnoredMatch) { . . 73: var ignoreRules []IgnoreRule 880.94MB 880.94MB 74: ignoreRules = append(ignoreRules, explicitIgnoreRules...) . . 75: . 30.60MB 76: for _, m := range matches.Sorted() { . 101.06MB 77: r, err := provider.GetRules(m.Vulnerability.ID) . . 78: . . 79: if err != nil { . . 80: log.Warnf("unable to get ignore rules for vuln id=%s", m.Vulnerability.ID) . . 81: continue . . 82: } . . 83: . . 84: ignoreRules = append(ignoreRules, r...) . . 85: } . . 86: . 8.44GB 87: return ApplyIgnoreRules(matches, ignoreRules) . . 88:}

The majority of allocations are happening here:

. 8.44GB 87: return ApplyIgnoreRules(matches, ignoreRules)

To step through the code that gets us here...

For each package here and matcher we apply the defined ignore rules . When there's a lot of packages, we'll call this a lot.

All of the defined ignore rules then get applied here. Keep in mind no patterns have been compiled yet.

Calls here

If you follow through, we get to here. (Recap on context, this line gets called (numPackage * numMatches * rules) times and importantly it calls filter.IgnoreMatch(match)... that many times...

If we follow into filter.IgnoreMatch it calls here getIgnoreConditionsForRule(r)

And getIgnoreConditionsForRule is where the allocations are happening. This is because every time it gets called it's compiling a pattern for the rule. So we compile regex patterns (Num Packages * Matches * Rules) times.

We can confirm through the heap profile:

(pprof) list getIgnoreConditionsForRule Total: 15.91GB ROUTINE ======================== github.com/anchore/grype/grype/match.getIgnoreConditionsForRule in /Users/adammcclenaghan/go/pkg/mod/github.com/anchore/grype@v0.87.0/grype/match/ignore.go 106MB 8.35GB (flat, cum) 52.48% of Total . . 111:func getIgnoreConditionsForRule(rule IgnoreRule) []ignoreCondition { . . 112: var ignoreConditions []ignoreCondition . . 113: . . 114: if v := rule.Vulnerability; v != "" { 14MB 54.50MB 115: ignoreConditions = append(ignoreConditions, ifVulnerabilityApplies(v)) . . 116: } . . 117: . . 118: if ns := rule.Namespace; ns != "" { . . 119: ignoreConditions = append(ignoreConditions, ifNamespaceApplies(ns)) . . 120: } . . 121: . . 122: if n := rule.Package.Name; n != "" { 32MB 8.19GB 123: ignoreConditions = append(ignoreConditions, ifPackageNameApplies(n)) . . 124: } . . 125: . . 126: if v := rule.Package.Version; v != "" { . . 127: ignoreConditions = append(ignoreConditions, ifPackageVersionApplies(v)) . . 128: } . . 129: . . 130: if l := rule.Package.Language; l != "" { . . 131: ignoreConditions = append(ignoreConditions, ifPackageLanguageApplies(l)) . . 132: } . . 133: . . 134: if t := rule.Package.Type; t != "" { 60MB 102.50MB 135: ignoreConditions = append(ignoreConditions, ifPackageTypeApplies(t)) . . 136: } . . 137: . . 138: if l := rule.Package.Location; l != "" { . . 139: ignoreConditions = append(ignoreConditions, ifPackageLocationApplies(l)) . . 140: }

Here you see the largest offender is the package name rules being re-compiled:

. . 122: if n := rule.Package.Name; n != "" { 32MB 8.19GB 123: ignoreConditions = append(ignoreConditions, ifPackageNameApplies(n)) . . 124: }

And just to see it all out through to the end:

(pprof) list ifPackageNameApplies Total: 15.91GB ROUTINE ======================== github.com/anchore/grype/grype/match.ifPackageNameApplies in /Users/adammcclenaghan/go/pkg/mod/github.com/anchore/grype@v0.87.0/grype/match/ignore.go 22.50MB 8.16GB (flat, cum) 51.32% of Total . . 182:func ifPackageNameApplies(name string) ignoreCondition { . 8.14GB 183: pattern, err := packageNameRegex(name) . . 184: if err != nil { . . 185: return func(Match) bool { return false } . . 186: } . . 187: 22.50MB 22.50MB 188: return func(match Match) bool { . . 189: return pattern.MatchString(match.Package.Name) . . 190: } . . 191:} . . 192: . . 193:func ifPackageVersionApplies(version string) ignoreCondition {

And the compile calls from packageNameRegex:

(pprof) list packageNameRegex Total: 15.91GB ROUTINE ======================== github.com/anchore/grype/grype/match.packageNameRegex in /Users/adammcclenaghan/go/pkg/mod/github.com/anchore/grype@v0.87.0/grype/match/ignore.go 33.50MB 8.14GB (flat, cum) 51.18% of Total . . 174:func packageNameRegex(packageName string) (*regexp.Regexp, error) { . . 175: pattern := packageName . . 176: if packageName[0] != '$' || packageName[len(packageName)-1] != '^' { 33.50MB 33.50MB 177: pattern = "^" + packageName + "$" . . 178: } . 8.11GB 179: return regexp.Compile(pattern) . . 180:} . . 181: . . 182:func ifPackageNameApplies(name string) ignoreCondition { . . 183: pattern, err := packageNameRegex(name) . . 184: if err != nil {

Wrt time the problem is that the amount of short lived allocations causes a lot of work for the GC as expected:

should we just check to see if strings.Contains(name, ".*") here to avoid making patterns and return a different function which just does a string == comparison

I may be misunderstanding, are you suggesting that we use a strings.Contains(name, ".*") check to determine if we should perform pattern matching, otherwise perform the == comparison?

The problem is if we do this to try to support regular expressions more broadly, it does not go far enough. There are a whole host of other symbols we'd have to check for to determine if the rule is a valid regular expression, in fact the best way to check would be to compile the regex in the first place (which we are trying to avoid)

I think that while it looks a little ugly to have the hard coded definitions here, it may make the intent clearer to readers for now.. Its also why I suggested that we perhaps use a shared const to make it even clearer that the linux kernel headers are the exception to the rule (for now)

As mentioned, a more proper solution is to have rules surface how they should be treated and how they should be matched against. However I felt doing that is beyond the scope of this fix for now.

adammcclenaghan added 3 commits February 13, 2025 10:51

fix: remove commented code block

d7655dd

Signed-off-by: Adam McClenaghan <adam@mcclenaghan.co.uk>

fix: Simplify and fix tests

d3115b9

Signed-off-by: Adam McClenaghan <adam@mcclenaghan.co.uk>

adammcclenaghan mentioned this pull request Feb 13, 2025

fix: create ignore rule patterns optionally adammcclenaghan/grype#1

Closed

kzantow reviewed Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Optionally support regular expressions when applying ignore rules for package names/package upstream names #2445

fix: Optionally support regular expressions when applying ignore rules for package names/package upstream names #2445

adammcclenaghan commented Feb 13, 2025

kzantow Feb 13, 2025

adammcclenaghan Feb 13, 2025

fix: Optionally support regular expressions when applying ignore rules for package names/package upstream names #2445

Are you sure you want to change the base?

fix: Optionally support regular expressions when applying ignore rules for package names/package upstream names #2445

Conversation

adammcclenaghan commented Feb 13, 2025

Description

Before

After

kzantow Feb 13, 2025

Choose a reason for hiding this comment

adammcclenaghan Feb 13, 2025

Choose a reason for hiding this comment