Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Optionally support regular expressions when applying ignore rules for package names/package upstream names #2445

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 22 additions & 11 deletions grype/match/ignore.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@ package match

import (
"regexp"
"strings"

"github.com/bmatcuk/doublestar/v2"

"github.com/anchore/grype/internal/log"
)

// IgnoreFilter implementations are used to filter matches, returning all applicable IgnoreRule(s) that applied,
Expand Down Expand Up @@ -206,13 +205,19 @@ func packageNameRegex(packageName string) (*regexp.Regexp, error) {
}

func ifPackageNameApplies(name string) ignoreCondition {
pattern, err := packageNameRegex(name)
if err != nil {
return func(Match) bool { return false }
if name == "linux(-.*)?-headers-.*" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my hope that we would move these ignore rules to be data driven at some point, but by hardcoding them in the functions here, this seems to move in the opposite direction. I'm having a hard time understanding where the time/allocation is: should we just check to see if strings.Contains(name, ".*") here to avoid making patterns and return a different function which just does a string == comparison?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my hope that we would move these ignore rules to be data driven at some point, but by hardcoding them in the functions here, this seems to move in the opposite direction

Yes this seems like a good way to do things, however the fact will still remain that some of the ignore rules (as far as I can tell just one currently) need to be treated as regex while the vast majority just need to be string equals comparisons.

In this data driven approach we could move towards a more 'robust' solution similar to the one I allude to in the PR description where the data itself lets callers know whether to treat the rule as a regular expression or not.

I'm having a hard time understanding where the time/allocation is

I could have explained that better and just realised my PR description screenshots maybe don't make it totally clear.

This pprof diagram might be a bit more helpful

Screenshot 2025-02-13 at 15 55 47

You can see that ApplyExplicitIgnoreRules perform a lot of allocations.

If we look at top10 we can see this as well

(pprof) top10
Showing nodes accounting for 10164.59MB, 62.40% of 16288.23MB total
Dropped 376 nodes (cum <= 81.44MB)
Showing top 10 nodes out of 140
      flat  flat%   sum%        cum   cum%
 3203.41MB 19.67% 19.67%  3203.41MB 19.67%  regexp/syntax.(*compiler).inst (inline)
 1913.64MB 11.75% 31.42%  1913.64MB 11.75%  regexp.onePassCopy
 1023.75MB  6.29% 37.70%  1023.75MB  6.29%  encoding/json.(*Decoder).refill
  882.94MB  5.42% 43.12%  9653.96MB 59.27%  github.com/anchore/grype/grype/match.ApplyExplicitIgnoreRules

And we can jump to the problem with:

(pprof) list github.com/anchore/grype/grype/match.ApplyExplicitIgnoreRules
Total: 15.91GB
ROUTINE ======================== github.com/anchore/grype/grype/match.ApplyExplicitIgnoreRules in /Users/adammcclenaghan/go/pkg/mod/github.com/anchore/grype@v0.87.0/grype/match/explicit_ignores.go
  882.94MB     9.43GB (flat, cum) 59.27% of Total
       2MB        2MB     72:func ApplyExplicitIgnoreRules(provider ExclusionProvider, matches Matches) (Matches, []IgnoredMatch) {
         .          .     73:	var ignoreRules []IgnoreRule
  880.94MB   880.94MB     74:	ignoreRules = append(ignoreRules, explicitIgnoreRules...)
         .          .     75:
         .    30.60MB     76:	for _, m := range matches.Sorted() {
         .   101.06MB     77:		r, err := provider.GetRules(m.Vulnerability.ID)
         .          .     78:
         .          .     79:		if err != nil {
         .          .     80:			log.Warnf("unable to get ignore rules for vuln id=%s", m.Vulnerability.ID)
         .          .     81:			continue
         .          .     82:		}
         .          .     83:
         .          .     84:		ignoreRules = append(ignoreRules, r...)
         .          .     85:	}
         .          .     86:
         .     8.44GB     87:	return ApplyIgnoreRules(matches, ignoreRules)
         .          .     88:}

The majority of allocations are happening here:

         .     8.44GB     87:	return ApplyIgnoreRules(matches, ignoreRules)

To step through the code that gets us here...

  1. For each package here and matcher we apply the defined ignore rules . When there's a lot of packages, we'll call this a lot.
  2. All of the defined ignore rules then get applied here. Keep in mind no patterns have been compiled yet.
  3. Calls here
  4. If you follow through, we get to here. (Recap on context, this line gets called (numPackage * numMatches * rules) times and importantly it calls filter.IgnoreMatch(match)... that many times...
  5. If we follow into filter.IgnoreMatch it calls here getIgnoreConditionsForRule(r)
  6. And getIgnoreConditionsForRule is where the allocations are happening. This is because every time it gets called it's compiling a pattern for the rule. So we compile regex patterns (Num Packages * Matches * Rules) times.

We can confirm through the heap profile:

(pprof) list getIgnoreConditionsForRule
Total: 15.91GB
ROUTINE ======================== github.com/anchore/grype/grype/match.getIgnoreConditionsForRule in /Users/adammcclenaghan/go/pkg/mod/github.com/anchore/grype@v0.87.0/grype/match/ignore.go
     106MB     8.35GB (flat, cum) 52.48% of Total
         .          .    111:func getIgnoreConditionsForRule(rule IgnoreRule) []ignoreCondition {
         .          .    112:	var ignoreConditions []ignoreCondition
         .          .    113:
         .          .    114:	if v := rule.Vulnerability; v != "" {
      14MB    54.50MB    115:		ignoreConditions = append(ignoreConditions, ifVulnerabilityApplies(v))
         .          .    116:	}
         .          .    117:
         .          .    118:	if ns := rule.Namespace; ns != "" {
         .          .    119:		ignoreConditions = append(ignoreConditions, ifNamespaceApplies(ns))
         .          .    120:	}
         .          .    121:
         .          .    122:	if n := rule.Package.Name; n != "" {
      32MB     8.19GB    123:		ignoreConditions = append(ignoreConditions, ifPackageNameApplies(n))
         .          .    124:	}
         .          .    125:
         .          .    126:	if v := rule.Package.Version; v != "" {
         .          .    127:		ignoreConditions = append(ignoreConditions, ifPackageVersionApplies(v))
         .          .    128:	}
         .          .    129:
         .          .    130:	if l := rule.Package.Language; l != "" {
         .          .    131:		ignoreConditions = append(ignoreConditions, ifPackageLanguageApplies(l))
         .          .    132:	}
         .          .    133:
         .          .    134:	if t := rule.Package.Type; t != "" {
      60MB   102.50MB    135:		ignoreConditions = append(ignoreConditions, ifPackageTypeApplies(t))
         .          .    136:	}
         .          .    137:
         .          .    138:	if l := rule.Package.Location; l != "" {
         .          .    139:		ignoreConditions = append(ignoreConditions, ifPackageLocationApplies(l))
         .          .    140:	}

Here you see the largest offender is the package name rules being re-compiled:

         .          .    122:	if n := rule.Package.Name; n != "" {
      32MB     8.19GB    123:		ignoreConditions = append(ignoreConditions, ifPackageNameApplies(n))
         .          .    124:	}

And just to see it all out through to the end:

(pprof) list ifPackageNameApplies
Total: 15.91GB
ROUTINE ======================== github.com/anchore/grype/grype/match.ifPackageNameApplies in /Users/adammcclenaghan/go/pkg/mod/github.com/anchore/grype@v0.87.0/grype/match/ignore.go
   22.50MB     8.16GB (flat, cum) 51.32% of Total
         .          .    182:func ifPackageNameApplies(name string) ignoreCondition {
         .     8.14GB    183:	pattern, err := packageNameRegex(name)
         .          .    184:	if err != nil {
         .          .    185:		return func(Match) bool { return false }
         .          .    186:	}
         .          .    187:
   22.50MB    22.50MB    188:	return func(match Match) bool {
         .          .    189:		return pattern.MatchString(match.Package.Name)
         .          .    190:	}
         .          .    191:}
         .          .    192:
         .          .    193:func ifPackageVersionApplies(version string) ignoreCondition {

And the compile calls from packageNameRegex:

(pprof) list packageNameRegex
Total: 15.91GB
ROUTINE ======================== github.com/anchore/grype/grype/match.packageNameRegex in /Users/adammcclenaghan/go/pkg/mod/github.com/anchore/grype@v0.87.0/grype/match/ignore.go
   33.50MB     8.14GB (flat, cum) 51.18% of Total
         .          .    174:func packageNameRegex(packageName string) (*regexp.Regexp, error) {
         .          .    175:	pattern := packageName
         .          .    176:	if packageName[0] != '$' || packageName[len(packageName)-1] != '^' {
   33.50MB    33.50MB    177:		pattern = "^" + packageName + "$"
         .          .    178:	}
         .     8.11GB    179:	return regexp.Compile(pattern)
         .          .    180:}
         .          .    181:
         .          .    182:func ifPackageNameApplies(name string) ignoreCondition {
         .          .    183:	pattern, err := packageNameRegex(name)
         .          .    184:	if err != nil {

Wrt time the problem is that the amount of short lived allocations causes a lot of work for the GC as expected:
Screenshot 2025-02-13 at 16 05 31


should we just check to see if strings.Contains(name, ".*") here to avoid making patterns and return a different function which just does a string == comparison

I may be misunderstanding, are you suggesting that we use a strings.Contains(name, ".*") check to determine if we should perform pattern matching, otherwise perform the == comparison?

The problem is if we do this to try to support regular expressions more broadly, it does not go far enough. There are a whole host of other symbols we'd have to check for to determine if the rule is a valid regular expression, in fact the best way to check would be to compile the regex in the first place (which we are trying to avoid)

I think that while it looks a little ugly to have the hard coded definitions here, it may make the intent clearer to readers for now.. Its also why I suggested that we perhaps use a shared const to make it even clearer that the linux kernel headers are the exception to the rule (for now)

As mentioned, a more proper solution is to have rules surface how they should be treated and how they should be matched against. However I felt doing that is beyond the scope of this fix for now.

pattern, err := packageNameRegex(name)
if err != nil {
return func(Match) bool { return false }
}

return func(match Match) bool {
return pattern.MatchString(match.Package.Name)
}
}

return func(match Match) bool {
return pattern.MatchString(match.Package.Name)
return name == match.Package.Name
}
}

Expand Down Expand Up @@ -241,14 +246,20 @@ func ifPackageLocationApplies(location string) ignoreCondition {
}

func ifUpstreamPackageNameApplies(name string) ignoreCondition {
pattern, err := packageNameRegex(name)
if err != nil {
log.WithFields("name", name, "error", err).Debug("unable to parse name expression")
return func(Match) bool { return false }
if name == "linux.*" {
return func(match Match) bool {
for _, upstream := range match.Package.Upstreams {
if strings.HasPrefix(upstream.Name, "linux") {
return true
}
}
return false
}
}

return func(match Match) bool {
for _, upstream := range match.Package.Upstreams {
if pattern.MatchString(upstream.Name) {
if name == upstream.Name {
return true
}
}
Expand Down
64 changes: 5 additions & 59 deletions grype/match/ignore_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ var (
Version: "5.2.1",
Type: syftPkg.DebPkg,
Upstreams: []pkg.UpstreamPackage{
{Name: "linux"},
{Name: "notalinux"},
},
},
Details: []Detail{
Expand Down Expand Up @@ -568,7 +568,7 @@ func TestApplyIgnoreRules(t *testing.T) {
},
{
Package: IgnoreRulePackage{
UpstreamName: "linux-.*",
UpstreamName: "linux.*",
},
},
},
Expand All @@ -591,7 +591,7 @@ func TestApplyIgnoreRules(t *testing.T) {
AppliedIgnoreRules: []IgnoreRule{
{
Package: IgnoreRulePackage{
UpstreamName: "linux-.*",
UpstreamName: "linux.*",
},
},
},
Expand Down Expand Up @@ -638,7 +638,7 @@ func TestApplyIgnoreRules(t *testing.T) {
},
{
Package: IgnoreRulePackage{
Name: "linux-.*-headers-.*",
Name: "linux(-.*)?-headers-.*",
UpstreamName: "linux.*",
Type: string(syftPkg.DebPkg),
},
Expand Down Expand Up @@ -667,7 +667,7 @@ func TestApplyIgnoreRules(t *testing.T) {
AppliedIgnoreRules: []IgnoreRule{
{
Package: IgnoreRulePackage{
Name: "linux-.*-headers-.*",
Name: "linux(-.*)?-headers-.*",
UpstreamName: "linux.*",
Type: string(syftPkg.DebPkg),
},
Expand All @@ -677,33 +677,6 @@ func TestApplyIgnoreRules(t *testing.T) {
},
},
},
{
name: "ignore on name regex",
allMatches: kernelHeadersMatches,
ignoreRules: []IgnoreRule{
{
Package: IgnoreRulePackage{
Name: "kernel-headers.*",
},
},
},
expectedRemainingMatches: []Match{
kernelHeadersMatches[1],
kernelHeadersMatches[2],
},
expectedIgnoredMatches: []IgnoredMatch{
{
Match: kernelHeadersMatches[0],
AppliedIgnoreRules: []IgnoreRule{
{
Package: IgnoreRulePackage{
Name: "kernel-headers.*",
},
},
},
},
},
},
{
name: "ignore on name regex, no matches",
allMatches: kernelHeadersMatches,
Expand All @@ -730,33 +703,6 @@ func TestApplyIgnoreRules(t *testing.T) {
expectedRemainingMatches: kernelHeadersMatches,
expectedIgnoredMatches: nil,
},
{
name: "ignore on name regex, line termination test match",
allMatches: kernelHeadersMatches,
ignoreRules: []IgnoreRule{
{
Package: IgnoreRulePackage{
Name: "^kernel-headers$",
},
},
},
expectedRemainingMatches: []Match{
kernelHeadersMatches[1],
kernelHeadersMatches[2],
},
expectedIgnoredMatches: []IgnoredMatch{
{
Match: kernelHeadersMatches[0],
AppliedIgnoreRules: []IgnoreRule{
{
Package: IgnoreRulePackage{
Name: "^kernel-headers$",
},
},
},
},
},
},
}

for _, testCase := range cases {
Expand Down