-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON output that groups aliases #29
Comments
Maybe we can output multiple IDs per item in the "vulnerabilities" output? These IDs would all be linked together by their aliases (i.e. grouped). i.e. {
"results": [
{
"filePath": "sbom:file/path/test.spdx.json",
"packages": [
{
"name": "mercurial",
"version": "4.8.2",
"ecosystem": "pypi",
"vulnerabilities": [
{
"ids": ["PYSEC-2019-188", "GHSA-foo"],
"aliases": [ "CVE-blah" ]
},
{
"ids": ["PYSEC-2019-189", "GHSA-bar"],
"aliases": [ "CVE-blah2" ]
}
]
}]
}
]
} The distinction between the "ids" and "aliases" in the output would be that anything in "ids" would exist in the OSV.dev DB, while anything in "aliases" doesn't exist. But maybe we also don't need this distinction. CC @G-Rath for thoughts too. |
@oliverchang my thinking is that But it sounds like you're just wanting to deduplicate, which something we're already doing in the detector by checking against the I guess what I'm saying is my thoughts are the point of |
one optimization that does come to mind though would be if the API included the |
We are also already grouping by aliases as well in the human readable output -- just not in the JSON. Re deduplication -- I'm not sure if we want to make the decision on which source to present, so we'd like to give the option to the user and present every option. The {
"osvs": {
"PYSEC-2019-188": { full OSV },
"PYSEC-2019-189": { full OSV },
"GHSA-foo": { full OSV },
"GHSA-bar": { full OSV },
},
"matches": [
{
"filePath": "sbom:file/path/test.spdx.json",
"packages": [
{
"name": "mercurial",
"version": "4.8.2",
"ecosystem": "pypi",
"vulnerabilities": [
{
"ids": ["PYSEC-2019-188", "GHSA-foo"],
"aliases": [ "CVE-blah" ]
},
{
"ids": ["PYSEC-2019-189", "GHSA-bar"],
"aliases": [ "CVE-blah2" ]
}
]
}]
}
]
}
That sounds like something that's quite easily doable. |
Right now vulnerabilities only contains
but as Oliver says, it's definitely a good idea to output the full OSV, since we are already querying for it anyway.
We might run into the problem presented here: google/osv.dev#888 where two different vulnerabilities are grouped together and presented as just one because they both alias a common vulnerability.
If we are presenting the full OSV in the output anyway, this won't be needed anymore since it doesn't save any additional queries. |
Would this be a real problem in practice? The vulns considered for grouping are the ones that match a particular package/version for the same lockfile right? |
Also another alternative approach to the output: {
"results": [
{
"filePath": "sbom:file/path/test.spdx.json",
"packages": [
{
"name": "mercurial",
"version": "4.8.2",
"ecosystem": "pypi",
"vulnerabilities": [
{
full OSV
},
{
full OSV
},
...
],
"groups": [
{"ids": ["PYSEC-2019-188", "GHSA-foo"]},
{"ids": ["PYSEC-2019-189", "GHSA-bar"]},
]
}
]
} This might be a little better because it doesn't overload "aliases", and "vulnerabilities" contain the exact OSV entries which seems more logical. Users wanting an accurate count of vulnerabilities can just do |
I think I disagree because that feels like you're forcing more work onto the user when the acknowledged goal is for unity/consistency (aka the same information should be on all OSVs that are for the same vulnerability, in the long-run) - and likewise with some of the other suggestions being presented, it feels like it's putting more work on the user that could easily be done by the tool. This is probably more about what is the point of the tool: is it just to be a local client to interface with the API? (in which case yes including all OSVs and having the user sort out which they use is the right thing), or is it to be a tool for using the API? (in which case imo it should be more opinionated, to reduce the amount of work users need to do). I think also keeping in mind the API exists right? so users should already have a very easy way to get the OSV data for |
Indeed we want to reduce the work required by the user of this too. However, I don't think OSV should be the one making a judgement on which source is better, particularly when there may be small discrepancies. There's a few steps we can take towards making this workflow easier:
|
ooh I really like this because it can be used in other places i.e. preferring to link to GHSA advisories, because all of the dev's repos are on GH and that way they can easily see what other projects are affected. |
Improve JSON output so that it encodes alias information so that the number of vulnerabilities are not artificially increased by the number of different advisories publishing the same vulnerability.
The text was updated successfully, but these errors were encountered: