-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scanner: Create and store file listings for each resolved provenance #6970
Conversation
01325dd
to
d98f08b
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #6970 +/- ##
============================================
- Coverage 64.34% 64.33% -0.02%
- Complexity 1960 1966 +6
============================================
Files 327 329 +2
Lines 16518 16568 +50
Branches 2361 2367 +6
============================================
+ Hits 10629 10659 +30
- Misses 4870 4887 +17
- Partials 1019 1022 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
4371878
to
90bb219
Compare
90bb219
to
8835cea
Compare
model/src/main/kotlin/config/FileListingStorageConfiguration.kt
Outdated
Show resolved
Hide resolved
1ba6246
to
21f4cbb
Compare
21f4cbb
to
3943d25
Compare
Looking at above discussion I have a hunch it may be more efficient to come to a conclusion in a short call. @sschuberth @mnonnenmacher would you be in for that? |
model/src/main/kotlin/config/FileListingStorageConfiguration.kt
Outdated
Show resolved
Hide resolved
3943d25
to
809b8cd
Compare
@mnonnenmacher @sschuberth I've incorporated the changes we've agreed upon in our call yesterday. |
@mnonnenmacher the removal of the explicit |
Then I would argue for removing the default value, the issue severity is something I would like to see on the call side. |
This is supposed to be used for storing the list of file paths along with SHA1 sums for each resolved provenance in a `ProvenanceFileStorage`. The first use case this targets is producing SBOMs which include file paths and SHA1 sums for the file findings and later potentially for all files. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
This class is supposed to be used for creating and obtaining file listings by provenance. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The corresponding storage is supposed to be used to store a list of file paths and SHA1 sums for each resolved provenance. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The file listing storage configuration has been added in a preceeding change. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Store the files as a compressed blob using `xz`, as this leads to better results compared to `gzip` or `zip` while still being reasonably fast. The compressed sizes of `JSON` and `YAML` files are similar. So, choose `YAML` for better readability. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Are you fine with de-scoping this from this PR then (leave this PR as-is and address the topic separately). I don't like changing this back and forth. In my view this is now a different topic. Agreed? |
809b8cd
to
2a080de
Compare
In [1] a list of path-sha1-tuples has been introduced and "file listing" has been choosen as name. In the code review for [1] it's been decided that just "file list" would be the better name. So, rename all the occurences of file listings introduced in [1] accordingly. [1] #6970 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
In [1] a list of path-sha1-tuples has been introduced and "file listing" has been choosen as name. In the code review for [1] it's been decided that just "file list" would be the better name. So, rename all the occurences of file listings introduced in [1] accordingly. [1] #6970 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
In [1] a list of path-sha1-tuples has been introduced and "file listing" has been choosen as name. In the code review for [1] it's been decided that just "file list" would be the better name. So, rename all the occurences of file listings introduced in [1] accordingly. [1] #6970 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
In [1] a list of path-sha1-tuples has been introduced and "file listing" has been choosen as name. In the code review for [1] it's been decided that just "file list" would be the better name. So, rename all the occurences of file listings introduced in [1] accordingly. [1] #6970 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Create and store the file listings
(path, sha1)
for each resolved provenance.These can be consumed in future iterations (see #6945) for improving the generated SBOMs.
Resolves: #6943.
I've scanned the HEAD of the main branch of npm mime-types which resolved to 335 provenances. The total size of all file listings in the Postgres database is 5605372 bytes. So, about 16.4kb per file listing in average.
BREAKING CHANGE: One does not have to, but should configure a storage backend in
~/.ort/config/config.yml
.If one doesn't do so, ORT falls back to storing the file listings under
~/.ort
which may not bedesired. See the modified
reference.yml
for how the new configuration feature looks like.