Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanner: Create and store file listings for each resolved provenance #6970

Merged
merged 6 commits into from
May 12, 2023

Conversation

fviernau
Copy link
Member

@fviernau fviernau commented May 5, 2023

Create and store the file listings (path, sha1) for each resolved provenance.
These can be consumed in future iterations (see #6945) for improving the generated SBOMs.

Resolves: #6943.

I've scanned the HEAD of the main branch of npm mime-types which resolved to 335 provenances. The total size of all file listings in the Postgres database is 5605372 bytes. So, about 16.4kb per file listing in average.

BREAKING CHANGE: One does not have to, but should configure a storage backend in ~/.ort/config/config.yml.
If one doesn't do so, ORT falls back to storing the file listings under ~/.ort which may not be
desired. See the modified reference.yml for how the new configuration feature looks like.

@fviernau fviernau requested a review from a team as a code owner May 5, 2023 10:09
@fviernau fviernau added release notes scanner About the scanner tool labels May 5, 2023
@fviernau fviernau marked this pull request as draft May 5, 2023 10:10
@fviernau fviernau force-pushed the scanner-store-file-listings branch from 01325dd to d98f08b Compare May 5, 2023 10:13
@codecov
Copy link

codecov bot commented May 5, 2023

Codecov Report

Patch coverage: 60.00% and project coverage change: -0.02 ⚠️

Comparison is base (70e1780) 64.34% compared to head (2a080de) 64.33%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6970      +/-   ##
============================================
- Coverage     64.34%   64.33%   -0.02%     
- Complexity     1960     1966       +6     
============================================
  Files           327      329       +2     
  Lines         16518    16568      +50     
  Branches       2361     2367       +6     
============================================
+ Hits          10629    10659      +30     
- Misses         4870     4887      +17     
- Partials       1019     1022       +3     
Flag Coverage Δ
funTest-non-docker 30.37% <42.00%> (+0.12%) ⬆️
test 39.80% <60.00%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...src/main/kotlin/provenance/ProvenanceDownloader.kt 38.88% <ø> (ø)
...n/kotlin/config/FileListingStorageConfiguration.kt 31.81% <31.81%> (ø)
...anner/src/main/kotlin/utils/FileListingResolver.kt 81.48% <81.48%> (ø)
...del/src/main/kotlin/config/ScannerConfiguration.kt 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@fviernau fviernau force-pushed the scanner-store-file-listings branch 2 times, most recently from 4371878 to 90bb219 Compare May 5, 2023 12:14
@fviernau fviernau marked this pull request as ready for review May 5, 2023 16:39
@fviernau fviernau changed the title scanner: Create and store file listing for each scanned provenance scanner: Create and store file listings for each scanned provenance May 6, 2023
@fviernau fviernau force-pushed the scanner-store-file-listings branch from 90bb219 to 8835cea Compare May 8, 2023 13:58
scanner/src/main/kotlin/FileListing.kt Show resolved Hide resolved
scanner/src/main/kotlin/utils/FileListingResolver.kt Outdated Show resolved Hide resolved
scanner/src/main/kotlin/Scanner.kt Outdated Show resolved Hide resolved
scanner/src/main/kotlin/Scanner.kt Outdated Show resolved Hide resolved
scanner/src/main/kotlin/Scanner.kt Outdated Show resolved Hide resolved
scanner/src/main/kotlin/Scanner.kt Outdated Show resolved Hide resolved
@fviernau fviernau force-pushed the scanner-store-file-listings branch 2 times, most recently from 1ba6246 to 21f4cbb Compare May 8, 2023 18:51
@fviernau fviernau requested a review from sschuberth May 8, 2023 18:51
@fviernau fviernau force-pushed the scanner-store-file-listings branch from 21f4cbb to 3943d25 Compare May 9, 2023 08:47
@fviernau
Copy link
Member Author

fviernau commented May 9, 2023

Looking at above discussion I have a hunch it may be more efficient to come to a conclusion in a short call. @sschuberth @mnonnenmacher would you be in for that?

@fviernau fviernau changed the title scanner: Create and store file listings for each scanned provenance scanner: Create and store file listings for each resolved provenance May 10, 2023
@fviernau fviernau force-pushed the scanner-store-file-listings branch from 3943d25 to 809b8cd Compare May 10, 2023 17:10
@fviernau fviernau requested a review from mnonnenmacher May 10, 2023 17:10
@sschuberth sschuberth dismissed their stale review May 10, 2023 17:22

I'm currently on sick leave.

@fviernau
Copy link
Member Author

@mnonnenmacher @sschuberth I've incorporated the changes we've agreed upon in our call yesterday.

@sschuberth
Copy link
Member

What is the rationale for this? The default might change, but this would still be an error.

@mnonnenmacher the removal of the explicit Severity.ERROR happened on my request to align with the code base. If we cannot rely on the value of default arguments, it makes little sense to use default values at all...

@mnonnenmacher
Copy link
Member

What is the rationale for this? The default might change, but this would still be an error.

@mnonnenmacher the removal of the explicit Severity.ERROR happened on my request to align with the code base. If we cannot rely on the value of default arguments, it makes little sense to use default values at all...

Then I would argue for removing the default value, the issue severity is something I would like to see on the call side.

fviernau added 6 commits May 11, 2023 13:48
This is supposed to be used for storing the list of file paths along
with SHA1 sums for each resolved provenance in a
`ProvenanceFileStorage`. The first use case this targets is producing
SBOMs which include file paths and SHA1 sums for the file findings and
later potentially for all files.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
This class is supposed to be used for creating and obtaining file
listings by provenance.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The corresponding storage is supposed to be used to store a list of file
paths and SHA1 sums for each resolved provenance.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The file listing storage configuration has been added in a preceeding
change.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Store the files as a compressed blob using `xz`, as this leads to better
results compared to `gzip` or `zip` while still being reasonably fast.
The compressed sizes of `JSON` and `YAML` files are similar. So, choose
`YAML` for better readability.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
@fviernau
Copy link
Member Author

fviernau commented May 11, 2023

Then I would argue for removing the default value, the issue severity is something I would like to see on the call side.

Are you fine with de-scoping this from this PR then (leave this PR as-is and address the topic separately). I don't like changing this back and forth. In my view this is now a different topic. Agreed?

@fviernau fviernau force-pushed the scanner-store-file-listings branch from 809b8cd to 2a080de Compare May 11, 2023 12:07
@fviernau fviernau requested a review from mnonnenmacher May 11, 2023 12:07
@fviernau fviernau merged commit 2ed8faf into main May 12, 2023
@fviernau fviernau deleted the scanner-store-file-listings branch May 12, 2023 05:38
fviernau added a commit that referenced this pull request May 12, 2023
In [1] a list of path-sha1-tuples has been introduced and
"file listing" has been choosen as name. In the code review for [1]
it's been decided that just "file list" would be the better name. So,
rename all the occurences of file listings introduced in [1]
accordingly.

[1] #6970

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this pull request May 12, 2023
In [1] a list of path-sha1-tuples has been introduced and
"file listing" has been choosen as name. In the code review for [1]
it's been decided that just "file list" would be the better name. So,
rename all the occurences of file listings introduced in [1]
accordingly.

[1] #6970

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this pull request May 15, 2023
In [1] a list of path-sha1-tuples has been introduced and
"file listing" has been choosen as name. In the code review for [1]
it's been decided that just "file list" would be the better name. So,
rename all the occurences of file listings introduced in [1]
accordingly.

[1] #6970

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this pull request May 15, 2023
In [1] a list of path-sha1-tuples has been introduced and
"file listing" has been choosen as name. In the code review for [1]
it's been decided that just "file list" would be the better name. So,
rename all the occurences of file listings introduced in [1]
accordingly.

[1] #6970

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scanner About the scanner tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scanner: Store file lists in a dedicated storage
3 participants