-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate configuration of scan storages #8721
Comments
For reference, this also relates to the pending draft at #5516 which tries to address the storage configuration complexity by centralizing and reusing it. |
That distinction makes sense to me. And maybe the interfaces should be named accordingly: For binary data that's already the case, but maybe |
In the context of this issue and #6603 I would also like to further simplify the scan storage implementation: Currently, an arbitrary number of storages can be configured, and they can act as either reader, writer, or both. I would like to change that so that only a single implementation can be configured which is always reader and writer. For the separate reader/writer configuration I am aware of only two use cases:
The following implementations would be affected by removing support for package based storages:
@oss-review-toolkit/core-devs I would be very interested in your feedback. |
I'm fine with removing support for package-based storages. Just as a reminder: I believe we do still have plans to add file-based storages / query provenance-based storages on a file-level. This should be loosely kept in mind for relevant design choices.
Sounds reasonable to me. I foresee more such "read-only" scan result storages coming up in the context of https://github.com/aboutcode-org, and such results would probably be queried by purl (so, also not by provenance; that is, unless you encode the
Are you, @heliocastro?
Just thinking out aloud here: I still wish for an IPFS scan storage. Maybe such a storage would be slow, so maybe you'd want to first use a faster storage and only use IPFS as the fallback. Such a thing would not be possible with the new implementation. Also I'm wondering: Does the current complexity mainly come from allowing readers and writers to be different, or from allowing multiple readers and writers? Would it make sense to maybe only remove the more complex of the aforementioned features, but keeping the other one? |
Yes, that's another argument for simplifying the current implementation.
So the faster storage would act as a cache for the IPFS storage? Maybe it's better to solve this in the IPFS implementation itself instead of on a higher level.
In the scanner code it's mainly the support for both package and provenance based storages. In the configuration it's mainly the support for multiple kinds of storages at the same time, separate readers and writers, and the fact that the configuration is scattered. What I would like is to separate ORT's internal storage logic from external services, because they do not use the same concepts, and then find other ways to integrate the external services. |
@mnonnenmacher @sschuberth We already overhaul the entire SW360 backend and from next release, 20.x it will be very different, so the current storage model will work only for version up to 18.x and i honestly think that this one should go for good.
|
@heliocastro So if I understand you correctly you are not using the
Sorry, I do not understand what you mean, could you elaborate? |
Yep, not using Sw360Storage at all, correct. For the second point, the "upload-result-to-sw360" utility today create projects and components based on the information coming from the ORT results. Basically, we are eliminating the Sw360Storage, but adding a functionality to upload-result to add a reference to whatever storage is defined on ORT config |
That sounds good, it's in line with my goal to separate the integration of external services from ORT's internal storage implementation.
I think currently there is no easy way to get this, could you give an example what this reference should look like, for example, if results are stored in a PostgreSQL database? |
The scanner supports storing multiple types of data in storages for reuse in subsequents runs or other tools:
Currently the storage backends can be configured separately for each of those four data types. While this is very flexible, in practice it provides little value. For example, if scan results are stored in a Postgres database, there is little reason to store provenance results in a different place. Or if file archives are stored in S3, there is little reason to store file lists in a different place.
This flexibility makes the configuration complex: The default settings store all data in a local directory which is usually not desired in a production setup, so to store the data remotely four storage backend configurations are required. This often confuses users and can also cause performance issues for users not knowing how the scanner works internally, for example, by forgetting to configure a provenance storage which leads to unnecessary repetition of the provenance resolution.
To simplify the configuration, the proposal is to consolidate the configuration to just two types of data:
The implementation proposal is:
ScanStorage
and all related classes toScanResultStorage
ScanStorage
was chosen becauseScanResultStorage
was already taken, but this is not the case anymore.ScanStorage
BinaryStorage
PostgresScanStorage
usesProvenanceBasedPostgresStorage
,PostgresPackageProvenanceStorage
andPostgresNestedProvenanceStorage
.MariaDbScanStorage
should not only provide a way to store scan results, but also to store provenance resolution results.The text was updated successfully, but these errors were encountered: