Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SPDX format support for SBOM #608

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

midnightercz
Copy link

Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms.
No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx().
New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started.

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

@brunoapimentel
Copy link
Contributor

Just a quick heads-up: the utils/merge_syft_sbom.py script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.

@midnightercz
Copy link
Author

Just a quick heads-up: the utils/merge_syft_sbom.py script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.

I thought I will eventually integrate the changes into new subcommand implemented here: #593 but it wasn't merged yet when I started working on this one

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
@eskultety eskultety changed the title [STONEBLD-2714] Added SDPX format support for SBOM Add SDPX format support for SBOM Aug 27, 2024
@MartinBasti
Copy link
Contributor

typo in commit message and PR: s/SDPX/SPDX/

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
@midnightercz midnightercz changed the title Add SDPX format support for SBOM Add SPDX format support for SBOM Aug 29, 2024
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
@midnightercz
Copy link
Author

Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, a couple of minor changes is needed. Edit: you would also need to make sure UTs pass.

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
tests/unit/test_cli.py Show resolved Hide resolved
tests/unit/test_cli.py Outdated Show resolved Hide resolved
@a-ovchinnikov
Copy link
Collaborator

Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?

merge-sboms is a thin wrapper around an internal function that is tested elsewhere. It does not do much except some validation of input and then invoking a method for saving merge results (which is also tested elsewhere). There is not much value in doing this sort of test.

@a-ovchinnikov
Copy link
Collaborator

Pushing the change to collect early round of feedback on rework. There is still some work to do, mainly to add more tests and to restructure the history to make it more manageable.

@chmeliik
Copy link
Contributor

chmeliik commented Dec 17, 2024

Tried this out on https://github.com/cachito-testing/gomod-pandemonium

cachi2 fetch-deps --sbom-output-type spdx '[
        {"type": "gomod"},
        {"type": "gomod", "path": "terminaltor"},
        {"type": "gomod", "path": "weird"}
    ]'

Noticed a problem:

Multiple purls for one package

There are 15 packages with 2 purls (and curiously, none with more than 2)

jq < cachi2-output/bom.json '.packages | map(select(.externalRefs | length >= 2))'

Some interesting ones:

crypto/rand vs. math/rand - ⚠️ definitely not the same package!
  {
    "SPDXID": "SPDXRef-Package-crypto/rand-None-8034fbc7583154cf2c496e52e0e50888e58253e533efd316eb8f586b4e151352",
    "name": "crypto/rand",
    "externalRefs": [
      {
        "referenceLocator": "pkg:golang/crypto/rand?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      },
      {
        "referenceLocator": "pkg:golang/math/rand?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      }
    ],
    "annotations": [
      {
        "annotator": "Tool:cachi2:jsonencoded",
        "annotationDate": "2024-12-17T13:06:33Z",
        "annotationType": "OTHER",
        "comment": "{\"name\": \"cachi2:found_by\", \"value\": \"cachi2\"}"
      }
    ],
    "downloadLocation": "NOASSERTION"
  }
crypto/internal/alias vs. vendor/golang.org/x/crypto/internal/alias

Arguably a similar package? Probably best not to group them though

  {
    "SPDXID": "SPDXRef-Package-crypto/internal/alias-None-b72e17ed348a15d861a8d6a517507932ed428364d043e0848384e16f50b6dbe1",
    "name": "crypto/internal/alias",
    "externalRefs": [
      {
        "referenceLocator": "pkg:golang/crypto/internal/alias?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      },
      {
        "referenceLocator": "pkg:golang/vendor/golang.org/x/crypto/internal/alias?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      }
    ],
    "annotations": [
      {
        "annotator": "Tool:cachi2:jsonencoded",
        "annotationDate": "2024-12-17T13:06:33Z",
        "annotationType": "OTHER",
        "comment": "{\"name\": \"cachi2:found_by\", \"value\": \"cachi2\"}"
      }
    ],
    "downloadLocation": "NOASSERTION"
  }
?type=module vs. ?type=package - probably fine to group these (but not needed IMO)
  {
    "SPDXID": "SPDXRef-Package-github.com/Masterminds/semver-v1.4.2-1160d2af59246b7be80ace7615339f4f826b6de1586871b2e4e6eb2def67585b",
    "name": "github.com/Masterminds/semver",
    "versionInfo": "v1.4.2",
    "externalRefs": [
      {
        "referenceLocator": "pkg:golang/github.com/Masterminds/semver@v1.4.2?type=module",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      },
      {
        "referenceLocator": "pkg:golang/github.com/Masterminds/semver@v1.4.2?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      }
    ],
    "annotations": [
      {
        "annotator": "Tool:cachi2:jsonencoded",
        "annotationDate": "2024-12-17T13:06:33Z",
        "annotationType": "OTHER",
        "comment": "{\"name\": \"cachi2:found_by\", \"value\": \"cachi2\"}"
      }
    ],
    "downloadLocation": "NOASSERTION"
  }

@chmeliik
Copy link
Contributor

Then I also tried out SBOM merging on https://github.com/cachito-testing/pip-e2e-test

cachi2 fetch-deps --sbom-output-type spdx pip
cd ..
# the two repos are next to each other
cachi2 merge-sboms --sbom-output-type spdx gomod-pandemonium/cachi2-output/bom.json pip-e2e-test/cachi2-output/bom.json > merged.bom.json

First minor observation - the output SBOM type does not default to the input SBOM type, which feels weird

Validity problem - the SPDXRef-DocumentRoot-File- element disappears from the merged SBOM.

The individual SBOMs do have it, and the merged SBOM still references it in relationships:

    {
      "spdxElementId": "SPDXRef-DOCUMENT",
      "comment": "",
      "relatedSpdxElement": "SPDXRef-DocumentRoot-File-",
      "relationshipType": "DESCRIBES"
    },
    {
      "spdxElementId": "SPDXRef-DocumentRoot-File-",
      "comment": "",
      "relatedSpdxElement": "SPDXRef-Package-vendor/golang.org/x/sys/cpu-None-fff45ff8a23684e4ba82a21914e5418abe0d79156f3d0d8f33925961edde6d19",
      "relationshipType": "CONTAINS"
    }

But there's no such SPDXID in the .packages array

@chmeliik
Copy link
Contributor

For reference, how I found out about the issues: I'm currently working on the script that merges cachi2 SBOMs with syft SBOMs, and the cachi2 SBOM failed this check chmeliik/build-tasks-dockerfiles@d4ee394

@chmeliik
Copy link
Contributor

Another annoying thing: cachi2 merge-sboms --sbom-output-type spdx randomizes the relationships order

# use any non-empty SBOM
cp cachi2-output/bom.json bom1.json
cp cachi2-output/bom.json bom2.json
cachi2 merge-sboms --sbom-output-type spdx bom1.json bom2.json > merged1.json
cachi2 merge-sboms --sbom-output-type spdx bom1.json bom2.json > merged2.json

diff merged1.json merged2.json
# created date is different, which is fine, and relationships order doesn't match, which is annoying

@a-ovchinnikov
Copy link
Collaborator

There are 15 packages with 2 purls (and curiously, none with more than 2)

Turns out the original implementation ignored Optional[namespace] element of purls when deduplicating packages. Taking it into account resolved the first two issues. I believe it is safe to keep type=module and type=package lumped together, at least for now. As for none with more than 2 part I believe it is due to the structure of the repo: if it somehow had more variety in purl types then I would expect to see more purls merged into a single package.

the output SBOM type does not default to the input SBOM type

The current default is still CycloneDX, so it is assumed that if you don't say otherwise you expect to get CyDX as output. This could be changed to SPDX, but I am not sure trying to guess correct output type basing on input type is a good idea: what should cachi2 do when it is asked to merge one CyDX and one SPDX?

@chmeliik
Copy link
Contributor

The current default is still CycloneDX, so it is assumed that if you don't say otherwise you expect to get CyDX as output. This could be changed to SPDX, but I am not sure trying to guess correct output type basing on input type is a good idea: what should cachi2 do when it is asked to merge one CyDX and one SPDX?

Ah, I didn't know cachi2 supports that. The behaviour could be:

  • all inputs are cyclonex => output cyclonedx
  • all inputs are spdx => output spdx
  • inputs are mixed => require explicit output-type param

But defaulting to CycloneDX is fine too

@a-ovchinnikov
Copy link
Collaborator

the SPDXRef-DocumentRoot-File- element disappears from the merged SBOM ...
cachi2 merge-sboms --sbom-output-type spdx randomizes the relationships order ...

Good catch, thank you. Fixed.

The behaviour could be: ...

Yes, it definitely could be, but I am more concerned about user perspective -- having a known default and a way to override it feels more straightforward than having some logic which sometimes works by itself and sometimes requires assistance. I think I'll keep the current behavior for now and will change it if there is more popular demand for that.

midnightercz and others added 4 commits December 24, 2024 21:44
Added SDPX format support for SBOM

Support for SPDX format was added to fetch-depds command and also
to merge_syft_sboms.
No changes were made in particular package manager generating components
which are then converted to cyclonedx format. SPDX sbom can be obtained
by calling Sbom.to_spdx().
New switch sbom-type was added to merge_syfy_sboms, so user can choose
which output format should be generated - default is cyclonedx.
Once all tooling is ready to consume spdx sboms, cutoff changes
in this repository can be started.

SPDXRef-DocumentRoot-File- includes all spdx packages and is set
to be described by SPDXRef-DOCUMENT. This way of spdx generation
is closer to way syft generates spdx

Included fixes from sbom validation tool

- Added required documentNamespace attribute
- Added creationInfo.created attribute
- changed SPDXID-Package-* to SPDXRef-Package
- changed annotationDate to not include ms
- changed annotator to include prefix Tool:

Co-authered-by: Alexey Ovchinnikov <aovchinn@redhat.com>
Signed-off-by: Jindrich Luza <jluza@redhat.com>
Having merge_outputs in global utils can result in import loops
in rather unexpected places. Moving it to a different scope to
prevent those from appearing.

Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
This commit builds on top of proposed SPDX support and
addresses the following:
 * simplifies merge logic to align it with design in
   konflux-ci/architecture#213;
 * adjusts the model to make it work with Syft generated
   SPDX Sboms;
 * adds tests for some of existing and some new SPDX-CyDX
   handling and merging cases.

This is not the final form and more work still has to be done:
 * several important test cases are pending;
 * CyDX SBOM has to be amended to better fit into the new
   structure;
 * Decisions need to be additionally documented within the code;
 * This commit and its parent must be split into several
   parts to simplify repository history.

Co-authored-by: Jindrich Luza <midnightercz@gmail.com>
Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
Adding test conditions and simplifying overall
code structure.

Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
"""

def create_document_root() -> SPDXPackage:
return SPDXPackage(name="", versionInfo="", SPDXID="SPDXRef-DocumentRoot-File-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this element required by the spec? I tried to find references to it, but couldn't. I was also wondering why does it end with a hyphen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the reference for it in the Konflux SPDX design (which triggered this change here).

So the intention was to inform the folder that was processed to generate the SBOM. If so, I think we want to keep it as "SPDXRef-DocumentRoot-Directory-{dirname}", and maybe use the same dirname for the "name" package attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to have one "main element" for each "main package". So cachi2 fetch-deps '[{"type": "pip"}, {"type": "npm"}]' would result in something like the example here https://github.com/konflux-ci/build-tasks-dockerfiles/tree/main/sbom-utility-scripts/scripts/add-image-reference-script#list-of-updates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that would need more changes I think, so a made-up main element is fine in the meantime.

But if you do use dirname for the name attribute, please use ./{dirname} to clearly make it look like a file rather than an actual package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to have one "main element" for each "main package

I like this idea, it could be a future improvement.

# mypy is upset by patrition_by being broadly typed.
packages = [create_document_root()] + libs_to_packages(libraries) # type: ignore
files = files_to_packages(cydxfiles) # type: ignore
relationships = [create_root_relationship()] + link_to_root(packages) + link_to_root(files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the spec does not require these relationships to be informed (I also checked the official examples to try to figure this out).

Now, IIUC, we're saying that every dependency is contained by this "virtual" root package SPDXRef-DocumentRoot-File-. I'm wondering if this is adding any value in the end, as it does add a lot of extra lines to the SBOM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am following a precedent set by Syft here. Since these relationships seem to be added by default I think it is best to keep them (there will be some interaction with Syft, at least for some time).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, Syft does at least two types of relationships:

{
  "spdxElementId": "SPDXRef-DocumentRoot-Directory--home-bpimente-Projects-test-repos-porta",
  "relatedSpdxElement": "SPDXRef-Package-npm-babel-plugin-polyfill-corejs3-a77653646f05b06b",
  "relationshipType": "CONTAINS"
}

And one for the file that described the dependency (e.g. package.json):

{
  "spdxElementId": "SPDXRef-Package-npm-string-decoder-14144cdbecdae249",
  "relatedSpdxElement": "SPDXRef-File-yarn.lock-dcee26e98fae7b36",
  "relationshipType": "OTHER",
  "comment": "evident-by: indicates the package's existence is evident by the given file"
}

I'm not sure if we'd also want to add the latter at some point, but it does feel more useful than the first one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, feel free to close this thread, it's a non-blocker.

cachi2/core/models/sbom.py Show resolved Hide resolved
A stricter rule is applied to packages equivalency determination.
Now only those packages whos purls sets are identical will be
considered identical, they will be reported separately
otherwise.

This change also resulted in a redefinition of one of test cases

Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
@brunoapimentel
Copy link
Contributor

/ok-to-test

Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't dare to comment on the overall design of this at this point in time without diving into the depths of the SPDX spec, even more so since 3 people have done that already. I can only comment on the overall structure of the PR, more specifically, what caught my eye as I was going through it and what I think would have helped me understand the topic without actually diving into the spec. To be fair, @a-ovchinnikov you've done a great job of describing majority of the functions/methods so that any non-intuitive logic is covered by an extensive commentary. That said, I think if you try to squash changes into the very first commit and introduce other changes more gradually as I tried to hint, second pass over this PR is going to be a much better experience for me such that I'll understand at least a fraction of what's actually going on.

) -> None:
"""Process a single SPDX relationship.

Add relatationship to merged relationships list while replacing spdxElementId and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/relatationship/relationship.

@@ -196,22 +195,3 @@ def get_cache_dir() -> Path:
except KeyError:
cache_dir = Path.home().joinpath(".cache")
return cache_dir.joinpath("cachi2")


def merge_outputs(outputs: Iterable[RequestOutput]) -> RequestOutput:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a cleanup of a pre-existing behaviour not strictly related to the PR, so it could be the first commit, couldn't it?

@@ -101,124 +101,3 @@ def merge(self, other: "Self") -> "Self":
pip_package_binary=self.pip_package_binary or other.pip_package_binary,
bundler_package_binary=self.bundler_package_binary or other.bundler_package_binary,
)


def merge_relationships(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels really weird for a reviewer to have to read "initial implementation", trying to get a grasp of it only to realize a few commits later in the same patch series that much of it is simply dropped.
My guess is that this commit being proposed standalone is supposed to be a way of giving credit to the original author of "initial implementation" commit? Be it as it may and while a noble gesture, strictly from review POV these line dropping changes should be squashed to the original commit to reduce the confusion and declutter the overall diff.
The common practice is to just use a bunch of Signed-off-bys with every contributor's name/email. It's not like you need to take the authorship away from commit 1. I believe readability of the changes in such a complex topic as adding SPDX support should take precedence over any noble gestures in the name of collaboration :).

Comment on lines +325 to +326
def __hash__(self) -> int:
return hash(self.algorithm + self.checksumValue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to make instances hashable and therefore usable in maps/sets? I suppose so, but this is insufficient because the instances are still mutable, a single update to any of the values invalidates all places that such an instance would have been used as a hashable object. I know it may be unlikely, but I feel like there are enough models here where it might be quite hard to track down a potential violator. Can we make use of the frozen config option for these models?
https://docs.pydantic.dev/latest/api/config/#pydantic.config.ConfigDict.frozen

@staticmethod
def deduplicate_spdx_packages(items: Iterable[SPDXPackage]) -> list[SPDXPackage]:
"""Deduplicate SPDX packages and merge external references.

If package with same name and version is found multiple times in the list,
merge external references of all the packages into one package.
"""
# NOTE: keeping this implementation mostly intact for historical reasons.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate, this implementation was done within this very PR so indirectly referring to it as "inferior" and only keeping it intact for historical reasons doesn't seem very convincing in general.

@@ -135,7 +139,7 @@ def to_spdx(self, doc_namespace: str) -> "SPDXSbom":
for prop in component.properties:
annotations.append(
SPDXPackageAnnotation(
annotator="Tool:cachi2:jsonencoded",
annotator="Tool: cachi2:jsonencoded",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is any of Tool and jsonencoded somewhat common/used in the SPDX ecosystem?

@@ -197,6 +198,12 @@ def get_cache_dir() -> Path:
return cache_dir.joinpath("cachi2")


def first(predicate: Callable, iterable: Iterable, fallback: Any) -> Any:
def first_for(predicate: Callable, iterable: Iterable, fallback: Any) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this change should happen in the commit which is introducing first as utility function rather than renaming it later in the series.

Comment on lines +206 to +209
def partition_by(predicate: Callable, iterable: Iterable) -> tuple[Iterable, Iterable]:
"""Partition iterable in two by predicate."""
i1, i2 = tee(iterable)
return filterfalse(predicate, i1), filter(predicate, i2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly to first, feels like partition_by could be a standalone patch OR at least paired with first in a standalone patch without other logic around.

Comment on lines 451 to 471
cyclonedx_sboms_to_merge = []
for _sbom in sboms_to_merge:
if not isinstance(_sbom, Sbom):
cyclonedx_sboms_to_merge.append(_sbom.to_cyclonedx())
else:
cyclonedx_sboms_to_merge.append(_sbom)
# TODO: merging SBOMs should be done uniformly via "+"
sbom: Union[Sbom, SPDXSbom] = Sbom(
components=merge_component_properties(
chain.from_iterable(s.components for s in cyclonedx_sboms_to_merge)
)
)
start_sbom = sboms_to_merge[0].to_cyclonedx()
else:
spdx_sboms_to_merge = []
for _sbom in sboms_to_merge:
if not isinstance(_sbom, SPDXSbom):
spdx_sboms_to_merge.append(_sbom.to_spdx(doc_namespace="NOASSERTION"))
else:
spdx_sboms_to_merge.append(_sbom)

sbom = sum(spdx_sboms_to_merge, start=spdx_sboms_to_merge[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial implementation squash material?

Comment on lines +10 to +20
PropertyName = Literal[
"cachi2:bundler:package:binary",
"cachi2:found_by",
"cachi2:missing_hash:in_file",
"cachi2:pip:package:binary",
"cdx:npm:package:bundled",
"cdx:npm:package:development",
]

def merge_component_properties(components: Iterable[Component]) -> list[Component]:
"""Sort and de-duplicate components while merging their `properties`."""
components = sorted(components, key=Component.key)
grouped_components = groupby(components, key=Component.key)

def merge_component_group(component_group: Iterable[Component]) -> Component:
component_group = list(component_group)
prop_sets = (PropertySet.from_properties(c.properties) for c in component_group)
merged_prop_set = functools.reduce(PropertySet.merge, prop_sets)
component = component_group[0]
return component.model_copy(update={"properties": merged_prop_set.to_properties()})
class Property(pydantic.BaseModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hunk isn't strictly SPDX related, is it? IOW Could we introduce it at the beginning of the series so that reviewers can see it and forget about it as they go through the PR?

@joejstuart
Copy link

Is it known how a property like this will be replicated for SPDX?

"components": [
    {
      "externalReferences": [
        {
          "type": "distribution",
          "url": "https://github.com/cachito-testing/cachi2-generic/archive/refs/tags/v2.0.0.zip"
        }
      ],
      "name": "archive.zip",
      "properties": [
        {
          "name": "cachi2:found_by",
          "value": "cachi2"
        }
      ],
      "purl": "pkg:generic/archive.zip?checksum=sha256:386428a82f37345fa24b74068e0e79f4c1f2ff38d4f5c106ea14de4a2926e584&download_url=https://github.com/cachito-testing/cachi2-generic/archive/refs/tags/v2.0.0.zip",
      "type": "file"
    },
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants