Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR 44: describing support of SDPX SBOM format in konflux #213

Merged
merged 16 commits into from
Dec 2, 2024
Merged
146 changes: 146 additions & 0 deletions ADR/0039-spdx-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# 39. SPDX SBOM support
midnightercz marked this conversation as resolved.
Show resolved Hide resolved

Date: 2024-09-24

## Status

Proposed

## Context

SPDX SBOM format enables additional features not available in cyclondedx like multiple purl attributes per component. SPDX is also a widely adopted standard for software bill of materials.
This ADR describes how to enable use of SPDX SBOM format in Konflux.

## Decision


### SBOM lifecycle in build pipeline

At the start SBOMs are generated by cachi2 and syft. At later phase of the build pipeline, SBOMs from base images are extracted and merged toghether with SBOMs generated for the currently build container. When switching to spdx, newly built container image can be used as base image another container image. Therefore any tooling/task which works with sboms has to be able to work with both formats.
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
As a result, all tools and tasks should implement the sbomType attribute to specify the expected SBOM format for input and output. This will also allow tools to be tested with SPDX before the entire pipeline transitions to this format.

Initially, SBOMs are generated by Cachi2 and Syft. In a later phase of the build pipeline, SBOMs from base images are extracted and merged with those generated for the container currently being built. When transitioning to SPDX, the newly built container image can serve as a base image for another container. Therefore, any tools or tasks that work with SBOMs must be compatible with both formats.
midnightercz marked this conversation as resolved.
Show resolved Hide resolved


### CycloneDX -> SPDX conversion

CycloneDX (1.4) is structured document in json format with following structure (not full specification)

- Document Root
- Metadata
- Tools
- List<Tool>
- vendor
- name
- Components
- List<Component>
- name
- version
- purl
- properties
- List<Property>
- name
- value
- formulations
- List<Component>

SPDX (2.3) is structured document in json format with following structure(not full specification):
- Document Root
- name
- SPDXID
- creationInfo
- Creators
- List<String>
- components
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
- List<Component>
- SPDXID
- name
- versionInfo
- externalRefs
- List<ExternalRef>
- referenceCategory
- referenceType
- referenceLocator
- annotations
- List<Annotation>
- annotationDate
- annotationType
- annotator
- Comment
- relationships
- List<Relationship>
- spdxElementId
- relationshipType
- relatedSpdxElement

#### 1:1 conversions
Following CycloneDX to SPDX attributes are converted as 1:1 as they represent the same thing.

| SPDX Attribute | CycloneDX Attribute |
|-----------------------|---------------------|
| components | packages |
| component.name | package.name |
| component.versionInfo | package.version |


#### Component.purl
CycloneDX (version 1.4) supports only a single purl attribute per component. SPDX doesn’t have a direct attribute, but instead every package includes an externalRefs array which describes all external references for the package. There are defined reference categories and types. For PURL, category PACKAGE-MANAGER and type purl is used. The purl itself will be stored as referenceLocator

| SPDX Attribute | CycloneDX Attribute |
|------------------------------|---------------------------------------------------------------|
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
| component.purl = <PURL> | package.externalRefs = [{referenceCategory:”PACKAGE-MANAGER”, |
| | referenceType:purl, |
| | referenceLocator: <PURL> |
| | }] |


#### Component.properties
CycloneDX components properties describe mapping of string:string properties for given component. SPDX component doesn’t have anything similar to cyclonedx properties. SPDX Package annotations are the only attribute where custom data can be stored and the only “customizable” field where there is comment which is a simple string. Due to that fact, cycloneDX property in format of {“name”: <string>, “value”: <string>} is encoded into json string. There can be also annotations produced by other tools. Therefore to be able to tell annotation comment is json encoded, annotator should ends with string “:jsonencoded”

| SPDX Attribute | CycloneDX Attribute |
|---------------------------------|------------------------------------------------------------|
| package.properties = [ | component.annotations = [ |
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
| {“name”: …, “value”: …} | {..., annotator: <tool>”:jsonencoded” |
| ] | ] |


#### Formulations
CycloneDX formulations describe how the container was manufactured. In SPDX, Relationship elements can be used for the same purpose. All elements in SPDX have SPDXID attribute which is an element identifier unique in the whole SBOM document. Relationship element describes relation between two elements using their SPDXID and relationship type. Relationship type BUILD_TOOL_OF can be used to express the relationship of packages which were used to build the container.

| SPDX Attribute | CycloneDX Attribute |
|---------------------------------|------------------------------------------------------------|
| Formulations.components = [{}] | Relationships = [{ |
| | spdxElementId = <A-PACKAGE-ID> |
| | relationshipType=BUILD_TOOL_OF |
| | relatedSpdxElement=<ROOT-DOCUMENT-ID> |
| | }] |
midnightercz marked this conversation as resolved.
Show resolved Hide resolved


#### Metadata.tools
The CycloneDX metadata.tools sub attributes that we are mostly interested in are the vendor and name elements. Information about the creation of the SPDX document can be stored into creationInfo. CreationInfo.creators element is basically a list of strings. There’s a vague specification about how it should be structured in the standard. Strings should be formatted in the following way: “<Attribute>: <Value>”. For example vendor should be stored as “Vendor: <vendor>”

| SPDX Attribute | CycloneDX Attribute |
|------------------------------------------------|---------------------------------------------------|
| Metadata.tools = [{“vendor”: “X”, “name”: “Y”] | CreationInfo.creators = [“Vendor: X”, “Tool: Y”] |
midnightercz marked this conversation as resolved.
Show resolved Hide resolved


#### Merging SPDX
##### Packages
Packages of two SPDX documents can be merged together as a concatenation of two lists. In cycloneDX component elements can have only a single purl attribute, therefore component elements representing packages with the same name and version but with different purl have to be stored as multiple elements. CycloneDX package elements can bear multiple purls. Therefore multiple cycloneDX components can be squashed together into single SPDX package element with purls concatenated into a single list. Following rules are applied to packages merging process:
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
- Packages with the same package name and versionInfo are squashed into single package element
- Package with the same package name and versionInfo set to None (or empty) is squashed with package with the same name and non-empty versionInfo
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
NOTE: packages cannot be merged together based on SPDXID attribute as there’s no specification in the spdx standard on how SPDXID should be calculated. Individual tools can calculate it differently while still passing condition to make it unique across the whole document.
##### Relationships
SPDX relationships represent graph/tree structure of relations of elements in the document. The Root element is the SPDX document itself. Individual packages are in relationship CONTAINS with the root document. In the case of packages which were used to build the container, packages are in relationship BUILD_TOOL_OF with the root document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Individual packages are in relationship CONTAINS with the root document.

Why is that important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your question. It describes structure of the document so I guess that's why it's important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make this clearer to say The root element (the document itself) is named "SPDXRef-DOCUMENT".

The point, I think, is that an SPDX document can say "I describe this container image, which contains these things. The image was built using these other things, which aren't included". I don't know if CycloneDX 1.4 can express this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CycloneDX 1.4 probably can't, but we use 1.5

CycloneDX 1.5:

{
  "metadata": {
    "component": {
      // main component (e.g. the output container image)
    }
  },
  "components": [
    // the dependencies included in the container image
  ],
  "formulation": [
    {
      "components": [
        // build-time-only dependencies of the container image
      ]
    }
  ]
}

Conversion to SPDX, in my mind:

SPDXRef-DOCUMENT DESCRIBES .metadata.component
.metadata.component CONTAINS .components[]
.formulation[].components[] BUILD-TOOL-OF .metadata.component

Copy link
Contributor

@chmeliik chmeliik Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, Konflux SBOMs don't have a useful .metadata.component (it's just the syft-specific nonsense component). Which IMO shouldn't stand in the way of this ADR, we should document the desired state and file a story to make it so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about "base image" here: I think we're talking about builder images, not parent images.

Asi in, "parent image" = the last FROM instruction, "builder image" = any FROM instruction other than the last?

Currently, Konfux reports both of those in the .formulations[], distinguished by a custom property

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asi in, "parent image" = the last FROM instruction, "builder image" = any FROM instruction other than the last?

Exactly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These make sense to me. The other BaseImages relations not so much - the SBOM doesn't describe the base images, those have their own SBOMs

So the you basically want:

SPDXRef-DOCUMENT -> DESCRIBES: SPDXRef-Image
SPDXRef-DOCUMENT -> CONTAINS: SPDXRef-Image
SPDXRef-Image -> CONTAINS: PackageA
SPDXRef-BuilderImage-Package-1-> BUILD_TOOL_OF: SPDXRef-Image

So for purl, I assume it will be something like:
pkg:docker/ubi9/ubi@sha256:abcdef?repository_url=registry.redhat.io

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks correct except for

SPDXRef-DOCUMENT -> CONTAINS: SPDXRef-Image

Copy link
Contributor

@chmeliik chmeliik Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the purl will be pk:oci rather than pkg:docker, same as the builder/base images (but the exact format doesn't matter that much for this ADR). And FYI, Aleš is already working on the purl konflux-ci/build-tasks-dockerfiles#181

NOTE: Packages can be wrapped in a “virtual package”. This package can have empty name and version and no attributes, or name can be set to directory or container name which was set as source for generating SBOM document. This element has insignificant information value and can be omitted.
midnightercz marked this conversation as resolved.
Show resolved Hide resolved
Relations of two documents needs to be merged together into single graph in a way which keeps the graph structure of the original graph of the main document (into which other document will be merged to). Once packages are merged together, relationships of the second document can be cleared off relations which refer to packages not included in the merged package list. SpdxElementId and relatedSpdxElement point to root document id of the second document should be replaced with root document id of the main document. If there’s “virtual package” in the second document, ids in relationships referring to it should be replaced with “virtual package” of the main document or main document id directly (if there’s no “virtual package”)
midnightercz marked this conversation as resolved.
Show resolved Hide resolved


## Consequences
All tooling used in pipeline needs to support SPDX SBOM format


## References
CycloneDX specification https://cyclonedx.org/specification/overview/o
SPDX specification https://spdx.github.io/spdx-spec/v2.3/
SPDX json schema https://github.com/spdx/spdx-spec/blob/development/v2.3/schemas/spdx-schema.json#L724
midnightercz marked this conversation as resolved.
Show resolved Hide resolved