-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Standard Base Image Annotations #821
Comments
In practice, this would be most useful if we took it all the way to the extreme of storing a full "tree" of references (given that a single "base" isn't always accurate), wouldn't it? For example: FROM golang:alpine AS build
# ... build the thing statically ...
FROM scratch
COPY --from=build ... Now imagine a critical musl vulnerability that warrants a rebuild of In a more complex example, you might have something like this: FROM foo
COPY --from=bar ...
COPY --from=baz ...
# then, separately
FROM that
COPY --from=buzz ... |
I've discussed this a bit before with @imjasonh and we decided to simplify this a bit by just starting with the single base image use case. I think it's definitely possible to adapt the proposal to suit multiple base images, but the semantics get a bit tricky. The final image has multiple bases, some of which are ephemeral intermediary images that would never get pushed anywhere (with multi-stage builds), so we couldn't reference them directly, but we could capture their base images, which would allow you to detect when a build needs re-triggering. For some of these cases, you might be able to just annotate a layer -- "this layer derives (eventually) from this base image". But for more complex cases, there's not an obvious way to represent this... one annotation per layer wouldn't work if a single layer somehow derived from multiple base images, e.g. imagine flattening an image. We could try to force this by having delimiters and multiple values within an annotation (honestly, not horrible), or we could have a fit-for-purpose separate tree data structure that gets attached to the image to indicate its pedigree, not unlike SBOM or signature use cases. For single-base images, you get a nice tree structure by default. E.g. you could imagine that I have several images based on ubuntu, and ubuntu might be based on debian (it's not, but imagine it). A rebuild of debian would trigger a rebuild of ubuntu which would trigger a rebuild of my images. Thinking about it more, a flattened list of base image dependencies sounds like it might actually work. The only change we'd need to make is adding a delimiter for both annotations so that they can support multiple values. I would expect A list of direct base dependencies would allow me to answer all the questions I'd care about, and kind of ducks the definition of what a "base" image really is, if there's only one. We can say that they're just images that this image depends on at build time. I doubt that I'd ever want to attempt to automatically rebase a multi-based image, as that feels really unsafe, but you'd still get the benefit of knowing where some stuff came from and whether or not an image is up to date. @imjasonh WDYT? Anything I'm missing? |
some of the annotations are flexible to be used in more than just descriptors. This proposal is very specific to descriptors, and may need particular docs of when it does not make sense i.e. if the blob has been "squashed" such that the base image is no longer fetched for a rebuild. But even as i'm typing that, in the case of a "squashed" image, i would like to know the digest that the image was originally built upon. i think i'm supportive of the use-case, just trying to get a heads up on all the ways this will get abused. :-D |
I think what we settled on in the PR is a single top-level annotation in a manifest, pointing to the base image ref/digest. The language in the PR seems generic enough to abuse however we'd like :P |
then the references seem too spindly to me. 🤔 |
The annotations spec already contains this text suggesting that the annotation keys are intended for manifests and image indexes:
We obviously can't stop people from annotating anything however they want, but IMO this text helps forestall bug reports where someone applied the annotation to their toaster or whatever and it caught fire. |
@vbatts can you expand on the spindly-ness? |
Just the clients that pull through an image-index, and go straight to the architecture of their host, After thinking about this, there is nothing that will blow up due to this. Fishing with dynamite 🧨 |
I'd like to propose adding two new standard annotations to the spec, to describe information about an image's base image.
(This was previously briefly inquired about in #783)
Motivation
In general, it can be useful to users to be able to answer the question, "what is this image based on", or even for registry hosts to answer "which images are based on [this image]".
Use cases include being able to:
In the case of (3), this is known not to be universally safe, but in certain constrained use cases, for example Buildpacks, this can be made to be more safe such that safely rebasing is possible. Being able to produce many updated images at once using only registry API operations can greatly decrease the time it takes to roll out a fix for a vulnerability.
Proposal
To accomplish this, I'm proposing two new layer annotations:
If both annotations are provided, the value of base.ref.name is assumed to refer to the same object (image manifest or manifest list) as base.digest refers to at the time the image was built.
Examples
1. Identifying Vulnerable Base Images
Say a vulnerability is found in an image,
base-image:v6
, with digestbase-image@sha256:abcdef12
. A patch is applied and rolled out, tagged asbase-image:v6
with digestbase-image@sha256:deadb33f
.Given an image annotated as above (assuming that annotation is correct), I can trivially tell whether it was based on the vulnerable image, if its image.base.digest annotation is
sha256:abcdef12
.2. Identifying Updated Base Images
Say I have an image,
my-app:latest
, annotated with:sha256:deadb33f
base-image:v6
I can at any point query the registry to determine the current digest of
base-image:v6
, and if it doesn't match the image.base.digest, I can know that my image is not based on the currentbase-image:v6
image. Knowing this, I can perform a rebuild to pick up this latest base image to ensure I have the latest base image.For this to be efficient, base.ref.name and base.digest are assumed to refer to the same object, so clients only need to
HEAD
the base ref and compare the resulting digest, and notGET
a manifest list andHEAD
each child manifest.3. Rebasing Updated Base Images
If I know, due to how my image was built, that my app's layers will be compatible with new layers in my base image, I don't even need to rebuild -- I can rebase.
That is, I can take the layers in
my-app:latest
, remove the base layers shared inbase-image@sha256:deadb33f
, and substitute them with the layers inbase-image:v6
, then push that image back to the registry for further validation and delivery.This requires being able to identify a "base image seam", that is, which layers in an image belong to the base image, and which belong above the base image.
Rebasing is not guaranteed to produce valid images in all cases. But if the top-most app layers don't assume specific details of the lower base layers, this can be made to be safe.
More thorough examples and documentation in
crane
and Buildpacks.Alternatives Considered
Annotating Multiple Base Images
An image might have multiple layers of base images -- e.g., a tagged whole app image, based on a tagged shared base image containing some OS packages, based on a bare OS image.
One could therefore annotate the image to describe any number of base images. However, this adds complexity, both to the naming and semantics of the annotations (org.containers.image.base[N].digest ?) and in being able to constructively solve any of the motivating use cases listed above.
Instead, I propose only describing one base image seam per image, and if that image itself describes a base image seam of its own, and so on, so be it. In a situation where the base OS image fixes a vulnerability, the intermediate OS-packages base image can address it, producing a new tagged image, which will in turn signal to downstream app images that they are in need of an update.
Layer Index Annotation
Instead of annotating the image with the digest of the base image, one could annotate an integer index into the list of layers that represents the base layer seam. This is relatively straightforward, and still supports rebase scenarios, but it loses valuable provenance information. In this approach you wouldn't be able to answer "is this image based on a vulnerable base image", or "is this image based on an out-of-date base image".
topLayer
AnnotationInstead of annotating the image manifest with the digest of the base image, Buildpacks annotates the image with the digest of the "top layer", that is, the top layer of the base image. Layers above this layer are preserved when rebasing.
The disadvantage of this approach is in handling images containing duplicated layers. Perfectly valid images might include the same layer contents (meaning the same layer digest) multiple times, perhaps non-consecutively, which can make identifying the base layer seam using the "top layer digest" approach ambiguous and harder to specify.
It's worth noting that a
topLayer
annotation, and an integer layer index annotation, both have the benefit of being able to describe the base image seam without needing to consult a registry, though neither can communicate base image provenance. Buildpacks expresses base image provenance information through another annotation mechanism.Layer Descriptor Annotation
Instead of annotating the manifest with base image information, one could annotate the layer descriptors, to describe the base image seam. This would make it simpler to describe multiple base image seams, but as that's a non-goal (see above), this is not necessary, and having to validate that only one layer is annotated as a base image seam adds complexity.
cc @jonjohnsonjr @ekcasey @sclevine
The text was updated successfully, but these errors were encountered: