Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] many: add "ref" in file customizations to support host resources #1157

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented Jan 21, 2025

We have a feature request that asks to support including local
files in blueprints. This also came up recently in a dicussion
about the blueprints v2 format (that is where the name "ref" comes
from). This is tricky of course for the service and would probably
not be supported there initially (maybe later via a extra upload
option). But for image-builder-cli it does make sense and is
a useful feature that is support by other image-building tools
too.

This is a PoC to show how we can use blueprints and a new "ref"
field to give access to host files [0]. It works internally by
generating an osbuild manifest with org.osbuild.files inputs
that get populated via a curl stage [1] that pulls the files
from the host into the osbuild store. Because they are addressed
by hash (just like inline files) the manifest is still reproducible
and if anything changes in the file(s) from the host the manifest
will fail to build. Note that all of this is an implementaion detail
and we could change "osbuild" to do something else, the blueprint
API would not change.

Note also that this means that an attacker could potentially use
this in combination with custom rpm packages and maintainer scripts
to gain information about files outside the buildroot. As our isolation
is not a security feature this shouldn't be an issue (and arguably
running maintainer scripts in the rpm stage as root is already
opening a lot of possibilities to break out of the buildroot)
(obviously that is only an issue when the inputs are untrusted,
e.g. in the service in the future).

Blueprints generated like this are no longer portable as they
require the local files and when those change the manifest breaks.

They will become portable when we implement "offline" bundles,
i.e. a way to generate a manifest and their dependencies in
a single file that can then be rebuild by image-builder/osbuild.
This implementation works nicely with that idea because in the
most simple case a "offline-bundle" is just a manifest+store
directory. Because everything is looked up in the store first
by hash this will "just work". Offline bundles will also
fix the other issue with manifests that if sources like packages
go away (because they are outdated) the manifest also stops
building. Note that we can even have "half-offline" bundles
where we would put only the "file" refs into the .store we
generate but keep the package urls out.

The downside of using the store of course is that it means
duplication of data and may seem wasteful for big files.
However without that it is very hard to keep the gurantee
that a manifest is reproducible because if we access the
file directly we have a race-condition that the hash may
change anytime. Any trick like bind-mounting or hardlinking
will have the same issue. So unless we drop the reproducibility
for local file refs (which if of course an option) I see
no other way than to have a copy in the store.

An example how this would work:
https://github.com/osbuild/image-builder-cli/compare/main...mvo5:blueprints-for-local-files?expand=1#diff-dd961e42853d4345e2cf6a29f188f80386ed6b583056d7078a38c0c099c97c26R57

Note that once we have agreement on this we can also
expand to directories.

[0] PoC because tests are missing and some shortcuts in the
implemenation, the approach feels solid.
[1] IIRC coreos uses this trick as well in their manifests

Jira: RHEL-21334

mvo5 added 2 commits January 21, 2025 12:45
This commit adds a convenience package `hashutil` with a first
`Sha256sum` helper that can be used like the "sha256sum" cmdline
util. It will generate the hex encoded sha256 sum of the input
file.
We have a feature request that asks to support including local
files in blueprints. This also came up recently in a dicussion
about the blueprints v2 format (that is where the name "ref" comes
from). This is tricky of course for the service and would probably
not be supported there initially (maybe later via a extra upload
option). But for `image-builder-cli` it does make sense and is
a useful feature that is support by other image-building tools
too.

This is a PoC to show how we can use blueprints and a new "ref"
field to give access to host files [0]. It works internally by
generating an osbuild manifest with `org.osbuild.files` inputs
that get populated via a `curl` stage [1] that pulls the files
from the host into the osbuild store. Because they are addressed
by hash (just like inline files) the manifest is still reproducible
and if anything changes in the file(s) from the host the manifest
will fail to build. Note that all of this is an implementaion detail
and we could change "osbuild" to do something else, the blueprint
API would not change.

Note also that this means that an attacker could potentially use
this in combination with custom rpm packages and maintainer scripts
to gain information about files outside the buildroot. As our isolation
is not a security feature this shouldn't be an issue (and arguably
running maintainer scripts in the rpm stage as root is already
opening a lot of possibilities to break out of the buildroot).
This is only an issue when the inputs are untrusted, e.g. in the
service in the future.

Blueprints generated like this are no longer portable as they
require the local files and when those change the manifest breaks.

They will become portable when we implement "offline" bundles,
i.e. a way to generate a manifest and their dependencies in
a single file that can then be rebuild by image-builder/osbuild.
This implementation works nicely with that idea because in the
most simple case a "offline-bundle" is just a manifest+store
directory. Because everything is looked up in the store first
by hash this will "just work". Offline bundles will also
fix the other issue with manifests that if sources like packages
go away (because they are outdated) the manifest also stops
building. Note that we can even have "half-offline" bundles
where we would put only the "file" refs into the .store we
generate but keep the package urls out.

The downside of using the store of course is that it means
duplication of data and may seem wasteful for big files.
However without that it is very hard to keep the gurantee
that a manifest is reproducible because if we access the
file directly we have a race-condition that the hash may
change anytime. Any trick like bind-mounting or hardlinking
will have the same issue. So unless we drop the reproducibility
for local file refs (which if of course an option) I see
no other way than to have a copy in the store.

[0] PoC because tests are missing and some shortcuts in the
implemenation, the approach feels solid.
[1] IIRC coreos uses this trick as well in their manifests

Jira: RHEL-21334
@mvo5 mvo5 force-pushed the blueprint-for-local-files branch from 907cf9d to b2700e4 Compare January 21, 2025 12:51
@supakeen
Copy link
Member

supakeen commented Jan 22, 2025

I absolutely love the idea and flexibility this gives; it will cover quite a few usecases that I have personally, especially when we can include entire trees. It will also severely simplify blueprints when you want to, for example, write 15 quadlet files.

However. I do feel that we should only introduce this together with manifest bundles, I prefer being able to rebuild images on several (different) hosts.

--

To side track a tiny bit; we could implement this without the need for bundles if we allow a reference to a git repository (and a subpath within), with an appropriate git source in osbuild. I can split that idea out if we feel that's a worthwhile thing to have.

@mvo5
Copy link
Contributor Author

mvo5 commented Jan 22, 2025

I absolutely love the idea and flexibility this gives; it will cover quite a few usecases that I have personally, especially when we can include entire trees. It will also severely simplify blueprints when you want to, for example, write 15 quadlet files.

However. I do feel that we should only introduce this together with manifest bundles, I prefer being able to rebuild images on several (different) hosts.

Offline/manifest-bundles seem pretty straightforward, we can do them first, I have no strong opinion here, I have a PoC for image-builder-cli in https://github.com/osbuild/image-builder-cli/compare/main...mvo5:offline-bundles?expand=1#diff-dd961e42853d4345e2cf6a29f188f80386ed6b583056d7078a38c0c099c97c26R63 - the "backend" code should land in images but the concept is pretty straightforward (the linked test works fwiw).

@supakeen
Copy link
Member

I absolutely love the idea and flexibility this gives; it will cover quite a few usecases that I have personally, especially when we can include entire trees. It will also severely simplify blueprints when you want to, for example, write 15 quadlet files.
However. I do feel that we should only introduce this together with manifest bundles, I prefer being able to rebuild images on several (different) hosts.

Offline/manifest-bundles seem pretty straightforward, we can do them first, I have no strong opinion here, I have a PoC for image-builder-cli in https://github.com/osbuild/image-builder-cli/compare/main...mvo5:offline-bundles?expand=1#diff-dd961e42853d4345e2cf6a29f188f80386ed6b583056d7078a38c0c099c97c26R63 - the "backend" code should land in images but the concept is pretty straightforward (the linked test works fwiw).

I'll read through that, awesome, thanks.

@ondrejbudai
Copy link
Member

I very much support this, we indeed have quite a lot of users that want to have the files stored outside of blueprints.

I appreciate the simplicity of this solution as it just uses the pre-existing source in osbuild. @mvo5 mentioned to me that it isnt great that we firstly need to copy the file inside the osbuild's store which effectively creates a double-copy. I think that's a valid concern, but I don't consider this as a blocker. We can revisit this later when we know how this feature is used by users. The nice thing is that we can just change the implementation without changing the user interface.

Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this approach is very reasonable.

Even without offline bundles, it would be possible to optionally embed the file in the manifest as inline source ,if needed. But letting it be curl source makes sense, especially once the ref would be an actual remote URL and we would have a resolver that would generate the ref.

I think that extracting the resolver logic (at the moment only for local files, but without making the assumption that they must be local) into a common resolver and passing the resolved spec as an input to manifest generation would make sense.

Lastly, the ref name is a but overloaded, in my opinion, and initially, I though that it is just the SHA256 of the file. While in reality it is a path or a URL (we could standardize on URLs even for local files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants