-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I get the config via oras.Pull? #142
Comments
Adding some context from the Slack channel.
_, desc, err := resolver.Resolve(ctx, ref)
// .. more stuff
return desc, layers, nil The source to desc := ocispec.Descriptor{
Digest: dgst,
MediaType: contentType,
Size: size,
}
log.G(ctx).WithField("desc.digest", desc.Digest).Debug("resolved")
return ref, desc, nil So it clearly does not include the manifest, but it just a simple created resp.Body.Close() // don't care about body contents. This leads me to wonder if the descriptor from When I stepped through the opts.dispatch, I found that it had pulled up all of the subresources - layers and config (but not annotations) - but that the config got filtered out somewhere along the line? Is this a limitation in |
@jdolitsky had kindly agreed on Slack to take a look. I have some more info. I figured out why it is losing it. The filterHandler filters out not only those components with an allowed media-type (which is a bit of a duplicate of picker, but it also checks if the descriptor has an annotation This is great for I see three ways to make the config available after the
Unfortunately, once inside a handler, you really have no way of knowing, "I am handling the config". You know that config is coming when you process the image manifest, so perhaps you could tag it, but it isn't actually handled until after it is fetched, at which point you don't really know. A solution could be that the handler does two kinds of processing:
The first solution is easiest and would work right now, but won't help with anything other than getting it to the filesystem, not making it available where it probably needs to be, in the program (and won't work with any config that doesn't have that annotation). The second and third solutions both work, provided an additional handler is made to find the content and manage the content. The third probably is easiest, as it doesn't require changing the ingester (file store and memory store). I am happy to open a PR for it. Thoughts? |
The same issue exists if you want the actual manifest, or the info on the layer. It does not write the manifest itself (and therefore its annotations) to the ingester because of return &fileWriter{
store: s,
file: file,
desc: desc,
digester: digest.Canonical.Digester(),
status: content.Status{
Ref: name,
Total: desc.Size,
StartedAt: now,
UpdatedAt: now,
},
afterCommit: afterCommit,
}, nil But I don't see anywhere to retrieve it? |
This sounds fine to me, and it's be great if you can make it work. I would like to get some thoughts from Shiwei cc @shizhMSFT |
I am reasonably confident I can. I am going to hold off until I see comments here from Shiwei as well. |
The internal data structure of contents is a directed acyclic graph of descriptors (i.e. references to actual blobs). Usually, the images are stored in a tree structure, which is a special case of DAG.
The purpose of resp.Body.Close() // don't care about body contents. Since Back to your original issue of getting the config. ORAS does not support the config well as it does make much sense since it is not a container. However, if you insist to have the config, it is possible to fetch the config by adding its type as an allowed media type, marking it as a valid node and follow your solution 1. If you are not using The assumption of your solution 3 is that the config is already fetched. According to the existing code logic, successful fetching implies the config descriptor passes In addition, it is trivial to make it workable but hard to be generic for all scenarios. |
Hi @shizhMSFT . This part here:
I wish I had that when I started. It doesn't appear anywhere in the containerd docs. I did eventually figure it out (with lots of code-reading, containerd slack channel, and experimentation), but do wish someone had written it up on the containerd pages (or that I had spoken with you before I dug in :-) ).
This part I don't completely agree with. oras supports pushing config to registry, even with a custom type, and I expect that with the vistas that oras opens up, we will see many more uses of config in many different ways. I wouldn't underestimate the value of what oras will make possible.
Understood.
I don't understand this part. Is this a store that has multiple "sub-stores", and so sort of demuxes the content to different places? How does it pick what goes where? Is it explained/documented anywhere that I could read?
I don't think I fully agree here. config is a fundamental and required part of every manifest (and oras recognizes it with its config option). As such, I think there should be a standard way to retrieve it from |
The config is a required part of every manifest but not always used for non-containers like artifacts. However, as you have mentioned, providing a standard way to retrieve the config does benifits a lot to developers. Let's discuss and find out. |
This is a Hybrid Store for read purpose. A store for write purpose can be constructed similarly. The |
@deitch @jdolitsky Here are the topics needed to be discussed.
|
@deitch The following example prototype pulls the config. Although it is a bit hacky, it is good to start the discussion with package main
import (
"context"
"errors"
"fmt"
"github.com/deislabs/oras/pkg/content"
"github.com/deislabs/oras/pkg/oras"
ocicontent "github.com/containerd/containerd/content"
"github.com/containerd/containerd/remotes/docker"
ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)
func check(e error) {
if e != nil {
panic(e)
}
}
const customConfigMediaType = "my.custom.config.media.type"
func main() {
ref := "localhost:5000/oras:test"
fileName := "hello.txt"
fileContent := []byte("Hello World!\n")
customMediaType := "my.custom.media.type"
ctx := context.Background()
resolver := docker.NewResolver(docker.ResolverOptions{})
// Push file(s) w custom mediatype to registry
memoryStore := content.NewMemoryStore()
desc := memoryStore.Add(fileName, customMediaType, fileContent)
pushContents := []ocispec.Descriptor{desc}
fmt.Printf("Pushing %s to %s...\n", fileName, ref)
desc, err := oras.Push(ctx, resolver, ref, memoryStore, pushContents,
oras.WithConfigMediaType(customConfigMediaType))
check(err)
fmt.Printf("Pushed to %s with digest %s\n", ref, desc.Digest)
// Pull file(s) from registry and save to disk
fmt.Printf("Pulling from %s and saving to %s...\n", ref, fileName)
fileStore := content.NewFileStore("")
defer fileStore.Close()
allowedMediaTypes := []string{customMediaType, customConfigMediaType}
hybridStore := newHybridStore(fileStore)
desc, _, err = oras.Pull(ctx, resolver, ref, hybridStore, oras.WithAllowedMediaTypes(allowedMediaTypes),
oras.WithPullEmptyNameAllowed())
check(err)
fmt.Printf("Pulled from %s with digest %s\n", ref, desc.Digest)
configDesc, _, err := hybridStore.GetConfig()
check(err)
fmt.Printf("Pulled config of type %s\n", configDesc.MediaType)
fmt.Printf("Try running 'cat %s'\n", fileName)
}
type hybridStore struct {
*content.Memorystore
ingester ocicontent.Ingester
config *ocispec.Descriptor
}
func newHybridStore(ingester ocicontent.Ingester) *hybridStore {
return &hybridStore{
Memorystore: content.NewMemoryStore(),
ingester: ingester,
}
}
func (s *hybridStore) GetConfig() (ocispec.Descriptor, []byte, error) {
if s.config != nil {
if desc, data, found := s.Memorystore.Get(*s.config); found {
return desc, data, nil
}
}
return ocispec.Descriptor{}, nil, errors.New("config not found")
}
// Writer begins or resumes the active writer identified by desc
func (s *hybridStore) Writer(ctx context.Context, opts ...ocicontent.WriterOpt) (ocicontent.Writer, error) {
var wOpts ocicontent.WriterOpts
for _, opt := range opts {
if err := opt(&wOpts); err != nil {
return nil, err
}
}
switch wOpts.Desc.MediaType {
case customConfigMediaType:
s.config = &wOpts.Desc
fallthrough
case ocispec.MediaTypeImageManifest, ocispec.MediaTypeImageIndex:
return s.Memorystore.Writer(ctx, opts...)
}
return s.ingester.Writer(ctx, opts...)
} Output:
|
@shizhMSFT thanks for the example, I see it now. You are wrapping the filestore with a custom store that switches on if it is a config or not. I can see that as an intermediate solution that works today. |
My opinion on your topics (admittedly, I am new to oras, but can give an outsiders' view FWIW):
I agree not. The allowed media type is exactly that: which types do we allow. If it is not there, we filter it out. You want to be consistent, I think.
I think that the config is a fundamental part of the stored item. An image manifest includes the standard descriptor stuff (media type, size, hash), and then two key things:
We return the layers (or at least save them in the ingester and return their descriptors), I think we should return the config. My view would be that it would look something like: manifest, config, layers, err := oras.Pull(...) In other words, there is another returned item. If not, then the alternate would be to change the various stores such that they recognize config as something unique, essentially standardizing your example hybrid store. Every file store and memory store would have a |
I see the different directions we are persuiting now. From your view, manifest, config, layers, err := oras.Pull() or more generialized in case of an OCI index (i.e. manifest list for docker equivalent) images, err := oras.Pull() where each image in From my view, root, nodes, err := oras.Pull() where the root node is the manifest and the nodes are layers in the most cases. My idea is to make Just FYI, some OCI images/artifacts do not even have a manifest. They are indexes with layers only. @jdolitsky Any comments? |
/cc @SteveLasker @sajayantony for more inputs. |
Linking #131 |
I can see it working that way as well. You turn What would the workflow be like? Can you walk through an example of how I would pull an image via its manifest reference, and how I would pull an index, where I would retrieve each? |
It is strange to use Here is the code to pull the image package main
import (
"context"
"fmt"
"github.com/deislabs/oras/pkg/content"
"github.com/deislabs/oras/pkg/oras"
"github.com/containerd/containerd/remotes/docker"
)
func check(e error) {
if e != nil {
panic(e)
}
}
func main() {
ref := "docker.io/library/hello-world:latest"
rootPath := "oci_store"
ctx := context.Background()
resolver := docker.NewResolver(docker.ResolverOptions{})
// Pull file(s) from registry and save to disk
fmt.Printf("Pulling from %s and saving to %s...\n", ref, rootPath)
store, err := content.NewOCIStore(rootPath)
check(err)
root, nodes, err := oras.Pull(ctx, resolver, ref, nil,
oras.WithContentProvideIngester(store),
oras.WithPullEmptyNameAllowed(),
)
store.AddReference(ref, root)
err = store.SaveIndex()
check(err)
fmt.Printf("Pulled from %s with digest %s\n", ref, root.Digest)
fmt.Println("Pulled nodes:")
for _, node := range nodes {
fmt.Println(node.Digest, node.MediaType)
}
} Output:
Filesystem struecture:
|
That was just an example. What I meant was, according to your proposed: root, nodes, err := oras.Pull() How would I retrieve layers and config, and how would I extract the config? I am guessing config is one of the |
Go through each node and fetch the actual node content from the store. Since the root node is given, it is easy to get the child node, including the config, by images.Children(ctx, store, node). Or more easily, filter the nodes with a known config media type. |
Yes, that was what I meant. I was hoping for an example of it. I see some challenges with the approach:
|
I did a quick scan of the thread. A few notes as I’m still focused on conference prep.
|
@shizhMSFT do you want to PR your change in the CLI to support zero layer blobs which only has a config blob? |
@sajayantony Do you mean allowing oras to push / pull an image, which has a config but no layers? |
Created a new issue #153 |
Hi Avi, ORAS is a reference implementation that demonstrates how the ORAS library can be used. It provides some utility type CLIs. I'm looking to create a more targeted experience, built off the markymarky example Josh made. I'll add auth and a few other capabilities, as I get some help from more skilled golang devs. config pullWhile we initially used the config to just identify the artifact type, we've continued to see examples where pulling the config object is useful. I could see another parameter added to IndexAs we embark on Notary v2 efforts to persist a signature as Filtering media typesThis was a remnant of an original idea that we could store different
If we went with this idea, we'd need a way to pull a fully qualified reference Now, a specific CLI, meaning a regdoc would filter the types of ORAS to pull docker/runtime imagesThe docker/containerd client are very special artifact types, with a layered filesystem, using layers (as opposed to blobs). There seem to be other tools that specialize on the docker/containerd runtime image artifact type. ORAS was really designed to support all the other artifact types. So, I'm a little concerned to add too much details for runtime images in ORAS as it could get messy quickly, and, there are other tools. What specifically were you looking for, that other tools don't support? I can't help on the detailed implementation, but I can help on the APIs exposed. For instance the |
I had completely forgotten about this issue, went looking to solve something, and stumbled across this. Do we want to pick this up again? For the curious, I ran into a variant of the config question. It may be out of scope, but I am curious about your thoughts. "legacy" registries don't support artifacts, nor does using the standard build tools (e.g. However:
The way around that is to use either annotations on the manifest itself (which, again, many build tools do not support), or on the config. Which brings us back to the config question, in a different way: what if I want to read the config before the other layers? For example, using Not to expand the discussion too much, I think it is looking at whether it is possible to store multiple artifacts in a single layer, i.e. support "legacy" images. Is that even something we want to think about? Partially, it look like trying to do what other tools do. On the other hand, having a single library/cli that can support both new spec artifacts and those in a legacy has some great potentially, especially bridging adopters in. |
Ugh, I might just have painted myself into a corner here.
And I built this lovely See here To paraphrase Obi-Wan Kenobi, "I have done this myself." This raises the interesting question, completely separate from the config question: do we want the above to be configurable, i.e. "extract just this named file from the tar archive"? The default would be as now, just take the single file, but have an option to configure it. That is fairly easy to create an option to pass to a single I keep coming back to this: are we going too far afield? Is supporting artifacts wrapped in a "legacy" image just too complicated/out-of-scope for oras? I still really like the idea that it can handle artifacts however they come, and be a bridge between old and new. But maybe too far? |
Hi @deitch and happy new years,
|
Happy New Year @SteveLasker and everyone else on this thread! Wishing everyone a healthy and successful one.
Yeah, that just might be a smart move 😄
I agree that it is something of an anti-pattern. But there are plenty of people stuffing objects ("artifacts") inside layers and using standard Docker or similar tooling to build them. I do it in cases, so does Ignite, etc. The question I am asking really isn't, do we want to encourage that? I agree here that the flexibility within a manifest - available when both the registry and the build tooling support it - is more than enough to do the right thing, and make each layer an artifact (don't you wish we could rename it from The question I am asking is, for all of those images that people have created, and continue to create - due to tooling and/or registry issues - standard OCI images with one or more artifacts in tar/tar+gz files stored as blobs, do we want to support in this library/CLI the ability to retrieve those as well? My leaning (obviously from this thread 😄 ) is yes. I like the idea of a single interface, single tool, that can handle both, and acts as a nice bridge. "This is the right way to do it, but if your tools aren't there yet, we still can work with you, but don't you want to get onto the express train to the future and do it the right way?" I understand if we do not want to support that (despite my preferences). But given how much there is out there, I think it should be a conscious decision. I hope I explained it well?
Good thing. I think that was the original issue here. So it is just a question of getting oras up to speed on that? Or possibly just my understanding of how to use it.
This is interesting. I didn't realize how far you were going with this. So everything will be tar/tar+gz, annotation goes away, and we just use the tar header for the single file in the tar archive to say what the path should be. When there are multiple artifacts, multiple tars as layers, each without annotations required, and the media-type indicates what its purpose is. Is that it? |
As I think about this more @SteveLasker, I wonder how we would do this without annotations? E.g. I have 5 artifacts in 5 layers, how do I know what the purpose and type of each file is? How do I figure out that it is just the artifact in the tar at layer2 that I want? |
I think we're really having a discussion of what could be possible, vs. what we'd like to enable with ORAS. We never want to tell something they can't do something generic. :)
If we don't support this, we should, and happy to see a PR that enables fetching the config independently.
ORAS is generic but meant to be used for a specific artifact type scenario. The idea is Helm would use it the way they want, Singularity, Azure ARM/Bicep, ... The idea is they define how layers are used, and why there are multiple. This is also why we identified in the artifacts spec authors can define different layer (blob) mediaTypes.
You might decide to tar each subdirectory If you're doing something more complex, where two directories are stored in completely different locations on the local disk, or some content is put on client-a, while another layer is placed on the server-side, that's also ok. But, it would be related to the specific scenarios/artifact type to decide that. |
The formatting got messed up on your last comment @SteveLasker :-) Taking your example above of a few directories and some files. I can do
This has been what has driven this conversation from the beginning. If I have a layer which is just some arbitrary artifact, then the media-type can tell me what it does/can do/is, and the image annotation can tell me where it thinks it should go (precisely how a. Could contain more than one artifact (for better or for worse) These are true even with annotations, since I only can have one image annotation on a layer, while I could have multiple files. We can call it an anti-pattern (FWIW, I agree with you on this one), but, as I said, lots of such cases out there, and more importantly, as long as someone can put multiple files in (tar always will support it), someone will; implicit APIs always end up being used. The config appealed to me precisely because it is image-wide. I can have a bunch of arbitrary annotations there that, for example, say which files to pull out of the various tar streams and what to do with them. Plus, of course, they are supported by every version of standard build-tooling out there (like docker). In other words, I always need some way of indicating the purpose and potentially intended location of the contents of an artifact, whether I eliminate annotations, or just have a format that can have multiple artifacts in it. |
Hello Gents. Reading through the entirety of this thread, 131, and then the Ref3 information, i was a bit lost as to where this issues stands presently. In our setup, we are looking to consolidate all our object types into OCI, and use the config.json as the OPTIONAL addition for our needs (non-container type objects, so arbitrary info. following all the readme and such, we are able to create the config.json and pass in with the oras push, but as @SteveLasker outlined, a command like the following that would like us pull the config file out.
Where my-object:v2 was created like this <config.json> oras push my.registry.com/my-packaged-object:v2 </end push command> To that end, I see that 131 has "help wanted" - i noticed the @deitch had done lots of the initial leg work and was back at it, so perhaps you are chasing it again or could use some help / show me where you're at so far? Cheers, |
This is one of those issues that is somewhat fundamental, triggered by a seemingly simple question, so it stays open and is a discussion point for quite some time. At least once every few months, I find myself looking for something like this, stumbling across it, and going, "Oh, look, someone raised it already! I wonder who it was?" 😄 We also are looking at some fundamental re-plumbing of oras the library to give it more flexibility around cases like this. As for the CLI having an |
if one puts something in, one should be able to get it out. |
This is really an old thread. Let me write a doc for how to push and pull the config file using the existing |
Created PR oras-project/oras-www#3 for existing |
#274 enables CLI to pull config. The implementation is a bit hacky since all configs and layers are flattened by the |
Resolved by #480 |
I have been trying to figure this out. It looks like
oras.Pull()
returns the descriptor for each of the layers.How do I use the library to extract the config? Looking at the OCI spec for the manifest, it includes:
oras.Pull()
?The text was updated successfully, but these errors were encountered: