Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I get the config via oras.Pull? #142

Closed
deitch opened this issue Nov 4, 2019 · 40 comments · Fixed by #480
Closed

How do I get the config via oras.Pull? #142

deitch opened this issue Nov 4, 2019 · 40 comments · Fixed by #480
Labels
duplicate This issue or pull request already exists
Milestone

Comments

@deitch
Copy link
Contributor

deitch commented Nov 4, 2019

I have been trying to figure this out. It looks like oras.Pull() returns the descriptor for each of the layers.

How do I use the library to extract the config? Looking at the OCI spec for the manifest, it includes:

  • layers (we have that covered)
  • media type (in the returned descriptor)
  • schema version (not there)
  • size (in the returned descriptor)
  • annotations (in the returned descriptor)
  • config - where is that, and how do I get that from oras.Pull()?
@deitch
Copy link
Contributor Author

deitch commented Nov 6, 2019

Adding some context from the Slack channel.

oras.Pull() returns the descriptor, which is retrieved here:

	_, desc, err := resolver.Resolve(ctx, ref)
// .. more stuff
	return desc, layers, nil

The source to resolver.Resolve is here:

			desc := ocispec.Descriptor{
				Digest:    dgst,
				MediaType: contentType,
				Size:      size,
			}
			log.G(ctx).WithField("desc.digest", desc.Digest).Debug("resolved")
			return ref, desc, nil

So it clearly does not include the manifest, but it just a simple created Descriptor. The interesting line is here where it says:

            resp.Body.Close() // don't care about body contents.

This leads me to wonder if the descriptor from resolve.Resolve() is not intended to return the config or annotations (and therefore the entire manifest), but just basic info about it.

When I stepped through the opts.dispatch, I found that it had pulled up all of the subresources - layers and config (but not annotations) - but that the config got filtered out somewhere along the line?

Is this a limitation in resolver.Resolve()? Or is there some other containerd function we are supposed to use to retrieve the actual manifest that we are missing?

@deitch
Copy link
Contributor Author

deitch commented Nov 7, 2019

@jdolitsky had kindly agreed on Slack to take a look.

I have some more info. I figured out why it is losing it. The filterHandler filters out not only those components with an allowed media-type (which is a bit of a duplicate of picker, but it also checks if the descriptor has an annotation ocispec.AnnotationTitle (resolved to "org.opencontainers.image.title"), which the FileStore uses to select the name of the file to which to save the artifact.

This is great for layers, but doesn't work for items that aren't well-saved as a file (e.g. the config). Unfortunately, images.Dispatch() returns nothing but an error; there is no way to build up information from the process other than what is passed into the handler functions as they are created.

I see three ways to make the config available after the Dispatch().

  • give it the annotation org.opencontainers.image.title, which would cause FileStore (or any other) to save it to a file by that name. That doesn't make it very usable in the library / CLI, but does retrieve it.
  • save it to the content.Ingester, but have the ingestor (FileStore, MemoryStore) be aware that this is not a typical one to write, but metadata to be retrieved
  • add an additional handler between FetchHandler and picker that specifically handles config and makes it available separately

Unfortunately, once inside a handler, you really have no way of knowing, "I am handling the config". You know that config is coming when you process the image manifest, so perhaps you could tag it, but it isn't actually handled until after it is fetched, at which point you don't really know.

A solution could be that the handler does two kinds of processing:

  1. When looking at the manifest, save the media type and sha256 of the config, so it will know to find it when it comes
  2. When looking at a fetched blob, check if the sha256+mediatype match, and if they do, handle the config content separately, saving it either to a different memory structure, or to the ingester, but in a way that it can handle it.

The first solution is easiest and would work right now, but won't help with anything other than getting it to the filesystem, not making it available where it probably needs to be, in the program (and won't work with any config that doesn't have that annotation).

The second and third solutions both work, provided an additional handler is made to find the content and manage the content. The third probably is easiest, as it doesn't require changing the ingester (file store and memory store).

I am happy to open a PR for it. Thoughts?

@deitch
Copy link
Contributor Author

deitch commented Nov 7, 2019

The same issue exists if you want the actual manifest, or the info on the layer. It does not write the manifest itself (and therefore its annotations) to the ingester because of filterHandler. If you then want to know about the annotations for a given layer that was downloaded, i.e. need the layers struct, you may be stuck. Unless the fileStore (as ingester) already provides a way to save that information? I looked in file.go but couldn't find it. It definitely adds the descriptor when creating the fileWriter:

	return &fileWriter{
		store:    s,
		file:     file,
		desc:     desc,
		digester: digest.Canonical.Digester(),
		status: content.Status{
			Ref:       name,
			Total:     desc.Size,
			StartedAt: now,
			UpdatedAt: now,
		},
		afterCommit: afterCommit,
	}, nil

But I don't see anywhere to retrieve it?

@jdolitsky
Copy link
Contributor

add an additional handler between FetchHandler and picker that specifically handles config and makes it available separately

This sounds fine to me, and it's be great if you can make it work. I would like to get some thoughts from Shiwei cc @shizhMSFT

@deitch
Copy link
Contributor Author

deitch commented Nov 7, 2019

be great if you can make it work.

I am reasonably confident I can. I am going to hold off until I see comments here from Shiwei as well.

@shizhMSFT
Copy link
Contributor

The internal data structure of contents is a directed acyclic graph of descriptors (i.e. references to actual blobs). Usually, the images are stored in a tree structure, which is a special case of DAG.

                      +------+
               +----->+Config|
               |      +------+
               |
+--------+     |      +------+
|Manifest+-----+----->+Layer |
+--------+     |      +------+
               |
               |      +------+
               +----->+Layer |
                      +------+

The purpose of resolver.Resolve() is to resolve a reference (e.g. myregistry.com/hello:latest) to the descriptor of the associated manifest. Since the descriptor is a reference, it does not contain the actual data. @deitch This explains what you have mentioned above

resp.Body.Close() // don't care about body contents.

Since oras is implemented based on containerd, the suggested way to get the actual blob is via images.Dispatch(). The start point is the manifest as it is most suitable to be a root node. The Dispatch() function does not care whether a child node is a config or a layer. They are all child nodes. It just traverses the entire tree in parallel.

Back to your original issue of getting the config. ORAS does not support the config well as it does make much sense since it is not a container. However, if you insist to have the config, it is possible to fetch the config by adding its type as an allowed media type, marking it as a valid node and follow your solution 1. If you are not using FileStore, you may consider the oras.WithPullEmptyNameAllowed() option with memory store or OCI store. The previous solution may combine with your solution 2. It is not good to modify the existing FileStore or MemoryStore but to create a customized hybrid store and pass it to the option oras.WithContentProvideIngester().

The assumption of your solution 3 is that the config is already fetched. According to the existing code logic, successful fetching implies the config descriptor passes filterHandler and will be picked up by picker. In that case, we don't need an additional handler. If you do want to have one, you can add the addtional handler by the option WithPullBaseHandler() or WithPullCallbackHandler(), which is useful in the manifest case.

In addition, it is trivial to make it workable but hard to be generic for all scenarios.

@deitch
Copy link
Contributor Author

deitch commented Nov 8, 2019

Hi @shizhMSFT . This part here:

The internal data structure of contents is a directed acyclic graph ..
// all the way to
It just traverses the entire tree in parallel.

I wish I had that when I started. It doesn't appear anywhere in the containerd docs. I did eventually figure it out (with lots of code-reading, containerd slack channel, and experimentation), but do wish someone had written it up on the containerd pages (or that I had spoken with you before I dug in :-) ).

ORAS does not support the config well as it does make much sense since it is not a container

This part I don't completely agree with. oras supports pushing config to registry, even with a custom type, and I expect that with the vistas that oras opens up, we will see many more uses of config in many different ways. I wouldn't underestimate the value of what oras will make possible.

It is not good to modify the existing FileStore or MemoryStore

Understood.

create a customized hybrid store and pass it to the option oras.WithContentProvideIngester().

I don't understand this part. Is this a store that has multiple "sub-stores", and so sort of demuxes the content to different places? How does it pick what goes where? Is it explained/documented anywhere that I could read?

successful fetching implies the config descriptor passes filterHandler and will be picked up by picker. In that case, we don't need an additional handler

I don't think I fully agree here. config is a fundamental and required part of every manifest (and oras recognizes it with its config option). As such, I think there should be a standard way to retrieve it from oras.Pull().

@shizhMSFT
Copy link
Contributor

The config is a required part of every manifest but not always used for non-containers like artifacts.

However, as you have mentioned, providing a standard way to retrieve the config does benifits a lot to developers. Let's discuss and find out.

@shizhMSFT
Copy link
Contributor

shizhMSFT commented Nov 8, 2019

I don't understand this part. Is this a store that has multiple "sub-stores", and so sort of demuxes the content to different places? How does it pick what goes where? Is it explained/documented anywhere that I could read?

This is a Hybrid Store for read purpose. A store for write purpose can be constructed similarly.

The hybridStore in store.go is also a good example of a hybrid store.

@shizhMSFT
Copy link
Contributor

@deitch @jdolitsky Here are the topics needed to be discussed.

  • Although oras currently does not support pushing index, it does support pulling index. In that case, we will have multiple manifests and multiple configs. What's the best experience for developers to get those configs?
  • What if the config in the manifest is not in the allowed media type? Should we still fetch them? (I am suggesting not)
  • If we fetch the configs, where should we store the configs? It is OK for memory stores and OCI stores since they store the content by digest. How about FileStore? Configs do not have file names.

@shizhMSFT
Copy link
Contributor

@deitch The following example prototype pulls the config. Although it is a bit hacky, it is good to start the discussion with

package main

import (
	"context"
	"errors"
	"fmt"

	"github.com/deislabs/oras/pkg/content"
	"github.com/deislabs/oras/pkg/oras"

	ocicontent "github.com/containerd/containerd/content"
	"github.com/containerd/containerd/remotes/docker"
	ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)

func check(e error) {
	if e != nil {
		panic(e)
	}
}

const customConfigMediaType = "my.custom.config.media.type"

func main() {
	ref := "localhost:5000/oras:test"
	fileName := "hello.txt"
	fileContent := []byte("Hello World!\n")
	customMediaType := "my.custom.media.type"

	ctx := context.Background()
	resolver := docker.NewResolver(docker.ResolverOptions{})

	// Push file(s) w custom mediatype to registry
	memoryStore := content.NewMemoryStore()
	desc := memoryStore.Add(fileName, customMediaType, fileContent)
	pushContents := []ocispec.Descriptor{desc}
	fmt.Printf("Pushing %s to %s...\n", fileName, ref)
	desc, err := oras.Push(ctx, resolver, ref, memoryStore, pushContents,
		oras.WithConfigMediaType(customConfigMediaType))
	check(err)
	fmt.Printf("Pushed to %s with digest %s\n", ref, desc.Digest)

	// Pull file(s) from registry and save to disk
	fmt.Printf("Pulling from %s and saving to %s...\n", ref, fileName)
	fileStore := content.NewFileStore("")
	defer fileStore.Close()
	allowedMediaTypes := []string{customMediaType, customConfigMediaType}
	hybridStore := newHybridStore(fileStore)
	desc, _, err = oras.Pull(ctx, resolver, ref, hybridStore, oras.WithAllowedMediaTypes(allowedMediaTypes),
		oras.WithPullEmptyNameAllowed())
	check(err)
	fmt.Printf("Pulled from %s with digest %s\n", ref, desc.Digest)
	configDesc, _, err := hybridStore.GetConfig()
	check(err)
	fmt.Printf("Pulled config of type %s\n", configDesc.MediaType)
	fmt.Printf("Try running 'cat %s'\n", fileName)
}

type hybridStore struct {
	*content.Memorystore
	ingester ocicontent.Ingester
	config   *ocispec.Descriptor
}

func newHybridStore(ingester ocicontent.Ingester) *hybridStore {
	return &hybridStore{
		Memorystore: content.NewMemoryStore(),
		ingester:    ingester,
	}
}

func (s *hybridStore) GetConfig() (ocispec.Descriptor, []byte, error) {
	if s.config != nil {
		if desc, data, found := s.Memorystore.Get(*s.config); found {
			return desc, data, nil
		}
	}
	return ocispec.Descriptor{}, nil, errors.New("config not found")
}

// Writer begins or resumes the active writer identified by desc
func (s *hybridStore) Writer(ctx context.Context, opts ...ocicontent.WriterOpt) (ocicontent.Writer, error) {
	var wOpts ocicontent.WriterOpts
	for _, opt := range opts {
		if err := opt(&wOpts); err != nil {
			return nil, err
		}
	}

	switch wOpts.Desc.MediaType {
	case customConfigMediaType:
		s.config = &wOpts.Desc
		fallthrough
	case ocispec.MediaTypeImageManifest, ocispec.MediaTypeImageIndex:
		return s.Memorystore.Writer(ctx, opts...)
	}
	return s.ingester.Writer(ctx, opts...)
}

Output:

Pushing hello.txt to localhost:5000/oras:test...
WARN[0000] encountered unknown type my.custom.media.type; children may not be fetched 
WARN[0000] reference for unknown type: my.custom.media.type  digest="sha256:03ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340" mediatype=my.custom.media.type size=13
WARN[0000] encountered unknown type my.custom.config.media.type; children may not be fetched 
WARN[0000] reference for unknown type: my.custom.config.media.type  digest="sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a" mediatype=my.custom.config.media.type size=2
Pushed to localhost:5000/oras:test with digest sha256:da610e590478af5df6c0a1a7899f00e3c5d2cc9237d91855c583fe0a13fbe22a
Pulling from localhost:5000/oras:test and saving to hello.txt...
WARN[0000] reference for unknown type: my.custom.media.type  digest="sha256:03ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340" mediatype=my.custom.media.type size=13
WARN[0000] reference for unknown type: my.custom.config.media.type  digest="sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a" mediatype=my.custom.config.media.type size=2
WARN[0000] encountered unknown type my.custom.config.media.type; children may not be fetched 
WARN[0000] encountered unknown type my.custom.media.type; children may not be fetched 
Pulled from localhost:5000/oras:test with digest sha256:da610e590478af5df6c0a1a7899f00e3c5d2cc9237d91855c583fe0a13fbe22a
Pulled config of type my.custom.config.media.type
Try running 'cat hello.txt'

@deitch
Copy link
Contributor Author

deitch commented Nov 10, 2019

@shizhMSFT thanks for the example, I see it now. You are wrapping the filestore with a custom store that switches on if it is a config or not. I can see that as an intermediate solution that works today.

@deitch
Copy link
Contributor Author

deitch commented Nov 10, 2019

My opinion on your topics (admittedly, I am new to oras, but can give an outsiders' view FWIW):

What if the config in the manifest is not in the allowed media type? Should we still fetch them? (I am suggesting not)

I agree not. The allowed media type is exactly that: which types do we allow. If it is not there, we filter it out. You want to be consistent, I think.

If we fetch the configs, where should we store the configs? It is OK for memory stores and OCI stores since they store the content by digest. How about FileStore? Configs do not have file names.

I think that the config is a fundamental part of the stored item. An image manifest includes the standard descriptor stuff (media type, size, hash), and then two key things:

  • config
  • layers

We return the layers (or at least save them in the ingester and return their descriptors), I think we should return the config. My view would be that it would look something like:

manifest, config, layers, err := oras.Pull(...)

In other words, there is another returned item. If not, then the alternate would be to change the various stores such that they recognize config as something unique, essentially standardizing your example hybrid store. Every file store and memory store would have a store.GetConfig(), and every store would be expected to handle configs uniquely. I actually think I would prefer it if the interface were extended to include SetConfig() and GetConfig() in addition to Writer(), as this is unique. However, I recognize that the base for store is the ingester interface, and so may not make sense to push too far.

@shizhMSFT
Copy link
Contributor

shizhMSFT commented Nov 11, 2019

I see the different directions we are persuiting now.

From your view, oras.Pull() should return the detailed images like

manifest, config, layers, err := oras.Pull()

or more generialized in case of an OCI index (i.e. manifest list for docker equivalent)

images, err := oras.Pull()

where each image in images contains manifest, config, and layers.

From my view, oras.Pull() returns a root node and allowed nodes like

root, nodes, err := oras.Pull()

where the root node is the manifest and the nodes are layers in the most cases.

My idea is to make oras a decoupled component and generic enough like containerd so that people can extend it to do whatever they want. It is also the main diference between containerd and docker. Meanwhile, we should provide some convenient way or packages for those who just want the conventional docker way.

Just FYI, some OCI images/artifacts do not even have a manifest. They are indexes with layers only.

@jdolitsky Any comments?

@shizhMSFT
Copy link
Contributor

shizhMSFT commented Nov 11, 2019

/cc @SteveLasker @sajayantony for more inputs.

@shizhMSFT
Copy link
Contributor

Linking #131

@deitch
Copy link
Contributor Author

deitch commented Nov 11, 2019

From my view, oras.Pull() returns a root node and allowed nodes like

root, nodes, err := oras.Pull()
where the root node is the manifest and the nodes are layers in the most cases.

I can see it working that way as well. You turn oras.Pull() into a more generic kind of pull. It certainly seems cleaner, and probably is easier to handle unforeseen future needs.

What would the workflow be like? Can you walk through an example of how I would pull an image via its manifest reference, and how I would pull an index, where I would retrieve each?

@shizhMSFT
Copy link
Contributor

It is strange to use oras to pull images since it is not the goal of oras (OCI Registry as Storage: to store and retreive generic blobs). However, we still can do as you requested.

Here is the code to pull the image docker.io/library/hello-world:latest of all platforms from dockerhub.

package main

import (
	"context"
	"fmt"

	"github.com/deislabs/oras/pkg/content"
	"github.com/deislabs/oras/pkg/oras"

	"github.com/containerd/containerd/remotes/docker"
)

func check(e error) {
	if e != nil {
		panic(e)
	}
}

func main() {
	ref := "docker.io/library/hello-world:latest"
	rootPath := "oci_store"

	ctx := context.Background()
	resolver := docker.NewResolver(docker.ResolverOptions{})

	// Pull file(s) from registry and save to disk
	fmt.Printf("Pulling from %s and saving to %s...\n", ref, rootPath)
	store, err := content.NewOCIStore(rootPath)
	check(err)
	root, nodes, err := oras.Pull(ctx, resolver, ref, nil,
		oras.WithContentProvideIngester(store),
		oras.WithPullEmptyNameAllowed(),
	)
	store.AddReference(ref, root)
	err = store.SaveIndex()
	check(err)

	fmt.Printf("Pulled from %s with digest %s\n", ref, root.Digest)
	fmt.Println("Pulled nodes:")
	for _, node := range nodes {
		fmt.Println(node.Digest, node.MediaType)
	}
}

Output:

Pulling from docker.io/library/hello-world:latest and saving to oci_store...
Pulled from docker.io/library/hello-world:latest with digest sha256:c3b4ada4687bbaa170745b3e4dd8ac3f194ca95b2d0518b417fb47e5879d9b5f
Pulled nodes:
sha256:c3b4ada4687bbaa170745b3e4dd8ac3f194ca95b2d0518b417fb47e5879d9b5f application/vnd.docker.distribution.manifest.list.v2+json
sha256:59712c0d6630673e81a77683ceb040be559c8ff7c245c73779b5def5bb90806d application/vnd.docker.distribution.manifest.v2+json
sha256:d0d4c5389b53875b0f2364f94c466f77cf6f02811fb02f0477b97d609fb50568 application/vnd.docker.distribution.manifest.v2+json
sha256:1e44d8bca6fb0464794555e5ccd3a32e2a4f6e44a20605e4e82605189904f44d application/vnd.docker.distribution.manifest.v2+json
sha256:0a5f8ec1343015eae71b3fd4eef60f9fe4d8612787cf78ab2f9aaf116695c9d6 application/vnd.docker.distribution.manifest.v2+json
sha256:12cf9ef90835465316cb0b3729c36bfd8654d7f2f697e23432fddfaa7d7e31b5 application/vnd.docker.distribution.manifest.v2+json
sha256:92c7f9c92844bbbb5d0a101b22f7c2a7949e40f8ea90c8b3bc396879d95e899a application/vnd.docker.distribution.manifest.v2+json
sha256:d1fd2e204af0a2bca3ab033b417b29c76d7950ed29a44e427d1c4d07d14f04f9 application/vnd.docker.distribution.manifest.v2+json
sha256:577ad4331d4fac91807308da99ecc107dcc6b2254bc4c7166325fd01113bea2a application/vnd.docker.distribution.manifest.v2+json
sha256:5a4bdadd9acd8779ed6fcf007a4e7ed7f919056a92c3c67824b4fded06ef0a6e application/vnd.docker.distribution.manifest.v2+json
sha256:158c64d77ced2c0887665320be9a0875daa0438c550dce56ba66de6689ad1d4f application/vnd.docker.container.image.v1+json
sha256:de6f0c40d4e5d0eb8e13fa62ccbbdabad63be2753c9b61f495e7f1f486be1443 application/vnd.docker.container.image.v1+json
sha256:d2708320a3117f18f6ff95ae5ffc97f3341ec99948a378b9207d516e205bd8b6 application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:3b4173355427082b90463dbe6b9606a6a8c14c9d1235469c62dd95aba76da642 application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:e34b597d4d9c54cb14121d3346ec811c943854da4126e8c5880963f06b2b6a94 application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:384e1cb9c170f08905207d9414a3ee93c5f6c77b2bd980221d8f34b8715cc41e application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:085b8605baffc6a0d2b4df7a830cbd73c2bc4a8aaad4851dd8296e20f05b9aa6 application/vnd.docker.container.image.v1+json
sha256:118a999e7a543bfff38b7021ffd6ae088892a107ab9667eded35d477ec0330a8 application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:ee8e362eeaf08460ae42b078293feaa1c3cfe3922dd3e3d8b1216b2fa780c73c application/vnd.docker.container.image.v1+json
sha256:618e43431df9635eee9cf7224aa92c8d6f74aa36cd3b2359604389ca36e79380 application/vnd.docker.container.image.v1+json
sha256:fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e application/vnd.docker.container.image.v1+json
sha256:590e13f69e4afcc08e9060a320ec5e4622d2771ace9dc26b024dc786fcb5b36e application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:1b930d010525941c1d56ec53b97bd057a67ae1865eebf042686d2a2d18271ced application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:c1eda109e4da870583f2ba3224030c94909b5f60b34a489dc3607f9e7b0e2cee application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:8a6314bc97b0b30fd59b3aa6bbea5391e7297114e31ced445144e44ae698dbb6 application/vnd.docker.container.image.v1+json
sha256:7ed68418e8524939294e9bcc71ef1b51ffa05d9f2c82fa2a89faad15227ee8d9 application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:d39d9a31884d8a8934ef7790f6278c224de873743af587e4a9ad810f885a17c2 application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:df8c1d4877c5b7c3bb398e41d24224693be78303942d6750020c29d63abe7401 application/vnd.docker.container.image.v1+json
sha256:da7887b27acbb158c9c6241cac3f12d3a6e1e07f49bfee7f2bfa87f26864b2b9 application/vnd.docker.container.image.v1+json
sha256:d8aec4eeb95f50a6bb92e676e9cbd4ae700faf31be657cfe2216bc0491c50afe application/vnd.docker.image.rootfs.diff.tar.gzip
sha256:9ff41eda08873205ee308953fbbd9d307ab8def0f435b97345f200877506d6c5 application/vnd.docker.image.rootfs.foreign.diff.tar.gzip
sha256:e46172273a4e4384e1eec7fb01091c828a256ea0f87b30f61381fba9bc511371 application/vnd.docker.image.rootfs.foreign.diff.tar.gzip
sha256:af0f84283f52649b65958128c4f34206ceed508f59bd50719eb57e6a136d6844 application/vnd.docker.image.rootfs.foreign.diff.tar.gzip

Filesystem struecture:

$ tree oci_store
oci_store
├── blobs
│   └── sha256
│       ├── 085b8605baffc6a0d2b4df7a830cbd73c2bc4a8aaad4851dd8296e20f05b9aa6
│       ├── 0a5f8ec1343015eae71b3fd4eef60f9fe4d8612787cf78ab2f9aaf116695c9d6
│       ├── 118a999e7a543bfff38b7021ffd6ae088892a107ab9667eded35d477ec0330a8
│       ├── 12cf9ef90835465316cb0b3729c36bfd8654d7f2f697e23432fddfaa7d7e31b5
│       ├── 158c64d77ced2c0887665320be9a0875daa0438c550dce56ba66de6689ad1d4f
│       ├── 1b930d010525941c1d56ec53b97bd057a67ae1865eebf042686d2a2d18271ced
│       ├── 1e44d8bca6fb0464794555e5ccd3a32e2a4f6e44a20605e4e82605189904f44d
│       ├── 384e1cb9c170f08905207d9414a3ee93c5f6c77b2bd980221d8f34b8715cc41e
│       ├── 3b4173355427082b90463dbe6b9606a6a8c14c9d1235469c62dd95aba76da642
│       ├── 577ad4331d4fac91807308da99ecc107dcc6b2254bc4c7166325fd01113bea2a
│       ├── 590e13f69e4afcc08e9060a320ec5e4622d2771ace9dc26b024dc786fcb5b36e
│       ├── 59712c0d6630673e81a77683ceb040be559c8ff7c245c73779b5def5bb90806d
│       ├── 5a4bdadd9acd8779ed6fcf007a4e7ed7f919056a92c3c67824b4fded06ef0a6e
│       ├── 618e43431df9635eee9cf7224aa92c8d6f74aa36cd3b2359604389ca36e79380
│       ├── 7ed68418e8524939294e9bcc71ef1b51ffa05d9f2c82fa2a89faad15227ee8d9
│       ├── 8a6314bc97b0b30fd59b3aa6bbea5391e7297114e31ced445144e44ae698dbb6
│       ├── 92c7f9c92844bbbb5d0a101b22f7c2a7949e40f8ea90c8b3bc396879d95e899a
│       ├── 9ff41eda08873205ee308953fbbd9d307ab8def0f435b97345f200877506d6c5
│       ├── af0f84283f52649b65958128c4f34206ceed508f59bd50719eb57e6a136d6844
│       ├── c1eda109e4da870583f2ba3224030c94909b5f60b34a489dc3607f9e7b0e2cee
│       ├── c3b4ada4687bbaa170745b3e4dd8ac3f194ca95b2d0518b417fb47e5879d9b5f
│       ├── d0d4c5389b53875b0f2364f94c466f77cf6f02811fb02f0477b97d609fb50568
│       ├── d1fd2e204af0a2bca3ab033b417b29c76d7950ed29a44e427d1c4d07d14f04f9
│       ├── d2708320a3117f18f6ff95ae5ffc97f3341ec99948a378b9207d516e205bd8b6
│       ├── d39d9a31884d8a8934ef7790f6278c224de873743af587e4a9ad810f885a17c2
│       ├── d8aec4eeb95f50a6bb92e676e9cbd4ae700faf31be657cfe2216bc0491c50afe
│       ├── da7887b27acbb158c9c6241cac3f12d3a6e1e07f49bfee7f2bfa87f26864b2b9
│       ├── de6f0c40d4e5d0eb8e13fa62ccbbdabad63be2753c9b61f495e7f1f486be1443
│       ├── df8c1d4877c5b7c3bb398e41d24224693be78303942d6750020c29d63abe7401
│       ├── e34b597d4d9c54cb14121d3346ec811c943854da4126e8c5880963f06b2b6a94
│       ├── e46172273a4e4384e1eec7fb01091c828a256ea0f87b30f61381fba9bc511371
│       ├── ee8e362eeaf08460ae42b078293feaa1c3cfe3922dd3e3d8b1216b2fa780c73c
│       └── fce289e99eb9bca977dae136fbe2a82b6b7d4c372474c9235adc1741675f587e
├── index.json
├── ingest
└── oci-layout

3 directories, 35 files

@deitch
Copy link
Contributor Author

deitch commented Nov 12, 2019

It is strange to use oras to pull images since it is not the goal of oras (OCI Registry as Storage: to store and retreive generic blobs)

That was just an example. What I meant was, according to your proposed:

root, nodes, err := oras.Pull()

How would I retrieve layers and config, and how would I extract the config? I am guessing config is one of the nodes?

@shizhMSFT
Copy link
Contributor

How would I retrieve layers and config, and how would I extract the config? I am guessing config is one of the nodes?

Go through each node and fetch the actual node content from the store. Since the root node is given, it is easy to get the child node, including the config, by images.Children(ctx, store, node). Or more easily, filter the nodes with a known config media type.

@deitch
Copy link
Contributor Author

deitch commented Nov 13, 2019

Go through each node and fetch the actual node content from the store

Yes, that was what I meant. I was hoping for an example of it.

I see some challenges with the approach:

  • the store would need to be able to store config. Right now, e.g. FileStore does not, unless the config has the right annotation, which shouldn't be required. Even if it does, it then saves it to a file, which may not be what you want. We would need to make stores aware that they are saving config.
  • You may not actually know the config media-type. From the root manifest's perspective, config is a known child, but its type is not known until you inspect the child. What if I just want to get an artifact, and then its config, but don't yet know the config? I expect (reasonably) either to have oras wisely figure out the config, or look at the manifest and use that to determine what the config is (which would be repetitive, which is why I would want oras to do it and treat it specially).

@SteveLasker
Copy link
Contributor

I did a quick scan of the thread. A few notes as I’m still focused on conference prep.

  • manifest.config retrieval should be incorporated into Oras. It wasn’t needed at the time, but the latest PR on the spec calls for optional support. The config object may be null, and likely follows a different schema than oci-image. But, that’s intentional. Latest conversations say the config object MAY be other formats, not just json. Oras should have an option to retrieve the config, if not null, and persist somewhere local, as a file. Where is a good question
  • Oras and OCI Artifacts do not yet call out support for OCI Index support in v1. This will come as we complete v1, but i’d like to focus on nailing the manifest scenarios, including config.

@sajayantony
Copy link
Contributor

@shizhMSFT do you want to PR your change in the CLI to support zero layer blobs which only has a config blob?

@shizhMSFT
Copy link
Contributor

@sajayantony Do you mean allowing oras to push / pull an image, which has a config but no layers?

@shizhMSFT
Copy link
Contributor

Created a new issue #153

@SteveLasker
Copy link
Contributor

Hi Avi,
Sorry for the long delay in responding. There's definitely some things we'd like to add.

ORAS is a reference implementation that demonstrates how the ORAS library can be used. It provides some utility type CLIs. I'm looking to create a more targeted experience, built off the markymarky example Josh made. I'll add auth and a few other capabilities, as I get some help from more skilled golang devs.

config pull

While we initially used the config to just identify the artifact type, we've continued to see examples where pulling the config object is useful. I could see another parameter added to oras pull --config-out ./config.json

Index

As we embark on Notary v2 efforts to persist a signature as oci index that references a manifest, or another oci index we will add support. We'll likely add support in the notary v2 prototype branch.

Filtering media types

This was a remnant of an original idea that we could store different :tags with different mediaTypes. For example, if I'm working on the artifacts project, I might have the following files, which could be persisted in the same registry, with the same versioned tag, as they had different mediaTypes. This would be the same way I can save the files in the same directory on my local machine.

file registry url mediaType
artifacts-spec.docx demo42.azurecr.io/artifacts/spec:v1 application/vnd.microsoft.word.config.v1
artifacts-spec.pptx demo42.azurecr.io/artifacts/spec:v1 application/vnd.microsoft.powerpoint.config.v1
artifacts-spec.md demo42.azurecr.io/artifacts/spec:v1 application/vnd.something.markdown.config.v1

If we went with this idea, we'd need a way to pull a fully qualified reference demo42.azurecr.io/artifacts/spec:v1 with a mediaType filter.
However, this idea never stuck, so the requirement to specify the mediaType or -a is no longer needed. It's part of issue Simplify ORAS default experience #178

Now, a specific CLI, meaning a regdoc would filter the types of mediaTypes it understands how to process. More generic tooling, like ORAS shouldn't filter.

ORAS to pull docker/runtime images

The docker/containerd client are very special artifact types, with a layered filesystem, using layers (as opposed to blobs). There seem to be other tools that specialize on the docker/containerd runtime image artifact type. ORAS was really designed to support all the other artifact types. So, I'm a little concerned to add too much details for runtime images in ORAS as it could get messy quickly, and, there are other tools. What specifically were you looking for, that other tools don't support?

I can't help on the detailed implementation, but I can help on the APIs exposed. For instance the oras pull --config-out [file] would be great to see a PR.
But, we should also figure out how to cleanup the libraries referenced from ORAS as well.

@deitch
Copy link
Contributor Author

deitch commented Jan 10, 2021

I had completely forgotten about this issue, went looking to solve something, and stumbled across this.

Do we want to pick this up again?

For the curious, I ran into a variant of the config question. It may be out of scope, but I am curious about your thoughts. "legacy" registries don't support artifacts, nor does using the standard build tools (e.g. docker build). So lots of people stuff "artifacts" into a container as a file in the tar+gz of a layer.

However:

  1. Most of those build tools don't let you put annotations on an individual layer, but do support it on config
  2. If I already am stuffing an artifact as a file in a tar+gz, nothing prevents me from stuffing >1, which makes the single annotations of image.title challenging

The way around that is to use either annotations on the manifest itself (which, again, many build tools do not support), or on the config.

Which brings us back to the config question, in a different way: what if I want to read the config before the other layers? For example, using content.FileStore, which requires the title annotation on the descriptor so it knows what to call the file it extracts, but that metadata may be in the config.

Not to expand the discussion too much, I think it is looking at whether it is possible to store multiple artifacts in a single layer, i.e. support "legacy" images. Is that even something we want to think about? Partially, it look like trying to do what other tools do. On the other hand, having a single library/cli that can support both new spec artifacts and those in a legacy has some great potentially, especially bridging adopters in.

@deitch
Copy link
Contributor Author

deitch commented Jan 10, 2021

Ugh, I might just have painted myself into a corner here.

  • if you have a single artifact in a layer, media-type and annotations work. Just to get this to work requires proper tooling anyways.
  • If you have a single artifact in a layer wrapped in a tar/tar+gz, then you indicate in the annotations and/or media-type on that layer what it is, assuming you can use layer annotations/media-types, of course.
  • If you have multiple artifacts in a layer - or even multiple files, only one of which is the actual artifact you are interested in - then you need to find the actual file. The annotations - on the layer or on config - can help you, but you need to go through multiple files in the tar to find the one you want.

And I built this lovely UntarWriter wrapped by aDecompressStore... which just takes the first file it finds. It assumes (blindly) that there only ever will be one object of interest in the tar archive. 🤦🏻‍♂️

See here

To paraphrase Obi-Wan Kenobi, "I have done this myself."

This raises the interesting question, completely separate from the config question: do we want the above to be configurable, i.e. "extract just this named file from the tar archive"? The default would be as now, just take the single file, but have an option to configure it.

That is fairly easy to create an option to pass to a single UntarWriter, much harder when wrapped in a DecompressStore, which can have store.Writer()` called multiple times, one for each layer archive, each of which has a different file to extract out of the multiple in it.

I keep coming back to this: are we going too far afield? Is supporting artifacts wrapped in a "legacy" image just too complicated/out-of-scope for oras? I still really like the idea that it can handle artifacts however they come, and be a bridge between old and new. But maybe too far?

@SteveLasker
Copy link
Contributor

Hi @deitch and happy new years,
I'll try to bullet these out:

  • New/Old registry support: It would be good to understand which old registries we're talking about here. I'd much rather focus on a standard across all active registries, which support OCI artifacts. The outlier is Docker Hub of course. @justincormack and I keep having this conversation, and I think we'd all rather see Hub (intended to be the standard) be current.
  • Multi artifacts in a single layer: this is sort of an anti-pattern, and would suggest it's along the lines of the above item. Let's get the outdated registries updated, or just use current registries.
  • Fetching config, prior to layers: this is intended to be the standard, and outlined in the OCI Artifacts spec:

Tooling can pull the configuration object prior to layers. An artifact component may use the config to determine how and where the layer should be instantiated. The artifact component might send layer requests to different compute instances, such as OCI Image layers being distributed differently based on the target OS.

  • Annotations: as noted above, I'd really like to see ORAS move away from annotations and individual files. Rather, place all files, even single files, in tar archives. The .tar format accounts for the file names, so we can remove the dependency on annotations.

@deitch
Copy link
Contributor Author

deitch commented Jan 12, 2021

Happy New Year @SteveLasker and everyone else on this thread! Wishing everyone a healthy and successful one.

bullet these out

Yeah, that just might be a smart move 😄

New/Old registry support
Multi artifacts in a single layer: this is sort of an anti-pattern, and would suggest it's along the lines of the above item. Let's get the outdated registries updated, or just use current registries.

I agree that it is something of an anti-pattern. But there are plenty of people stuffing objects ("artifacts") inside layers and using standard Docker or similar tooling to build them. I do it in cases, so does Ignite, etc.

The question I am asking really isn't, do we want to encourage that? I agree here that the flexibility within a manifest - available when both the registry and the build tooling support it - is more than enough to do the right thing, and make each layer an artifact (don't you wish we could rename it from layers?), possibly with the right annotations, but with the right media-types (see below).

The question I am asking is, for all of those images that people have created, and continue to create - due to tooling and/or registry issues - standard OCI images with one or more artifacts in tar/tar+gz files stored as blobs, do we want to support in this library/CLI the ability to retrieve those as well?

My leaning (obviously from this thread 😄 ) is yes. I like the idea of a single interface, single tool, that can handle both, and acts as a nice bridge. "This is the right way to do it, but if your tools aren't there yet, we still can work with you, but don't you want to get onto the express train to the future and do it the right way?"

I understand if we do not want to support that (despite my preferences). But given how much there is out there, I think it should be a conscious decision.

I hope I explained it well?

Fetching config, prior to layers

Good thing. I think that was the original issue here. So it is just a question of getting oras up to speed on that? Or possibly just my understanding of how to use it.

Annotations: as noted above, I'd really like to see ORAS move away from annotations and individual files. Rather, place all files, even single files, in tar archives. The .tar format accounts for the file names, so we can remove the dependency on annotations.

This is interesting. I didn't realize how far you were going with this. So everything will be tar/tar+gz, annotation goes away, and we just use the tar header for the single file in the tar archive to say what the path should be. When there are multiple artifacts, multiple tars as layers, each without annotations required, and the media-type indicates what its purpose is. Is that it?

@deitch
Copy link
Contributor Author

deitch commented Jan 12, 2021

As I think about this more @SteveLasker, I wonder how we would do this without annotations? E.g. I have 5 artifacts in 5 layers, how do I know what the purpose and type of each file is? How do I figure out that it is just the artifact in the tar at layer2 that I want?

@SteveLasker
Copy link
Contributor

SteveLasker commented Jan 13, 2021

New/Old registry support

I think we're really having a discussion of what could be possible, vs. what we'd like to enable with ORAS. We never want to tell something they can't do something generic. :)

Fetching config

If we don't support this, we should, and happy to see a PR that enables fetching the config independently.

Annotations

ORAS is generic but meant to be used for a specific artifact type scenario. The idea is Helm would use it the way they want, Singularity, Azure ARM/Bicep, ... The idea is they define how layers are used, and why there are multiple. This is also why we identified in the artifacts spec authors can define different layer (blob) mediaTypes.
For the general files scenario, the idea is each .tar would have a way to define the file names within the layer. If you have multiple layers, you would need to define where they should be expanded to.
In the generic case, let's say you have a directory like the following

\docs
  readme.md
  media\
    arch-diagram.png
    logo.jpg
\code
  foo\
    blah.go
  bar\
    drumbeat.go
\tests
\sbom

You might decide to tar each subdirectory \docs, \code, \tests, \sbom as separate layers. If you pass the directory to ORAS, it would tar each subdirectory, and you just need to extract it to the same location.

If you're doing something more complex, where two directories are stored in completely different locations on the local disk, or some content is put on client-a, while another layer is placed on the server-side, that's also ok. But, it would be related to the specific scenarios/artifact type to decide that.

@deitch
Copy link
Contributor Author

deitch commented Jan 13, 2021

The formatting got messed up on your last comment @SteveLasker :-)

Taking your example above of a few directories and some files. I can do oras.Push() to send artifacts up, and oras.Pull() to get them back down. And those do take an Ingestor. But what would indicate how those things should be used and what their purpose is?

If you have multiple layers, you would need to define where they should be expanded to.

This has been what has driven this conversation from the beginning. If I have a layer which is just some arbitrary artifact, then the media-type can tell me what it does/can do/is, and the image annotation can tell me where it thinks it should go (precisely how FileStore uses it). If I have something a bit more legacy - or if I just do everything tar/tar+gz, as you suggested, and no annotations - then I have a format that:

a. Could contain more than one artifact (for better or for worse)
b. Carries no information about what those artifacts are (like media-type) or what to do with it (like the annotations)

These are true even with annotations, since I only can have one image annotation on a layer, while I could have multiple files. We can call it an anti-pattern (FWIW, I agree with you on this one), but, as I said, lots of such cases out there, and more importantly, as long as someone can put multiple files in (tar always will support it), someone will; implicit APIs always end up being used.

The config appealed to me precisely because it is image-wide. I can have a bunch of arbitrary annotations there that, for example, say which files to pull out of the various tar streams and what to do with them. Plus, of course, they are supported by every version of standard build-tooling out there (like docker).

In other words, I always need some way of indicating the purpose and potentially intended location of the contents of an artifact, whether I eliminate annotations, or just have a format that can have multiple artifacts in it.

@lmcdasm
Copy link

lmcdasm commented May 16, 2021

Hello Gents.

Reading through the entirety of this thread, 131, and then the Ref3 information, i was a bit lost as to where this issues stands presently.

In our setup, we are looking to consolidate all our object types into OCI, and use the config.json as the OPTIONAL addition for our needs (non-container type objects, so arbitrary info.

following all the readme and such, we are able to create the config.json and pass in with the oras push, but as @SteveLasker outlined, a command like the following that would like us pull the config file out.

      oras pull my.registry.com/my-pakcaged-object:v2 --config-out ./config.json

Where my-object:v2 was created like this

<config.json>
{
"config" : {
"comp": "something",
"some key": "some value"
},
}
</end config.json>

oras push my.registry.com/my-packaged-object:v2
--manifest-config config.json:application/vnd.myco.my-object.config.v1+json artifact.txt:text/plain

</end push command>

To that end, I see that 131 has "help wanted" - i noticed the @deitch had done lots of the initial leg work and was back at it, so perhaps you are chasing it again or could use some help / show me where you're at so far?

Cheers,
dasm

@deitch
Copy link
Contributor Author

deitch commented May 16, 2021

This is one of those issues that is somewhat fundamental, triggered by a seemingly simple question, so it stays open and is a discussion point for quite some time. At least once every few months, I find myself looking for something like this, stumbling across it, and going, "Oh, look, someone raised it already! I wonder who it was?" 😄

We also are looking at some fundamental re-plumbing of oras the library to give it more flexibility around cases like this.

As for the CLI having an --config-out or similar command, I think @shizhMSFT did the basic leg work above for it, and prob knows the CLI best (I hope I am correct here)?

@SteveLasker
Copy link
Contributor

if one puts something in, one should be able to get it out.
the —configOut parameter sounds like a great choice.
We should probably add the option to get just the config as I can imagine scenarios, similar to docker, that the config might provide info that routes the remaining blob pulls to another host process.

@shizhMSFT
Copy link
Contributor

This is really an old thread. Let me write a doc for how to push and pull the config file using the existing oras cli. Then I will try to figure out a proper UX to explicitly pull the config.

@shizhMSFT
Copy link
Contributor

Created PR oras-project/oras-www#3 for existing oras cli releases to push / pull configs.

@shizhMSFT
Copy link
Contributor

shizhMSFT commented May 17, 2021

#274 enables CLI to pull config.

The implementation is a bit hacky since all configs and layers are flattened by the containerd remote resolver as descriptors. Anyway, layers and configs are in the blob catagory. We are not able to know which descriptor is the config without explicitly unmarshal the manifest, which should be the containerd internal.

@shizhMSFT shizhMSFT added this to the v0.14.0 milestone May 7, 2022
@shizhMSFT shizhMSFT added the duplicate This issue or pull request already exists label Aug 2, 2022
@shizhMSFT shizhMSFT moved this to Todo in ORAS-Planning Aug 2, 2022
@shizhMSFT shizhMSFT linked a pull request Aug 15, 2022 that will close this issue
@shizhMSFT
Copy link
Contributor

Resolved by #480

Repository owner moved this from Todo to Done in ORAS-Planning Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants