-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add a manifest list API #222
Comments
From #114 (comment):
Emphatic yes, but I'd propose returning a list of descriptors instead of just mediaType and digest: #22 (comment) Whether that's in the form of an image index or just a json array or even json-lines, I don't care that much, but I would really love for this to be available. cc @justincormack this is related to what I was saying in the OCI call a couple weeks ago |
@jonjohnsonjr seeing the recording of that meeting triggered this for me, went looking back at #22 and realized this point was rather buried. Getting this as a json list of descriptors makes a lot of sense to me. Definitely don't want the list of strings like we have with tag/list today. |
Yes, descriptors seems correct. |
Excellent! That's why I brought it up 👍 From my earlier comment:
I'll campaign a little bit for my image-index-as-the-API idea. ProsOne nice feature of making this format be an actual image index (and not a list of descriptors), would be the ability to reuse code that knows how to deal with an image index already. So if you wanted to e.g. mirror an entire repository from one place to another, I already have code that can deal with pulling and pushing an image index. I can just reuse that. Continuing the example, if you want to keep your mirror up to date, you might need to poll tag values every so often to make sure they haven't change. If we could just ask a registry to show us everything in one request, we can just poll once per repo instead of once per tag. Assuming we cache everything aggressively, this shouldn't even be that expensive. If we wanted, we could even expose the digest of this top-level image index so that clients can essentially ask "hey, has anything in this entire repo changed?" with a HEAD request, without even having to actually look at the entire structure. This could also allow clients to ask for the state of a repo from the past, if registries allow you to GET the list of manifests by digest (though, that might be too expensive to keep around). ConsIf we do expose the digest of this structure, then re-computing the digest on every push/delete can be somewhat expensive for registries, so they may not want to do that. I think that's fine -- perhaps the Another downside is that pagination could be a little weird. For enormous repositories, we probably don't want to send the entire list in one response, and paginating the entries of an image index might frighten and confuse some clients (of course, this is something we can just specify, since it's an entirely new API). Pagination also makes the digest thing weird -- is this the digest of the whole repo? Or just this page? We may just want to punt on "digest of a repo" stuff for now. |
Having consistency across all registries for discovering content is a definite need. Removing the Might I suggest we capture the various requirements for the listing API? There are lots of great designs, but only a few that would meet the requirements we outline. For instance, when we were iterating on the Notary v2 discovery prototypes, we needed a way to discover all the artifacts that were dependent upon another artifact. eg: what signatures exist for a given digest. Here's a hacked up Notary v2 prototype-1 version, pre the newly proposed @aviral26 just made an update for the OCI Artifact manifest, but this is a prototype, based on the Notary v2 and Artifacts requirements. We need to iterative further for the notary & artifact scenarios, but I'd love to see this evolve OCI Artifacts and a View of the Future So, just suggesting we capture and iterate on a list of requirements for the listing API. Then, we're debating which design meets the requirements, vs. which "design is better". Also note, Docker has transitioned docker/distribution to CNCF. It's now located at: distribution/distribution and we have a new set of active maintainers from GitLab, GitHub, Digitial Ocean, VMWare, Docker, that want/need to keep the innovations moving forward. |
Filtering based on annotation values is interesting. I don't love the other examples because they impose additional storage requirements on registries. I worry that waiting for a perfect solution (i.e. requirements for features that don't exist) will block a massive improvement ~indefinitely. Can we agree on a simple set of requirements with possible extensions? I think at a minimum we need:
It would be nice to have:
We should leave open for extension:
I really don't want to block on 4 and 5. To borrow your "cloud filesystem" metaphor, I agree that having a SQL-like API for querying the filesystem would be really nice, but as is our filesystem can't even list files or directories... this is bad. We should keep filtering/sorting in mind so that implementations or specs don't preclude it, but I don't think it's mandatory, and I don't expect all registries to implement this stuff. It should be possible to implement a registry with zero logic, just static files. We should be able to define some optional querystring parameters this. Similarly, I don't want to block this on notary concepts that are still prototypes. |
Oh no, you brought up winfs - yikes 😳 I like the extension model or the reserved for future space approach. Having the full list in mind, allows us to design a multi-phased approach. It's like building the house, knowing you want to add a deck or garage later. If we can get a full list, we can prioritize, while reserving space in the design for known additions. Might I suggest a PR for |
I'm happy to send a PR, but I'd like some feedback from other registry operators. Ideally, we could talk about this on the dev call. I would personally commit to implementing this for gcr.io and pkg.dev if I could get some kind of consensus and commitment from other registries. If this is missing something that anyone feels is a hard requirement, I'd like to know. If this is too onerous to implement for any registries, I'd like to know. It would be good enough for me if I could get any two of {Docker Hub, ECR, ACR, Quay} to implement this, but otherwise it's just an additional registry-specific thing that adds no value to end users. I don't want to shove every possible feature you might want into these APIs. These are meant to be the lowest common denominator, bare minimum things that are inoffensive to implement and maintain, i.e. undifferentiated heavy lifting. If your registry supports some awesome feature that nobody else has, I think it belongs in a proprietary API. As is, the only way to list repositories is to use _catalog. We told people not to use that, but offer zero alternative, so they use it. For untagged images, it's even worse. There is undefined behavior around what happens -- do images just disappear? Do they stick around forever? How do you know what exists? We really need a standard way to expose this information. I already proposed these two additional APIs in #22, but I still think they're good places to start. They are simple and aligned with existing APIs, so they should be familiar to users. They expose information that already exists while being open to extensibility. Reader: I implore you not to bikeshed this. If things can be simplified to make this more likely to be implemented, I'd love to hear it. If you want to attach a use case for a hypothetical future thing that may never exist, please weigh the expected value of that use case against the decreased likelihood of us reaching consensus. 1. Descriptor listing APIConcretely, some structs: // ManifestDescriptor describes the content of a given manifest object.
type ManifestDescriptor struct {
// MediaType is the media type of the object this schema refers to.
MediaType string `json:"mediaType,omitempty"`
// Digest is the digest of the targeted content.
Digest digest.Digest `json:"digest"`
// Size specifies the size in bytes of the blob.
Size int64 `json:"size"`
// Annotations contains arbitrary metadata relating to the targeted content.
Annotations map[string]string `json:"annotations,omitempty"`
// Tags contains a list of tags associated with this object.
Tags []string `json:"tags,omitempty"`
}
// ManifestDescriptorList is a list of manifest descriptors for a given repository.
type ManifestDescriptorList struct {
// Manifests references manifest objects.
Manifests []ManifestDescriptor `json:"manifests"`
} This is a trimmed down version of It might make sense to keep As an example:
{
"manifests": [{
"digest": "sha256:7a47ccc3bbe8a451b500d2b53104868b46d60ee8f5b35a24b41a86077c650210",
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 2035,
"tags": ["latest", "v1"],
"annotations": {
"org.opencontainers.image.created": "1985-04-12T23:20:50.52Z"
}
},{
"digest": "sha256:3093096ee188f8ff4531949b8f6115af4747ec1c58858c091c8cb4579c39cc4e",
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"size": 943
},{
"digest": "sha256:703218c0465075f4425e58fac086e09e1de5c340b12976ab9eb8ad26615c3715",
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 1201,
"tags": ["v2"],
"annotations": {
"org.opencontainers.image.created": "2001-04-12T23:20:50.52Z"
}
}]
} Note the second manifest in particular. It is untagged. This is completely unreachable today via the registry API. Also note that it doesn't have a If we wanted, we could define more optional annotations for things like push time, pull time, etc. Really anything. Vendor-specific annotations can use their own namespacing. This could also be a place for registries to surface user-specified metadata about artifacts. All of the annotations should be optional. A registry may choose to populate a ton of annotations, but none should be required for conformance. Mandatory fields are I've excluded blob descriptors from this list, as I don't think it makes much sense. Garbage Collection around blobs is pretty consistent across registries, and all blobs that are in the registry should be reachable through these manifests (ignoring GC strategies). If anyone thinks it makes sense to have a list of blobs accessible, I'd love to hear why, but I think it's out of scope. Pagination should work identically to tags listing. 2. Repository listing APIThis is the same as from _catalog: type RepositoryList struct {
Name string `json:"name"`
Repositories []string `json:"repositories"`
} With the added "name" from The only thing we need to do differently from catalog is make this work for a repository and not for a registry. Perhaps a top-level thing could exist, for certain registries, but it certainly shouldn't be required if it doesn't make sense (e.g. for GCR, where everything is namespaced as As an example:
{
"name": "library",
"repositories": [
"adminer",
"aerospike",
"alpine",
"alt",
"amazoncorretto",
"amazonlinux",
"arangodb",
"backdrop",
"bash",
"bonita",
"buildpack-deps",
"busybox"
]
} This should list repositories immediately under "library". For registries that support multiply-nested repositories, like GCR, you should be able to subsequently call Pagination should work identically to tags listing. The only downside to this that I could see is that there's no place to stick arbitrary metadata like we would have in descriptors/list. Honestly, I think this is fine, but I'm open to feedback if someone disagrees. I don't mind having something like a I don't think sorting or filtering are nearly as interesting for repositories as they are for descriptors. |
Ah, look at the timing: https://www.docker.com/blog/open-sourcing-the-docker-hub-cli-tool/ I worry a bit that Docker will care less about implementing a standard thing for this. |
@jonjohnsonjr Big thumbs up from me. I would lean towards having a |
That seems reasonable to me, especially if it unblocks Docker Hub. |
I'll read through the detailed post above tomorrow. But I did want to quickly note the docker hub cli.
|
@jonjohnsonjr the reason the Hub cli tool is currently a standalone binary not built into Docker cli is the standardisation issue - I talked about this on the OCI call before Christmas. We want to release something that works across registries but its a mess now. |
Oh yeah, I think it's completely reasonable -- don't get me wrong. Before, Docker Hub was in an awkward position where there was no way to do this, so my hope was that y'all would be extra motivated to throw weight behind some standardization here. Now, y'all are just in the same boat as the rest of us :) there are good reasons to have a separate CLI, e.g. to expose Docker Hub specific stuff.
Absolutely agree, which is why I'm pushing on this. There's this tragedy of the commons where all (most?) registries now have a bespoke way to do the same thing, which is often good enough for a customer, but it hurts the ecosystem. If I want to write a tool that lists stuff in a registry (say, spinnaker), my options are:
That sucks for users, and I think OCI exists to solve this kind of problem. cc @hdonnay @samuelkarp @bainsy88 Do y'all have any interest in fixing this? |
While true, we will need a change to support the requirements of Notary v2. We are focused on two dimensions:
This is a great start. I like the paginated list of descriptors, although the notary signature scenario likely calls for a slightly bit more info. I'll capture in the listing api requirements.
Also great. I think we all struggle for how to provide a history of digests to a given tag. As we move into the gated-mirror scenarios, I expect we'll see customers asking for "rollback" of a tag to a previous digest. When not if an update fails.
Having a list, which we can all prioritize will surely help. I suspect what might be a lower priority to some, might be a higher priority to others. So, hopefully, we can divide and conquer. Both, for the spec and the reference implementation.
Yup, hopefully, we finally have enough "pain" to invest in the gain.
With the newly formed CNCF distribution/distribution group, I've forwarded the links to these discussions. Hopefully, we'll get more engagement and feedback. I suspect the Notary v2 work, that must land this year, will be enough of a compelling event to provide feedback, and be a compelling customer need that we'll make good progress. It's not just my hope to leverage Notary to achieve these list goals, rather it's a requirement to meet the Notary v2 requirements.
Let's capture these in the requirements. I think all registries have proprietary APIs to handle these scenarios. I'm hopeful we can take our experience for what we like and don't like to make a spec'd api we can all implement.
I'm going to do the PM thing and ask we start with a set of requirements. It's really helped defuse the debates over different designs. Rather than argue which design is better, we can argue which designs meet the prioritized requirements, with the right usability. BTW, I do like the proposal to return a
This is a great point. We should also capture the requirements.
To balance the boil the ocean, I'm actually hopeful
Yuppppppp!!!!! We can do this… |
In the spirit of "no time like the present", and "I've got to run, it's Friday night": |
I don't think requirements are actually useful here. We aren't building a product -- we're trying to achieve consensus on extending a protocol. If that protocol is too burdensome for everyone to implement, we've failed to make any progress. What would be useful is a set of limitations, e.g. what is everyone currently capable of doing or willing to implement? What kind of features would be impossible for some people to implement? I'm looking for the lowest common denominator that is still useful. Everything I've proposed is already part of the spec, so everyone should be able to implement this unless they've made some very interesting choices in implementation (and I'd really like to hear from them, if so). I guess we could express some of these limitations as requirements, if that would help you to understand:
This doesn't make any sense to me. To borrow your metaphor again, you somehow see a dependency between
I don't understand how this can be a requirement when it doesn't exist? |
@jonjohnsonjr More than happy to be involved in this. I think I agree that as a first pass just having standardisation around the simple operations of listing repositories and a list of descriptors per repo. Also completely get the need to open this up for extension.
I think the above is really important, this should work with a vanilla Registry. If you take distribution running on S3, that is really not geared up for performant listing. Having the meta data in object storage really limits how quickly these list can be produced. It will be challenging to make that performant regardless so adding additional complexity will add to the pain. Storing additional data would also be introduce the problem that you would need a process that goes and adds all the new information to storage for all existing repos/images. |
Yes this is exactly what I'm thinking about. If your registry implementation happens to have a backend that can be easily paginated, sorted, or filtered, then of course these additional features would be useful (and likely spare you some cycles), but requiring these features is burdensome on implementations that don't have their metadata store set up for this already. I don't think Docker Hub supports tag pagination, even today. If we have any requirements that aren't trivially implementable by existing registries, this will either not get implemented or take years to roll out because registries will need to perform migrations, backfills, etc. Another example is quay's support of schema 2 images. As I understand it, adding support wasn't particularly difficult, but the backfill process took a long time. |
Do we see edge-level infrastructure and root-level infrastructure implementing the same APIs? |
I'm not sure what kind of topology you have in mind, but I think it's reasonable to expect anything that implements the tag listing API should implement these APIs as well. If clients or caches wanted to expose something similar to this, I think it's a nice way to discover content (e.g. should be compatible with |
How can we define and agree on useful? We can title the PR anything you'd like. But, if we can agree on what we're trying to solve, we can then have an actionable conversation on how we're solving it.
We can definitely add this to the PR. This was one of the issues with the
As registry operators and product owners, there are lots of great features we'd all like to add. For the listing API, we've each added APIs to unblock our customers. So, it's hard to justify a new listing API, when we have so many other top-priority requirements. What I'm suggesting is Notary v2 has strong business justification from most, if not all registry operators and products. While we might have been able to create a more focused solution, just for signing, we're taking a more generic approach so that we can support the signing of images and all artifact types. Including reference types like an SBoM, Singularity, WASM, Helm, Nydus and others. By bundling the listing API, which is a pri-0 requirement for the end to end scenarios, we can likely deliver a common solution that meets all our needs. Notary v2 is based on cross registry integration. Kinda hard to do that if we don't have a common listing API.
The
I totally get this one. Changing data storage, adding indexing is a major change. Somewhere we had a reference to minimizing storage changes. However, the need to support adding multiple signatures without changing the digest and/or tag of the artifact being signed meant we needed to add a reverse lookup (reference) model. But, this is where the business need will drive the priority to get it backlogged. Now, in comparison, supporting Notary v1 is quite complex, and doesn't meet the needs, so we actually think the net work is smaller than it could be
@stevvooe Are you referring to on-prem, or IoT scenarios? |
By having this discussion. I can PR my proposal, if you'd rather do it on a PR, but it's all just markdown and comments in the end.
I think it's somewhat obvious -- there's a big I'm happy to make concessions to support other use cases, but I think it's pretty obvious what needs to be done, and I care very little about the exact implementation details. If you're telling me that ACR will not implement repo or manifest listing APIs unless we merge the artifacts and notary stuff first, I'll be pretty frustrated, but that's exactly the kind of feedback I'm looking for.
There doesn't seem to be any mention of listing currently. Are you saying you'd like to add a rider to the notary stuff that registries must implement a listing API to be compliant? Or is this already a P0 requirement that I'm missing? I think that's fine, as long as we converge on something, but I don't want to block this on notary requirements because this proposal would benefit registries and clients that can't or won't implement notary.
If it's a requirement you've now either violated the "no additional storage" requirement (which requires a backfill) or require implementations to read and parse every single manifest to retrieve this field, which would be a performance nightmare.
Giving registries the option to surface top-level annotations via the ManifestDescriptor as I've proposed should satisfy this use-case, right?
I don't agree with this -- that's just one possible implementation that satisfies the requirements. Adding a reverse lookup for metadata is an interesting proposal. Adding weak references to the content model is also an interesting proposal. I'd like to see the semantics of both of those things defined and consider the consequences of making a breaking change to the image-spec before passing any judgement. "Notary needs this" is not really a convincing argument for breaking every client and registry on the planet to me, personally.
What does this mean?
Based on what you've said above, it seems that Notary v2 is also quite complex? |
Much of this might be easier for discussion on the OCI call, to which I see you've added an agenda item. I'll reply here for the breadth of conversation, and those that can't attend the call. The overall gist is we really, really want to land this. We just don't want to ship yet another API that doesn't get implemented because, ...we didn't capture the needs...
Yup, I get it. There are a few requests trying to converge. Rather than run parallel requests on the same thing, I'm just asking we step back and assure we're capturing all the known things so we can get adoption and spec-it, code-it, and have all the registries ship-it.
It's not that we don't want to incorporate this. I think all registries would like a standard. On the CNCF distribution call, we briefly discussed this as well. The CI/CD vendors are in an even more difficult space as they need to support multiple registries. This means each must write different code paths for each registry. Rather than assume we know all the use-cases, let's capture them first. As noted above, we and other vendors feel we must ship a Notary v2 solution in the coming months. It must have a discovery API to support the ability to push, discover and pull signatures. There's a PR in the staged notary/distribution repo that has a very rough prototype for it. However, I would not look deeply as we have a newer prototype that uses the OCI Artifact Manifest that's also work-in-progress and not ready for review as we're still iterating ourselves. We're using the prototypes to validate the specs and scenarios.
The approach we're taking is to explicitly design these as independent capabilities that enable a breadth of scenarios, including Notary. If we've done the design correctly, implementing a few, but important, changes will enable a wide range of scenarios. To your other point, it would be great to know what would block a registry from wanting to implement these. We believe all registries need an artifact signing solution and having a common spec so content can move within and across all OCI conformant registries is the customer need we must meet. It just so happens we need discovery APIs to complete the experience, so we should get the listing API as a result.
Backfill is a huge issue. It will take some more thought process, but I believe we have a design that says only new artifacts that reference existing artifacts will use this new storage/indexing requirement. So, no backfill would be required.
The design specifically calls out not changing the
As much as we'd like to add better ways to do the same capabilities, we're all over-committed to delivering new or enhanced capabilities to our customers. All registries have proprietary listing apis. So, technically, we're not blocked. CI/CD vendors have a fun time dealing with all the differences, but they're not blocked either. So, from a business need, it's just difficult to prioritize a refactoring. Particularly if the new API doesn't account for the behavior we all implement in our proprietary APIs. Thus, the need for vendor extensibility. I'll add more to the requirements later today/tomorrow to call this out. We are blocked in delivering a standard artifact signing solution that spans the various cloud and registry providers. This is something we're all hearing from our customers and it will get prioritized.
There will be a few key changes. But, they won't be unique to Notary, and we believe they will be far easier than implementing Notary v1, which doesn't satisfy the requirements of content signing. So, it's not as complex, and provides far more benefit. But, that's still for us to finish identifying so it's more obvious. |
Sure, here's the need, IMO: All manifests in a registry should be discoverable through the registry API. I would be satisfied with just a list of digests for manifests and strings for repositories, but we can obviously do better than that.
This really tunnel-visions on the problem. It is much simpler to make these things optional than to codify implementation details like this.
How could you possibly know this? It's also not true. I don't think it's fair to ignore registry implementations that aren't from giant cloud vendors.
Sure, but we should also consider the difficulty of the implementation. Adding support for your complex notary and artifacts proposals is a huge undertaking, whereas these listing proposals are dead simple and shouldn't (my hope) be difficult for any registry to satisfy, unless we keep attaching pork fat to it. |
Revisiting this after OCI-Artifact #29. Over there, we are proposing an API to query artifacts by their connection to existing manifests, including filtering by I'm not sure of the timing between the 1.0 Distribution and Artifact specs, but in my ideal world we'd see both go GA this year, and so it would be useful to include the |
I'm fairly opposed to adding I'd also not want to make filtering mandatory if we can avoid it. It's convenient but also entirely possible to implement client-side. |
I like the idea of flexibility, but defer this decision to the registry operators that have to implement this at scale.
Agreed. That follows along with many of the other registry APIs, like pagination. |
Is this captured well enough here: 3. A user can get a list of manifests, within a given registry/namespace.
The clarification above was related to the Your comment on optional and codifying an implementation is more relevant to this larger conversation. There are two approaches:
1 above is somewhat moot, as we can't agree that push should be core to a registry :(. So, we're basically saying most things are optional to an implementation of the distribution-spec. So, regardless of what gets added to distribution-spec, we'll need a definition for non-supported behavior: HTTP 501 for spec features a registry doesn't support I've queued up a conversation in our next call to discuss how we proceed with distribution-spec features and extensions. |
This is probably not the right place to leave this comment, but I haven't found a better one. Suggestions as to better venues are welcome. What I would find really useful, as a maintainer of base images that other teams derive from in order to build release artifacts, is the moral equivalent of I'm aware that there are lots of other things that one might want to index on and filter by. But this is, at least, not an abstract need, nor is it a special case of "find things I can garbage collect to reduce my spend". Labels are, in practice, the user-defined metadata that gets propagated along a sequence of derivation steps. (Assuming you don't clobber them; I think this is where Label Schema and OCI Annotations went wrong, by trying to assign label names that describe the "final" image, and therefore have to be clobbered in every Dockerfile that layers in more tools or configurations. But that's a problem with conventional usage patterns, not with the tooling.) In any case, they're a reasonable thing to want to apply equality / set membership filters to. One might hope to use the same syntax as Kubernetes label selectors (https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors), which at least one major cloud vendor seems to have decided are the only list filtering mechanism worth implementing at scale (https://cloud.google.com/run/docs/reference/rest/v1/namespaces.revisions/list). So, yeah. I'd be pretty happy with |
+1 this one to bubble up in folks' queues :) |
This is pulling out some comments made in #22 and a bit related to #114 . It would be useful to have some kind of
/v2/<name>/manifests/list
API similar to what we have for tags today. Returned from that should be all manifest digests within that repository.This can be useful for building user scripts to implement the GC policy outside of the registry, looking for dangling manifests that do not have any tag pointing to them, and calling the manifest delete API when the user defined criteria is met. For example, a user script could examine the manifest and referenced config, looking for labels indicating the image was part of a nightly build, and remove any manifests pointing to a 2 week or older build.
The text was updated successfully, but these errors were encountered: