-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specs-go: clarify mediatypes #411
Conversation
@@ -75,6 +75,7 @@ type History struct { | |||
} | |||
|
|||
// Image is the JSON structure which describes some basic information about the image. | |||
// This provides the `application/vnd.oci.image.config.v1+json` mediatype when marshalled to JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is annoying that this one is inconsistent. Can't we safely add a media type field to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking the same. Especially since the field is omitEmpty for JSON. Perhaps make it optional with a SHOULD, so it will have compat with older docker? Thoughts @stevvooe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(to that end, also the oci-layout
file, and it doesn't even have a mediatype assigned for it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Fri, Oct 21, 2016 at 07:12:12AM -0700, Vincent Batts wrote:
(to that end, also the
oci-layout
file, and it doesn't even have a mediatype assigned for it)
+1 to assigning a media type to oci-layout
. I don't expect to find it in CAS, but image-layout directories might be accessed over HTTP, and it would be strange to return application/json
for it when all of our other schemas have more specific types.
On Fri, Oct 21, 2016 at 06:59:20AM -0700, Jonathan Boulle wrote:
It is annoying that this one is inconsistent. Can't we safely add a media type field to it?
This is going the wrong way. If folks want to look inside inside the blob and try to guess its type (e.g. by unmarshalling into Versioned
), they can do that. But I don't think we should require anyone to look inside the blob to figure out what it is. Descriptor references tell you the media type ahead of time, and we should be using those to identify blob types, and that approach places no restrictions on the blob itself. So while we may want to keep the Versioned
fields for backwards compat with Docker, I don't think we want to extend that approach to additional structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonboulle In general, we should not actually be embedding mediaTypes in the target types. The mediaType is a lens to the data. Really, we should remove them from the types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just follow along with literally every other format that exists and make it self-describing?
The only time I can see where we'd want this to be the recommended method for typing a blob is in signed assertions. For everything else, I'd rather have:
The descriptor that sent me here said this was a
application/vnd.oci.image.manifest.v1+json
, so I'll attach it to my manifest handler…
Instead of:
Lets see, does it have
ustar\000
at offset 257? No? Good, because I'm not sure how I would have figured out if that was using the.wh.*
whiteout handling or the new [static] whiteout handling (#24). Do the first four bytes match00 00 00 xx
? No? Ok, not UTF-32BE JSON. What about00 xx 00 xx
? … Maybe the first byte is{
? Ok, that sounds like UTF-8 JSON. Let me unmarshal it intoMediaTyped
. That worked! And it has a value frommediaType
! It says it isapplication/vnd.oci.image.manifest.v1+json
, so I'll seek back to the beginning and attach it to my manifest handler…
And again, I'm just arguing that we shouldn't be using peek-inside typing for image unpacking, etc. I'm ok with us deciding that we want to use it to save keystrokes on pre-CAS-push validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking You're arguing that there's dichotomy between "free for all, no need to have mediaType
" and "hueristic so that we can recognise every blob type". I don't think there is one.
The benefit of being able to know what a JSON blob is meant to represent is entirely separate from "I can tell what every blob in the image is without references". If you're not happy with detecting tar
files (like file
and libmagic
do) that's fine. But please let's not make all of our JSON objects meaningless blobs that require jumping through references in order to even understand what we're looking at (or keeping the type information out-of-band).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But please let's not make all of our JSON objects meaningless blobs that require jumping through references in order to even understand what we're looking at…
You're saying “assuming (for some out-of-band reason) that the blob is a JSON object which contains a self-describing mediaType
field, we can use that mediaType
field to unambiguously identify the content”. That initial assumption is what I'm worried about. If you see cases where you are comfortable making that assumption (for whatever out-of-band reasons), then great, use peek-inside type detection based on the mediaType
value. But I'd strongly recommend consumers use the referencing descriptor's mediaType
to avoid having to rely on that assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'd strongly recommend consumers use the referencing descriptor's mediaType to avoid having to rely on that assumption.
Your consistent implication that all consumers will have access to the entire image is getting annoying. If I have a tool like oci-do-something
which I pipe a JSON object to, I don't expect that it will be reading the repository. In fact, I might have a service that modifies the JSON objects (and therefore actually cannot access the original repo). So you can't "use the referencing descriptor" because there isn't one (that you can see).
Now, you might argue that we should send the out-of-band media type with it. But why should that be a requirement? What are you gaining by removing mediaType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, you might argue that we should send the out-of-band media type with it. But why should that be a requirement?
That's exactly what I'll argue ;). And unless you implement completely generic peek-inside detection (which I don't think anyone's arguing for), you're going to have to transmit some amount of media-type guidance along with your blob content. I'm suggesting that guidance be the media type.
You seem to be suggesting that that guidance be “this blob is a JSON object which contains a self-describing mediaType
field”. Maybe you transmit that information because the tool-caller knows the tool can only handle such media types and therefore only feeds matching blobs into the tool. That's how oci-image-validate
works, and I'm comfortable with that from a keystroke-saving perspective. However, I don't think we should pretend that this approach is completely free of out-of-band type guidance.
What are you gaining by removing
mediaType
?
I'm not suggesting we remove mediaType
, because some users (e.g. you with oci-do-something
, or a number of people with oci-image-validate
's autodection) can't be bothered to pass media types around. And I'm fine with that (typing out a long media type is not something I'd like to do repeatedly).
I'm just suggesting image-handling tools follow the spec's SHOULD and use descriptors to reference blob content, with peek-inside type detection being reserved for signed-assertions. And having acquired the media type from the referencing descriptor (or because we authored the blob ourselves), I see no need for image-handling tools to use peek-inside type detection.
Perhaps our difference here is that I see (almost) all tooling as being descriptor-based, while you see the tooling as being isolated-blob based. Since I'm fine leaving existing mediaType
entries in place, maybe we can just wait a year to see how that plays out and revisit this discussion then?
type ManifestList struct { | ||
specs.MediaTyped | ||
specs.Versioned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Versioned
contains MediaTyped
(which is what you currently have), isn't it redundant to list MediaTyped
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, technically. Does not change anything and is very apparent that it is media typed
@vbatts Is this to support partial decode for mediaType sniffing? If so, are there other fields that are common to all types that we want in this decode structure. |
Why did we ever do this? On 21 October 2016 at 20:35, Stephen Day notifications@github.com wrote:
|
type Descriptor struct { | ||
// MediaType contains the MIME type of the referenced object. | ||
MediaType string `json:"mediaType"` | ||
specs.MediaTyped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, so this is actually wrong because the mediatype here is the REFERENCED object not itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, same field name, and opposite intention. :-|
@vbatts can we rather remove media type fields? |
@jonboulle like completely? or just this PR? |
@vbatts like completely |
well, honestly I like the self describing nature of it. I wish the descriptor field names could be fixed, such that it could self describe and have a separate field that describes the object pointing to. mt := v1.MediaTyped{}
if err := json.Unmarshal(buf, &mt); err != nil {
// handle yer biz
}
switch mt.MediaType {
case v1.MediaTypeImageManifest:
// something specific
default:
// handle yer biz
} So we can have discovery without having a |
On Fri, Oct 28, 2016 at 07:07:03AM -0700, Vincent Batts wrote:
There was a bit more discussion of this in We SHOULD descriptors for all blob references 2, so if the initial a. Created it independently (in which case they should know what it is), (c) is the only tricky case, and the solution is to have the GC engine |
My main concern is that without @wking gave the example that In particular, the fact that descriptors don't say that they're descriptors is quite silly ( |
The tar+gzip headers do not distinguish Walking the DAG sounds like an integral part of using CAS. Can you provide more details about what you mean by "just output the JSON blob that I am trying to look at"? |
Lets say I want to modify the config for an image. Currently the way I need to do this is:
In order to update the image I now need to:
Now, my main point is that the OCI tooling hasn't added anything. The process takes precisely as long as it would using I'm going to end up writing a wrapper for all of this, but I think we should be considering who the target user of these utilities is. Looking at |
But ls, cat, and rm don't abstract away the CAS registry implementation. And the presence of low-level tools like oci-cas doesn't preclude additional tooling like oci-jq-manifest (or whatever your high-level tweaker looks like). In fact, oci-cas makes it much easier to write a CAS-engine agnostic oci-jq-manifest. I'm missing the connection between "I'll wrap DAG walking because it's tedious" and "I need to discover media types by inspecting the blob content". In your example above you start with |
@wking My point will be more clear if I just make my wrapping tool. I'm working on it now, but in order to handle it properly I need to merge this PR into my branch -- so we can just continue with merging this (and we can discuss higher-level handlers later). |
My point is that I see no need for peek-inside type detection. But the |
@wking ... You can't have it both ways. In opencontainers/image-tools#5 you want to make the tooling around images as flexible as possible (so that you can support extensions that don't match the spec). That's fine, but you can't then also say that all of the objects should no longer be self-describing. The result is that you cannot use any generic OCI tooling for images, because all of the objects are defined relative to one another and thus you couldn't effectively modify a single object (you'd have to parse the image each time to know what object you're modifying -- and even then you'd need to track the object types implicitly). In other words, you'd have to craft new tooling for every new producer of an OCI image (because they could add extensions that make it impossible to normally parse the image). Explicit is better than implicit. |
It's a Merkle DAG, you can never modify a single blob. Peek-inside type detection doesn't change that. |
Even if you're modifying the root blob, you'd still need to push a new ref. |
@cyphar The model for CAS-based engine distribution typically requires referencing an object through type-qualified references, which are called descriptors. I disagree that using CAS in this manner prevents extension, as I don't think OCI could be introduced at all if the concepts defined in manifests and manifest lists didn't work. We have been doing this in compilers and runtime since forever. The difference is the propagation of the modification. Typically, the process of modifying something requires a walk don't the qualified reference path, then anything that was touched or references something that was touched gets update. This is what happens with this technology (I also suspect this is part of why CAS is not adopted more often). When you touch something, it modifies the "address" of that thing and those changes and their references need to be updated. I don't know what is going on in opencontainers/image-tools#5, but if CAS access to a tar file is being exposed over CLI, something is going way wrong. In general, I'd like to see a proposal for the image-tools that are useful. I am not familiar with your work, but it would be helpful if you could identify the problems that have led to a "wrapper" or even how we can incorporate that work into OCI. |
On Mon, Oct 31, 2016 at 02:22:54PM -0700, Stephen Day wrote:
I disagree, but I think we should probably be discussing this point in
+1. |
I did, then you just opened three more PRs. I have no more time to waste. |
@stevvooe Okay, so here's what I'm working on at the moment (the only reason I haven't published it yet is just because right now I'm doing a bunch of go-mtree related work rather than actually getting The main issue I have with the current incarnation of @wking's The tooling I would like (and am planning on writing) actually is written around images as a first-class concept. In particular, it lets you modify images rather than blobs. So you can say "modify this field in the config of this image ref" and that will translate to replacing the config, the manifest blob that referenced it and the ref blob that referenced that. Same thing applies to changing the set of layers and so on -- the tooling is actually written around the end-user goal of actually modifying images as a first-class concept. And in my case, |
On Mon, Oct 31, 2016 at 08:22:31PM -0700, Aleksa Sarai wrote:
oci-cas is generic CAS tooling. oci-refs is generic mutable-refs
This sounds useful to me too. I don't see why having this
So your imagectl doesn't need to self-describing blobs or a type-aware
It's not just exposing tar-handling code, it's exposing a generic
I completely agree that we don't want oci-cas and oci-refs to be our
I'm still not understanding this leap. Can you present a workflow |
On Mon, Oct 31, 2016 at 09:45:02PM -0700, W. Trevor King wrote:
The only situation I've seen where self-describing blobs is useful is Are there other use-cases for self-describing blobs that do make this |
@cyphar I completely agree. CAS is too-level to be useful. Thank you for the break down! moby/moby#27455 is a proposal to docker to add a manifest tool which has the same UX problem. Basically, you need to "check out" an image, modify it, then check it back in. In the spirt of OCI being made up of proven technologies, I think it would be worthwhile to let |
lots of philosophy and bike sheds, it seems to me. Is the motivation to remove |
Due to the conflicting use of the `mediatType` field across documents, and after discussion on opencontainers/image-spec#411, this changeset removes the use of `mediaType` where it is used to refers to a document's own type. Leaving only the use of `mediaType` for descriptors, where it is used to describe the type of a referenced object. Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Due to the conflicting use of the `mediatType` field across documents, and after discussion on opencontainers/image-spec#411, this changeset removes the use of `mediaType` where it is used to refers to a document's own type. Leaving only the use of `mediaType` for descriptors, where it is used to describe the type of a referenced object. Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Due to the conflicting use of the `mediatType` field across documents, and after discussion on opencontainers/image-spec#411, this changeset removes the use of `mediaType` where it is used to refers to a document's own type. Leaving only the use of `mediaType` for descriptors, where it is used to describe the type of a referenced object. Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Due to the conflicting use of the `mediatType` field across documents, and after discussion on opencontainers/image-spec#411, this changeset removes the use of `mediaType` where it is used to refers to a document's own type. Leaving only the use of `mediaType` for descriptors, where it is used to describe the type of a referenced object. Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Due to the conflicting use of the `mediatType` field across documents, and after discussion on opencontainers/image-spec#411, this changeset removes the use of `mediaType` where it is used to refers to a document's own type. Leaving only the use of `mediaType` for descriptors, where it is used to describe the type of a referenced object. Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Due to the conflicting use of the
mediatType
field acrossdocuments, and after discussion on
#411,
this changeset removes the use of
mediaType
where it is used to refersto a document's own type. Leaving only the use of
mediaType
fordescriptors, where it is used to describe the type of a referenced object.