-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need more details of annotation collection #530
Comments
Are collected annotations defined if the validation of the instance fails? Or because the whole schema fails, they should all be dropped (they are defined but empty)? |
@epoberezkin I think the conceptual algorithm is something like:
So the reason that you never get annotations from
Of course, in practice you would probably check assertions like But I think if we conceptualize this as collecting annotations as we work from "leaf" schemas up to the root schema, and dropping those annotations whenever we reach an assertion failure, that's the right simplified mental model. I like it a lot more than "collect annotations except under keywords X, Y, Z" Does that make sense? |
@epoberezkin I had not thought about whether dropping means "defined but empty" or "not present at all". I would lean towards "not present at all". If you don't have annotations, you shouldn't know that you could maybe have had annotations. That's a gut reaction, though, I need to think on exactly why I think that's the correct behavior. |
@handrews I agree with @epoberezkin on the "defined but empty" part. It makes more sense to me that an empty array says "evaluation completed without errors, but nothing was found." Also, as a very minor point, JSONPath contains verbiage that allows for an empty array to be returned in the case that nothing was found, but the primary return in these cases is Still, either way ( |
@gregsdennis, if I'm reading @epoberezkin correctly, the only time you have these placeholder values is if annotation values were collected, but were dropped due to a schema failure. What this would give you is the ability to determine whether information was collected and then revoked as not relevant, vs simply never having been collected. What is the use case for that distinction? In my view, the fact that a schema that failed validation happened to have some annotations in it is an implementation detail that should not be leaked. Dropping annotations due to a failed assertion means dropping them as if they were never collected. Consider the following schema: {
"allOf": [
{"type": "array", "minItems": 10, "description": "foo"},
{"oneOf": [A, B, C]}
]
} where Consider the case where the first thing that is checked is However, if "dropping" annotations means keeping the annotation name but clearing the data to Why do all of that extra work? What benefit does it bring? Plus, in general with JSON (and JavaScript, but the important part is JSON), things that aren't relevant are usually simply not present. In JSON Schema, absent keywords never cause validation to fail, and there are no required placeholder values. That is the general philosophy that JSON Schema, as a JSON-based system, should follow. |
@handrews I see the distinction and your confusion. I don't mean that we should continue processing annotations if we know validation will fail, only to remove those annotations. I'm simply talking about the return value. It makes more sense to me to return an empty array (or object if your example is to be followed) when there are no annotations rather than to return null. If I know at some point during validation that it will fail, I can still shortcut collection in my implementation and return the empty array/object. Coming from a strongly typed language, it's nice for me to know what type of entity I'm going to get. It's a question of getting JSON object vs JSON null (which is very different than .Net null). |
@handrews another question is whether you've considered collecting all keywords in the passed (sub)schemas as annotations. i.e., when the validation fails, it's usually obvious how it did. When it passed, it's usually not as obvious which branch made it to pass. In theory, while only some keywords are assertions, all keywords can be seen as annotations (e.g. required) and knowing which of them have been used to pass the validation can be useful. The question is prompted by this: ajv-validator/ajv#661 |
@epoberezkin I think that's interesting, but a separate issue. |
@gregsdennis I think that the question of how to represent an empty set of annotations is mostly an implementation-specific thing. Following the conventions of JSON (which come from JavaScript), it is preferable for something to be absent than to be a literal JSON However, when it comes to working with schemas and schema results in-memory, an implementation may choose to represent that absence however it wants, and should do so in a way that is as idiomatic as possible for the implementation language. Maybe that means using a .NET [EDIT: I realize the I misread the question and it is about using an empty data structure] As far as an empty data structure, I'll have to think on that. It might be reasonable, or it might be that strongly typed languages should just use an empty data structure where other languages may not use anything. It's so variable- in JavaScript empty data structures are truth-y, but in Python they are false-y. Ideally JSON Schema is idiomatic JSON, and in-memory representations are as close to idiomatic for that language as is practical. |
To summarize some off-github discussions with @Relequestual and @philsturgeon:
This last point is about what leads to "pointers" like: The runtime processing location seems useful because in situations such as the example I gave in the first comment, applications often want to define rules about "parent" schemas overriding "child" schemas. In this usage, the schema containing a I'm open to suggestion for a better, more clear name for this. An alternative to the
This is a more complicated format but avoids the need to define an in-memory runtime tree concept. |
I'm leaning more and more towards
as the format, as it sticks with existing JSON Pointer concepts. Also, probably one of the most common things an application will do is check whether something is above, alongside, or on the far end of a reference, and having a It would also allow the annotation collection process to be defined without requiring knowledge of whether you have crossed a reference or not. If you're creating one pseudo-json-pointer you need to "remember" the pointer prefix that had been built at the time that you follow the reference, and keep appending to that. If you start a new pointer when crossing a reference, then you don't need to do that. You just always start the pointer from where you start processing something- an absolute pointer if you're starting with a document's root schema, a relative pointer otherwise. And the you "return" that back across the |
I don't understand the Personally I like the included |
@gregsdennis it's a Relative JSON Pointer starting from the target of the
associated with the value "X" from something like: {
"properties": {
"foo": {"$ref": "#/$defs/bar"}
},
"$defs": {
"properties": {
"bar": {
"title": "X"
}
}
}
} |
@gregsdennis what makes the pointer-with-$ref easier for you to work with? I'm not dead set on either option, just trying to figure out the tradeoffs for both implementations and users here. |
In order to indicate where an error occurred, I build a path backward from the leaf to the root. It's just easier for me if I don't have to break it up at a reference node. I like your example in the original post. It's easy to read and makes sense. "The Also, given how the pointer is a key, how do you intend to use the array syntax? |
It would have to be a somewhat different structure, yeah. If we go with the array we'll sort the rest of it out. Now that you mention it I do think that I tend to see error reporting like that (usually with the Hmm... good points. I'm not in a rush to decide this so let's see what other opinions pop up :-) |
Section 3.3 states that:
My reading of this is that if input value...
...is validated by schema (fragment)...
...then the output value would be:
If I'm correct so far, then what happens if the input value is...
...and the validating schema (fragment) is as follows?
(end) |
@spenced the "attaches" wording was not intended to mean "write into the instance". It means to associate it in some vague and not really specified way. We're working out the mechanism in this issue. It definitely will never mean writing it into the instance, because that would constrain what instances can be used with annotations. I might publish a bugfix / wording clarification of core and validation (like I did with hyper-schema and relative json pointer about a week ago) and include some clarification here. It would still be draft-07 (no functional changes). But this is confusing and the |
@handrews Thanks for the clarification: I am happy to be wrong! |
With my new understanding, I can see that annotation values are determined from the reverse relationship of validated value to the schemas which positively validate the value. This set of schemas is the source of the annotations. As @epoberezkin notes, why discard assertions when they also provide useful information? Assertions are just extra data at the target of the relationship. Similarly, unknown keywords in the schema could also be passed through to the application in this manner. |
@spenced only irrelevant annotations are dropped. If the validation of that branch fails then the annotations are not relevant, as @handrews explains here: #530 (comment) I like the original examples for the same reasons as @gregsdennis. It's visually very clear whats going on, the code will be trivial to write, and I too am still unsure what that As for "should the keywords be on the ref", im a little torn between;
and
The first is more verbose and again visually very simplistic, and the latter is going to be a little lighter to work with for biiiiiig schemas with a lot of annotations. Either way there will be complaints. "It's annoying having to stick the keyword from the key onto the end of the string, this feels like constructing URLs!" or "This is so dumb why is it repeated so much. There's not gonna be a winner but we should make a call. |
Do you understand Relative JSON Pointers? (not sarcasm! They're not the most intuitive if you're not expecting them) I'm trying to figure out whether I need to explain what a relative JSON Pointer is, or just explain specifically how that particular relative JSON Pointer is evaluated, and why (although it seems likely to be moot for this issue as the other option is clearly more popular so far). https://tools.ietf.org/html/draft-handrews-relative-json-pointer-01#section-3 |
[EDIT: I've changed my mind on annotations not tied to a single keyword, see the next comment] @philsturgeon asked for an example of keywords contributing to another keyword's annotations. With some trepidation, I will demonstrate this with the
To do this we need two features, one of which already exists:
How this would work is that annotation behavior would be defined for As with all annotations, if the instance fails to validate against a schema at any point, all of its annotations, including those contributed by its subschemas, are dropped, and only the validation failure is propagated to the parent schema. When handling A benefit of this is that an implementation of This mechanism of allowing/requiring other keywords to contribute to an annotation is important, particularly in a multi-vocabulary world, because it decouples the behavior of the keyword using the annotation from any specific vocabulary or keyword that contributes to the annotation. |
While working on #600 (Keyword behavior can depend on subschema or adjacent keyword annotation results), I worked through more details on how all of this should work. I came to the conclusion that having annotations associated with anything but the collecting keyword's name is far too confusing. It creates an unpredictable interface between subschema and parent schema evaluation. The problems include:
So, in #600, you can see that I did the following:
Implementations are explicitly not required to code these keywords exactly as specified in terms of annotations, as other implementations are likely more efficient. Also, these keywords need to work in implementations that opt out of collecting annotations. Note that the schema location of the annotation is critical for implementing
With these keywords, annotations from all schema locations that have yet been processed (using the depth-first approach outlined in earlier comments) are considered. Note that the
There are a lot of issues with that but they're unrelated to annotation collection so I'll track those in #556, #557, and #561 (vocabulary support) or new issues. |
Regarding
I no longer thing either will work, as the whole concept of working with a post-reference-resolution structure is more complicated than it looks at first, and schemas are more naturally and reliably identified with full URIs. Better solutions are being discussed in #396 (output/error schema), so I'd rather that discussion be consolidated over there. This issue should continue to track the general annotation collection algorithm, and the required data to associate with the annotation (instance location, schema location, value, plus something around rolling up collected multiple values for things like Once that's done, I'll close this, and we can continue the discussion of the exact data structure, syntax, representation formats, etc. in #396, which is now also targeted at draft-08. |
At this point I think everything here has either been addressed by PR #310, or is being tracked by #679 (Output formatting) or #635 (general concerns about the clarity of keyword behavior description, including annotations). I'm going to close this out because I think those two open issues are better for discussing any further work. |
There should be pseudocode and a sample/recommend output format, as there is with Hyper-Schema. This should include guidance on how annotation collection can be implemented as part of the same evaluation process as validation.
not
and non-validating branches ofoneOf
,if
, etc. is a natural consequence of that$ref
links
andbase
in the hyper-schema spec)links
): how do they fit into the overall JSON Schema model? Are there any particular limitations?Some of the above may get split out into their own issues.
The following example shows the motivation for indicating where we were in the schema when the annotation was contributed. A common request is to be able to figure out which
title
ordescription
to use, with a typical desired heuristic is that values in parent schemas are preferred over values in child schemas. Note that "child" in this cases is what me might call the logical child, following$ref
s, instead of the physical child (a nested subschema within the current schema object).So let's have a schema:
With an instance of:
The annotation results might look like this (I literally made up how
$ref
appears in the JSON Pointer-ish thing while typing this example, so don't take that as set in stone):"/foo"
is the instance pointer- this is the place in the instance to which the annotations apply.Under that, the property names are the names of each annotation. All of these are also the keyword names that produced them, but in theory you could do something that doesn't map directly (please allow me to wave my hands on that- it's related to a complicated proposal that's under discussion and may not end up being relevant).
Under each annotation, the format is determined by the annotation.
readOnly
just has a boolean, because it's explicitly stated (as of draft-07) that the overall value ofreadOnly
is a logical OR of all applicable values. So there's no need to keep track of which value came from where- just let the application know the final outcome.On the other hand, the way "title" and "description" are shown here is probably the default behavior. For those,
"/properties/foo/...
are schema pointers, more or less. Pointing across$ref
is a bit weird and I need to figure out if that's the right approach. But the idea is to be able to tell that one of those is a runtime-parent of the other, and allow applications to make a decision based on that.It might be better to use an array of terms instead of a not-quite-JSON Pointer. And it's not yet clear to me whether the keyword itself should be at the end of the pointer-ish thing / array. That also has to do with how much we want to abstract away the keyword source of an annotation. In the case of something like
title
orreadOnly
, we really do just want literally what the keyword says.The text was updated successfully, but these errors were encountered: