Add $vocabularies, clarify $schema and $ref #432

handrews · 2017-10-04T05:54:05Z

This addresses issues #314 and #431.

"$schema" is now explicitly intended for meta-schema declaration.
An explanation for why it MUST only be present in root schemas
has been added so we (I) don't forget about it and freak out (again).

The concept of vocabularies is now introduced explicitly, and
the "$vocabularies" keyword is introduced to declare vocabulary
support. For compatibility (and simplicity in the case of
a single standard meta-schema conveying sufficient information),
omitting "$vocabularies" in the root schema causes it to behave
as if the "$schema" value is listed as a vocabulary. This preserves
all existing behavior.

Finally, a paragraph is added to "$ref" clarifying the conceptual
model, specifically with respect to "$schema" and meta-schema
validation.

$schema is now explicitly intended for meta-schema declaration. An explanation for why it MUST only be present in root schemas has been added so we (I) don't forget about it and freak out (again). The concept of vocabularies is now introduced explicitly, and the $vocabularies keyword is introduced to declare vocabulary support. For compatibility (and simplicity in the case of a single standard meta-schema conveying sufficient information), omitting $vocabularies in the root schema causes it to behave as if the $schema value is listed as a vocabulary. This preserves all existing behavior. Finally, a paragraph is added to $ref clarifying the conceptual model, specifically with respect to $schema and meta-schema validation.

This modifies the meta-schemas to explicitly forbid "$schema" in subschemas, and adds "$vocabularies".

Note the intended use of the meta-schemas includes both "$schema" and "$vocabularies".

epoberezkin

I believe $vocabulary should always be a single string, rather than array, so multiple vocabularies should not be allowed in the schemas. I also believe that $vocabulary should only be allowed in the root schema. I will elaborate in a separate comment.

handrews · 2017-10-06T19:45:18Z

I believe $vocabulary should always be a single string, rather than array, so multiple vocabularies should not be allowed in the schemas. I also believe that $vocabulary should only be allowed in the root schema. I will elaborate in a separate comment.

The entire purpose of $vocabularies is to clearly support multiple vocabularies, for instance validation + hyper-schema + UI generation + some custom vocabulary.

The nature of vocabularies is that they build on each other, so in order for a general system to tell whether their is a usable vocabulary, they need to know each vocabulary in the stack.

You are essentially saying that you want $vocabularies to be exactly like $schema, which is pointless. $vocabularies was specifically designed to not impose any file-wide requirements on schema processing.

handrews · 2017-10-06T19:53:03Z

@epoberezkin I could see an alternative where meta-schemas declare which vocabularies (plural) they describe. In this approach, implementations would examine the meta-schema to look for recognizable vocabularies, and instances would declare $schema exactly as in draft-06.

I would be just as happy with that approach.

In both approaches, a conforming implementation that only wants to support validation with no extensions can almost entirely avoid "$vocabularies" as the meta-schema is sufficiently well-known.

epoberezkin · 2017-10-06T19:56:14Z

@handrews, I definitely appreciate the progress, there are quite a few things in this PR we agree on. I will try to summarise what we agree on, please correct me if I am wrong:

$schema has a purpose to define meta-schema that can be used to validate the schema. Historically it's used to define the vocabulary as well, but given that the meta-schema can be extended (even though the mechanism is very verbose due to unsolved questions with adding properties in recursive cases), it is in general unfit for the purpose of defining vocabulary.
$schema must only appear in the root schema, as it defines the JSON schema that should be used to validate the current schema as JSON instance
$ref should not be seen as inclusion as the included fragment can belong to another schema file (=JSON instance) and therefore it can require a different meta-schema to validate.
$vocabulary keyword should be added to define the vocabulary of the schema.

Things we do not agree on (again, please correct me I am wrong):

@handrews:

schema can define multiple vocabularies ($vocabulary is array of strings)
$vocabulary can be used in subschemas
$vocabulary is URI (of what? spec or meta-schema? if the latter will it be the same as $schema in most cases?)

@epoberezkin:

schema needs to support only one $vocabulary (it is a string); but a vocabulary can be extending another vocabulary (in a way hyper-schema extends validation vocabulary)
$vocabulary can be used only in root schema
$vocabulary is a descriptive identifier defined in the respective I-D (e.g "validation", "hyper-schema", "ui-schema", etc.)

Arguments to allow separate vocabularies in subschemas and allow multiple vocabularies

you can package multiple schemas in the same file conveniently
you can have schemas that both generate code and UI (e.g.)
anything else?

Arguments to NOT allow vocabularies in subschemas:

validation concern. Each vocabulary assumes a specific meta-schema for validating the schema, whether the standard or extended meta-schema for this vocabulary is used. If the subschema defines a different vocabulary from the root schema, then it is either not validated (in case $schema allows additional properties) or fails validation (in case $schema prohibits additional properties). @handrews, do you have a solution to this problem?
processing concern. In practice, different vocabularies are implemented by different libraries. If the vocabulary is defined only in the root schema, then it is very easy for application to determine which library should be used for a given file. If the vocabulary can be re-defined (or even worse, extended) in any sub-schema) than one of the following should happen:
1. application should perform schema traversal, which is not trivial (see json-schema-traversal I had to abstract from ajv)
2. each (!) library becomes responsible for processing this keyword and somehow notifying the application that some part of the schema should be processed by another library.
3. a special routing library should be used
4. Some library that implements all vocabularies should be used - I find it highly unlikely that there will be such a library.
5. @handrews any workable idea you have in mind? All above seem too complex for practical purposes...

Arguments to NOT allow multiple vocabularies

validation concern. I really hope we will eventually be in a place that to use any custom keyword one MUST use extended meta-schema. It will prevent hours spent debugging schemas with mis-spelled keywords. Once we have an effective, elegant and agreed mechanism for schema extension it will be very easy to add additionalProperties: false to all meta-schemas. Multiple vocabularies make this pragmatic approach impossible.
processing concern. same as above, it makes it much more difficult for the application to determine how a given schema file should be processed.

So I may be missing something, but I really don't see how a questionable convenience of being able to package multiple schemas in a single file (particularly given that a simple alternative approach is possible - just define a package of schemas in the spec) can outweigh the above concerns.

epoberezkin · 2017-10-06T19:58:31Z

@handrew please hold on - writing more, answers to your questions :)

epoberezkin · 2017-10-06T20:01:59Z

The nature of vocabularies is that they build on each other

I completely agree with that statement

, so in order for a general system to tell whether their is a usable vocabulary, they need to know each vocabulary in the stack.

Not necessarily, as the extended vocabulary knows which vocabulary it extends. hyper-schema extends validation. ui-schema may extend validation. Mixing ui-schema and hyper-schema - really? Do you have a use-case?

You are essentially saying that you want $vocabularies to be exactly like $schema, which is pointless. $vocabularies was specifically designed to not impose any file-wide requirements on schema processing.

No, I don't say that. meta-schema can be extended. The whole purpose of using vocabulary is to define which library should be used.

I could see an alternative where meta-schemas declare which vocabularies (plural) they describe. In this approach, implementations would examine the meta-schema to look for recognizable vocabularies, and instances would declare $schema exactly as in draft-06.

That works for me too, as they are linked. Probably it is even better. But it means that the vocabulary cannot be changed in subschema.

epoberezkin · 2017-10-06T20:08:29Z

Once we have an effective, elegant and agreed mechanism for schema extension it will be very easy to add additionalProperties: false to all meta-schemas

By the way, quite a few people asked how they can prohibit additional properties in the meta-schema.

epoberezkin · 2017-10-06T20:14:12Z

I could see an alternative where meta-schemas declare which vocabularies (plural) they describe. In this approach, implementations would examine the meta-schema to look for recognizable vocabularies, and instances would declare $schema exactly as in draft-06.

The only problem for that approach is that the meta-schema is a schema that should validate itself. So we could say, schema MAY include $vocabulary, but in this case it MUST have $schema and the $vocabulary in the schema should be the same as in the meta-schema. The meta-schema for any vocabulary would then look like this:

{
  "$schema": "some_uri",
  "$vocabulary": "whatever",
  "type": ["object", "boolean"],
  "properties": {
    "$schema": {"type": "string", "format": "uri"},
    "$vocabulary": {"const": "whatever"},
    "etc.": {}
  },
  "dependencies": {
    "$vocabulary": ["$schema"],
    "etc.": []
  }
}

Or maybe it's fine to have $vocabulary without $schema as well, we can say that both $schema implies $vocabulary and vice versa, and both MAY be used too but if so they MUST match (in which case "dependencies" above won't be needed).

Ok, now I am done writing...

epoberezkin · 2017-10-06T20:33:47Z

One more thought.

I actually agree that in the future we may need to be able to mix multiple vocabularies that can be used both separately and together. I hope that by then we will have a mechanism that allows to define a separate meta-schema by mixing individual meta-schemas. When (and if) we find ourselves in such predicament, nothing would stop us to allow a $vocabulary to be an array of strings as well.

I just don't think we are there now and we may never get there, as at the moment we only have two standardised vocabularies: "validation" and "hyper-schema", the latter already inherits from "validation", it cannot be used WITHOUT it.

Once we have mixable vocabularies together with meta-schema extension mechanism I would be very happy to support $vocabulary as array. At the moment we have neither. So why don't we keep things simple for now?

handrews · 2017-10-06T20:40:02Z

@epoberezkin yeah, I think we can sort this out :-)

As you observed from the proposal to move $vocabularies off into the meta-schema, I can drop the per-schema aspect. I want to think on it a bit more, but right now I doubt I have a compelling enough use case for it to push it. The only time you wouldn't just declare all vocabularies in a schema in the root (or meta-schema) is because you have vocabularies that somehow conflict. And... just don't do that.

Pushing the conflicting bits out into separate files and $ref-ing them is an acceptable workaround.

Not necessarily, as the extended vocabulary knows which vocabulary it extends.

I'm not worried about an extended implementation knowing its base. It has to (if only to delegate it to another library). What I need is for a base implementation to understand an extended vocabulary.

If I have a "handrews-hyper-schema" vocabulary, what I need is for my implementation to be able to recognize my own private extensions. However, I MUST NOT expect peer implementations to support them. What I need, then, is the principle of graceful degradation: If the standard hyper-schema vocabulary is explicitly declared (in the schema or meta-schema), then a standard hyper-schema library can make use of my extended schemas.

I cannot think of any way to provide this without explicitly listing all of the schemas. We can explicitly list them in some form other than a flat list, but that is by far the easiest to handle. applications don't always validate or even have a copy of the meta-schema, and may not even be able to download one.

Mixing ui-schema and hyper-schema - really? Do you have a use-case?

Um... yes. HATEOAS-driven UI. It's one of the primary use cases I have for hyper-schema. If there is no standard UI vocabulary, I'll do it with a custom one.

when we have a mechanism that allows to define as separate meta-schema by mixing individual meta-schemas

I am only willing to accept this as a reason if you have a workable proposal now. I have not heard anything. $vocabularies solves the problem, and your objection (aside from the root schema only thing, which I'm fine changing) is basically that you don't need it. You aren't doing a lot with vocabularies, but I am.

So I have a solution to this problem and you don't. Why should I throw away my solution for your vaporware that may or may not ever materialize?

I just don't think we are there now and we may never get there, as at the moment we only have two standardised vocabularies: "validation" and "hyper-schema", the latter already inherits from "validation", it cannot be used WITHOUT it.

No, you only see those. We also have three proposals, at least one of which has a de-facto implementation (json-schema-form). You also have no idea what I might be planning to do with schema vocabularies that I'm not proposing a standard vocabulary for (because not all vocabularies need to be standard).

With all this in mind, I prefer that $vocabularies be present in the root schema of the (for lack of a better term) instance schema. We should not require the meta-schema to be accessible.

epoberezkin · 2017-10-07T09:20:27Z

All you problems can be either solved with vocabulary extension (in which case you don't need multiple vocabularies) or with allOf, where different subschemas can be references to other files.

I am well aware about the progress of other vocabularies, but none of them is a published draft at this point, so adding features to the core to support them without having them published is premature.

You are ignoring the main question - how the meta-schema combining multiple vocabularies should be constructed. Once we agree on $merge or any other option from your vote-a-rama this problem will be solved.

I see absolutely no problem with you using multiple vocabularies even if only one is specified in the spec - there are many people using $data, for example, without any problem. I keep saying that the usage practice should precede the spec. So at this point we really only need $vocabulary, singular. Later we can allow $vocabulary to be plural by allowing an array of strings (same as with "type").

The argument "I need it, you have no idea what I might be planning to do with schema vocabularies, and therefore it should be added to the spec" is neither proper nor convincing. Firstly, these plans should be explained and discussed to a wider community. Secondly, I would like to see substantially more support for mixing multiple vocabularies at this point, before we have more than 2 vocabularies standardised, when the second is the extension of the first (so no mixing is needed).

To summarise, I am categorically against supporting mixing schema vocabulary at this point of JSON schema evolution - there is no proven need for it.

handrews · 2017-10-08T02:08:53Z

@epoberezkin I am going to talk with the other active project members before continuing.

handrews · 2017-10-08T02:20:03Z

I'm also just going to close this for now. I don't think the conversation is productive for others to read at this point, and I want people to focus on the hyperschema rewrite.

handrews requested review from awwright and epoberezkin October 4, 2017 05:54

handrews added core Priority: High Status: Review Needed Type: Enhancement labels Oct 4, 2017

handrews added this to the draft-07 (wright-*-02) milestone Oct 4, 2017

This was referenced Oct 4, 2017

Understanding extended meta-schemas #314

Closed

Why is $schema restricted to root schemas? #431

Closed

Forbid "$schema" in subschemas

9c5eb1f

This modifies the meta-schemas to explicitly forbid "$schema" in subschemas, and adds "$vocabularies".

handrews force-pushed the vocab branch from d7a2984 to 9c5eb1f Compare October 4, 2017 16:29

Reference $vocabularies in other spec.

4fc265c

Note the intended use of the meta-schemas includes both "$schema" and "$vocabularies".

epoberezkin suggested changes Oct 6, 2017

View reviewed changes

handrews closed this Oct 8, 2017

handrews deleted the vocab branch August 23, 2019 03:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add $vocabularies, clarify $schema and $ref #432

Add $vocabularies, clarify $schema and $ref #432

handrews commented Oct 4, 2017 •

edited

Loading

epoberezkin left a comment

handrews commented Oct 6, 2017

handrews commented Oct 6, 2017

epoberezkin commented Oct 6, 2017 •

edited

Loading

epoberezkin commented Oct 6, 2017

epoberezkin commented Oct 6, 2017

epoberezkin commented Oct 6, 2017

epoberezkin commented Oct 6, 2017 •

edited

Loading

epoberezkin commented Oct 6, 2017 •

edited

Loading

handrews commented Oct 6, 2017

epoberezkin commented Oct 7, 2017 •

edited

Loading

handrews commented Oct 8, 2017

handrews commented Oct 8, 2017

Add $vocabularies, clarify $schema and $ref #432

Add $vocabularies, clarify $schema and $ref #432

Conversation

handrews commented Oct 4, 2017 • edited Loading

epoberezkin left a comment

Choose a reason for hiding this comment

handrews commented Oct 6, 2017

handrews commented Oct 6, 2017

epoberezkin commented Oct 6, 2017 • edited Loading

epoberezkin commented Oct 6, 2017

epoberezkin commented Oct 6, 2017

epoberezkin commented Oct 6, 2017

epoberezkin commented Oct 6, 2017 • edited Loading

epoberezkin commented Oct 6, 2017 • edited Loading

handrews commented Oct 6, 2017

epoberezkin commented Oct 7, 2017 • edited Loading

handrews commented Oct 8, 2017

handrews commented Oct 8, 2017

handrews commented Oct 4, 2017 •

edited

Loading

epoberezkin commented Oct 6, 2017 •

edited

Loading

epoberezkin commented Oct 6, 2017 •

edited

Loading

epoberezkin commented Oct 6, 2017 •

edited

Loading

epoberezkin commented Oct 7, 2017 •

edited

Loading