Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add $vocabularies, clarify $schema and $ref #432

Closed
wants to merge 3 commits into from

Conversation

handrews
Copy link
Contributor

@handrews handrews commented Oct 4, 2017

This addresses issues #314 and #431.

"$schema" is now explicitly intended for meta-schema declaration.
An explanation for why it MUST only be present in root schemas
has been added so we (I) don't forget about it and freak out (again).

The concept of vocabularies is now introduced explicitly, and
the "$vocabularies" keyword is introduced to declare vocabulary
support. For compatibility (and simplicity in the case of
a single standard meta-schema conveying sufficient information),
omitting "$vocabularies" in the root schema causes it to behave
as if the "$schema" value is listed as a vocabulary. This preserves
all existing behavior.

Finally, a paragraph is added to "$ref" clarifying the conceptual
model, specifically with respect to "$schema" and meta-schema
validation.

$schema is now explicitly intended for meta-schema declaration.
An explanation for why it MUST only be present in root schemas
has been added so we (I) don't forget about it and freak out (again).

The concept of vocabularies is now introduced explicitly, and
the $vocabularies keyword is introduced to declare vocabulary
support.  For compatibility (and simplicity in the case of
a single standard meta-schema conveying sufficient information),
omitting $vocabularies in the root schema causes it to behave
as if the $schema value is listed as a vocabulary.  This preserves
all existing behavior.

Finally, a paragraph is added to $ref clarifying the conceptual
model, specifically with respect to $schema and meta-schema
validation.
This modifies the meta-schemas to explicitly forbid "$schema"
in subschemas, and adds "$vocabularies".
Note the intended use of the meta-schemas includes both "$schema"
and "$vocabularies".
Copy link
Member

@epoberezkin epoberezkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe $vocabulary should always be a single string, rather than array, so multiple vocabularies should not be allowed in the schemas. I also believe that $vocabulary should only be allowed in the root schema. I will elaborate in a separate comment.

@handrews
Copy link
Contributor Author

handrews commented Oct 6, 2017

I believe $vocabulary should always be a single string, rather than array, so multiple vocabularies should not be allowed in the schemas. I also believe that $vocabulary should only be allowed in the root schema. I will elaborate in a separate comment.

The entire purpose of $vocabularies is to clearly support multiple vocabularies, for instance validation + hyper-schema + UI generation + some custom vocabulary.

The nature of vocabularies is that they build on each other, so in order for a general system to tell whether their is a usable vocabulary, they need to know each vocabulary in the stack.

You are essentially saying that you want $vocabularies to be exactly like $schema, which is pointless. $vocabularies was specifically designed to not impose any file-wide requirements on schema processing.

@handrews
Copy link
Contributor Author

handrews commented Oct 6, 2017

@epoberezkin I could see an alternative where meta-schemas declare which vocabularies (plural) they describe. In this approach, implementations would examine the meta-schema to look for recognizable vocabularies, and instances would declare $schema exactly as in draft-06.

I would be just as happy with that approach.

In both approaches, a conforming implementation that only wants to support validation with no extensions can almost entirely avoid "$vocabularies" as the meta-schema is sufficiently well-known.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 6, 2017

@handrews, I definitely appreciate the progress, there are quite a few things in this PR we agree on. I will try to summarise what we agree on, please correct me if I am wrong:

  • $schema has a purpose to define meta-schema that can be used to validate the schema. Historically it's used to define the vocabulary as well, but given that the meta-schema can be extended (even though the mechanism is very verbose due to unsolved questions with adding properties in recursive cases), it is in general unfit for the purpose of defining vocabulary.
  • $schema must only appear in the root schema, as it defines the JSON schema that should be used to validate the current schema as JSON instance
  • $ref should not be seen as inclusion as the included fragment can belong to another schema file (=JSON instance) and therefore it can require a different meta-schema to validate.
  • $vocabulary keyword should be added to define the vocabulary of the schema.

Things we do not agree on (again, please correct me I am wrong):

@handrews:

  • schema can define multiple vocabularies ($vocabulary is array of strings)
  • $vocabulary can be used in subschemas
  • $vocabulary is URI (of what? spec or meta-schema? if the latter will it be the same as $schema in most cases?)

@epoberezkin:

  • schema needs to support only one $vocabulary (it is a string); but a vocabulary can be extending another vocabulary (in a way hyper-schema extends validation vocabulary)
  • $vocabulary can be used only in root schema
  • $vocabulary is a descriptive identifier defined in the respective I-D (e.g "validation", "hyper-schema", "ui-schema", etc.)

Arguments to allow separate vocabularies in subschemas and allow multiple vocabularies

  • you can package multiple schemas in the same file conveniently
  • you can have schemas that both generate code and UI (e.g.)
  • anything else?

Arguments to NOT allow vocabularies in subschemas:

  • validation concern. Each vocabulary assumes a specific meta-schema for validating the schema, whether the standard or extended meta-schema for this vocabulary is used. If the subschema defines a different vocabulary from the root schema, then it is either not validated (in case $schema allows additional properties) or fails validation (in case $schema prohibits additional properties). @handrews, do you have a solution to this problem?
  • processing concern. In practice, different vocabularies are implemented by different libraries. If the vocabulary is defined only in the root schema, then it is very easy for application to determine which library should be used for a given file. If the vocabulary can be re-defined (or even worse, extended) in any sub-schema) than one of the following should happen:
    1. application should perform schema traversal, which is not trivial (see json-schema-traversal I had to abstract from ajv)
    2. each (!) library becomes responsible for processing this keyword and somehow notifying the application that some part of the schema should be processed by another library.
    3. a special routing library should be used
    4. Some library that implements all vocabularies should be used - I find it highly unlikely that there will be such a library.
    5. @handrews any workable idea you have in mind? All above seem too complex for practical purposes...

Arguments to NOT allow multiple vocabularies

  • validation concern. I really hope we will eventually be in a place that to use any custom keyword one MUST use extended meta-schema. It will prevent hours spent debugging schemas with mis-spelled keywords. Once we have an effective, elegant and agreed mechanism for schema extension it will be very easy to add additionalProperties: false to all meta-schemas. Multiple vocabularies make this pragmatic approach impossible.
  • processing concern. same as above, it makes it much more difficult for the application to determine how a given schema file should be processed.

So I may be missing something, but I really don't see how a questionable convenience of being able to package multiple schemas in a single file (particularly given that a simple alternative approach is possible - just define a package of schemas in the spec) can outweigh the above concerns.

@epoberezkin
Copy link
Member

@handrew please hold on - writing more, answers to your questions :)

@epoberezkin
Copy link
Member

The nature of vocabularies is that they build on each other

I completely agree with that statement

, so in order for a general system to tell whether their is a usable vocabulary, they need to know each vocabulary in the stack.

Not necessarily, as the extended vocabulary knows which vocabulary it extends. hyper-schema extends validation. ui-schema may extend validation. Mixing ui-schema and hyper-schema - really? Do you have a use-case?

You are essentially saying that you want $vocabularies to be exactly like $schema, which is pointless. $vocabularies was specifically designed to not impose any file-wide requirements on schema processing.

No, I don't say that. meta-schema can be extended. The whole purpose of using vocabulary is to define which library should be used.

I could see an alternative where meta-schemas declare which vocabularies (plural) they describe. In this approach, implementations would examine the meta-schema to look for recognizable vocabularies, and instances would declare $schema exactly as in draft-06.

That works for me too, as they are linked. Probably it is even better. But it means that the vocabulary cannot be changed in subschema.

@epoberezkin
Copy link
Member

Once we have an effective, elegant and agreed mechanism for schema extension it will be very easy to add additionalProperties: false to all meta-schemas

By the way, quite a few people asked how they can prohibit additional properties in the meta-schema.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 6, 2017

I could see an alternative where meta-schemas declare which vocabularies (plural) they describe. In this approach, implementations would examine the meta-schema to look for recognizable vocabularies, and instances would declare $schema exactly as in draft-06.

The only problem for that approach is that the meta-schema is a schema that should validate itself. So we could say, schema MAY include $vocabulary, but in this case it MUST have $schema and the $vocabulary in the schema should be the same as in the meta-schema. The meta-schema for any vocabulary would then look like this:

{
  "$schema": "some_uri",
  "$vocabulary": "whatever",
  "type": ["object", "boolean"],
  "properties": {
    "$schema": {"type": "string", "format": "uri"},
    "$vocabulary": {"const": "whatever"},
    "etc.": {}
  },
  "dependencies": {
    "$vocabulary": ["$schema"],
    "etc.": []
  }
}

Or maybe it's fine to have $vocabulary without $schema as well, we can say that both $schema implies $vocabulary and vice versa, and both MAY be used too but if so they MUST match (in which case "dependencies" above won't be needed).

Ok, now I am done writing...

@epoberezkin
Copy link
Member

epoberezkin commented Oct 6, 2017

One more thought.

I actually agree that in the future we may need to be able to mix multiple vocabularies that can be used both separately and together. I hope that by then we will have a mechanism that allows to define a separate meta-schema by mixing individual meta-schemas. When (and if) we find ourselves in such predicament, nothing would stop us to allow a $vocabulary to be an array of strings as well.

I just don't think we are there now and we may never get there, as at the moment we only have two standardised vocabularies: "validation" and "hyper-schema", the latter already inherits from "validation", it cannot be used WITHOUT it.

Once we have mixable vocabularies together with meta-schema extension mechanism I would be very happy to support $vocabulary as array. At the moment we have neither. So why don't we keep things simple for now?

@handrews
Copy link
Contributor Author

handrews commented Oct 6, 2017

@epoberezkin yeah, I think we can sort this out :-)

As you observed from the proposal to move $vocabularies off into the meta-schema, I can drop the per-schema aspect. I want to think on it a bit more, but right now I doubt I have a compelling enough use case for it to push it. The only time you wouldn't just declare all vocabularies in a schema in the root (or meta-schema) is because you have vocabularies that somehow conflict. And... just don't do that.

Pushing the conflicting bits out into separate files and $ref-ing them is an acceptable workaround.

Not necessarily, as the extended vocabulary knows which vocabulary it extends.

I'm not worried about an extended implementation knowing its base. It has to (if only to delegate it to another library). What I need is for a base implementation to understand an extended vocabulary.

If I have a "handrews-hyper-schema" vocabulary, what I need is for my implementation to be able to recognize my own private extensions. However, I MUST NOT expect peer implementations to support them. What I need, then, is the principle of graceful degradation: If the standard hyper-schema vocabulary is explicitly declared (in the schema or meta-schema), then a standard hyper-schema library can make use of my extended schemas.

I cannot think of any way to provide this without explicitly listing all of the schemas. We can explicitly list them in some form other than a flat list, but that is by far the easiest to handle. applications don't always validate or even have a copy of the meta-schema, and may not even be able to download one.

Mixing ui-schema and hyper-schema - really? Do you have a use-case?

Um... yes. HATEOAS-driven UI. It's one of the primary use cases I have for hyper-schema. If there is no standard UI vocabulary, I'll do it with a custom one.

when we have a mechanism that allows to define as separate meta-schema by mixing individual meta-schemas

I am only willing to accept this as a reason if you have a workable proposal now. I have not heard anything. $vocabularies solves the problem, and your objection (aside from the root schema only thing, which I'm fine changing) is basically that you don't need it. You aren't doing a lot with vocabularies, but I am.

So I have a solution to this problem and you don't. Why should I throw away my solution for your vaporware that may or may not ever materialize?

I just don't think we are there now and we may never get there, as at the moment we only have two standardised vocabularies: "validation" and "hyper-schema", the latter already inherits from "validation", it cannot be used WITHOUT it.

No, you only see those. We also have three proposals, at least one of which has a de-facto implementation (json-schema-form). You also have no idea what I might be planning to do with schema vocabularies that I'm not proposing a standard vocabulary for (because not all vocabularies need to be standard).


With all this in mind, I prefer that $vocabularies be present in the root schema of the (for lack of a better term) instance schema. We should not require the meta-schema to be accessible.

@epoberezkin
Copy link
Member

epoberezkin commented Oct 7, 2017

All you problems can be either solved with vocabulary extension (in which case you don't need multiple vocabularies) or with allOf, where different subschemas can be references to other files.

I am well aware about the progress of other vocabularies, but none of them is a published draft at this point, so adding features to the core to support them without having them published is premature.

You are ignoring the main question - how the meta-schema combining multiple vocabularies should be constructed. Once we agree on $merge or any other option from your vote-a-rama this problem will be solved.

I see absolutely no problem with you using multiple vocabularies even if only one is specified in the spec - there are many people using $data, for example, without any problem. I keep saying that the usage practice should precede the spec. So at this point we really only need $vocabulary, singular. Later we can allow $vocabulary to be plural by allowing an array of strings (same as with "type").

The argument "I need it, you have no idea what I might be planning to do with schema vocabularies, and therefore it should be added to the spec" is neither proper nor convincing. Firstly, these plans should be explained and discussed to a wider community. Secondly, I would like to see substantially more support for mixing multiple vocabularies at this point, before we have more than 2 vocabularies standardised, when the second is the extension of the first (so no mixing is needed).

To summarise, I am categorically against supporting mixing schema vocabulary at this point of JSON schema evolution - there is no proven need for it.

@handrews
Copy link
Contributor Author

handrews commented Oct 8, 2017

@epoberezkin I am going to talk with the other active project members before continuing.

@handrews
Copy link
Contributor Author

handrews commented Oct 8, 2017

I'm also just going to close this for now. I don't think the conversation is productive for others to read at this point, and I want people to focus on the hyperschema rewrite.

@handrews handrews closed this Oct 8, 2017
@handrews handrews deleted the vocab branch August 23, 2019 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants