-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conditional selection of alternate schemas (includes "switch" and other options) #64
Comments
Based on past experience working with declarative systems that add conditionals, once you do that, you have a programming language and there's no going back. People will want more and more imperative features and it gets harder to explain why they shouldn't go in. It's pretty much the same problem you see in web templating systems that try to not be a language in order to keep logic out of the templates. As admirable as their goal is, they always tend towards being a language as time goes on, and just end up being a really unsatisfying one. In my experience working with teams learning JSON Schema, wrapping one's head around this sort of use of I also confess that I messed myself up multiple times trying to construct a switch that illustrated both the continue and the empty schema the way I wanted. I think that |
Pros:
Cons:
Weighing pros and cons I think this keyword is useful and should be added, again based on my experience of answering users questions how to deal with tricky anyOf situations. There are usually two solutions: a verbose one, with anyOf, and succinct and clear one with switch. Conditionals and iterations (=imperative) are already here, whether we like it or not: they can be implemented via combinations of allOf/not etc. This switch: {
"switch": [
{ "if": { "$ref": "if1" }, "then": { "$ref": "then1" } },
{ "if": { "$ref": "if2" }, "then": { "$ref": "then2" } }
]
} is equivalent to: {
"anyOf": [
{ "allOf": [{"$ref": "if1"}, {"$ref": "then1"}] },
{ "allOf": [{"not": {"$ref": "if1"}}, {"$ref": "if2"}, {"$ref": "then2"}] },
{ "allOf": [{"not": {"$ref": "if1"}}, {"not": {"$ref": "if2"}}] }
]
} with the assumption that subschemas in anyOf are validated sequentially (which is always the case, as far as I've seen). Switch just provides a less verbose and more efficient alternative to it. |
Actually the assumption that anyOf is sequential is not needed... They are equivalent regardless. |
Also, I suggest clarifying the validation process: https://github.com/epoberezkin/ajv/blob/master/KEYWORDS.md#switch-v5-proposal Originally filed here: json-schema/json-schema#205 |
@epoberezkin I definitely consider the imperative aspect to be a more significant problem than you do, so we'll have to get some more folks involved. I think that that adding annotations to clarify the intent of the oneOf/anyOf forms is the better route to keep things consistent. |
I am neutral on imperativeness. {
"allOf/anyOf": [
{ "if": { "$ref": "if1" }, "then": { "$ref": "then1" } },
{ "if": { "$ref": "if2" }, "then": { "$ref": "then2" } }
]
} Same as allOf/anyOf, they are de facto imperative and switch can already be implemented via these keywords (as you can see from the example), just in a more verbose and less clear way:
|
I think I like switch more though :) |
Now that However, if you don't want to adopt this issue, I think we can close it given that it hasn't attracted much commentary. If someone else comes along and wants to champion it they can, of course, re-open it or open a new one. Anyway, please comment here within two weeks if you're going to take over advocating this, otherwise I'll close it. |
I think I've posted enough reasons to adopt switch. To summarise, the advantage to use it:
So, why don't we include it? What can I do to help it being accepted? PR? |
That's evidence that something is needed, but not evidence that "switch" is the best solution. In order to have evidence for that, users would also need to be offered the other solution(s) and we would see which is preferred. What I would like to see to move this forward is more discussion of the impact of adding an imperative construct to what is currently a declarative system. I think this is a principle that should not be broken lightly. @awwright , @Relequestual , @jdesrosiers , @sam-at-github ? ( @epoberezkin feel free to page others) |
As I explained above, switch can be expressed using existing keywords. If that's the case, how it is less declarative than the current vocabulary? |
Declarative and imperative programming systems can express the same things. But they are different styles, and I believe that a consistent style is better for JSON Schema in the long run. |
@handrews @awwright how would you feel about dropping
There are ways to achieve the same validation result via existing vocabulary (anyOf etc.), but it is more verbose and it gets really messy from error reporting perspective and also from other usage perspective, beyond validation (no control about which subschemas get validated). Maybe both deserve separate issues, in which case this can be closed. |
I don't know how I feel about this yet. It seems like a giant can of worms. I think we should encourage declarative methods where possible, and take a look at #31 to see if we can get the same effect that most people who want an if/then paradigm are going after. |
Not sure what is the proposal in #31. What people want is better control of validation with the use of predicates, whatever the syntax is. anyOf/oneOf/not are no more or less declarative than if/then - they all can be seen either as boolean expressions or as imperative. Same about select that uses some predicate to choose schema to validate - it is quite declarative. I don't understand how it can be addressed within the semantics of oneOf etc. |
Thank you for raising the problem and making summary! I see essential point in minimisation of impact, because scope of the new version is now looks very massive, some changes hard to implement would be dropped. Otherwise no one could make a big leap to the next version and we will get more steps to problem solution or some everlasting draft like RFC 6455 without any power to force implementation developers to follow this draft. So if both solutions (i really appreciate Why not just to tweak how reporting is formed attaching to every failed oneOf/anyOf selector according list of reasons why every of given branches have failed? This will be enough to heuristically detect missing fields, best fitting branch e.t.c. relying on this info. Pros:
Cons:
After this we will have enough time to think about enhancements having initial issue about gaining details what was wrong in selector generally solved. |
@mrjj :
That's basically what #31 is proposing (once you get past it's slightly confusing first few comments).
This change isn't targeted at Draft 06 no matter what happens. Draft 06 core and validation are compatible with Draft 05 and mostly compatible with Draft 04 (the meaning of the "uri" format changed in Draft 05, but that's a pretty minor change). The two potentially breaking changes have been done to still allow the old forms for now, with the expectation that they will be removed in a future draft once folks have had a chance to migrate and/or we can publish a tool to assist with migration. Draft 06 hyper-schema will probably have more significant changes, but only in areas that were already problematic. |
Perfect, so i vote for #31-like solution, sorry i haven't read all relative branches till end. If approach with extending existing reporting seems viable not only for me and @awwright I'll try to help with implementation details. Concern about O complexity for all branches output seems reasonable, but it could be a matter of developers choice expressed by some kind of validator (not schema) flag "allBranchesOutput: true" or smth like this. |
It is already what all validators are doing and that generates a large number of errors in real cases that are very difficult to manage.
That is very theoretic and too "heuristic". None of the validators are doing it and it is highly unlikely that any will, as it is:
So it is very easy to vote for "the solution" that does not and most likely never will exist. @mrjj, you call #31 a "solution" and also say there are some other "like" it - I am really waiting to see at least one solution, as #31 is just a vague idea, not a solution really. So I would wait until a proposal for spec exists about which algorythm validators should be using to determine "best-fitting" branch. But it seems much less trivial problem to solve to me than introducing a keyword that provides an easy to implement, performant (O(N) and faster than anyOf etc.) and deterministic solution within the scope of validation. |
@mrjj @epoberezkin I still prefer the The "0/foo" relative JSON Pointer in the The subschema designated with {
"type": "object",
"selectWith": "0/foo",
"oneOf": [
{
"properties": {
"foo": {"enum": ["firstValue"]},
"bar": {
"type": "array",
"items": {"type": "number"}
}
},
"required": ["foo", "bar"]
},
{
"properties": {
"foo": {"enum": ["secondValue"]},
"buzz": {
"type": "string",
"minLength": 10
},
"required": ["foo", "buzz"]
}
}
]
}
|
@handrews If selectWith doesn't affect validation it's useless. If it does, it is not clear how it works. How would validator know which subschema in oneOf to use based on the value in selectWith? Is it supposed to magically guess that because there is properties keyword inside with the same property that is used in selectWith that has enum inside it with some value it should use this value? Do you realise what kind of abominable violation of JSON-schema design principle it is (when only siblings can be taken into account, not grand-grand-grand-children of a sibling as in your case)? What if there is more complex schema inside rather then just with properties? I really don't get why you prefer so complex solution instead of a simple maping of value to schemas: {
"type": "object",
"selectWith": "0/foo",
"selectCases": {
"firstValue": {
"properties": {
"bar": {
"type": "array",
"items": {"type": "number"}
}
},
"required": ["foo", "bar"]
},
"secondValue": {
"properties": {
"buzz": {
"type": "string",
"minLength": 10
},
"required": ["foo", "buzz"]
}
}
},
"selectDefault": false
} Do you see how much simpler it would be to reason about and to process? Or the theoretic purity is so attractive that the voes of implementers and users don't really matter, do they? /rant :) |
I don't know why it should be |
No, because it's not. There's nothing magical about this. With the exception of the fall-through rules (which are the things to which I object most about
|
That is not "the only difficulty", it is a major difficulty
I understand that, but you are ignoring my question: how do you determine which subschema inside oneOf it points to in general case, when absolutely any set of schemas can be inside oneOf? Do you realise that pointer to data cannot be resolved into the pointer to the schema in general case? That it is only possible for the subset of JSON-Schemas? |
Don't care at all about fallthrough, very happy to kill that idea by the way. |
And talking about real-life use cases a mapping of value to schema, as in the example above, is much more useful (and also more deterministic, as the values are unique) than the switch (where technically multiple ifs can be matched, that makes it looking more imperative). |
Without instance data, that is correct, although humans can usually eyeball it in all but the most complex schemas. With instance data, though, it is easy enough- otherwise we couldn't validate anything at all. An alternative would be to use pointers-to-schema instead of pointers-to-data. The same idea works either way, and abstract reasoning would be much easier with pointers-to-schema. I don't actually recall why I wrote it as a data pointer. It does work, but schema pointers would be simpler. |
I don't actually follow your most recent example. If For one thing, JSON Properties must be strings, which is overly limiting. |
|
"eyeball" is not really possible to implement efficiently - it requires deep traversal of all subschemas. You can't really ignore implementations to that extent. In general case it doesn't have solution at all. Imagine oneOf inside oneOf, not, etc. It should be obvious from schema itself which value maps to what, not from "eyeballing". |
@handrews it is a general observation by the way that when you propose some nice theoretic idea you get so attached to it that you don't think how it is going to be implemented. But without implementations the standard is dead. You really have to change that approach and before proposing something try imagining yourself writing actual code that implements your idea. If you can clearly see that code, then the idea is fine. If it requires some complex and non-deterministic algorithms, tree traversals, etc. then this idea is better forgotten. Your selectWith is one such idea - impossible to implement efficiently, if not at all. |
You throw this at me a lot and I'm getting tired of it. You have no basis for it, you just don't like my ideas and find it easier to attack me personally than to reason about my ideas. I have written and am writing implementations, and I am quite conscious of the challenges involved. I work in a corporate environment so I can't just toss said implementations up on GitHub to placate you. You are essentially calling me both careless and stupid, and neither is an appropriate negotiating strategy in a project such as this. |
Nope. Just preferring theory over practice. |
OK, we're done. |
You are insistent on ascribing motivations to me. You insult me for not finding your solution "obvious" or agreeing with you on what is simple. You are not behaving as a good-faith member of this community, and I will not continue to discuss this with you. |
@handrews I am sorry you see it that way. I did not intend to insult you at all. You were using a stronger language on a number of occasions, both towards myself and other people, so I am not sure why you're sensitive in this case. And you closed your own issue by the way :) |
You've done it on a number of occasions without me even commenting on it by the way. And I do reason about your ideas. You just avoid answering the questions... |
Anyway, I am sorry. It wasn't my intention to cause any offence. |
Thank you. I can re-open this if you want. I closed it because I think #31 is sufficient and I did not feel like being attacked further for an issue I don't even care that much about.
Feel free to point any out, past or present, and I will make amends. While I will express technical opinions strongly, and push back on things like user statistics that aren't backed up with research, I do try to avoid ascribing motivations and attacking intelligence. I am generally happy to be called out on such things when they happen, as they are not intended. I am particularly sensitive about the "theoretical" comment both because of what it implies about me as a person (that I am dismissive of the concerns of others) and the fact that, because of the nature of my work, I cannot currently easily disprove it. It is frustrating not to be able to show examples right now, and to have you hammer me on a topic I cannot address despite the existence of evidence to the contrary makes it very difficult for me to want to discuss anything with you. Particularly when your response to my attempt to explain what I do consider is ignored and you just repeat your accusation. There aren't that many of us who are fully active on this project- basically you, me, Austin and Ben. I know I am not perfect in my interactions, but I am trying to work out how to make it work for all of us. Please do point out when I fail at this, but please also consider trying to meet me halfway. You don't know what's going on in my head or what work I have done that I cannot publish. Please think about that before characterizing me in a particular way. |
I admit that I do ignore some of your questions. This is because you are so combative on every possible point that I have started to try to minimize our discussions. It just seems to me that after a certain point, there is no benefit to replying to you- it will provoke another round of argument without advancing either side. And I'll probably lose my temper and say something regrettable. So instead I just stop replying. Perhaps we can both work on dialing it down a bit? :-) I would like to keep all of the discussions going, but I am worn down by the forcefulness of every single discussion. I can't do but so much about everyone else, but for myself given the choice between escalating and dropping out, I am increasingly choosing to drop out. Escalating wasn't working. |
You wrote: "I figured out that you have an excessively complex mental model for $ref but that didn't change my point of view that I find it extremely simple to work with and implement" That was a bit too personal. But that's fine, no offence taken. So far my mental model is the only one that passes all the tests in JS land... ;)
Sure, thank you. I will do my best...
I will try to be more patient. I was just getting a bit frustrated because our interactions reminded me your other conversations, when you are asking questions and get some unrelated answers...
It doesn't imply anything, from my point of view, at least. Some people start from practical considerations, some from theoretic, I wasn't judgemental about it at all, both approaches work... I was just hoping that highlighting it may help understand where the difference comes from... A bit of "British Parliament style" discourse was unnecessarry though, sorry.
I am happy to agree with #31 provided some specific and efficient, both from performance and implementatoin perspective, algorithm of mapping a value to a subschema is proposed and agreed. So I'd appreciate if we considered an explicit mapping or some other ideas in this issue a viable alterntive until then. |
Blah. Not my finest moment. You are right, it was inappropriate and I apologize.
I think it's safe to say we often have different points of view :-) But I accept that that was not your intention and will do my best to keep that in mind going forward. There's a bit more going on this weekend that has me on edge (I'll send you an email- not worth going into on GitHub) and I'm sorry that has made my temper worse than usual. Let's pick up the |
By the way, I keep saying that I'd rather have a much simpler if/then pair of keywords than switch, multiple such pairs can be combined with anyOf/allOf/oneOf providing simpler alternative to switch. |
The Problem (and current workarounds)
A common use case is to select an overall validation schema (or schemas) based on how a small subset of the instance validates. In the simplest case, some property in the instance is checked against a set of literal values, and the overall validation schema(s) are chosen based on that literal value. This is generally implemented with
oneOf
(oranyOf
) andenum
(although see also theconstant
proposal in issue #58 ). Some forms of this problem may also be solved withdependencies
.Note: Throughout this proposal, the elements of the
oneOf
/anyOf
lists are referred to as branches, as in this case they are being used as implicit (or in some options, explicit) conditionals.Single selection with
oneOf
This can be read as "if foo is firstValue, bar must be present and must be a list of number, otherwise if foo is secondValue, buzz must be present and a string that is at least 10 characters long":
One difficulty with this approach is that the cause and effect are not clear. This could just as easily be read as "If bar is present as a list of numbers, foo must be set to firstValue. Otherwise if buzz is present and a string of at least ten characters, foo must be set to secondValue."
This is both a strength and a weakness. The
oneOf
construct an capture complex alternatives, but does not clearly express the idea of one part of the schema being the determining factor. In a small schema like this, either interpretation is easy to read, and it’s easy to spot the enum and guess that it is probably the determinant. In a more complex schema, where maybe there are two values of the enum that select one branch and one in the other, that is much less easy to spot.Multiple selection with
anyOf
Here is a similar example using
anyOf
with, where foo being set to multiSelect can potentially validate against two branches (although is only required to validate against one or the other). If foo is set to singleSelect it must validate against the lone schema that accepts that value. Multiple matches are prominently addressed in one of the proposals so we will use this example of consider the options.In order to require validation against both "multiSelect"-designated schemas, an
allOf
must be introduced:Single selection with
dependencies
This schema validates the same set of instances as the schema above that uses
oneOf
:Note that
dependencies
can only specify things based on the presence or absence of properties, so the "if bar is present, else if buzz is present" interpretation must be used for this approach. In some cases, that is exactly what needs to be expressed, but it seems to be more common to use a value as the determinant rather than the presence or absence of a particular property.If the difference between foo being set to firstValue or secondValue was a difference in exactly how bar is validated (and buzz was not part of the schema at all), then the
oneOf
approach still works just fine, but thedependencies
approach is impossible.Multiple selection with
dependencies
Again, the logic is inverted from the most intuitive reading, with the presence or absence of the other properties determining the value of foo. Since (in this multi-select example) "foo" is the only required property, it’s just about possible to make out the intention that "foo"’s value determines how "bar", "buzz", or "zippy" is validated. But it is arguably substantially less clear than the
anyOf
example, and as with the single selection example,dependencies
cannot handle selection based purely on a value.The proposals
There are two possible approaches, one of which has two variants:
switch
validation keyword, more or less as seen in many programming languages. Originally proposed by @geraintluff (with additional discussion in the old repo).bounding
, and I am proposing an alternate syntax here.Clarifying intent with an annotation property
This approach does not change validation at all. Rather, it adds one or two annotation properties that allow schema readers or documentation generators to understand the intent of the schema author for how branches are selected.
selectWith
: pointers from outside the branchesselectWith
is an annotation keyword that appears at the same level as aoneOf
oranyOf
. It is either a single Relative JSON Pointer or a list of them. The pointers indicate which properties (or array indicies, for that matter) are intended to determine which branch of theoneOf
(or branches of theanyOf
) is/are taken.The values must allow Relative JSON Pointers (which include regular JSON Pointers) in order to allow a schema to be included in another schema as a child schema. Otherwise, the pointer would need to always have the correct full path, severely limiting re-use capabilities. The pointer is resolved with respect to the instance structure.
As an annotation property,
selectWith
cannot affect validation. Setting it to point to a non-existent property is legal and does not produce an error (following the general principle that nonsensical schemas are valid). Setting it to a property that will only exist on some branches is also possible and to be expected. Unspecified but allowed instance properties/array elements by default have a blank schema, allowing anything.Here is our single-select example rewritten with
selectWith
:Recall that the pointer is relative to the instance structure, so "0/foo" so this reads "the schema used to validate this instance property are the ones which determine which branch is taken.
The
selectWith
for the multi-selectanyOf
would be identical.selector
: booleans within each branchselector
is an alternative syntax directly derived from @mrjj’sbounding
proposal (so called because it put bounds on what parts of the schema needed to be fully processed, and therefore constrained error reporting only to the most relevant branches ofoneOf
/anyOf
constructs).Instead of one annotation keyword at the top,
selector
is a boolean annotation keyword that may appear anywhere within child schemas in a branch. Ifselector
is the effect is essentially the same as putting a pointer to that location inselectWith
.The only difference is that
selectWith
pointers are applied to all branches, whileselector
can be placed in different locations in different branches (and some branches my not have anyselector
). However, since unspecified properties/array elements have a blank schema (allowing anything) by default, the end effect is the same. The validation outcome remains unchanged no matter which proposal is used.Here is the single select example using
selector
:@mrjj’s original purpose with
bounding
was to narrow the scope of validation and therefore produce more specific errors. The approach is simply to validate anything marked"selector": true
first, as anything that fails the selector validation will fail validation of the entire branch, so it is not necessary to proceed further (or report errors related to) that branch.selectWith
andselector
comparisonWhile they may produce slightly different short-circuit validation behavior, neither of these change the validation outcome.
selector
appears within the schema doing the selecting, which makes its effect obvious as soon as you spot it. On the other hand, spotting the selectors scattered throughout a complex set of branches is tedious and error-prone, and implementations will need to walk the branches and locate all of the selectors before being able to use them for short-circuit validation or anything else.selectWith
requires a bit more interpretation for humans who may have to eyeball how a long JSON pointer actually lines up with the branches. However, all selectors are gathered in one place and can be used as soon as they are encountered.It would be possible to use both, for flexibility (which is why I gave them different names- something may be a
selector
for an outeroneOf
while specifying aselectWith
for an inneroneOf
. I feel like having both adds complexity without providing much gain.I am obviously biased, but I prefer
selectWith
simply because it reads much more clearly from the top down (assuming you put it above your branches). It clearly says "These fields are intended to determine which branch should validate." You can then look across the branch schemas and see what the selection conditions are. Which might be a bit tricky if the branches are complex, but no more so than trying to spot theselector
keywords.selectWith
also more closely matches how a documentation generator would use it- the documentation would reference it as part of the description of the whole branch set, so withselector
it would have to collect them into essentially theselectWith
value anyway.switch
(much of this section’s wording is copied directly from @geraintluff)
The purpose of the
switch
keyword is to express a series of conditional relations: "If A1 then B1, else if A2 then B2, else ...".Values for
switch
The value of
switch
is an array. The entries in the array must be objects, each containing:then
: a schema or a booleanif
: a schemacontinue
: a booleanValidation of
switch
For each object in the
switch
array:if
is specified:if
, then continue to the next item inswitch
then
is a boolean:then
isfalse
, then validation failsthen
is a schema:then
, then validation failscontinue
is set to booleantrue
, then:switch
switch
examplesHere is our regular single-select implemented with
switch
:And here is our regular multi-select. (Since nothing but foo is required and additional properties are allowed, it’s a bit silly to specify "bar" and "buzz" at separate schemas but pretend they are two schemas that make sense to do the sway because I don’t want to go redo all of the examples).
This actually isn’t very interesting because since the two branches associated with a foo of "multiSelect" are more concisely managed with an inner
anyOf
, the switch can once again only choose one of its conditions. Here is a more complex example adapted from the original proposal:In this example, if there is a yellow indicator, there must also be a warning message.
Whether there is a warning or not, a high enough
"powerLevel"
requires"disbelief"
, otherwise it requires `"confidence".Since the "indicator" branch specifies
"continue": true
, we go ahead and check the other conditions whether we have a yellow indicator or not. And since continue is not specified on the "powerLevel" branch, if we match that condition we will not examine the remaining branches.Because the last branch does not have an "if" schema, it will always match if we reach it. So the only we we do not match it is if we match the minimum power level, as that will end the processing of the switch before we consider the final branch.
Additionally, the point of allowing
then
to be a boolean is to provide a concise expression to say that the data must be one of the supplied options, e.g.:Comparing the options
selectWith
/selector
:anyOf
/oneOf
any more intuitiveswitch
:The text was updated successfully, but these errors were encountered: