Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format for non-string data types (complex validation rules) #759

Closed
m-mohr opened this issue Jul 9, 2019 · 4 comments · Fixed by #764
Closed

format for non-string data types (complex validation rules) #759

m-mohr opened this issue Jul 9, 2019 · 4 comments · Fixed by #764
Assignees

Comments

@m-mohr
Copy link

m-mohr commented Jul 9, 2019

Transferred from the JSON Schema Slack Channel...

Background: I'm working on a project, which passes workflows in JSON from clients to servers. These workflows we mostly validate with JSON Schema and (aim to) do the complex validation with format (see also https://github.com/Open-EO/openeo-processes/issues/67). Therefore, we have quite an extensive list of formats. Some examples:

  • collection-id checks in the database whether the specified dataset exists and can be used in the workflow. It's a string and the validation with ajv works well.
  • epsg-code check against the EPSG code database. It's an integer and doesn't work with ajv as ajv only supports format on numbers and strings.
  • We also have formats for object and arrays. These usually need to check the whole object or array and not individual items. For example, one property influences the validation of the second property or I'd like to check for two-element arrays that it doesn't contain two null values.

Specifying format on non-string data types doesn't seem very uncommon. OpenAPI also has formats for non-string types, for example int32/int64 for integer and float/double for number.

Problem: The JSON spec itself only defines formats on string values. Also, custom formats is described in the "Understanding JSON Schema" documentation as part of the string data type. This seems to lead to the problem that several libraries only implement custom formats for strings (or a subset of JSON data types). For example, ajv currently only supports format to be defined for strings and numbers, see also ajv-validator/ajv#1039. kin-openapi also only supports string formats (yes, it's OpenAPI, but seems to be the same problem).

In contrast, https://json-schema.org/latest/json-schema-validation.html#format doesn't mention this restriction and explicitly mentions for individual types that they apply for strings ('These attributes apply to string instances.'), which would not be required if format is only defined for strings anyway. Therefore, I couldn't really figure out from the spec whether format meant to be only for strings or for all data types.

Proposal: Therefore, I have asked on the Slack channel for a clarification in the spec (see the discussion).

@handrews wrote:

Technically you could do all sorts of things with format, and yes, technically implementations that only support format with strings have a bug.
BUT, format is problematic because it was an early, somewhat half-assed attempt at extensibility for complex semantic concepts. It's not consistently supported (or, as you noted, always correctly supported), and there is a tremendous amount of debate over what is really required for proper support.
Now we have the concept of annotation keywords, which can signify arbitrarily complex concepts to applications without dealing with format's problems.

So it seems there should be a clarification in the spec that format is allowed for all data types. This would clarify and help to convince implementers to support format for more data types.

Alternative: In the Slack discussion, an alternative was discussed. I was proposed to use custom keywords, which seems to be supported by ajv (JS) and Manatee.Json (.Net), but I'd still want to figure out how well supported these are by the libraries out there (e.g. in Python land). I'd be happy to switch over to custom keywords if they offer the same functionality and are well supported.

It was also suggested to deprecate format in favor of custom keywords. @handrews wrote:

Actually, I really like this idea. format causes nothing but pain, really. For users, maintainers, and for us. It would still be supported in the forthcoming draft, but deprecated to encourage usage of the vocabulary concept. Which would be great because then maybe people would be motivated to use vocabularies and give us feedback on them
We could say that format will not be removed unless/until vocabularies prove to be an adequate replacement.

(Although the alternative seems to be a good way forward, clarification regarding format still would be worth the effort, I think.)

For the full discussion, please see the Slack channel. My quotes are only short excerpts.

@handrews
Copy link
Contributor

handrews commented Jul 9, 2019

Problem: The JSON spec itself only defines formats on string values.

This is absolutely not true. JSON Schema does not restrict the sort of values that format can be used with. You should file bugs with Understanding JSON Schema, AJV, etc. for any claims that format cannot work with non-strings, or failure to allow that. This is not a spec problem, it's a problem with other projects not understanding the spec.

@Relequestual
Copy link
Member

it's a problem with other projects not understanding the spec.

Or plainly lazy implementation.


I discussed this on slack and suggested filing an issue. I agree it's an implementation issue, but I agreed it could be made clearer in the spec that this is possible, without having to say it explicitly.

@m-mohr
Copy link
Author

m-mohr commented Jul 9, 2019

@handrews Wait, that's a misunderstanding. My intention was to say that all pre-defined formats in the spec (e.g. uri) are for string types. There's no pre-defined format for any other data type in the spec. That could lead to confusion as people may thing it's meant to be only for strings.

Having that said, I assumed "Understanding JSON Schema" is somewhat part of the official spec as it's hosted at https://json-schema.org - I can fill an issue for it, but I don't know where to do that (edit: found it, opened an issue: json-schema-org/website#187). I also filled an issue already for ajv (see referenced issues above), but I didn't had clear docs that helped me prove my point that implementations should/could allow format for non-string types.

@Relequestual Fully agreed. That's the only thing I'm personally aiming for, a clarification.

This is not a spec problem, it's a problem with other projects not understanding the spec.

If devs don't understand the spec it is eventually a spec problem. I'm assuming devs are somewhat intelligent... ;-)

@handrews
Copy link
Contributor

This will get taken care of by my fix for #646.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

3 participants