-
-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain format (and content*) more clearly #646
Comments
The spec could make it even clearer that As for |
@johandorland for
I want it to be easier for implementation authors to say "yes, we implement it up to this point, but if you want more you can add it easily". |
@handrews In that case we're on the same line. It's how I interpreted the spec already, but after reading up a bit I can imagine not everyone does. |
@johandorland thanks, this will all help make the wording more clear! |
clearer* |
I like the idea of supporting validation for these loose keywords via "plugins." It's not something I had considered. It'll be available in my next version! |
On the topic of email regex, ya'll gotta read https://www.regular-expressions.info/email.html On the topic of this issue: |
I'm surprised that no one's mentioned #563 yet. Seems pertinent to me. I think implementations allowing their clients to append/modify the format validation that is available is the way to go. I'm taking a page from ajv's book and opening up the format validation so that, while I provide some stock validations, my client can define their own or even override the ones I have to suit their specific needs. |
@gregsdennis if you have any concerns on the implementation requirement wording around |
Concern about the
It would be good to go into more detail. "7bit" is possible in JSON but useless (would have to reject validation of string has any character with a high bit set); "8bit" and "binary" are not possible in JSON. "base64" is useful. "quoted-printable" is possible but is there any use for it? Extension tokens are possible. Can we say that if contentEncoding / contentMediaType are supported by a validator, then contentEncoding MUST support base64, and it MAY support extension encodings or 7bit/quoted-printable (through plugins) and 8bit/binary are NOT ALLOWED? |
@ebolwidt this keyword (and |
@handrews I see what you're saying. I'm not so much interested in contentMediaType validation, which is probably a too broad topic in any case - but in production application, I'm using base64 quite often and validating that a string is valid base64 is useful. It seems it could be done with a regex (https://stackoverflow.com/questions/475074/regex-to-parse-or-validate-base64-data) although not clear if this caters to every possible corner case - but it would be very useful to make a validation assertion easily with either contentEncoding or format that a string should be valid base64 data, like |
@ebolwidt this is getting into a very deep topic that will be a focus in draft-09 which is what to do with the fact that the There are some possible ideas around extending draft-08's vocabulary concept to help manage this, but it has been punted to draft-09 because it's really not obvious what the best option is. We will come back to this over the next few months. In the meantime, if you want to guarantee validity, using |
From what I am reading, it appears that you all may be moving away from using Currently (draft-07) the Regardless of where you end up going with (As for why I care, my company had code using a JSON schema validator library blow up today because we have customers with leading digits in their email addresses. Our library had used their hostname validator for the domain part of the email validator, but their hostname validator was only compliant with RFC 1034. Technically this should have been allowed by their email address validator, but the inconsistency in the spec caused an unfortunate shortcut.) |
@bvisness I haven't the foggiest idea why RFC 1123 wasn't referenced, that predates the involvement of any of the current spec maintainers. That's easily fixed in the next draft. As far as "moving away from using Basically, "optional validation" is horribly confusing, but for some formats (notably email) truly reliably validating them is quite burdensome. |
Makes sense. Honestly I think I would prefer if JSON schema didn't have any format validation at all - I might even prefer if it didn't have Obviously, as a user, it's convenient to have someone else do all the validation work for you. But since it appears the long-term "fix" is all this meta-schema and |
@bvisness the idea is that the majority of people will just keep referencing meta-schemas with Presumably, if you want to make up your own keywords and have other people implement them the same way, you're willing to dig deeper into how it all works. If you want to make up keywords and don't care if anyone else understands them, you can keep doing that how you do it now (basically, hardcode stuff in a private implementation). Classifying |
Having reviewed this and taking another look at #54 , I think THAT issue might be a way to resolve this, but it's going to require a lot more sounding out and chatting to implementers and schema authors than our schedule for draft-8 allows. As such, I feel this should be shifted to draft-9, but with the assurance that draft-9 will look to look for a well considered general consensus solution. |
OK I'm going to do something about this leveraging vocabularies (sorry @Relequestual I know I pushed you to move it out to draft-09 but I think my original intentions here need to be handled now. Other folks added a bunch of stuff here, and if those are still relevant after draft-08 goes out they will need to be filed separately. |
[this is a bit stream-of-consciousness, but I wanted to get it filed because I keep forgetting- we'll clean up the ideas here on the way to PRs]
format
confuses pretty much everyone. I have noticed people filing issues against various implementations complaining of imperfect enforcement (I believe @Julian has received complaints about "email", and @johandorland about "hostname", and I suspect they are not alone).format
,contentMediaType
, andcontentEncoding
are essentially best effort validation keywords in practice. Many if not most implementations make at least some effort to validateformat
. I'm not sure if anything attempts that forcontent*
as they are new (at least as part of the validation spec), and they would essentially require parsing the string encoding and media type which is potentially very expensive.Complicating the matter for
format
is the fact that many of the relatively fundamental internet-related formats such as "email" and "hostname" are very old, and conformance to specifications is rather complicated.For "hostname", RFC 1034 forbids leading digits, but this is sometimes ignored in practice, leading to ambiguous overlap with "ipv4" as a format. In practice, most programs that accept hostnames will also accept ipv4 addresses and just recognize that no DNS resolution is required, so this is rarely a concern.
The difficulty of validating email addresses, even on the syntactical level, is well-documented (try finding a regular expression that will do it, for instance, and if you find an actual iron-clad one, let me know).
Leveraging our relatively recent keyword classification work, I think it is best to classify these primarily as annotations rather than treating them as some sort of hybrid annotation+assertion. Annotations can specify any intent, including semantic validation or parsing instructions. The specification should provide guidance on how an implementation might directly offer handlers for such intents, and how to indicate the available level of support.
Applications can, as with any annotation, then perform additional processing if the implementation either does not offer any validation, or offers only incomplete validation. The spec already says that implementations SHOULD offer an ability to turn semantic validation off, so we can extend that guidance (probably at the MAY level) to cover situations like allowing hooks for application-defined processing in addition to or in place of implementation-supplied validation.
And of course, all of this is dependent on an implementation supporting annotations. As with the
additionalProperties
andadditionalItems
keywords (now in the core spec and defined in terms of annotation collection), the spec should allow for the existing sort of implementations to continue to be valid and in conformance for implementations that do not implement general annotation collection support.The text was updated successfully, but these errors were encountered: