Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split "id" into "$base" and "$anchor". #155

Closed
wants to merge 1 commit into from

Conversation

handrews
Copy link
Contributor

This shows how #149 could be addressed. I wrote it quickly as an example, so if we go this route it probably needs some tweaking. It just seemed more clear and concise to put this up rather than try to go into more detail in the issue.

This is intended to separate the two use cases of "id" in order to simplify
the definition of each keyword and clarify the intent of schema authors.

Additionally, use the "$" prefix for core keywords, as already discussed
in the prior decision to change "id" to "$id".

This is intended to separate the two use cases of "id" in order to simplify
the definition of each keyword and clarify the intent of schema authors.

Additionally, use the "$" prefix for core keywords, as already discussed
in the prior decision to change "id" to "$id".
@epoberezkin
Copy link
Member

Assuming $anchor would not change resolution scope and $base can be used in the root of the document?

@handrews
Copy link
Contributor Author

"$anchor": "foo" is equivalent to <a name="foo"></a> in HTML. It just creates an identifier that can be used as a plain name fragment, and fragments cannot affect the resolution base. So you are correct: $anchor would not change the base.

$base can, as written, be used anywhere. It's essentially $id without fragments, or more accurately specifying that fragments MUST be ignored since it is for absolute URIs or URI references that, when combined with the previous base, produce absolute URIs. I gave it a different name to keep things less confusing (hopefully), but that name is certainly negotiable.

@scranen
Copy link

scranen commented Nov 22, 2016

Since $base now only changes the resolution scope, I can no longer inline external schemas and reference them by their original URIs. Is that correct?

Maybe you can help me understand the proposal by explaining what happens in the following example. If http://foo.bar resolves to

{ "$base": "http://bar.baz",
  "$anchor": "a" }

Then which of the references in the following schema would we expect to be resolved, and why?

{ "definitions": 
  { "a": { "$ref": "http://foo.bar" },
    "b": { "$ref": "http://foo.bar#a" },
    "c": { "$ref": "http://bar.baz" },
    "d": { "$ref": "http://bar.baz#a" } } 
}

@handrews
Copy link
Contributor Author

@scranen all of those $refs should resolve to the same schema. It should be exactly the same as if http://foo.bar resolved to {"id": "http://bar.baz#a"} under v4 rules. Is that what you expected/wanted?

I did not intend to break or change any functionality, just reorganize it, so if it looks like I broke something that's a bug in my write-up. Let me know where the confusing language is.

@scranen
Copy link

scranen commented Nov 22, 2016

The new text only says that $base changes the resolution scope, it does not mention anymore that it also signifies that the document in which it occurs should be the result of resolving that URI. I think that should be added back in some form. As a result, no two unequal subschemas may define the same $base URI.

I believe that the mechanism you propose would work, but I have a couple of remarks:

  • The weakest condition to guarantee well-defined semantics of this mechanism is to require that the same $base may only occur in equivalent subschemas or referenced schemas. This is quite a heavy implication for a parser. Maybe the spec should mention that a parser is allowed to give up if it encounters the same $base URI twice in different locations. The parser must be obliged to return an error if two different documents with the same $base are encountered.
  • The fact that all the $refs in the example above should be resolved, means that you are forcing a parser to do some kind of fixpoint computation: unresolved references must be ignored until such a time that none of the unresolved references can be resolved anymore. That again is a pretty heavy requirement for a parser. Example:
    { "definitions":
      { "a": { "$ref": "http://foo.bar" },
        "b": { "$ref": "http://x.y" } } }
    If http://foo.bar contains a $ref to http://bar.baz, [which contains a $ref to ... ]*, which contains a $ref to http://x.y, then both refs in this example should be resolved. But you won't find out that you can resolve it before you have parsed all other $refs and all $refs in all referenced schemas.
  • It should also be mentioned what a parser is to do when a reference cannot be resolved.
  • There should be a phrase (it's been a while since I went over the full spec, so maybe it has appeared in the meanwhile) that specifies that there may be some external URI resolution mechanism (e.g. an HTTP client for http:// URLs). The $base/$anchor mechanism should be seen as a complementary way of resolving URIs. This poses the question whether $base/$anchor should be seen as an overrride or as an alternative: if I specify a "$base": "http://x.y", should this instruct the parser that whenever it resolves http://x.y it should ignore the contents that an HTTP client could retrieve from that location and instead use this document, or should it signify that the document in which the $base is declared is equal to the contents that can be found at that URI? If it's an override, what happens if the parser first encounters { "$ref": "http://x.y" } and only then finds a subdocument that defines "$base": "http://x.y"? If it's an alternative, what happens if the document that contains the $base property is in fact not equal to the contents at the specified URI? Should the parser detect this actively? If the parser detects it, should it result in an error (which is equivalent to saying that it is the responsibility of the author to only produce valid JSON schemas, which is probably the most reasonable solution), or if not how should it be resolved?

@epoberezkin
Copy link
Member

I agree with @scranen. It also means that $base is confusing and it's better to stick with $id.

@handrews
Copy link
Contributor Author

@scranen I think that most or all the things you are worried about are already covered in the "Internal References" and "External References" sections. Can you take a look at those and tell me where they are inadequate?

In particular:

As a result, no two unequal subschemas may define the same $base URI.

The correct restriction is that no two unequal subschemas may define the same $base and same $anchor. That is exactly the same as with id. There is nothing complicated going on here.

$base + "#" + $anchor == id

(assuming $base does not already have a fragment, which it SHOULD not).

@handrews
Copy link
Contributor Author

@epoberezkin we're working through a few wording issues here and have spent only a few hours on it. It's premature to declare the whole thing confusing, particularly given the epic amounts of confusion id has caused ever since it was introduced.

@awwright
Copy link
Member

awwright commented Nov 22, 2016

I've proposed something like $base before, but I tend to think it would make things more complex and not less complex.

If we need it, I would like to consider calling one of the keywords "$self" instead, which would let you assign the document a URI without changing the URI base.

@handrews
Copy link
Contributor Author

@scranen

It should also be mentioned what a parser is to do when a reference cannot be resolved.

That is already covered- $ref URIs need not be resolvable, and implementations SHOULD NOT assume that they can be resolved. So if the referenced URI was not defined internally and cannot be resolved externally, then you simply cannot process the schema.

Again, this is not new, it's exactly the same as it was under id

@handrews
Copy link
Contributor Author

@awwright would your $self replace or be in addition to id?

@handrews
Copy link
Contributor Author

Also why the resistance to having an anchor naming feature which is common in other media types?

@awwright
Copy link
Member

@handrews I briefly forgot what that was referring to, sorry...

XML and now HTML let you refer to any element with an id property that matches a certain naming convention, which is exactly duplicated in the latest I-D:

To name subschemas in a JSON Schema document, subschemas can use "id" to give themselves a document-local identifier. This form of "id" keyword MUST begin with a hash ("#") to identify it as a fragment URI reference, followed by a letter ([A-Za-z]), followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), or periods (".").

Do you think this suffices, or do we still need a new keyword?

@handrews
Copy link
Contributor Author

@awwright I found it confusing that id basically has two functions, each of which has its own syntax. That is why I split the functions. Then each keyword has its own clear function and intent.

If XML and HTML have an equally confusing multi-purpose id, I guess I'll give up on this. I think it's a horrible idea, but if it's already ratified in other standards I can't fight that.

@handrews
Copy link
Contributor Author

@awwright wait... now I'm forgetting things. HTML ids are not necessarily URIs and do not also change the base. As far as I can tell from literally two minutes with the xml:id spec, neither is it. So I'm back to wanting two keywords, although perhaps I have the names wrong.

@handrews
Copy link
Contributor Author

handrews commented Nov 22, 2016

OK, to re-summarize. I think that there are two use cases here, and I think that HTML and XML have separate keywords/elements for them. One is defining a document-local identifier, and the other is changing the base URI.

If I am right about that, I would like for JSON Schema to follow that pattern. It makes much more sense than overloading one keyword with both functions. I do not much care what the names are as long as there are two- if we use the names from HTML and XML that's great, but I'm open to whatever. $base and $anchor were just the first things that came to mind, I'm not attached to them.

If I'm wrong about HTML and XML handling these separately, I'm wrong and I probably have no case :-)

@awwright
Copy link
Member

Hmm, I think I'm starting to understand where some of the difference in perspective is coming from.

The Web isn't just a collection of documents anymore, but a collection of resources. And in the Web, resources can have URIs just as well as documents can. Non-information resources don't have representations as documents, but we can say things about them, inside documents.

This concept was enthusiastically embraced by RDF, RDFa, Turtle, and others; and is now seeing adoption by JSON technologies now like JSON-LD.

So while HTML documents have traditionally allowed you to create resources with fragment ids (by using the "id" attribute), that's not an absolute limitation. HTML+RDFa lets you define resources with their own URIs outright. JSON-LD also allows description of multiple resources in a single document.

Anticipating a vast ecosystem of schemas, JSON Schema went this direction. So if you want the traditional form where all the described resources (schemas) share the same base URI, all you have to do is prepend a "#" to the "id" argument.

I think if we just make it clear that we're using URIs as ids, instead of document-local identifiers exclusively, things become a lot more intuitive.

So having a separate "anchor" keyword... it just seems like three extra characters I have to type (if "id":"#a" means the same thing as "anchor":"a").

@handrews
Copy link
Contributor Author

@awwright thanks for that explanation. I don't much like it, but it's pretty convincing :-D

Since no one else seems enthusiastic about splitting these, I'll go ahead and close this PR and issue #149. We should instead merge #154 after appropriate review.

@handrews handrews closed this Nov 22, 2016
@epoberezkin
Copy link
Member

epoberezkin commented Nov 22, 2016

@handrews:

I think that there are two use cases here

Could you please tell what they are, in the real world, not in theoretic XML land.

HTML and XML have separate keywords/elements for them. One is defining a document-local identifier, and the other is changing the base URI.

HTML has baseURI. I never knew it exists, I don't know why would anyone use it. The fact that XML and HTML have it doesn't mean they are used. Why do we bring lots of useless stuff in the standard? What is the real use case for it?

JSON was purposefully designed as a much simpler thing than XML, there are many concepts in XML that don't exist in JSON. That was the main reason for its widespread adoption. Why do we need to copy useless abstractions from XML instead of only having things that are needed by 90% of users and not having things that are only needed by 10% (maybe even less than 1% in this case)?

We already know that base URI change is exactly where implementations lack interoperability. What sense does it make to make it more complex instead of removing it completely?

@seagreen
Copy link
Collaborator

@epoberezkin: I think those are good things to talk about. Since this is closed, want to bring up these concerns on Issue #14?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants