Split "id" into "$base" and "$anchor". #155

handrews · 2016-11-21T20:23:26Z

This shows how #149 could be addressed. I wrote it quickly as an example, so if we go this route it probably needs some tweaking. It just seemed more clear and concise to put this up rather than try to go into more detail in the issue.

This is intended to separate the two use cases of "id" in order to simplify
the definition of each keyword and clarify the intent of schema authors.

Additionally, use the "$" prefix for core keywords, as already discussed
in the prior decision to change "id" to "$id".

This is intended to separate the two use cases of "id" in order to simplify the definition of each keyword and clarify the intent of schema authors. Additionally, use the "$" prefix for core keywords, as already discussed in the prior decision to change "id" to "$id".

epoberezkin · 2016-11-22T10:40:56Z

Assuming $anchor would not change resolution scope and $base can be used in the root of the document?

handrews · 2016-11-22T17:10:29Z

"$anchor": "foo" is equivalent to <a name="foo"></a> in HTML. It just creates an identifier that can be used as a plain name fragment, and fragments cannot affect the resolution base. So you are correct: $anchor would not change the base.

$base can, as written, be used anywhere. It's essentially $id without fragments, or more accurately specifying that fragments MUST be ignored since it is for absolute URIs or URI references that, when combined with the previous base, produce absolute URIs. I gave it a different name to keep things less confusing (hopefully), but that name is certainly negotiable.

scranen · 2016-11-22T20:23:03Z

Since $base now only changes the resolution scope, I can no longer inline external schemas and reference them by their original URIs. Is that correct?

Maybe you can help me understand the proposal by explaining what happens in the following example. If http://foo.bar resolves to

{ "$base": "http://bar.baz",
  "$anchor": "a" }

Then which of the references in the following schema would we expect to be resolved, and why?

{ "definitions": 
  { "a": { "$ref": "http://foo.bar" },
    "b": { "$ref": "http://foo.bar#a" },
    "c": { "$ref": "http://bar.baz" },
    "d": { "$ref": "http://bar.baz#a" } } 
}

handrews · 2016-11-22T20:28:28Z

@scranen all of those $refs should resolve to the same schema. It should be exactly the same as if http://foo.bar resolved to {"id": "http://bar.baz#a"} under v4 rules. Is that what you expected/wanted?

I did not intend to break or change any functionality, just reorganize it, so if it looks like I broke something that's a bug in my write-up. Let me know where the confusing language is.

scranen · 2016-11-22T21:21:51Z

The new text only says that $base changes the resolution scope, it does not mention anymore that it also signifies that the document in which it occurs should be the result of resolving that URI. I think that should be added back in some form. As a result, no two unequal subschemas may define the same $base URI.

I believe that the mechanism you propose would work, but I have a couple of remarks:

The weakest condition to guarantee well-defined semantics of this mechanism is to require that the same $base may only occur in equivalent subschemas or referenced schemas. This is quite a heavy implication for a parser. Maybe the spec should mention that a parser is allowed to give up if it encounters the same $base URI twice in different locations. The parser must be obliged to return an error if two different documents with the same $base are encountered.
The fact that all the $refs in the example above should be resolved, means that you are forcing a parser to do some kind of fixpoint computation: unresolved references must be ignored until such a time that none of the unresolved references can be resolved anymore. That again is a pretty heavy requirement for a parser. Example:
```
{ "definitions":
  { "a": { "$ref": "http://foo.bar" },
    "b": { "$ref": "http://x.y" } } }
```
If http://foo.bar contains a $ref to http://bar.baz, [which contains a $ref to ... ]*, which contains a $ref to http://x.y, then both refs in this example should be resolved. But you won't find out that you can resolve it before you have parsed all other $refs and all $refs in all referenced schemas.
It should also be mentioned what a parser is to do when a reference cannot be resolved.
There should be a phrase (it's been a while since I went over the full spec, so maybe it has appeared in the meanwhile) that specifies that there may be some external URI resolution mechanism (e.g. an HTTP client for http:// URLs). The $base/$anchor mechanism should be seen as a complementary way of resolving URIs. This poses the question whether $base/$anchor should be seen as an overrride or as an alternative: if I specify a "$base": "http://x.y", should this instruct the parser that whenever it resolves http://x.y it should ignore the contents that an HTTP client could retrieve from that location and instead use this document, or should it signify that the document in which the $base is declared is equal to the contents that can be found at that URI? If it's an override, what happens if the parser first encounters { "$ref": "http://x.y" } and only then finds a subdocument that defines "$base": "http://x.y"? If it's an alternative, what happens if the document that contains the $base property is in fact not equal to the contents at the specified URI? Should the parser detect this actively? If the parser detects it, should it result in an error (which is equivalent to saying that it is the responsibility of the author to only produce valid JSON schemas, which is probably the most reasonable solution), or if not how should it be resolved?

epoberezkin · 2016-11-22T22:25:39Z

I agree with @scranen. It also means that $base is confusing and it's better to stick with $id.

handrews · 2016-11-22T22:28:02Z

@scranen I think that most or all the things you are worried about are already covered in the "Internal References" and "External References" sections. Can you take a look at those and tell me where they are inadequate?

In particular:

As a result, no two unequal subschemas may define the same $base URI.

The correct restriction is that no two unequal subschemas may define the same $base and same $anchor. That is exactly the same as with id. There is nothing complicated going on here.

$base + "#" + $anchor == id

(assuming $base does not already have a fragment, which it SHOULD not).

handrews · 2016-11-22T22:29:12Z

@epoberezkin we're working through a few wording issues here and have spent only a few hours on it. It's premature to declare the whole thing confusing, particularly given the epic amounts of confusion id has caused ever since it was introduced.

awwright · 2016-11-22T22:29:54Z

I've proposed something like $base before, but I tend to think it would make things more complex and not less complex.

If we need it, I would like to consider calling one of the keywords "$self" instead, which would let you assign the document a URI without changing the URI base.

handrews · 2016-11-22T22:31:29Z

@scranen

It should also be mentioned what a parser is to do when a reference cannot be resolved.

That is already covered- $ref URIs need not be resolvable, and implementations SHOULD NOT assume that they can be resolved. So if the referenced URI was not defined internally and cannot be resolved externally, then you simply cannot process the schema.

Again, this is not new, it's exactly the same as it was under id

handrews · 2016-11-22T22:31:52Z

@awwright would your $self replace or be in addition to id?

handrews · 2016-11-22T22:32:23Z

Also why the resistance to having an anchor naming feature which is common in other media types?

awwright · 2016-11-22T22:45:09Z

@handrews I briefly forgot what that was referring to, sorry...

XML and now HTML let you refer to any element with an id property that matches a certain naming convention, which is exactly duplicated in the latest I-D:

To name subschemas in a JSON Schema document, subschemas can use "id" to give themselves a document-local identifier. This form of "id" keyword MUST begin with a hash ("#") to identify it as a fragment URI reference, followed by a letter ([A-Za-z]), followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), or periods (".").

Do you think this suffices, or do we still need a new keyword?

handrews · 2016-11-22T22:47:24Z

@awwright I found it confusing that id basically has two functions, each of which has its own syntax. That is why I split the functions. Then each keyword has its own clear function and intent.

If XML and HTML have an equally confusing multi-purpose id, I guess I'll give up on this. I think it's a horrible idea, but if it's already ratified in other standards I can't fight that.

handrews · 2016-11-22T22:49:34Z

@awwright wait... now I'm forgetting things. HTML ids are not necessarily URIs and do not also change the base. As far as I can tell from literally two minutes with the xml:id spec, neither is it. So I'm back to wanting two keywords, although perhaps I have the names wrong.

handrews · 2016-11-22T22:52:47Z

OK, to re-summarize. I think that there are two use cases here, and I think that HTML and XML have separate keywords/elements for them. One is defining a document-local identifier, and the other is changing the base URI.

If I am right about that, I would like for JSON Schema to follow that pattern. It makes much more sense than overloading one keyword with both functions. I do not much care what the names are as long as there are two- if we use the names from HTML and XML that's great, but I'm open to whatever. $base and $anchor were just the first things that came to mind, I'm not attached to them.

If I'm wrong about HTML and XML handling these separately, I'm wrong and I probably have no case :-)

awwright · 2016-11-22T23:32:33Z

Hmm, I think I'm starting to understand where some of the difference in perspective is coming from.

The Web isn't just a collection of documents anymore, but a collection of resources. And in the Web, resources can have URIs just as well as documents can. Non-information resources don't have representations as documents, but we can say things about them, inside documents.

This concept was enthusiastically embraced by RDF, RDFa, Turtle, and others; and is now seeing adoption by JSON technologies now like JSON-LD.

So while HTML documents have traditionally allowed you to create resources with fragment ids (by using the "id" attribute), that's not an absolute limitation. HTML+RDFa lets you define resources with their own URIs outright. JSON-LD also allows description of multiple resources in a single document.

Anticipating a vast ecosystem of schemas, JSON Schema went this direction. So if you want the traditional form where all the described resources (schemas) share the same base URI, all you have to do is prepend a "#" to the "id" argument.

I think if we just make it clear that we're using URIs as ids, instead of document-local identifiers exclusively, things become a lot more intuitive.

So having a separate "anchor" keyword... it just seems like three extra characters I have to type (if "id":"#a" means the same thing as "anchor":"a").

handrews · 2016-11-22T23:44:47Z

@awwright thanks for that explanation. I don't much like it, but it's pretty convincing :-D

Since no one else seems enthusiastic about splitting these, I'll go ahead and close this PR and issue #149. We should instead merge #154 after appropriate review.

epoberezkin · 2016-11-22T23:54:52Z

@handrews:

I think that there are two use cases here

Could you please tell what they are, in the real world, not in theoretic XML land.

HTML and XML have separate keywords/elements for them. One is defining a document-local identifier, and the other is changing the base URI.

HTML has baseURI. I never knew it exists, I don't know why would anyone use it. The fact that XML and HTML have it doesn't mean they are used. Why do we bring lots of useless stuff in the standard? What is the real use case for it?

JSON was purposefully designed as a much simpler thing than XML, there are many concepts in XML that don't exist in JSON. That was the main reason for its widespread adoption. Why do we need to copy useless abstractions from XML instead of only having things that are needed by 90% of users and not having things that are only needed by 10% (maybe even less than 1% in this case)?

We already know that base URI change is exactly where implementations lack interoperability. What sense does it make to make it more complex instead of removing it completely?

seagreen · 2016-11-23T03:26:10Z

@epoberezkin: I think those are good things to talk about. Since this is closed, want to bring up these concerns on Issue #14?

This was referenced Nov 21, 2016

Replace non-root "id" (or "$id") with "anchor" (or "$anchor") #149

Closed

Change "id" to "$id", retain "id" as deprecated. #154

Closed

handrews mentioned this pull request Nov 22, 2016

Need a contributor's guide #137

Closed

handrews closed this Nov 22, 2016

handrews mentioned this pull request Feb 3, 2017

JSON pointer usage is not explained properly in core #243

Closed

handrews deleted the anchor branch September 16, 2017 09:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split "id" into "$base" and "$anchor". #155

Split "id" into "$base" and "$anchor". #155

handrews commented Nov 21, 2016

epoberezkin commented Nov 22, 2016

handrews commented Nov 22, 2016

scranen commented Nov 22, 2016

handrews commented Nov 22, 2016

scranen commented Nov 22, 2016

epoberezkin commented Nov 22, 2016

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016

awwright commented Nov 22, 2016 •

edited

Loading

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016

awwright commented Nov 22, 2016

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016 •

edited

Loading

awwright commented Nov 22, 2016

handrews commented Nov 22, 2016

epoberezkin commented Nov 22, 2016 •

edited

Loading

seagreen commented Nov 23, 2016

Split "id" into "$base" and "$anchor". #155

Split "id" into "$base" and "$anchor". #155

Conversation

handrews commented Nov 21, 2016

epoberezkin commented Nov 22, 2016

handrews commented Nov 22, 2016

scranen commented Nov 22, 2016

handrews commented Nov 22, 2016

scranen commented Nov 22, 2016

epoberezkin commented Nov 22, 2016

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016

awwright commented Nov 22, 2016 • edited Loading

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016

awwright commented Nov 22, 2016

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016

handrews commented Nov 22, 2016 • edited Loading

awwright commented Nov 22, 2016

handrews commented Nov 22, 2016

epoberezkin commented Nov 22, 2016 • edited Loading

seagreen commented Nov 23, 2016

awwright commented Nov 22, 2016 •

edited

Loading

handrews commented Nov 22, 2016 •

edited

Loading

epoberezkin commented Nov 22, 2016 •

edited

Loading