Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate template pre-processing #142

Closed
handrews opened this issue Nov 14, 2016 · 13 comments
Closed

Eliminate template pre-processing #142

handrews opened this issue Nov 14, 2016 · 13 comments

Comments

@handrews
Copy link
Contributor

handrews commented Nov 14, 2016

NOTE: I'm splitting this out from #52 in order to get more attention on this specific point and the related pull request #129 . I'll probably split other stuff out from #52 and then close it. In hindsight, it was not a good idea to dump everything together there.

URI Template pre-processing is confusing, significantly complicates implementation, and does not address all limitations in the current rules for filling out templates with instance data.

The current approach tries to circumvent the URI Template spec's variable name limitations within the actual string used to express the URI Template. A simpler yet more powerful approach is to make the URI Template strings normal URI Templates, and use a mapping object to translate legal URI Template names to expressions that can identify any part of the instance. The proposed keyword for the map is hrefVars.

Relative JSON Pointers come closest to meeting the necessary requirements: starting from any point in the instance (specifically, the point from which the LDO including the template is defined), they can identify nearly any other point in the instance (see #115 for limitations with respect to arrays). See #126 for a discussion of whether Relative JSON Pointers should be a separate I-D or should start as part of JSON Schema. Either would work fine for this proposal.

To preserve the current behavior when preprocessing is not needed, if a template variable "x" does not appear in hrefVars, it may be considered present with a relative pointer of "0/x".

Examples based on current pre-processing features

Here is a subset of the table of examples for pre-processing, followed by a schema showing links using these variable names.

Input Output
"{(escape space)}" "{escape%20space}"
"{(a (b)))}" "{a%20%28b%29}
"{()}" "{%65mpty}
"{+$*}" "{+%73elf*}
{
    "links": [
        {
            "rel": "foo",
            "href": "/{(escape space)}/{(a (b))}/{()}"
        },
        {
            "rel": "bar",
            "href": "/{+$*}"
        }
    ]
}

Note that making use of $ in the "self" case with the "bar" link requires using the URI Template "+" operator to allow percent-encoded sequences, as the $ is replaced by a percent-encoded sequence during pre-processing. This is particularly confusing since, without pre-processing, the "+" would make the "$" a literal dollar sign that did not need to be percent-encoded.

Given this instance:

{
    "escape space": "x",
    "a (b)": "y",
    "": "z"
}

the "foo" link would expand to "/x/y/z"

Given an instance of [1, 2, 3, 4] the "bar" link would expand to "/1/2/3/4" (the "*" suffix is a URI Template "explode" operator which interprets each list element as a path component).

Here is what a mapping approach might look like, which would produce the same results when applied to the same two example instances.

{
    "links": [
        {   
            "rel": "foo",
            "href": "/{space}/{complicated}/{empty}",
            "hrefVars": {
                "space": "0/escape space",
                "complicated": "0/a (b)",
                "empty": "0/"
            }   
        },  
        {   
            "rel": "bar",
            "href": "/{self*}",
            "hrefVars": {
                "self": "0" 
            }   
        }   
    ]   
}

The hrefVars keyword defines the map for href. Note that we no longer need the "+" operator at all. URI Template operators such as the "*" suffix are not part of the template variable name and therefore do not appear in the map.

Examples that are impossible with current pre-processing

Given this schema:

{
    "type": "object",
    "properties": {
        "foo": {"type": "string"},
        "bar": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "x": {"type": "number"},
                    "y": {
                        "type": "object",
                        "properties": {"z": {"type": "boolean"}}
                    }
                },
                "links": [{
                    "rel": "item",
                    "href": "/{foo}/{x}/{z}",
                    "hrefVars": {
                        "foo": "1/foo",
                        "z": "0/y/z"
                    }
                }]
            }
        }
    }
}

Applying it to this instance:

{
    "foo": "oof",
    "bar": [{"x": 42, "y": {"z": true}}, {"x": 0, "y": {"z": false}}]
}

would produce an item link of "/oof/42/true" for the first array element, and "/oof/0/false" for the second. Note that since "x" was not in hrefVars it is treated as if mapped to "0/x", which produces the same behavior as the current specification for non-preprocessed variables.

This demonstrates the support for referencing data in enclosing and nested instances.

In conclusion

Mapping is much easier to implement and much easier to read than pre-processing. As proposed, it offers a superset of the current functionality. The main implementation concern would be introducing Relative JSON Pointers, but they are required for several proposals currently under consideration.

In my experience on past projects, schema authors found variable mapping easy to work with. It was not a significant source of confusion or bugs, either in schemas or in the code that processed the schemas using this feature.

@handrews
Copy link
Contributor Author

handrews commented Nov 14, 2016

Paging known hyper-schema users @jdesrosiers , @slurmulon, @Anthropic :-)

@handrews
Copy link
Contributor Author

Forgot to add: this approach would apply to any current template keyword, so there would be a baseVars corresponding to base. This can be seen in the pull request, I just forgot to mention it here.

@handrews
Copy link
Contributor Author

@awwright : Note that if jumping straight to this approach is too much for folks, it's entirely possible to support both templating and this mapping approach. We just declare that template processing happens first, and then the mapping rules apply (which only makes any difference if hrefvars is present, so it's entirely backwards compatible).

I personally have no intention of implementing template processing in my tools, because it's extremely complicated for very little benefit. But if the main concern is not changing things too quickly or drastically, that would be an option.

@handrews
Copy link
Contributor Author

Updating with thoughts from an IRC discussion with @awwright :

  • While the URI Templates RFC allows the "." character in variable names, which looks useful for hierarchical access, it does not require that usage, and using it would not work for us.
    • dot-separated variable names cannot express all useful JSON Paths
    • the empty string is a valid property name in JSON, but two adjacent dots are not valid in a variable name
    • there is no way to address JSON values in a containing or sibling part of the instance, only in the immediate instance or its children
  • Mapping to Relative JSON pointers avoid all of these problems
    • they allow referencing containing and and sibling instances
    • they allow all legal JSON property names, so the URI Template variables do not need to support them (either directly or through pre-processing)
  • Relative JSON pointers themselves are not that important, the important part is mapping to avoid preprocessing
    • Any other sufficiently expressive system would work
    • Relative JSON pointers are convenient because we are already using JSON Pointer extensively so the general usage is familiar

@handrews
Copy link
Contributor Author

Relative pointer alternative: Dotted URI Template variables

@awwright has expressed doubts about Relative JSON Pointer, and indicated a preference for using dot-separated URI Template variable names to reference nested properties and array elements.

Here are my thoughts- warning, I'm not even pretending to be objective here :-) While I'm not specifically attached to Relative JSON Pointers, I don't see how dot-separated variables can come close to their functionality.

The URI Template specification RFC 6570 defines variables as:

varname       =  varchar *( ["."] varchar )
varchar       =  ALPHA / DIGIT / "_" / pct-encoded

Pros of dot-separated template variables

  • It's a familiar idiom from many programming languages
  • It's clearly meant to be usable for some sort of variable structure functionality based on the ABNF
  • @awwright probably has more and will hopefully comment :-)

Cons of dot-separated template variables

  • It would still require preprocessing, as the variable still cannot express all possible JSON property names (and would also need to escape a literal dot)
  • It's clearly not required by the spec to be used for variable structure, so there are no interoperability guarantees; the spec is completely silent on why the dot is given special treatment at all
  • It would not support referencing the current index in an array or the set of property names of an object, both features that were already part of or reasonably proposed for relative pointers
  • There is no way to reference the entire instance value, which is sometimes needed for objects and arrays, and always needed for instances that are strings, numbers, bools, or null
  • There is no way to reference parent or sibling fields

To me, the last two are deal-breakers. It's not clear to me how dotted variables could be defined to handle those- it would likely have to involve more preprocessing of the kind we are trying to move away from. Or it would require an additional separate solution, and I haven't (yet) heard a good reason why we should have two approaches to complex instance data referencing when there is a very simple and consistent approach that has been successfully implemented "in the wild".

@Anthropic
Copy link
Collaborator

@handrews we get away for the most part with dot and bracket as we process any key internally into an array based key anyway. objectpath lib can transform any a.b[c].d into the key ['a', 'b', c, 'd'] and back to a['b'][c]['d'] for use in js.

Potentially some form of functional symbolism may be easier to work with, eg. "parent(3).x[path(parent(3).y.ref.id)].z", for processing with regexes into something interpretable for each language while still offering simplicity for simple keys eg. "parent(2).array[3].property". With parent/self only accepted as the first element in a path and path can identify a key for an index using another path.

I don't have strong opinion on the end format, so long as it is easy to translate and covers every possible scenario in the simplest way possible ;)

@handrews
Copy link
Contributor Author

@Anthropic While I can see building a useful syntax in that way, not all languages will be able to handle it as easily as JavaScript. Which means that it adds a relatively complex parsing requirement to the system.

Additionally, we end up with yet more preprocessing and escaping needs- those strings would not be valid variables in URI Templates as-is. So we're back to layering a macro language on top of URI Templates. Ick.

I also really like that, since we're using JSON Pointer elsewhere, relative pointers are easily applied to JSON Pointers to produce new JSON Pointers. JSON Schema already uses JSON Pointer, so implementations already need to know how to apply JSON Pointers to instances. This means that using a relative pointer is extremely simple:

  • Take the JSON Pointer for the current location
  • Apply the relative pointer to produce a new JSON Pointer
  • Apply the resulting JSON Pointer to the instance

Applying the relative pointer to an existing JSON Pointer is intentionally a very simple process both in terms of parsing and executing. You only need to parse and make use of the leading term (before the first "/") of the relative pointer, after which you can just blindly tack the rest of the relative pointer onto the intermediate result. Relative pointers are also visually consistent with JSON Pointers that appear elsewhere such as in "$ref" URIs.

Finally, other proposals like $data need the same functionality, but do not involve URI Templates. We could, of course, specify that such keywords use the same mini-language as we stuff into URI Templates, but again that's a more significant implementation burden than just using relative pointers.

So far no one has explained why relative pointers are bad. In particular, why they are worse than implementing our own mini-language within URI Templates. If the dot notation alone worked for us (as it does when used with some data type that does not have containing/sibling data) I would be all for it. But it does not.

@Anthropic
Copy link
Collaborator

@handrews I agree with you by the way, just considering all options.

Well, I don't agree it would be significantly harder to parse in different languages, but I certainly agree that mappings are easier to handle than inline anyway. I would go so far as to say json-pointers themselves need the same mapping ability...
In fact now I am curious about the potential need for a substitution keyword instead.

{
  "title": "My name is {name}",
  "$substitute": { "name": "/model/name" }
}

@handrews
Copy link
Contributor Author

Well, I don't agree it would be significantly harder to parse in different languages

By "harder" I mean I would find it more irritating ;-)
Although the chances of me being forced to implement this all in C seem low.

In fact now I am curious about the potential need for a substitution keyword instead.

Interesting. I think that could get very complicated very quickly, but on the other hand I'm fairly sure something at least kind of like that was proposed somewhere. Can't find it now.

I'm not sure I follow about json-pointers themselves needing the same mapping ability, though.

@Anthropic
Copy link
Collaborator

@handrews I could use substitution to assign indexes dynamically to array pointers "obj/array/{i1}/arr/{i2}/property" etc...

@handrews
Copy link
Contributor Author

handrews commented Dec 1, 2016

So this would basically be like "$data" except instead of loading the instance value from the pointer into the schema to replace the "$data" object, it replaces the substitution variables wherever they appear in the schema?

Interesting, although I see all sorts of complications with scoping and references.

If you want to pursue "$substitute", please file it as a separate issue. I don't think we need to solve it in order to resolve this issue.

@Anthropic
Copy link
Collaborator

Well it depends, if it was worth considering elsewhere then it would impact on the use of hrefVars as it essentially does the same thing, so this could end up being:

{
  "links": [{
    "rel": "item",
    "href": "/{foo}/{x}/{z}",
    "$subst": {
      "foo": "1/foo",
      "z": "0/y/z",
      "x": "3/x"
    }
  }]
}

@handrews
Copy link
Contributor Author

I'm going to close this as PR #179 is a first step towards the larger problem (see #108) that will probably take us in a slightly different direction here. Which I think will ultimately be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants