Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: JSON Schema $ref for aliases (breaking change) #259

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

drwpow
Copy link
Contributor

@drwpow drwpow commented Jan 11, 2025

Summary

This makes a breaking change to aliases, changing the syntax:

- { "$value": "{color.blue.01}" }
+ { "$value": { "$ref": "#/color/blue/01/$value" } }

This provides a solution for #166, as well as aligns the DTCG format closer to common syntax that address similar problems.

Reasoning

DTCG users have kept their files separately for a long time, and have asked for the ability to reference tokens in other files (#166). Since we were already borrowing from JSON Schema in places (primarily the $ character to mark reserved keys), this brings JSON Schema’s $ref keyword to the concept of token aliasing.

The $ref keyword (aka “JSON pointers”) comes both from JSON Schema and the OpenAPI specification (previously Swagger), and has been in use for over a decade. It is syntactically already consistent with the DTCG format, and functionally consistent as well. Most languages have lots of existing tooling to parse and resolve these (e.g. JavaScript).

For the DTCG format, it is a way to keep all the functionality of aliases while extending it for future needs, borrowing from prior art where this syntax is known, used, and well-defined. For simplicity, this proposal aims to replace the previous alias syntax, rather than keep 2 conflicting ways to accomplish the same thing (which will be expanded on in points below).

What’s gained

  • The ability for tokens to alias tokens in remote files, supported by the spec
  • The ability for tokens to reuse any part of other tokens (still bound by the requirement it must ultimately resolve to a valid schema)

What’s lost

  • Nothing! This is backwards-compatible (behavior-wise), and the previous alias syntax can be cleanly upgraded with no change in behavior.

Pros

  • This greatly expands what’s possible with the DTCG format, now that any part of the document (or remote documents, or partial remote documents) may be reused.

    • For example, aliased tokens could now even reuse parts of $extensions if they wanted to, while selectively applying overrides
    • Because it allows overrides, you can reuse more DTCG syntax than you could before, even sub-schemas in remote documents (e.g. extending public design systems)
  • While this is flagged as a breaking syntax change, it’s a backwards compatible functionality upgrade to aliases. There is no end-behavior nor features that should be lost with this change.

  • Figma Styles and Variables use the / character in names, so now token names will map 1:1

    • Are there other cases where this is true?
  • This is backwards compatible for token names. Tokens can still use / and # characters in their names; they just have to be escaped with ~0 and ~1 respectively ONLY INSIDE $ref, according to the spec:

    {
      "my/token": { "$type": "number", "$value": 5 },
      "other-token": { "$ref": "#/my~1token" }
    }
  • For DTCG parsers, this simplifies parsing/handling of aliases. Consider the old syntax:

    {
      "font-1": { "$type": "fontFamily", "$value": "Inter" },
      "font-2": { "$type": "fontFamily", "$value": "{font-1}" }
    }

    When a schema isn’t distinguishable by structure, tool makers have to do additional work string matching to do basic typecasting and discrimination.

  • Tool makers also benefit from over a decade of prior art and tooling for handling this syntax (even if they may not have used it directly before)

Cons

  • Given that aliases are core syntax, this is a disruptive change, not just for toolmakers but for consumers of the DTCG spec. We likely would want to talk about a deprecation strategy
  • It does change the structure, converting a string node ("{my.token.alias"}) to an object ({ "$ref": "#/my/token/alias" }). Again, though long-term it will be easier to work with, short-term will impose migration pains.
  • It is more verbose, especially requiring repeat of $value e.g.:
    - "$value": "{color.blue.500}"
    + "$value": { "$ref": "#/color/blue/500/$value" }
  • It does require some extra explanation around “what counts as an alias” vs what doesn’t (see comment)
    • Proposal: count anything as an alias if a $value of one token points to another token’s $value (which means aliasing the entire token object itself, or a group, means aliases are created)

Alternatives

  • We could extend the current {…} alias syntax to support remote files. But I think realistically, it would follow the JSON Pointer spec underneath. But that would require extensive documentation to outline this minor detail, and would still leave toolmakers without the ability to use any of the myriad tools available for JSON pointers. If we’re adhering to those principles underneath, why not just use the syntax directly?

  • Those familiar with JSON Schema will also be familiar with its counterpart $defs, which allows you to declare reusable parts of a document that can be $ref’d anywhere (but by default are ignored/not parsed). $defs are NOT being proposed here, because it doesn’t solve an immediate problem, and introduces more complexity than necessary (but could be an additive followup)

  • An alternate idea is to keep aliases the same {color.blue.05}, and introduce this new concept as a reference (and distinguish between the terms). Aliases would just be the “legacy” way to declare token aliases. I didn’t initially propose this because:

    • There were no advantages I could find. Even the automatic $type inheritance is possible with $ref, but $ref carries more benefits

    • I couldn’t think of a sensible usecase of distinguishing “aliases” from “references” after reading this note in the JSON Schema spec (2019-09:

      Attempting to remove all references and produce a single schema document does not, in all cases, produce a schema with identical behavior to the original form.

      In other words, even the concept of $ref feels spiritally identical to the existing DTCG alias syntax—there are times when you do want to preserve and reference those values later, and they are significant in some ways (even including overrides).

    However, I think we could define some sort of deprecation strategy where the old syntax is supported for some time before switching over.

Notes

  • This does introduce some confusion in token IDs. For example, is the “official” token ID now color.blue.05 or color/blue/05?
    • Further, does this mean . is allowed in token names again?
  • As someone that’s worked extensively with JSON Schema syntax, it’s often not enough to say “follow the spec” because there are, like, dozens of conflicting versions that all have breaking changes. So I’m proposing specifically the 2020-12 version, in the case that tool makers run into one of these conflicts
    • Specifically, 2020-12 DOES allow $ref to have sibling keys (“overrides”), which is IMPORTANT! Without this, I believe this would lose some functionality of aliases—specifically their ability to inherit $type automatically.
    • 99.9% of the time it’s not an issue; this is just coverage for that one weird edge case 😅

Edit: an earlier version referenced the upcoming Resolver Spec proposal, but I’ve realized the two aren’t related, and Resolver Spec isn’t a motivator for this. Mentions have been removed.

@drwpow drwpow changed the title Proposal: breaking changes to Aliases (JSON Schema $ref)s Proposal: JSON Schema $ref for aliases (breaking change) Jan 11, 2025
Copy link

netlify bot commented Jan 11, 2025

Deploy Preview for dtcg-tr ready!

Name Link
🔨 Latest commit 7e3fa6b
🔍 Latest deploy log https://app.netlify.com/sites/dtcg-tr/deploys/6781b4e21beb8a000806df3e
😎 Deploy Preview https://deploy-preview-259--dtcg-tr.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Jan 11, 2025

Deploy Preview for dtcg-tr ready!

Name Link
🔨 Latest commit 299d636
🔍 Latest deploy log https://app.netlify.com/sites/dtcg-tr/deploys/67868f24fda59e0008beb3ea
😎 Deploy Preview https://deploy-preview-259--dtcg-tr.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@c1rrus
Copy link
Member

c1rrus commented Jan 11, 2025

I've literally just read this, so this is a bit of a knee-jerk reaction (perhaps I'll feel differently after sleeping on it :-P), but... I have concerns about this proposal:

  • While I take your point about the prior art and widespread usage and support of JSON Schema, I'd argue that within the domain of design systems and design tokens, the current {foo.bar.baz} syntax is more familiar. Mainly because it's inspired by Style Dictionary's syntax, and Style Dictionary has been the most widely used design token export tool for a while now. Newer ones like Cobalt UI/Terrazo support DTCG and therefore also use the current syntax.
  • Yes, the current syntax requires a little bit of string parsing to detect a reference versus an actual value, but I'm not convinced that if ( token.$value.charAt(0) === '{' ) is any more difficult than if ( typeof token.$value === 'object' && token.$value.$ref !== undefined ).

Re-reading #166 and thinking about the resolver spec, I wonder whether "referencing tokens in another file" is the right problem to be solving. Clearly, there's a strong demand for being able to split up your design tokens across multiple files - both to keep things manageable when you have a large number of design tokens, and to do stuff like theming by varying which files get included by some mechanism. But that doesn't necesarily require references that point to specific files.

In many cases, the intent is to merge all the source files into a single collection of tokens, and then to do stuff with it. This is how tools like Theo and Style Dictionary have been doing it forever and I'd argue the Resolver Spec is another (more advanced) take on that concept too. Resolving design token references doesn't happen until after that file merging process has been done, so there's no need for them to specify a file that a token is in.

We can debate whether or not that kind of merge-and-then-resolve-references approach is the "best" way to go, but my point is enabling tokens spread across multiple files can be achieved without changing the current reference syntax.

So, considering how big of a breaking change this would be, I wonder if it's worth it.

@drwpow
Copy link
Contributor Author

drwpow commented Jan 13, 2025

@c1rrus great thoughts, thank you! 🙏 I wanted to think on the points you raised for a few days.

In many cases, the intent is to merge all the source files into a single collection of tokens, and then to do stuff with it. This is how tools like Theo and Style Dictionary have been doing it forever and I'd argue the Resolver Spec is another (more advanced) take on that concept too. Resolving design token references doesn't happen until after that file merging process has been done, so there's no need for them to specify a file that a token is in.

This is a great callout, and I think really gets at the mental model of how people think about the DTCG spec now, how people want to think about it, etc. So this is exactly the conversation I wanted to have with this proposal, even if it doesn’t move forward.

Specifically “Resolving design token references doesn't happen until after that file merging process has been done” can be preserved with this proposal with no changes. Again, see the comment from the JSON spec:

Attempting to remove all references and produce a single schema document does not, in all cases, produce a schema with identical behavior to the original form.

I interpret this as working exactly as aliases work today—you don’t have to preresolve those preemptively, and you can (and should) preserve all the pointers as actual pointers (not deep clones of the data). I do realize the spec changes as-written are confusing about that, so I should update the examples. But this can work the exact same as aliases do. And leaning on how JSON pointers are used today, the filenames themselves are negligible, again, they can work the same. It just removes the restriction of keeping tokens in files by necessity and moves it to a pattern where you can put any token in any file by choice. You can point to tokens in the same file, or remote files. Up to you.

Yes, the current syntax requires a little bit of string parsing to detect a reference versus an actual value, but I'm not convinced that if ( token.$value.charAt(0) === '{' ) is any more difficult than if ( typeof token.$value === 'object' && token.$value.$ref !== undefined ).

Yeah performance wasn’t a primary motivator of the change, but this is mostly only true in JavaScript. I agree with you—it is fairly-trivial to parse and typecast on-the-fly. But in other languages that rely more heavily on fixed structures and memory allocation, leaning on structural type discrimination (like serde for Rust) is more ergonomic than string parsing and pattern-matching. Again, not to mention the fact that for JSON $refs, every language already has existing tooling to understand, parse, and resolve these.


As an aside, after writing this I did think about the relation between this and the Resolver Spec more, and I reached the conclusion the Resolver Spec in its current form is sufficient, and whether we accept or reject this proposal won’t change that. The problems I also thought I would hit in the Resolver Spec ended up not being problems at all when I worked through a complex example!

So all that to say I’d like to evaluate this solely on the basis of:

  1. Does this solve problems for the current state of DTCG, and DTCG alone? (I think so)
  2. Does this “stand on the shoulders of giants” and use familiar, prior art? (I think so)

And we may come to the conclusion “no” and that’s OK with me. I’m happy whether this is accepted or rejected; so long as we evaluated it.

@markacianfrani
Copy link

As a tool author, it is very difficult to reliably tell if a given spec is a single file vs. multi file with remote refs. I don't work with JSON Schema enough but there's enough prior art that I could find easily find a solution within 5 minutes so this looks pretty good.

At the same time, parsing "{" also wasn't that big of a pain point for me. And it does push us farther from human readable but I think that ship has sailed.

I also would encourage more breaking changes. As a consumer, I understand the (draft) contract.

```css
:root {
--color-palette-black: #000000;
--color-text-primary: var(--color-palette-black);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I also updated this earlier section introducing the term “Alias,” and used CSS variables rather than Sass variables. I think it better preserves this idea we want to keep—of aliases carrying through to runtime better. Plus I think in 2024, CSS variables are more widely-used than Sass variables (completely guessing; I have no data to back this up).

{
"color": {
"blue": {
"4": { "$value": "#218bff", "$type": "color" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: color syntax is waiting on #257, but can be updated here if necessary. It’s unrelated to this proposal.

@drwpow
Copy link
Contributor Author

drwpow commented Jan 14, 2025

As a tool author, it is very difficult to reliably tell if a given spec is a single file vs. multi file with remote refs. I don't work with JSON Schema enough but there's enough prior art that I could find easily find a solution within 5 minutes so this looks pretty good.

That’s a great point, and it is a little bit of a papercut for sure—you don’t always know the depth of schemas that refer to other schemas until you try resolving it. But this is also a common problem that has over a decade of prior art and workarounds, too, so it’s not new.

This goes beyond this PR, but in many setups today, people are already using multiple token files for their design system. And they have to manage and describe that “meta” layer that determines how everything fits together. By using JSON pointers ($refs), some people MAY want to just have a single document that describes their entire token structure, and offload maintaining/building that meta layer. This gives them the option to do so (but is not a requirement).

Of course, there are also more complex usecases with modes, etc., that will be solved by the still-in-progress Resolver proposal (that I have seen a preview, and am a big fan of!). So if we accept JSON $refs, we don’t require people to use a specific structure, or set them up in a way the Resolver proposal becomes harder-to-use. We just give people more options to organize tokens in a way that works best for them.

@kaelig
Copy link
Member

kaelig commented Jan 22, 2025

@drwpow walked me through the PR, and I grilled him on various aspects to ensure we weren't missing anything. I feel comfortable with this proposal, as it helps with adoption and tool creation (JSON ref libraries already exist for several languages).

The only possible issue is that we'd no longer support "inline aliases" interpolated in a value. For example: {spacing.inline.small} 1em 0 {spacing.inline.small}. I don't know if this is a limitation we want to embrace or if there are legitimate use-cases where interpolation is needed. Editors & community, please advise!

@drwpow
Copy link
Contributor Author

drwpow commented Feb 3, 2025

Started playing around with an implementation in Terrazzo, and I realized there’s an error with the proposal that needs to be corrected. And it may affect some peoples’ opinions about the approach:

1. $value shortcuts

The way this is described, both would essentially be the same:

{
  "color-blue-500": { "$type": "color", "$value": "#218bff" },
  "color-action-bg": { "$ref": "#/color-blue-500" }
}
{
  "color-blue-500": { "$type": "color", "$value": "#218bff" },
  "color-action-bg": { "$type": "color", "$value": { "$ref": "#/color-blue-500/$value" } }
}

The difference in the latter is the $value MUST be required, otherwise the reference would be invalid (the value for a color token couldn’t be a token itself). But that’s a little bit of extra boilerplate, isn’t it? We could take 2 approaches

Solution 1 (recommended): Make the $value explicitly required (simple, correct, but verbose)

We could simply require the full value: "$value": { "$ref": "#/color-blue-500/$value" }. This is correct. And it’s requiring precision because it’s necessary. This is efficient and has no downsides; just has some repetition in code.

Solution 2: Allow reserved words (more complex, more ambiguous, but less verbose)

Or we could just say “when a $ref key is on a reserved key ($value, $type, $description, etc.), and the reference resolves to have that same reserved word, pretend the alias points there, e.g.:

  • "$value": { "$ref": "#/color-blue-500" }: color-blue-500 also has $value so it can be shortened
  • "$value": { "$ref": "#/color-blue-500/$value" }: still allowed because it’s a full pointer
  • "$description": { "$ref": "#/color-blue-500" }: error because color-blue-500 doesn’t have a $description key defined

The downside here is this breaks JSON pointers somewhat—altering their behavior. It could likely lead to errors unless this behavioral quirk is known and replicated in all tools.

From a technical standpoint there’s nothing wrong with this approach, i.e. no way I can think of this would break or cause unwanted effects. Just increases burden on toolmakers to make sure they implement this extra little bit of logic.

2. Alias boundaries

Not an “error” per se, but an omission that needs to be clarified, is what, then, counts as an alias? For example, if we were to say “color-action-bg aliases color-blue-500,” which of the following code would still keep that statement true?

  1. "color-action-bg": { "$ref": "#/color-blue-500" } (whole-token)
  2. "color-action-bg": { "$value": { "$ref": "#/color-blue-500/$value" } } ($value only)
  3. "color-action-bg": { "$description": { "$ref": "#/color-blue-500/$description" } } ($description, or any other non-$value property)

The difference here is aliases can’t be “flattened” until the very last step under the current language, and the reference must be preserved. But if users can compose any part of their schema, what constitutes an “alias” now?

Solutions

How should we amend the language?

  1. (Recommended) A token is an alias of another token if its $value points to another token’s, e.g.:

    • "color-action-bg": { "$ref": "#/color-blue-500" }: this is an alias because $value is aliased by proxy of the entire object pointing to another token
    • "color-action-bg": { "$value": { "$ref": "#/color-blue-500/$value" } }: this is an alias because $value directly points to another token’s
    • "color-action-bg": { "$description": { "$ref": "#/color-blue-500/$description" } }: this is NOT an alias because $value does not point to another token
  2. A token is only an alias of another token if the entire token points to another token AND $value is not overridden, e.g.:

    • "color-action-bg": { "$ref": "#/color-blue-500", "$description": "Action BG color" }: this is an alias because the entire token points to another token AND $value is not overridden (even if some other properties are)
    • "color-action-bg": { "$ref": "#/color-blue-500", "$value": "#218bff" }: this is NOT an alias because even though the token originally pointed to another token, $value was overridden making this a unique token.
    • "color-action-bg": { "$value": { "$ref": "#/color-blue-500/$value" } }: this is NOT an alias because the entire token object doesn’t point to another token
  3. A token is only an alias of another token if $value is aliased directly, e.g.:

    • "color-action-bg": { "$ref": "#/color-blue-500" }: this is NOT an alias because it is happening one level up from $value
    • "color-action-bg": { "$value": { "$ref": "#/color-blue-500/$value" } }: this is an alias because $value directly points to another token’s
    • "color-action-bg": { "$description": { "$ref": "#/color-blue-500/$description" } }: this is NOT an alias because $value does not point to another token

Out of all the approaches, #3 would be beneficial being the easiest to statically-analyze. You could tell which tokens are aliases without having to resolving any pointers. But #3 would be the most restrictive, too, preventing people from aliasing entire groups like other methods could.

As the spec proposer, just for the sake of argument, I would probably stick with #1 where even though it’s hard to statically-analyze the number of final tokens without doing work resolving everything, that seems like a worthwhile tradeoff to give schema authors more raw power to generate tokens and create aliases more freely without restrictions. In other words, I don’t really know if it’s advantageous to make token counts easier up front, especially considering the aliases have to be resolved one way or another, and all approaches are the same amount of work in the end.


I’ll update the PR description with Solution 1 and Solution 1 respectively as my rough proposals, but I could be easily swayed 🙂. Would really just love thoughts in general and poking holes in this more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants