-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exotic Types as fields of data provider structs #523
Comments
Notes from 2021-03-09:
Decision: Exotic types in data structs are OK. Add some documentation in the style guide explaining when to use them. |
After #539 is implemented, I would like to spend some time profiling code size. I have trouble seeing how the compiler could know that it can eliminate the |
In documenting this, also discuss expectations for validation of user input, which will be necessary in Serde impls of exotic types. |
In #480, @gregtatum is proposing that we put an exotic type (in this case
Pattern
) in the data struct definition, and I expect @echeran will do something similar in enumerated properties. This is novel in the data provider architecture, and I want to discuss the implications.Definition: When I say "exotic type", I mean an opaque type defined by ICU4X that implements Serde in a non-obvious way. I do not consider
TinyStr
andLiteMap
to be exotic types, because they are serialization-compatible withString
andHashMap
.Advantages of Exotic Types
When we put exotic types in data structs, it means that the serialized form is more opaque to the reader of the code. If we used only simple types in the data structs, it makes it easier to reason about the lifecycle of data structs.
However, as Greg pointed out in #480, putting exotic types in the data provider means that deserialization into those exotic types happens during data loading rather than in the constructor, which is a nice advantage that is consistent with the "pre-processing of data" design goal I outlined in data_provider.md.
On the other hand, we could design data structs to contain "low-level" representations of ICU4X concepts, like the list of tokens in a
Pattern
rather than the CLDR syntax for aPattern
. This would resolve the "pre-processing of data" issue, but it may tie our hands a bit more by encoding the internal representation of types into the data struct.Stability
As a reminder, the reason we have versioned data keys and data structs (like
plurals/ordinal@1
andPluralRulesV1
) is so that a data file from a newer ICU4X can power code from an older ICU4X (and vice-versa as much as possible), as discussed in data_pipeline.md.We need to think about what implications exotic types have on data struct stability. For example, if you have a struct like
FooV1 { bar: Bar }
, then the serialized form ofBar
must also be stable.Here are some implications to start with:
#[non_exhaustive]
can work for fields of data structs, because if a new variant gets added to an enum, for example, a newer data file, which might have the new enum variant, won't be able to power an older ICU4X client, which doesn't know how to interpret that variant.Portability
Currently, the Rust structs are the source of truth for the schema of the serialized files. However, I want us to keep the door open to using a cross-platform declarative syntax like JSON Schema or YANG to define the data struct schemas.
In other words, the serialized form of exotic types should be rooted in the building blocks of modern data markup languages: strings, numbers, arrays, and maps, with possible restrictions on those types (e.g., a string that is 1 character long, or a number that is in a certain range).
Portability is also an argument in favor of making the deserialization step as simple as possible. Going back to the
Pattern
exotic type, if I wanted to implement a data provider in JavaScript that serves a data struct across the FFI boundary, I would have to implement CLDR pattern parsing code.The text was updated successfully, but these errors were encountered: