Skip to content
This repository has been archived by the owner on Mar 29, 2024. It is now read-only.

OmnICU API surface #15

Closed
sffc opened this issue Jan 23, 2020 · 9 comments
Closed

OmnICU API surface #15

sffc opened this issue Jan 23, 2020 · 9 comments

Comments

@sffc
Copy link
Member

sffc commented Jan 23, 2020

I made a doc discussing the API surface for OmnICU in several host languages. I have a section about Rust.

https://docs.google.com/document/d/1tXACn0p2EuzSCJ0Gd8Nd1RV-DgdYG54N6lDwqYeQE4Q/edit#

One big open question is about named arguments delivery. In ECMAScript, we just use object literals and it's easy. In Rust, I suggested two options:

  1. Struct that implements the Default trait
  2. Builder pattern

Thoughts?

@zbraniecki
Copy link
Member

I'm in favor of (1).

I just finished implementing custom types and formatters for fluent-rs and here's an example of how a (custom) DateTime looks with an option bag that will be extended to match ECMA402:

https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/tests/custom_types.rs#L51-L124

I think that a call like:

DateTime::new(41312314141, DateTimeOptions {
  date_style: DateTimeStyleValues::Full,
  time_style: DateTimeStyleValues::Medium,
  hour_cycle: DateTimeStyleHourCycles::H24
});

is more rusty than a builder pattern. It also allows us to do more with the options struct - clone, serialize, parse from JSON, etc.

@sffc
Copy link
Member Author

sffc commented Jan 24, 2020

@filmil suggested another alternative: an array of enums. Something like this:

mod NumberFormat {
    #[derive(Debug)]
    pub enum Opt<'a> {
        style(Style),
        currency(&'a str),
        maximumFractionDigits(i32),
    }
    
    pub fn build(locale: &str, options: &[Opt]) {
        print!("{:?} {:?}", locale, options);
    }
}

fn main() {
    NumberFormat::build("pt-BR", &[
        NumberFormat::Opt::style(NumberFormat::Style::CURRENCY),
        NumberFormat::Opt::currency("EUR"),
    ]);
}

@zbraniecki
Copy link
Member

zbraniecki commented Jan 24, 2020

That seems semantically weak. You need to handle all cases of:

    NumberFormat::build("pt-BR", &[
        NumberFormat::Opt::style(NumberFormat::Style::CURRENCY),
        NumberFormat::Opt::style(NumberFormat::Style::DECIMAL),
    ]);

and eventually, you'll want to parse it into some semantically readable structure with a single style, or iterate over the array each time to find which style it is.

What's the benefit of that over:

    NumberFormat::new("pt-BR", NumberFormat::Options {
        style: NumberFormat::Style::CURRENCY,
        currency: "EUR"
    });

For Options you can also then easily implement From and Into, add Hash, Default, constructors, validators, merging etc.

@zbraniecki
Copy link
Member

I'd also argue that we should do:

    let langid: LanguageIdentifier = "pt-BR".parse().unwrap();

    NumberFormat::new(langid, NumberFormat::Options {
        style: NumberFormat::Style::CURRENCY,
        currency: "EUR"
    });

or have a fallible constructor:

impl NumberFormat {
    pub fn try_new(loc: impl TryInto<LanguageIdentifier>) -> Result<Self, ()>;
}

to allow for parsing of the locale to fail, or the locale to not be available - it's also worth here talking about where the language negotiation should happen. I'm not a huge fan of the implict negotiation, and would prefer an explicit one:

let available_locales = NumberFormat::GetAvailableLocales();
let negotiated = negotiate(requested, available_locales);
let nf = NumberFormat::new(negotiated);

@filmil
Copy link

filmil commented Jan 24, 2020

That seems semantically weak. You need to handle all cases of: [...]

Fair point, and I don't insist on my proposed aprpoach.

Which reminds me: could you recommend other similarly configured APIs that we can take a look and learn from?

@filmil
Copy link

filmil commented Jan 24, 2020

I'd also argue that we should do:

    let langid: LanguageIdentifier = "pt-BR".parse().unwrap();

    NumberFormat::new(langid, NumberFormat::Options {
        style: NumberFormat::Style::CURRENCY,
        currency: "EUR"
    });

If you refer to decoupling langid parsing from formatter initialization, that seems reasonable to me too, as a way to separate the parsing and formatting concerns

or have a fallible constructor:

impl NumberFormat {
    pub fn try_new(loc: impl TryInto<LanguageIdentifier>) -> Result<Self, ()>;
}

IMHO this makes try_new unnecessarily fallible because of a langid issue and you give a reason why.

I'm not a huge fan of the implict negotiation, and would prefer an explicit one:

ECMA402 has an abstract operation SupportedLocales. It looks to me like it's not exposed so that users don't need to worry about the available_locales parameter, making the API surface more compact. I can see how in JS this makes sense. I also like using APIs with few methods to memorize. OTOH we may want less compact APIs on the rust side, so that the users can compose the bits that they do want to use. @sffc do you have data from past experience with the API that would be relevant?

@zbraniecki
Copy link
Member

zbraniecki commented Jan 25, 2020

IMHO this makes try_new unnecessarily fallible because of a langid issue and you give a reason why.

There are three reasons why constructor like this may fail:

  1. If you pass a locale as a string, it may be incorrect locale code.
  2. If you pass a locale that we don't have data for
  3. If your options cannot be resolved within the dataset we have, either due to errors in the options, or in the data

===

  1. The first I targeted with the proposal of separating the locale construction. In particular, with rust and its procedural macros, it should be painless, because while in demos and examples the FooFormater::new("en") is common, in reality you very rarely construct a formatted for a locale from the source code. You usually take the locale code from somewhere and parsing it is easier. In cases where you do need to pass a string, a proc macro makes it perfectly safe: FooFormatter::new(langid!("en")) and now you have a build time parsed locale code :)

  2. The second one is the subject of my negotiation comment. In ICU and ECMA402 the negotiation is performed implicitly - I may request data for locale af-ZH, but will get af or something else on the fallback chain, but basically always it'll return something (correct me if I'm wrong on ICU design!).
    I feel like it's a big magical in result and potentially performance suboptimal.

I'd like to aim for a simple raw low-level API design where negotiation is either explicit, or explicitly visible from the callsite.
I'm talking either FooFormatter::try_new(langid) which may return Err(MissingData) or FooFormatter::negotiate_for_locale(langid) which is infallible and always returns data for some locale, ending on last fallback locale.
That still assumes that some locales data is available. I'm not sure how to handle a scenario where no data is available at all without a fallible constructor.

  1. Last element is the most interesting one. Impossible options combination. In the proposal I used a struct for options, which could contain some constructor validation on its own. A constructor could return an Error if you try to options that are illegal together or ommit fieds that are required with some option.
    Alternatively, we could permit building such a struct, but could expose a method like validate which would evaluate and report if the combination is illegal.
    Yet alternatively, we could allow for any options combinations and only validate them when passed to the formatter's constructor. I like this variant the least since it clusters more operations into a single Result state, so having a way to evaluate options during their construction would be preferable.

Finally, there are errors that come from a failure of combining options with data. Let's say I construct a formatter for ru and lets say I seem to have the ru locale data, but when I start fetching particular information required because of my options for ru I end up with an error - say, somehow building the right date/time pattern failed. I'd like the constructor to return an error in such case.

All those cases make me feel that constructor should be fallible, and then we may consider making format infallible since all the data should be ready and we should have nice control over the input type.

@sffc
Copy link
Member Author

sffc commented Jun 29, 2020

This discussion was continued on unicode-org/icu4x#112 and we decided to move forward with the options struct and default trait.

@sffc sffc closed this as completed Jun 29, 2020
@sffc
Copy link
Member Author

sffc commented Jun 29, 2020

I don't know if the other discussions in this thread are still open. Can you open follow-up issues in the ICU4X repo if necessary?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants