Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating data provider docs #3740

Merged
merged 10 commits into from
Aug 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions components/icu/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ icu_relativetime = { version = "0.1.0", path = "../../experimental/relativetime"
icu_provider = { version = "1.2.0", path = "../../provider/core", default-features = false }

[dev-dependencies]
icu_datetime = { version = "1.2.0", path = "../datetime", features = ["serde"] }
icu_provider_adapters = { path = "../../provider/adapters", features = ["serde"] }
icu_provider_blob = { path = "../../provider/blob" }
writeable = { path = "../../utils/writeable" }

[features]
Expand Down
121 changes: 70 additions & 51 deletions components/icu/README.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

121 changes: 70 additions & 51 deletions components/icu/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,83 +18,100 @@
//!
//! # Data Management
//!
//! Most functionality relies on data which clients have to provide to the APIs.
//!
//! `ICU4X` uses the concept of a [`DataProvider`] to separate data from logic.
//! Data providers come in many different forms; the following providers are provided
//! by `ICU4X` in separate crates:
//! * [`BlobDataProvider`]: uses an in-memory serde-serialized blob. This is the most flexible provider, and
//! data can be updated at runtime.
//! * `BakedDataProvider`: a code-generated provider that contains the data directly in Rust code. This is
//! the most efficient provider as it's serialization-free, and allows for compile-time optimizations.
//! * [`FsDataProvider`]: uses a file system tree of Serde files. This is mostly useful for development and
//! not recommended in production for performance reasons.
//! * [`icu_provider_adapters`]: this crate contains APIs to combine providers or
//! provide additional functionality such as locale fallback.
//! Most internationalization algorithms are data-driven based on surveys of locale experts.
//! ICU4X offers multiple ways to manage locale data: many clients can start by using the
//! extensive data compiled into the library, while users with additional requirements can
//! provide data explicitly using [`DataProvider`]s.
//!
//! The data that is required by these providers (in `BakedDataProvider`'s case, the provider itself) can be
//! generated and customized using the [`icu_datagen`] crate.
//! ## Compiled data
//!
//! # Example
//! Compiled data is exposed through idiomatic Rust constructors like `new` or `try_new`:
//!
//! ```
//! use icu::calendar::DateTime;
//! use icu::datetime::{options::length, DateTimeFormatter};
//! use icu::datetime::DateTimeFormatter;
//! use icu::locid::locale;
//! use writeable::assert_writeable_eq;
//!
//! let options = length::Bag::from_date_time_style(
//! length::Date::Long,
//! length::Time::Medium,
//! )
//! .into();
//!
//! let dtf = DateTimeFormatter::try_new(
//! &locale!("es").into(),
//! options,
//! &locale!("es-US").into(),
//! Default::default(),
//! )
//! .expect("Failed to create DateTimeFormatter instance.");
//! .expect("compiled data should include 'es-US'");
//! ```
//!
//! Clients using compiled data benefit from simple code and optimal zero-cost data loading. Additionally,
//! ICU4X's APIs are designed such that dead-code elimination can optimize away unused compiled data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps mention that this is per data type not per locale

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to get into defining what a data key is here.

//!
//! By default, most of the data available in [CLDR] is included. Users can customize data by using
//! the `icu4x-datagen` tool (with the `-⁠-format mod` flag) to, for example, select a smaller set of
//! locales, and then compiling with the `ICU4X_DATA_DIR` variable.
//!
//! ## Explicit data
//!
//! Powerful data management is possible with [`DataProvider`]s, which are passed to ICU4X APIs via
//! special constructors:
//!
//! ```no_run
//! use icu::datetime::DateTimeFormatter;
//! use icu::locid::locale;
//! use icu_provider_adapters::fallback::LocaleFallbackProvider;
//! use icu_provider_blob::BlobDataProvider;
//!
//! let date = DateTime::try_new_iso_datetime(2020, 9, 12, 12, 35, 0)
//! .expect("Failed to parse date.");
//! let date = date.to_any();
//! let data: Box<[u8]> = todo!();
//!
//! let formatted_date = dtf.format(&date).expect("Formatting failed");
//! assert_writeable_eq!(formatted_date, "12 de septiembre de 2020, 12:35:00");
//! let provider = BlobDataProvider::try_new_from_blob(data).expect("data should be valid");
//!
//! let formatted_date_string =
//! dtf.format_to_string(&date).expect("Formatting failed");
//! assert_eq!(formatted_date_string, "12 de septiembre de 2020, 12:35:00");
//! let provider = LocaleFallbackProvider::try_new_with_buffer_provider(provider).expect("provider should include fallback data");
//!
//! let dtf = DateTimeFormatter::try_new_with_buffer_provider(
//! &provider,
//! &locale!("es-US").into(),
//! Default::default(),
//! )
//! .expect("data should include 'es-US', 'es', or 'und'");
//! ```
//!
//! Explicit data management can be used if the compiled-data constructors are too limiting. It allows:
//! * Accessing data without fallback
//! * Custom [`DataProvider`]s backed by sources like the operating system
//! * Lazily loading or updating data from I/O
//! * Composing data providers from different sources
//! * Manually including/excluding data
//! * ... and more. See our [data management tutorial]
//!
//! The following [`DataProvider`]s are available in separate crates:
//! * [`BlobDataProvider`]: deserializes data from an in-memory blob, which can be updated at runtime.
//! * `BakedDataProvider`: a code-generated provider that contains the data directly in Rust code. This is the
//! same provider that is used internally by compiled data.
//! * [`FsDataProvider`]: uses a file system tree of Serde files. This is mostly useful for development and
//! not recommended in production for performance reasons.
//! * [`icu_provider_adapters`]: this crate contains provider adapters to combine providers,
//! provide additional functionality such as locale fallback, and more.
//!
//! The data that is required by these providers (in `BakedDataProvider`'s case, the provider itself) can be
//! generated and customized using the [`icu4x-datagen`] tool.
//!
//! # Features
//!
//! ICU4X components share a set of common Cargo features that control whether core pieces of
//! functionality are compiled. These features are:
//!
//! - `std`: Whether to include `std` support. Without this Cargo feature, `icu` is `#[no_std]`-compatible
//! - `serde`: Whether to include `serde::Deserialize` implementations for data structs, such as [`SymbolsV1`],
//! and `serde::{Serialize, Deserialize}` implementations for core library types, such as [`Locale`]. These are
//! required with `serde`-backed providers like [`BlobDataProvider`][^1].
//! - `compiled_data` (default): Whether to include compiled data. Without this flag, only constructors with
//! explicit `provider` arguments are available.
//! - `std`: Whether to include `std` support. Without this Cargo feature, `icu` is `#[no_std]`-compatible.
//! - `sync`: makes most ICU4X objects implement `Send + Sync`. Has a small performance impact when used with non-static data.
//! - `logging`: Enables logging through the `log` crate.
//! - `serde`: Activates `serde` implementations for core library types, such as [`Locale`], as well
//! as `*_with_buffer_provider` constructors for explicit data management.
//! - `experimental`: Whether to enable experimental preview features. Modules enabled with
//! this feature may not be production-ready and could change at any time.
//!
//! The following Cargo features are only available on the individual crates, but not on this meta-crate:
//!
//! - `datagen`: Whether to implement `serde::Serialize` and functionality that is only required during data generation.
//! - `datagen`: Whether to implement functionality that is only required during data generation.
//! - `bench`: Whether to enable exhaustive benchmarks. This can be enabled on individual crates
//! when running `cargo bench`.
//!
//! There are additional features that, when enabled on specific crates, enable functionality across ICU4X:
//!
//! - `icu_provider/sync`: makes [`DataPayload`] implement `Send + Sync`, which in turn
//! makes most ICU4X objects also implement `Send + Sync`.
//! - `icu_provider/deserialize_*`: enables ICU4X buffer providers to read various different
//! serialization formats. See [`BufferProvider`](icu_provider::BufferProvider) for details.
//!
//! [^1]: If using blob data, you need to enable one of the deserialization Cargo features on the `icu_provider` crate, as noted above.
//!
//!
//! [CLDR]: http://cldr.unicode.org/
//! [`DataProvider`]: icu_provider::DataProvider
//! [`DataPayload`]: icu_provider::DataPayload
//! [`FsDataProvider`]: https://docs.rs/icu_provider_fs/latest/icu_provider_fs/struct.FsDataProvider.html
Expand All @@ -103,6 +120,8 @@
//! [`icu_datagen`]: https://docs.rs/icu_datagen/latest/icu_datagen/
//! [`Locale`]: crate::locid::Locale
//! [`SymbolsV1`]: crate::decimal::provider::DecimalSymbolsV1
//! [`icu4x-datagen`]: https://docs.rs/icu_datagen/latest/icu_datagen/
//! [data management tutorial]: https://github.com/unicode-org/icu4x/blob/main/docs/tutorials/data_provider.md#loading-additional-data-at-runtime

// https://github.com/unicode-org/icu4x/blob/main/docs/process/boilerplate.md#library-annotations
#![cfg_attr(not(any(test, feature = "std")), no_std)]
Expand Down
5 changes: 3 additions & 2 deletions provider/core/src/buf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ impl DataMarker for BufferMarker {
/// A data provider that returns opaque bytes.
///
/// Generally, these bytes are expected to be deserializable with Serde. To get an object
/// implementing [`DataProvider`] via Serde, use [`as_deserializing()`], which requires
/// enabling at least one of the deserialization Cargo features:
/// implementing [`DataProvider`] via Serde, use [`as_deserializing()`].
///
/// Passing a `BufferProvider` to a `*_with_buffer_provider` constructor requires enabling
/// the deserialization Cargo feature for the expected format(s):
/// - `deserialize_json`
/// - `deserialize_postcard_1`
/// - `deserialize_bincode_1`
Expand Down
Loading