Skip to content

Commit

Permalink
Updating data provider docs (#3740)
Browse files Browse the repository at this point in the history
  • Loading branch information
robertbastian authored Aug 8, 2023
1 parent 7dd1460 commit 6749f02
Show file tree
Hide file tree
Showing 9 changed files with 217 additions and 140 deletions.
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions components/icu/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ icu_relativetime = { version = "0.1.0", path = "../../experimental/relativetime"
icu_provider = { version = "1.2.0", path = "../../provider/core", default-features = false }

[dev-dependencies]
icu_datetime = { version = "1.2.0", path = "../datetime", features = ["serde"] }
icu_provider_adapters = { path = "../../provider/adapters", features = ["serde"] }
icu_provider_blob = { path = "../../provider/blob" }
writeable = { path = "../../utils/writeable" }

[features]
Expand Down
121 changes: 70 additions & 51 deletions components/icu/README.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

121 changes: 70 additions & 51 deletions components/icu/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,83 +18,100 @@
//!
//! # Data Management
//!
//! Most functionality relies on data which clients have to provide to the APIs.
//!
//! `ICU4X` uses the concept of a [`DataProvider`] to separate data from logic.
//! Data providers come in many different forms; the following providers are provided
//! by `ICU4X` in separate crates:
//! * [`BlobDataProvider`]: uses an in-memory serde-serialized blob. This is the most flexible provider, and
//! data can be updated at runtime.
//! * `BakedDataProvider`: a code-generated provider that contains the data directly in Rust code. This is
//! the most efficient provider as it's serialization-free, and allows for compile-time optimizations.
//! * [`FsDataProvider`]: uses a file system tree of Serde files. This is mostly useful for development and
//! not recommended in production for performance reasons.
//! * [`icu_provider_adapters`]: this crate contains APIs to combine providers or
//! provide additional functionality such as locale fallback.
//! Most internationalization algorithms are data-driven based on surveys of locale experts.
//! ICU4X offers multiple ways to manage locale data: many clients can start by using the
//! extensive data compiled into the library, while users with additional requirements can
//! provide data explicitly using [`DataProvider`]s.
//!
//! The data that is required by these providers (in `BakedDataProvider`'s case, the provider itself) can be
//! generated and customized using the [`icu_datagen`] crate.
//! ## Compiled data
//!
//! # Example
//! Compiled data is exposed through idiomatic Rust constructors like `new` or `try_new`:
//!
//! ```
//! use icu::calendar::DateTime;
//! use icu::datetime::{options::length, DateTimeFormatter};
//! use icu::datetime::DateTimeFormatter;
//! use icu::locid::locale;
//! use writeable::assert_writeable_eq;
//!
//! let options = length::Bag::from_date_time_style(
//! length::Date::Long,
//! length::Time::Medium,
//! )
//! .into();
//!
//! let dtf = DateTimeFormatter::try_new(
//! &locale!("es").into(),
//! options,
//! &locale!("es-US").into(),
//! Default::default(),
//! )
//! .expect("Failed to create DateTimeFormatter instance.");
//! .expect("compiled data should include 'es-US'");
//! ```
//!
//! Clients using compiled data benefit from simple code and optimal zero-cost data loading. Additionally,
//! ICU4X's APIs are designed such that dead-code elimination can optimize away unused compiled data.
//!
//! By default, most of the data available in [CLDR] is included. Users can customize data by using
//! the `icu4x-datagen` tool (with the `-⁠-format mod` flag) to, for example, select a smaller set of
//! locales, and then compiling with the `ICU4X_DATA_DIR` variable.
//!
//! ## Explicit data
//!
//! Powerful data management is possible with [`DataProvider`]s, which are passed to ICU4X APIs via
//! special constructors:
//!
//! ```no_run
//! use icu::datetime::DateTimeFormatter;
//! use icu::locid::locale;
//! use icu_provider_adapters::fallback::LocaleFallbackProvider;
//! use icu_provider_blob::BlobDataProvider;
//!
//! let date = DateTime::try_new_iso_datetime(2020, 9, 12, 12, 35, 0)
//! .expect("Failed to parse date.");
//! let date = date.to_any();
//! let data: Box<[u8]> = todo!();
//!
//! let formatted_date = dtf.format(&date).expect("Formatting failed");
//! assert_writeable_eq!(formatted_date, "12 de septiembre de 2020, 12:35:00");
//! let provider = BlobDataProvider::try_new_from_blob(data).expect("data should be valid");
//!
//! let formatted_date_string =
//! dtf.format_to_string(&date).expect("Formatting failed");
//! assert_eq!(formatted_date_string, "12 de septiembre de 2020, 12:35:00");
//! let provider = LocaleFallbackProvider::try_new_with_buffer_provider(provider).expect("provider should include fallback data");
//!
//! let dtf = DateTimeFormatter::try_new_with_buffer_provider(
//! &provider,
//! &locale!("es-US").into(),
//! Default::default(),
//! )
//! .expect("data should include 'es-US', 'es', or 'und'");
//! ```
//!
//! Explicit data management can be used if the compiled-data constructors are too limiting. It allows:
//! * Accessing data without fallback
//! * Custom [`DataProvider`]s backed by sources like the operating system
//! * Lazily loading or updating data from I/O
//! * Composing data providers from different sources
//! * Manually including/excluding data
//! * ... and more. See our [data management tutorial]
//!
//! The following [`DataProvider`]s are available in separate crates:
//! * [`BlobDataProvider`]: deserializes data from an in-memory blob, which can be updated at runtime.
//! * `BakedDataProvider`: a code-generated provider that contains the data directly in Rust code. This is the
//! same provider that is used internally by compiled data.
//! * [`FsDataProvider`]: uses a file system tree of Serde files. This is mostly useful for development and
//! not recommended in production for performance reasons.
//! * [`icu_provider_adapters`]: this crate contains provider adapters to combine providers,
//! provide additional functionality such as locale fallback, and more.
//!
//! The data that is required by these providers (in `BakedDataProvider`'s case, the provider itself) can be
//! generated and customized using the [`icu4x-datagen`] tool.
//!
//! # Features
//!
//! ICU4X components share a set of common Cargo features that control whether core pieces of
//! functionality are compiled. These features are:
//!
//! - `std`: Whether to include `std` support. Without this Cargo feature, `icu` is `#[no_std]`-compatible
//! - `serde`: Whether to include `serde::Deserialize` implementations for data structs, such as [`SymbolsV1`],
//! and `serde::{Serialize, Deserialize}` implementations for core library types, such as [`Locale`]. These are
//! required with `serde`-backed providers like [`BlobDataProvider`][^1].
//! - `compiled_data` (default): Whether to include compiled data. Without this flag, only constructors with
//! explicit `provider` arguments are available.
//! - `std`: Whether to include `std` support. Without this Cargo feature, `icu` is `#[no_std]`-compatible.
//! - `sync`: makes most ICU4X objects implement `Send + Sync`. Has a small performance impact when used with non-static data.
//! - `logging`: Enables logging through the `log` crate.
//! - `serde`: Activates `serde` implementations for core library types, such as [`Locale`], as well
//! as `*_with_buffer_provider` constructors for explicit data management.
//! - `experimental`: Whether to enable experimental preview features. Modules enabled with
//! this feature may not be production-ready and could change at any time.
//!
//! The following Cargo features are only available on the individual crates, but not on this meta-crate:
//!
//! - `datagen`: Whether to implement `serde::Serialize` and functionality that is only required during data generation.
//! - `datagen`: Whether to implement functionality that is only required during data generation.
//! - `bench`: Whether to enable exhaustive benchmarks. This can be enabled on individual crates
//! when running `cargo bench`.
//!
//! There are additional features that, when enabled on specific crates, enable functionality across ICU4X:
//!
//! - `icu_provider/sync`: makes [`DataPayload`] implement `Send + Sync`, which in turn
//! makes most ICU4X objects also implement `Send + Sync`.
//! - `icu_provider/deserialize_*`: enables ICU4X buffer providers to read various different
//! serialization formats. See [`BufferProvider`](icu_provider::BufferProvider) for details.
//!
//! [^1]: If using blob data, you need to enable one of the deserialization Cargo features on the `icu_provider` crate, as noted above.
//!
//!
//! [CLDR]: http://cldr.unicode.org/
//! [`DataProvider`]: icu_provider::DataProvider
//! [`DataPayload`]: icu_provider::DataPayload
//! [`FsDataProvider`]: https://docs.rs/icu_provider_fs/latest/icu_provider_fs/struct.FsDataProvider.html
Expand All @@ -103,6 +120,8 @@
//! [`icu_datagen`]: https://docs.rs/icu_datagen/latest/icu_datagen/
//! [`Locale`]: crate::locid::Locale
//! [`SymbolsV1`]: crate::decimal::provider::DecimalSymbolsV1
//! [`icu4x-datagen`]: https://docs.rs/icu_datagen/latest/icu_datagen/
//! [data management tutorial]: https://github.com/unicode-org/icu4x/blob/main/docs/tutorials/data_provider.md#loading-additional-data-at-runtime
// https://github.com/unicode-org/icu4x/blob/main/docs/process/boilerplate.md#library-annotations
#![cfg_attr(not(any(test, feature = "std")), no_std)]
Expand Down
5 changes: 3 additions & 2 deletions provider/core/src/buf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ impl DataMarker for BufferMarker {
/// A data provider that returns opaque bytes.
///
/// Generally, these bytes are expected to be deserializable with Serde. To get an object
/// implementing [`DataProvider`] via Serde, use [`as_deserializing()`], which requires
/// enabling at least one of the deserialization Cargo features:
/// implementing [`DataProvider`] via Serde, use [`as_deserializing()`].
///
/// Passing a `BufferProvider` to a `*_with_buffer_provider` constructor requires enabling
/// the deserialization Cargo feature for the expected format(s):
/// - `deserialize_json`
/// - `deserialize_postcard_1`
/// - `deserialize_bincode_1`
Expand Down
Loading

0 comments on commit 6749f02

Please sign in to comment.