Skip to content

Commit

Permalink
update intro
Browse files Browse the repository at this point in the history
Signed-off-by: Eric Liang <ekhliang@gmail.com>
  • Loading branch information
ericl committed May 16, 2023
1 parent caa4406 commit 866d202
Showing 1 changed file with 3 additions and 5 deletions.
8 changes: 3 additions & 5 deletions doc/source/data/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,14 @@ Ray Data: Scalable Datasets for ML

.. _data-intro:

Ray Data scales common ML data processing patterns that arise in batch inference
and distributed training applications. These problems occur when it becomes necessary to
combine data preprocessing and model computations in the same job. Ray Data does this by providing
Ray Data scales common ML data processing patterns in batch inference
and distributed training applications. Ray Data does this by providing
streaming distributed transformations
such as maps (:meth:`map_batches <ray.data.Dataset.map_batches>`),
global and grouped aggregations (:class:`GroupedData <ray.data.grouped_data.GroupedData>`), and
shuffling operations (:meth:`random_shuffle <ray.data.Dataset.random_shuffle>`,
:meth:`sort <ray.data.Dataset.sort>`,
:meth:`repartition <ray.data.Dataset.repartition>`),
and is compatible with a variety of file formats, data sources, and distributed frameworks.
:meth:`repartition <ray.data.Dataset.repartition>`).

Read on for an overview of the main use cases and operations supported by Ray Data.

Expand Down

0 comments on commit 866d202

Please sign in to comment.