Skip to content

Commit

Permalink
[train/docs] Restructure Ray Train docs with framework-specific guides (
Browse files Browse the repository at this point in the history
ray-project#37892)

This PR restructures the Ray Train docs to better mimic typical user journeys.

Primarily, we restructure the guides to be grouped by frameworks. Previously, we grouped by tasks (e.g. training, data loading, checkpointing) and had (tabbed) examples for some of the frameworks. Now, we group by framework on the first level and by task on the second level. The idea here is that users of e.g. PyTorch don't actually care about how things are done for XGBoost - they just want to be successful with training their PyTorch models.

This PR emphasizes support for PyTorch, which is guided by user feedback showing that PyTorch and related libraries are most commonly used.

Lastly, this PR also declutters the Ray Train documentation by removing duplicates (e.g. we had 4 different "quick start" examples for PyTorch before).

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
  • Loading branch information
3 people authored and arvind-chandra committed Aug 31, 2023
1 parent ff723c1 commit f6015b7
Show file tree
Hide file tree
Showing 54 changed files with 1,810 additions and 2,221 deletions.
13 changes: 13 additions & 0 deletions doc/source/_static/js/custom.js
Original file line number Diff line number Diff line change
Expand Up @@ -37,20 +37,33 @@ document.addEventListener("DOMContentLoaded", function() {
let navItem = navItems[i];
const stringList = [
"User Guides", "Examples",
// Ray Core
"Ray Core", "Ray Core API",
"Ray Clusters", "Deploying on Kubernetes", "Deploying on VMs",
"Applications Guide", "Ray Cluster Management API",
// Ray AIR
"Ray AIR API",
// Ray Data
"Ray Data", "Ray Data API", "Integrations",
// Ray Train
"Ray Train", "Ray Train API",
"Distributed PyTorch", "Advanced Topics", "More Frameworks",
"Ray Train Internals",
// Ray Tune
"Ray Tune", "Ray Tune Examples", "Ray Tune API",
// Ray Serve
"Ray Serve", "Ray Serve API",
"Production Guide", "Advanced Guides",
"Deploy Many Models",
// Ray RLlib
"Ray RLlib", "Ray RLlib API",
// More libraries
"More Libraries", "Ray Workflows (Alpha)",
// Monitoring/debugging
"Monitoring and Debugging",
// References
"References", "Use Cases",
// Developer guides
"Developer Guides", "Getting Involved / Contributing",
];

Expand Down
46 changes: 26 additions & 20 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,29 +59,35 @@ parts:
- file: train/train
title: Ray Train
sections:
- file: train/getting-started
title: "Getting Started"
- file: train/key-concepts
title: "Key Concepts"
- file: train/user-guides
title: "User Guides"
- file: train/distributed-pytorch
sections:
- file: train/distributed-pytorch/converting-existing-training-loop
- file: train/distributed-pytorch/data-loading-preprocessing
- file: train/distributed-pytorch/using-gpus
- file: train/distributed-pytorch/persistent-storage
title: Configuring Persistent Storage
- file: train/distributed-pytorch/monitoring-logging
- file: train/distributed-pytorch/checkpoints
- file: train/distributed-pytorch/experiment-tracking
- file: train/distributed-pytorch/fault-tolerance
- file: train/distributed-pytorch/advanced
sections:
- file: train/distributed-pytorch/reproducibility
- file: train/distributed-pytorch/automatic-mixed-precision
- file: train/distributed-pytorch/hyperparameter-optimization
title: Hyperparameter optimization
- file: train/more-frameworks
sections:
- file: train/distributed-tensorflow-keras
- file: train/distributed-xgboost-lightgbm
- file: train/horovod
- file: train/internals/index
sections:
- file: train/config_guide
title: "Configuring Ray Train"
- file: train/dl_guide
title: "Deep Learning Guide"
- file: train/hf_trainers
title: "Hugging Face Trainers"
- file: train/gbdt
title: "XGBoost/LightGBM guide"
- file: train/architecture
title: "Ray Train Architecture"
- file: train/train-with-tune
title: "Using Ray Train with Ray Tune"
- file: train/check-ingest
title: "Configuring Training Datasets"
- file: train/predictors
- file: train/benchmarks
- file: train/internals/architecture
- file: train/internals/benchmarks
- file: train/internals/environment-variables
- file: train/examples
title: "Examples"
sections:
Expand Down
2 changes: 1 addition & 1 deletion doc/source/data/batch_inference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,7 @@ Models that have been trained with :ref:`Ray Train <train-docs>` can then be use

checkpoint = result.checkpoint

**Step 3:** Use Ray Data for batch inference. To load in the model from the :class:`Checkpoint <ray.air.checkpoint.Checkpoint>` inside the Python class, use one of the :ref:`framework-specific Checkpoint classes <train-framework-catalog>`.
**Step 3:** Use Ray Data for batch inference. To load in the model from the :class:`Checkpoint <ray.air.checkpoint.Checkpoint>` inside the Python class, use one of the framework-specific Checkpoint classes.

In this case, we use the :class:`XGBoostCheckpoint <ray.train.xgboost.XGBoostCheckpoint>` to load the model.

Expand Down
2 changes: 1 addition & 1 deletion doc/source/data/iterating-over-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ into disjoint shards.

If you're using :ref:`Ray Train <train-docs>`, you don't need to split the dataset.
Ray Train automatically splits your dataset for you. To learn more, see
:ref:`Configuring training datasets <air-ingest>`.
:ref:`Configuring training datasets <data-ingest-torch>`.

.. testcode::

Expand Down
2 changes: 1 addition & 1 deletion doc/source/data/preprocessors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Ray AIR provides several common preprocessors out of the box and interfaces to d
Overview
--------

The most common way of using a preprocessor is by passing it as an argument to the constructor of a Ray Train :ref:`Trainer <train-getting-started>` in conjunction with a :ref:`Ray Data dataset <data>`.
The most common way of using a preprocessor is by passing it as an argument to the constructor of a Ray Train :ref:`Trainer <train-docs>` in conjunction with a :ref:`Ray Data dataset <data>`.
For example, the following code trains a model with a preprocessor that normalizes the data.

.. literalinclude:: doc_code/preprocessors.py
Expand Down
2 changes: 1 addition & 1 deletion doc/source/data/working-with-pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Ray Data integrates with :ref:`Ray Train <train-docs>` for easy data ingest for

...

For more details, see the :ref:`Ray Train user guide <train-datasets>`.
For more details, see the :ref:`Ray Train user guide <data-ingest-torch>`.

.. _transform_pytorch:

Expand Down
4 changes: 0 additions & 4 deletions doc/source/ray-air/api/configs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,6 @@ Ray AIR Configurations

.. TODO(ml-team): Add a general AIR configuration guide that covers all of these configs.
.. seealso::

See :ref:`this Ray Train configuration user guide <train-config>` for more details.

.. currentmodule:: ray

.. autosummary::
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-air/api/dataset-ingest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Ray Data Ingest into AIR Trainers

.. seealso::

See this :ref:`AIR Data ingest guide <air-ingest>` for usage examples.
See this :ref:`AIR Data ingest guide <data-ingest-torch>` for usage examples.

.. currentmodule:: ray

Expand Down
5 changes: 0 additions & 5 deletions doc/source/ray-air/api/predictor.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
Predictor
=========

.. seealso::

See this :ref:`user guide on performing model inference <air-predictors>` in
AIR for usage examples.

.. currentmodule:: ray.train

Predictor Interface
Expand Down
8 changes: 2 additions & 6 deletions doc/source/ray-air/computer-vision.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ Training vision models
:end-before: __torch_trainer_stop__
:dedent:

For more in-depth examples, see :ref:`Using Trainers <train-getting-started>`.
For more in-depth examples, see :ref:`the Ray Train documentation <train-docs>`.

.. tab-item:: TensorFlow

Expand All @@ -202,7 +202,7 @@ Training vision models
:end-before: __tensorflow_trainer_stop__
:dedent:

For more information, check out :ref:`the Ray Train documentation <train-getting-started>`.
For more information, check out :ref:`the Ray Train documentation <train-docs>`.

Creating checkpoints
--------------------
Expand Down Expand Up @@ -259,8 +259,6 @@ image datasets.
:end-before: __torch_batch_predictor_stop__
:dedent:

For more in-depth examples, read :ref:`Using Predictors for Inference <air-predictors>`.

.. tab-item:: TensorFlow

To create a :class:`~ray.train.batch_predictor.BatchPredictor`, call
Expand All @@ -272,8 +270,6 @@ image datasets.
:end-before: __tensorflow_batch_predictor_stop__
:dedent:

For more information, read :ref:`Using Predictors for Inference <air-predictors>`.

Serving vision models
---------------------

Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-air/examples/batch_forecasting.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1167,7 +1167,7 @@
"- We will restore a Prophet or ARIMA model directly from checkpoint, and demonstrate it can be used for prediction.\n",
"\n",
"```{tip}\n",
"[Ray AIR Predictors](air-predictors) make batch inference easy since they have internal logic to parallelize the inference.\n",
"Ray AIR Predictors make batch inference easy since they have internal logic to parallelize the inference.\n",
"```\n"
]
},
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-air/examples/batch_tuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -984,7 +984,7 @@
"metadata": {},
"source": [
"```{tip}\n",
"[Ray AIR Predictors](air-predictors) make batch inference easy since they have internal logic to parallelize the inference.\n",
"Ray AIR Predictors make batch inference easy since they have internal logic to parallelize the inference.\n",
"```\n",
"\n",
"Finally, we will restore the best and worst models from checkpoint and make predictions. \n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
"source": [
"First, we load and preprocess the MNIST dataset.\n",
"\n",
"Assumption for this tutorial: your existing code is using the `tf.data.Dataset` native to Tensorflow. This tutorial continues to use `tf.data.Dataset` to allow you to make as few code changes as possible. **Everything in this tutorial is also possible if you choose to use Ray Data, and you will also get the benefits of efficient preprocessing and multi-worker batch prediction.** See [here](train-datasets) for resources to get started with Ray Data."
"Assumption for this tutorial: your existing code is using the `tf.data.Dataset` native to Tensorflow. This tutorial continues to use `tf.data.Dataset` to allow you to make as few code changes as possible. **Everything in this tutorial is also possible if you choose to use Ray Data, and you will also get the benefits of efficient preprocessing and multi-worker batch prediction.** See [here](data-ingest-torch) for resources to get started with Ray Data."
]
},
{
Expand Down Expand Up @@ -519,9 +519,7 @@
"\n",
"A few notes on the configs set below:\n",
"- `train_loop_config` sets the hyperparameters passed into the training loop as the `config` parameter\n",
"- `scaling_config` configures **how many parallel workers to use**, the **resources required per worker**, and whether we want to **enable GPU training** or not.\n",
"\n",
"See this [configuration guide](train-config) for more details on how to configure the trainer."
"- `scaling_config` configures **how many parallel workers to use**, the **resources required per worker**, and whether we want to **enable GPU training** or not."
]
},
{
Expand Down Expand Up @@ -617,8 +615,6 @@
"\n",
"In our [other examples](ref-ray-examples) you can learn how to do more things with Ray, such as **serving your model with Ray Serve** or **tune your hyperparameters with Ray Tune**. You can also learn how to perform {ref}`offline batch inference <batch_inference_home>` with Ray Data.\n",
"\n",
"See [this table](train-framework-catalog) for a full catalog of frameworks that AIR supports out of the box.\n",
"\n",
"We hope this tutorial gave you a good starting point to leverage Ray AIR. If you have any questions, suggestions, or run into any problems pelase reach out on [Discuss](https://discuss.ray.io/), [GitHub](https://github.com/ray-project/ray) or the [Ray Slack](https://forms.gle/9TSdDYUgxYs8SA9e8)!"
]
}
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-air/examples/gptj_batch_prediction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You may notice that we are not using an AIR {class}`Predictor <ray.train.predictor.Predictor>` here. This is because Predictors are mainly intended to be used with AIR {class}`Checkpoints <ray.train.Checkpoint>`, which we don't for this example. See {ref}`air-predictors` for more information and usage examples."
"You may notice that we are not using an AIR {class}`Predictor <ray.train.predictor.Predictor>` here. This is because Predictors are mainly intended to be used with AIR {class}`Checkpoints <ray.train.Checkpoint>`, which we don't for this example. See {class}`ray.train.predictor.Predictor` for more information and usage examples."
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You may notice that we are not using an AIR {class}`Predictor <ray.train.predictor.Predictor>` here. This is because AIR does not implement an out of the box Predictor for Diffusers. We could implement it ourselves, but Predictors are mainly intended to be used with AIR {class}`Checkpoints <ray.air.checkpoint.Checkpoint>`, and those are not necessary for this example. See {ref}`air-predictors` for more information and usage examples."
"You may notice that we are not using an AIR {class}`Predictor <ray.train.predictor.Predictor>` here. This is because AIR does not implement an out of the box Predictor for Diffusers. We could implement it ourselves, but Predictors are mainly intended to be used with AIR {class}`Checkpoints <ray.air.checkpoint.Checkpoint>`, and those are not necessary for this example. See {class}`ray.train.predictor.Predictor` for more information and usage examples."
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-overview/use-cases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ Learn more about the Tune library with the following talks and user guides.
Distributed Training
--------------------

The :ref:`Ray Train <train-userguides>` library integrates many distributed training frameworks under a simple Trainer API,
The :ref:`Ray Train <train-docs>` library integrates many distributed training frameworks under a simple Trainer API,
providing distributed orchestration and management capabilities out of the box.

In contrast to training many models, model parallelism partitions a large model across many machines for training. Ray Train has built-in abstractions for distributing shards of models and running training in parallel.
Expand Down
6 changes: 3 additions & 3 deletions doc/source/ray-references/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ documentation, sorted alphabetically.
to compute and apply one gradient update to the model weights.

Batch predictor
A :ref:`Ray AIR Batch Predictor<air-predictors>` builds on the Predictor class
A :class:`Ray AIR Batch Predictor<ray.train.predictor.Predictor>` builds on the Predictor class
to parallelize inference on a large dataset. A Batch predictor shards the
dataset to allow multiple workers to do inference on a smaller number of data
points and then aggregating all the worker predictions at the end.
Expand Down Expand Up @@ -413,7 +413,7 @@ documentation, sorted alphabetically.
.. TODO: Policy evaluation
Predictor
:ref:`An interface for performing inference<air-predictors>` (prediction)
:class:`An interface for performing inference<ray.train.predictor.Predictor>` (prediction)
on input data with a trained model.

Preprocessor
Expand Down Expand Up @@ -603,7 +603,7 @@ documentation, sorted alphabetically.
(e.g., for sharing computed gradients).

Trainer configuration
:ref:`A Trainer can be configured in various ways<train-config>`. Some
A Trainer can be configured in various ways. Some
configurations are shared across all trainers, like the RunConfig, which
configures things like the experiment storage, and ScalingConfig, which
configures the number of training workers as well as resources needed per
Expand Down
Loading

0 comments on commit f6015b7

Please sign in to comment.