Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
3fb189c
wip
sven1977 Aug 7, 2024
bf5a0a9
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Aug 12, 2024
f275de5
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Aug 16, 2024
c371005
wip
sven1977 Aug 16, 2024
84a4709
wip
sven1977 Aug 16, 2024
501f228
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Aug 22, 2024
32ee04f
merge
sven1977 Nov 3, 2024
fabc6b6
merge
sven1977 Nov 3, 2024
d7ba4d7
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Nov 4, 2024
6d4a22c
wip
sven1977 Nov 4, 2024
27e0f7f
wip
sven1977 Nov 4, 2024
6564638
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 6, 2024
7c94736
wip
sven1977 Dec 7, 2024
a3146ea
wip
sven1977 Dec 7, 2024
36798de
wip
sven1977 Dec 8, 2024
36ed15b
wip
sven1977 Dec 8, 2024
908368d
wip
sven1977 Dec 9, 2024
f4de8f0
wip
sven1977 Dec 9, 2024
1c2ecfc
wip
sven1977 Dec 9, 2024
46a70bc
fix
sven1977 Dec 9, 2024
25ffc1e
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 9, 2024
61cac4a
wip
sven1977 Dec 9, 2024
30bd0a3
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 11, 2024
61ea883
wip
sven1977 Dec 11, 2024
7b1830d
wip
sven1977 Dec 11, 2024
e052955
wip
sven1977 Dec 11, 2024
4771cf7
wip
sven1977 Dec 11, 2024
e0eaa3d
wip
sven1977 Dec 11, 2024
61bdea0
wip
sven1977 Dec 11, 2024
7647df7
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 11, 2024
e50242f
Merge branch 'docs_redo_cleanup_old_api_stack_01_00' into docs_redo_c…
sven1977 Dec 11, 2024
2572434
wip
sven1977 Dec 11, 2024
7e5cf4a
wip
sven1977 Dec 11, 2024
70cee22
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 11, 2024
4ce05cc
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 11, 2024
43efbff
wip
sven1977 Dec 11, 2024
425b3f6
wip
sven1977 Dec 11, 2024
f4c32e5
Apply suggestions from code review
sven1977 Dec 18, 2024
6b33398
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 18, 2024
25419d6
wip
sven1977 Dec 18, 2024
91a891d
Apply suggestions from code review
sven1977 Dec 18, 2024
5b14b7b
wip
sven1977 Dec 19, 2024
ca71107
wip
sven1977 Dec 19, 2024
aca7953
wip
sven1977 Dec 19, 2024
adec191
wip
sven1977 Dec 19, 2024
99c783e
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 19, 2024
96af3bf
wip
sven1977 Dec 19, 2024
170beaa
fixes
sven1977 Dec 19, 2024
157d2fa
fixes
sven1977 Dec 19, 2024
0c84080
fixes
sven1977 Dec 19, 2024
55d5f55
fixes
sven1977 Dec 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions doc/source/rllib/external-envs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-external-env-setups-doc:


External Environments and Applications
======================================

In many situations, it doesn't make sense for an RL environment to be "stepped" by RLlib.
For example, if you train a policy inside a complex simulator that operates its own execution loop,
like a game engine or a robotics simulation. A natural and user friendly approach is to flip this setup around
and - instead of RLlib "stepping" the env - allow the agents in the simulation to fully control
their own stepping. An external RLlib-powered service would be available for either querying
individual actions or for accepting batched sample data. The service would cover the task
of training the policies, but wouldn't pose any restrictions on when and how often per second the simulation
should step.

.. figure:: images/envs/external_env_setup_client_inference.svg
:width: 600

**External application with client-side inference**: An external simulator (for example a game engine)
connects to RLlib, which runs as a server through a tcp-cabable, custom EnvRunner.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use inline code formatting for all classes like EnvRunner?

The simulator sends batches of data from time to time to the server and in turn receives weights updates.
For better performance, actions are computed locally on the client side.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For better performance, actions are computed locally on the client side.
For better performance, RLlib computes actions locally on the client side.


.. todo (sven): show new image here with UE5
.. .. figure:: images/rllib-training-inside-a-unity3d-env.png
.. scale: 75 %
.. A Unity3D soccer game being learnt by RLlib via the ExternalEnv API.

RLlib provides an `external messaging protocol <https://github.com/ray-project/ray/blob/master/rllib/env/utils/external_env_protocol.py>`__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so cool!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's make this a widely adopted standard! :)

called :ref:`RLlink <rllink-protocol-docs>` for this purpose as well as the option to customize your :py:class:`~ray.rllib.env.env_runner.EnvRunner` class
toward communicating through :ref:`RLlink <rllink-protocol-docs>` with one or more clients.
An example, `tcp-based EnvRunner implementation with RLlink is available here <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/env_connecting_to_rllib_w_tcp_client.py>`__.
It also contains a dummy (CartPole) client that can be used for testing and as a template for how your external application or simulator should
utilize the :ref:`RLlink <rllink-protocol-docs>` protocol.

.. note::
External application support is still work-in-progress on RLlib's new API stack. The Ray team
is working on more examples for custom EnvRunner implementations (besides
`the already available tcp-based one <https://github.com/ray-project/ray/blob/master/rllib/env/tcp_client_inference_env_runner.py>`__)
as well as various client-side, non-python RLlib-adapters, for example for popular game engines and other
simulation software.


.. _rllink-protocol-docs:

The RLlink Protocol
-------------------

RLlink is a simple, stateful protocol designed for communication between a reinforcement learning (RL) server (ex., RLlib) and an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: Why not using plain HTTP/2? It is standard and provides security and serialization via Protobuf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely in the next iteration! Trying to keep it as simple as possible for this very first iteration. For now, this is just about the message types (what to say when and what to expect back from server?), not really the actual implementation of the messages.

external client acting as an environment simulator. The protocol enables seamless exchange of RL-specific data such as episodes,
configuration, and model weights, while also facilitating on-policy training workflows.

Key Features
~~~~~~~~~~~~

- **Stateful Design**: The protocol maintains some state through sequences of message exchanges (ex., request-response pairs like `GET_CONFIG` -> `SET_CONFIG`).
- **Strict Request-Response Design**: The protocol is strictly (client) request -> (server) response based. Due to the necessity to let the client simulation run in its own execution loop, the server side refrains from sending any unsolicited messages to the clients.
- **RL-Specific Capabilities**: Tailored for RL workflows, including episode handling, model weight updates, and configuration management.
- **Flexible Sampling**: Supports both on-policy and off-policy data collection modes.
- **JSON**: For reasons of better debugging and faster iterations, the first versions of RLlink are entirely JSON-based, non-encrypted, and non-secure.

Message Structure
~~~~~~~~~~~~~~~~~

RLlink messages consist of a header and a body:

- **Header**: 8-byte length field indicating the size of the body, for example `00000016` for a body of length 16 (thus, in total, the message size ).
- **Body**: JSON-encoded content with a `type` field indicating the message type.

Example Messages: PING and EPISODES_AND_GET_STATE
+++++++++++++++++++++++++++++++++++++++++++++++++

Here is a complete simple example message for the `PING` message. Note the 8-byte header
encoding the size of the following body to be of length `16`, followed by the message body with the mandatory "type" field.

.. code-block::

00000016{"type": "PING"}


The `PING` message should be sent by the client after initiation of a new connection. The server
then responds with:

.. code-block::

00000016{"type": "PONG"}


Here is an example of an `EPISODES_AND_GET_STATE` message sent by the client to the server and carrying
a batch of sampling data. With the same message, the client asks the server to send back the updated model weights.

.. _example-rllink-episode-and-get-state-msg:

.. code-block:: javascript

{
"type": "EPISODES_AND_GET_STATE",
"episodes": [
{
"obs": [[...]], // List of observations
"actions": [...], // List of actions
"rewards": [...], // List of rewards
"is_terminated": false,
"is_truncated": false
}
],
"env_steps": 128
}


Overview of all Message Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Requests: Client → Server
+++++++++++++++++++++++++

- **``PING``**

- Example: ``{"type": "PING"}``
- Purpose: Initial handshake to establish communication.
- Expected Response: ``{"type": "PONG"}``.

- **``GET_CONFIG``**

- Example: ``{"type": "GET_CONFIG"}``
- Purpose: Request the relevant configuration (for example, how many timesteps to collect for a single `EPISODES_AND_GET_STATE` message; see below).
- Expected Response: ``{"type": "SET_CONFIG", "env_steps_per_sample": 500, "force_on_policy": true}``.

- **``EPISODES_AND_GET_STATE``**

- Example: :ref:`See here for an example message <example-rllink-episode-and-get-state-msg>`
- Purpose: Combine ``EPISODES`` and ``GET_STATE`` into a single request. This is useful for workflows requiring on-policy (synchronous) updates to model weights after data collection.
- Body:

- ``episodes``: A list of JSON objects (dicts), each with mandatory keys "obs" (list of observations in the episode), "actions" (list of actions in the episode), "rewards" (list of rewards in the episode), "is_terminated" (bool), and "is_truncated" (bool). Note that the "obs" list has one item more than the lists for "actions" and "rewards" due to the initial "reset" observation.
- ``weights_seq_no``: Sequence number for the model weights version, ensuring synchronization.

- Expected Response: ``{"type": "SET_STATE", "weights_seq_no": 123, "mlir_file": ".. [b64 encoded string of the binary .mlir file with the model in it] .."}``.


Responses: Server → Client
++++++++++++++++++++++++++

- **``PONG``**

- Example: ``{"type": "PONG"}``
- Purpose: Acknowledgment of the ``PING`` request to confirm connectivity.

- **``SET_STATE``**

- Example: ``{"type": "PONG"}``
- Purpose: Provide the client with the current state (for example, model weights).
- Body:

- ``onnx_file``: Base64-encoded, compressed ONNX model file.
- ``weights_seq_no``: Sequence number for the model weights, ensuring synchronization.

- **``SET_CONFIG``**

- Purpose: Send relevant configuration details to the client.
- Body:

- ``env_steps_per_sample``: Number of total env steps collected for one ``EPISODES_AND_GET_STATE`` message.
- ``force_on_policy``: Whether on-policy sampling is enforced. If true, the client should wait after sending the ``EPISODES_AND_GET_STATE`` message for the ``SET_STATE`` response before continuing to collect the next round of samples.


Workflow Examples
+++++++++++++++++

**Initial Handshake**

1. Client sends ``PING``.
2. Server responds with ``PONG``.

**Configuration Request**

1. Client sends ``GET_CONFIG``.
2. Server responds with ``SET_CONFIG``.

**Training (on-policy)**

1. Client collects on-policy data and sends ``EPISODES_AND_GET_STATE``.
2. Server processes the episodes and responds with ``SET_STATE``.


.. note::
This protocol is an initial draft of the attempt to develop a widely adapted protocol for communication between an external
client and a remote RL-service. Expect many changes, enhancements, and upgrades as it moves toward maturity, including
adding a safety layer and compression.
For now, however, it offers a lightweight, simple, yet powerful interface for integrating external environments with RL
frameworks.


Example: External client connecting to tcp-based EnvRunner
----------------------------------------------------------

An example `tcp-based EnvRunner implementation with RLlink is available here <https://github.com/ray-project/ray/blob/master/rllib/env/tcp_client_inference_env_runner.py>`__.
See `here for the full end-to-end example <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/env_connecting_to_rllib_w_tcp_client.py>`__.

Feel free to alter the underlying logic of your custom EnvRunner, for example, you could implement a shared memory based
communication layer (instead of the tcp-based one).
57 changes: 57 additions & 0 deletions doc/source/rllib/hierarchical-envs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-hierarchical-environments-doc:


Hierarchical Environments
=========================

You can implement hierarchical training as a special case of multi-agent RL. For example, consider a two-level hierarchy of policies,
where a top-level policy issues high level tasks that are executed at a finer timescale by one or more low-level policies.
sven1977 marked this conversation as resolved.
Show resolved Hide resolved
The following timeline shows one step of the top-level policy, which corresponds to four low-level actions:

.. code-block:: text

top-level: action_0 -------------------------------------> action_1 ->
low-level: action_0 -> action_1 -> action_2 -> action_3 -> action_4 ->

Alternatively, you could implement an environment, in which the two agent types don't act at the same time (overlappingly),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome explanation!

but the low-level agents wait for the high-level agent to issue an action, then act n times before handing
back control to the high-level agent:

.. code-block:: text

top-level: action_0 -----------------------------------> action_1 ->
low-level: ---------> action_0 -> action_1 -> action_2 ------------>


You can implement any of these hierarchical action patterns as a multi-agent environment with various
types of agents, for example a high-level agent and a low-level agent. When set up using the correct
agent to module mapping functions, from RLlib's perspective, the problem becomes a simple independent
multi-agent problem with different types of policies.

Your configuration might look something like the following:

.. testcode::

from ray.rllib.algorithms.ppo import PPOConfig

config = (
PPOConfig()
.multi_agent(
policies={"top_level", "low_level"},
policy_mapping_fn=(
lambda aid, eps, **kw: "low_level" if aid.startswith("low_level") else "top_level"
),
policies_to_train=["top_level"],
)
)


In this setup, the appropriate rewards at any hierarchy level should be provided by the multi-agent env implementation.
The environment class is also responsible for routing between agents, for example conveying `goals <https://arxiv.org/pdf/1703.01161.pdf>`__ from higher-level
agents to lower-level agents as part of the lower-level agent observation.

See `this runnable example of a hierarchical env <https://github.com/ray-project/ray/blob/master/rllib/examples/hierarchical/hierarchical_training.py>`__.
1 change: 0 additions & 1 deletion doc/source/rllib/images/env_classes_overview.svg

This file was deleted.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion doc/source/rllib/images/multi-agent.svg

This file was deleted.

1 change: 0 additions & 1 deletion doc/source/rllib/images/rllib-envs.svg

This file was deleted.

1 change: 0 additions & 1 deletion doc/source/rllib/images/rllib-external.svg

This file was deleted.

Loading
Loading