[RLlib] DreamerV3: Main algo code and required changes to some RLlib APIs (RolloutWorker) #35386

sven1977 · 2023-05-16T14:10:16Z

DreamerV3:

Main algo code (dreamerv3.py, README) compilation and model size (architecture) tests.
Added DreamerV3Catalog.
Added DreamerV3 Algorithm class and config.
Some changes to RLlib:
- The class to use for sampling (default: RolloutWorker) is now publicly configurable via the AlgorithmConfig.rollouts(env_runner_class=...) setting.
- Had to make the gradient tape in TfLearner.update() persistent=True.
Managed to keep the Learner API as-is by simply overriding the DreamerV3TfLearner.compute_gradients() method. W/o overriding this, DreamerV3 on tf will not learn as computing gradients for the TOTAL_LOSS_KEY over all model params messes up world model gradients.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 · 2023-05-16T14:11:42Z

rllib/algorithms/algorithm_config.py

@@ -296,6 +295,9 @@ def __init__(self, algo_class=None):
        self.auto_wrap_old_gym_envs = True

        # `self.rollouts()`
+        # TODO (sven): Clean up the configuration of fully customizable


We can now publicly configure the class used for rollouts. This used to be configurable before via config.debugging(worker_cls=..), but was not working correctly.

sven1977 · 2023-05-16T14:11:53Z

rllib/algorithms/algorithm_config.py

@@ -838,7 +843,7 @@ def validate(self) -> None:
            self.model["_disable_action_flattening"] = True
        if self.model.get("custom_preprocessor"):
            deprecation_warning(
-                old="model_config['custom_preprocessor']",
+                old="AlgorithmConfig.training(model={'custom_preprocessor': ...})",


sven1977 · 2023-05-16T14:12:40Z

rllib/algorithms/algorithm_config.py

@@ -2716,12 +2731,22 @@ def get_multi_agent_setup(
        # Normal env (gym.Env or MultiAgentEnv): These should have the
        # `observation_space` and `action_space` properties.
        elif env is not None:
-            if hasattr(env, "observation_space") and isinstance(
+            if hasattr(env, "single_observation_space") and isinstance(


Support new gym.vector.Env envs, which have a single_action|observation_space property.

sven1977 · 2023-05-16T14:13:08Z

rllib/algorithms/ppo/tf/ppo_tf_rl_module.py

@@ -60,7 +60,7 @@ def _forward_exploration(self, batch: NestedDict) -> Mapping[str, Any]:

        return output

-    @override(TfRLModule)
+    @override(RLModule)


sven1977 · 2023-05-16T14:13:33Z

rllib/core/learner/learner.py

@@ -352,7 +352,7 @@ def _configure_optimizers_per_module_helper(
            pairs.append(pair)
        elif isinstance(pair_or_pairs, dict):
            # pair_or_pairs is a NamedParamOptimizerPairs
-            for name, pair in pairs.items():
+            for name, pair in pair_or_pairs.items():


This was a bug, but not visible for Learners that only use the default (single) optimizer path.

sven1977 · 2023-05-16T14:14:07Z

rllib/core/learner/learner.py

@@ -435,8 +435,25 @@ def compute_gradients(self, loss: Mapping[str, Any]) -> ParamDictType:
            The gradients in teh same format as self._params.
        """

+    @OverrideToImplementCustomLogic


sorted this into a better position. It should always be together:
compute_grads
postprocess_grads
apply_grads

^ in that order

sven1977 · 2023-05-16T14:14:20Z

rllib/core/learner/learner.py

    @abc.abstractmethod
-    def apply_gradients(self, gradients: ParamDictType) -> None:
+    def apply_gradients(self, gradients_dict: ParamDictType) -> None:


Consistent args naming.

sven1977 · 2023-05-16T14:14:35Z

rllib/core/learner/learner.py

-        forward passes within this method, and to use the "forward_train" outputs to
-        compute the required tensors for loss calculation.
+        "fwd_out". The returned dictionary must contain a key called
+        `self.TOTAL_LOSS_KEY`, which will be used to compute gradients. It is


Use constant name for this key.

sven1977 · 2023-05-16T14:14:55Z

rllib/core/learner/learner.py

@@ -811,7 +807,7 @@ def update(
        reduce_fn: Callable[[List[Mapping[str, Any]]], ResultDict] = (
            _reduce_mean_results
        ),
-    ) -> Mapping[str, Any]:
+    ) -> Union[Mapping[str, Any], List[Mapping[str, Any]]]:


If reduce_fn not given, might return a list of dicts.

sven1977 · 2023-05-16T14:15:14Z

rllib/core/learner/tf/tf_learner.py

@@ -124,7 +122,7 @@ def postprocess_gradients(
        return gradients_dict

    @override(Learner)
-    def apply_gradients(self, gradients: ParamDictType) -> None:
+    def apply_gradients(self, gradients_dict: ParamDictType) -> None:


same: more consistent args naming.

sven1977 · 2023-05-16T14:15:28Z

rllib/core/learner/tf/tf_learner.py

@@ -490,11 +489,13 @@ def helper(_batch):
            #  constraint on forward_train and compute_loss APIs. This seems to be
            #  in-efficient. Make it efficient.
            _batch = NestedDict(_batch)
-            with tf.GradientTape() as tape:
+            with tf.GradientTape(persistent=True) as tape:


Necessary for multiple optimizers that operate on the same RLModule.

sven1977 · 2023-05-16T14:15:42Z

rllib/core/learner/torch/torch_learner.py

@@ -93,24 +94,8 @@ def compute_gradients(

        return grads

-    @OverrideToImplementCustomLogic_CallToSuperRecommended


Moved down to a better location in the file.

sven1977 · 2023-05-16T14:15:51Z

rllib/core/learner/torch/torch_learner.py

            self._params[pid].grad = grad

        # for each optimizer call its step function with the gradients
        for optim in self._optimizer_parameters:
            optim.step()

+    @OverrideToImplementCustomLogic_CallToSuperRecommended


.. to here :)

sven1977 · 2023-05-16T14:16:34Z

rllib/core/models/catalog.py

@@ -25,93 +25,6 @@
 from ray.rllib.utils.spaces.space_utils import get_base_struct_from_space


-def _multi_action_dist_partial_helper(


Moved down for making the main Catalog class in this file more prominent. We should generally always move private functions to the end of files to avoid confusion and make the main class(es) in a file more visible.

…mer_v3_02_learner

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…e RLlib APIs (RolloutWorker). (#35386)" This reverts commit 8290bd1.

…e RLlib APIs (RolloutWorker). (#35386)" (#36564) This reverts commit 8290bd1.

…APIs (RolloutWorker). (ray-project#35386) Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

…e RLlib APIs (RolloutWorker). (ray-project#35386)" (ray-project#36564) This reverts commit 8290bd1. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

sven1977 added 4 commits May 16, 2023 15:17

wip

01282b2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

e4524cb

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

48f5c2d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

9e12a2f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners May 16, 2023 14:10

sven1977 commented May 16, 2023

View reviewed changes

sven1977 added 5 commits May 17, 2023 16:00

Merge branch 'master' of https://github.com/ray-project/ray into drea…

6ca9def

…mer_v3_02_learner

wip

9d94af0

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

17ba15b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

de6c728

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

bf63b79

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added 5 commits June 17, 2023 16:44

wip

69852c5

Signed-off-by: sven1977 <svenmika1977@gmail.com>

merge w/ master

f5dbb99

Signed-off-by: sven1977 <svenmika1977@gmail.com>

small fixes

b8466cd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

3c4fc86

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

1b28c73

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from edoakes, shrekris-anyscale, sihanwang41, zcin, architkulkarni, a team, richardliaw and ericl as code owners June 19, 2023 16:23

sven1977 force-pushed the dreamer_v3_03_main_algo branch from ffc1827 to 1b28c73 Compare June 19, 2023 18:13

sven1977 added 2 commits June 19, 2023 20:15

wip

5270239

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

0e0d395

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 merged commit 8290bd1 into ray-project:master Jun 19, 2023

krfricke added a commit that referenced this pull request Jun 20, 2023

Revert "[RLlib] DreamerV3: Main algo code and required changes to som…

4b841e6

…e RLlib APIs (RolloutWorker). (#35386)" This reverts commit 8290bd1.

krfricke mentioned this pull request Jun 20, 2023

Revert "[RLlib] DreamerV3: Main algo code and required changes to some RLlib APIs (RolloutWorker)" #36564

Merged

krfricke added a commit that referenced this pull request Jun 20, 2023

Revert "[RLlib] DreamerV3: Main algo code and required changes to som…

42e06e3

…e RLlib APIs (RolloutWorker). (#35386)" (#36564) This reverts commit 8290bd1.

akshay-anyscale mentioned this pull request Jul 21, 2023

Add service deployment instructions to stable diffusion template #37645

Closed

8 tasks

arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023

[RLlib] DreamerV3: Main algo code and required changes to some RLlib …

47cfec7

…APIs (RolloutWorker). (ray-project#35386) Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] DreamerV3: Main algo code and required changes to some RLlib APIs (RolloutWorker) #35386

[RLlib] DreamerV3: Main algo code and required changes to some RLlib APIs (RolloutWorker) #35386

sven1977 commented May 16, 2023 •

edited

Loading

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

sven1977 May 16, 2023

		@@ -93,24 +94,8 @@ def compute_gradients(

		return grads

		@OverrideToImplementCustomLogic_CallToSuperRecommended

		@@ -25,93 +25,6 @@
		from ray.rllib.utils.spaces.space_utils import get_base_struct_from_space


		def _multi_action_dist_partial_helper(

[RLlib] DreamerV3: Main algo code and required changes to some RLlib APIs (RolloutWorker) #35386

[RLlib] DreamerV3: Main algo code and required changes to some RLlib APIs (RolloutWorker) #35386

Conversation

sven1977 commented May 16, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented May 16, 2023 •

edited

Loading