[RLlib] AlgorithmConfig: Next steps (volume 01) #29395

sven1977 · 2022-10-17T10:16:13Z

This PR takes the next step in our journey to fully move off of python config dicts and utilize AlgorithmConfig (and its subclasses) in all of RLlib under the hood.

In particular, this PR:

Makes all Algorithms, RolloutWorker, Sampler, WorkerSet, and PolicyMap utilize AlgorithmConfig under the hood. In case an old-style python dict is passed into any of these constructors, an automatic conversion (to AlgorithmConfig) is applied. In case a user (or RLlib) still treats configs as dicts, AlgorithmConfig can handle getattr, setattr, etc.. gracefully. These shim-helper-methods will be removed in the future.
NOTE: Policies (except for PG Policies) still utilize old-style python dicts as configs under their hoods. PG was done in this PR for demonstration purposes. We need to convert all Policies to accepting AlgorithmConfig in the future, but this is a one-line change in each Policy class, mostly.
AlgorithmConfig objects can now be frozen (all Algorithms do this in their ctor now) to make sure no one can alter the config anymore once passed into an Algo's c'tor.
We do most of the config validation logic now inside the AlgorithmConfig class itself. So a lot of the validate_config code disappears. Eventually, we should get rid of validate_config entirely (should be handled by each config class directly).
Deprecated passing env into Algo's c'tor. Should no longer be used and now causes an error.
Evaluation config is now an automatically generated (full) AlgorithmConfig inside the main AlgorithmConfig (property: self.evaluation_config). The self.evaluation_config property within the evaluation config is None. This helps us do all validation checking early on (before freezing) and simplifies handling of eval configs. The typical eval-config override (via a dict) is still supported, but will eventually be replaced by objects (e.g. a to-be-designed AlgorithmConfigOverride class) as well.
RolloutWorker:
** Dramatically reduced complexity of its ctor signature (due to the fact that most args should have already been defined in the config anyways).
** Now uses AlgorithmConfig under the hood (passing in a dict is still supported, though).
** Moved some of the "conversion" logic that it used to perform on the config into AlgorithmConfig itself (e.g. the construction and unification of the policies dict in a multi-agent setup).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts: # rllib/policy/policy.py

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_configs_next_steps_1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 · 2022-10-18T08:46:18Z

rllib/utils/tf_utils.py

@@ -18,6 +18,7 @@
 )

 if TYPE_CHECKING:
+    from ray.rllib.policy.policy import Policy


Annotation bug fix.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 · 2022-10-18T09:58:02Z

rllib/algorithms/algorithm.py

@@ -311,40 +308,50 @@ def from_state(state: Dict) -> "Algorithm":
    @PublicAPI
    def __init__(
        self,
-        config: Optional[Union[PartialAlgorithmConfigDict, AlgorithmConfig]] = None,
-        env: Optional[Union[str, EnvType]] = None,
+        config: Union[AlgorithmConfig, PartialAlgorithmConfigDict],


Deprecated passing env into Algo's c'tor. Should no longer be used and now causes an error.

sven1977 · 2022-10-18T09:59:35Z

rllib/algorithms/algorithm.py

        # TODO: In the future, only support AlgorithmConfig objects here.
-        if isinstance(config, AlgorithmConfig):
-            config = config.to_dict()
+        if isinstance(config, dict):


Retro-support old-style python config dicts for a while.
We will translate these into AlgorithmConfig objects from here on, freeze the AlgorithmConfig object (so it cannot be changed anymore by anyone), and use it under the hood in all Algorithms.
The added backward-compat mechanism to look up old-style dict keys (str) from those AlgorithmConfig objects makes sure, this even works for custom algos that still think they are dealing with a config dict.

sven1977 · 2022-10-18T10:00:12Z

rllib/algorithms/algorithm.py

+            else:
+                config = default_config.update_from_dict(config)
+
+        if env is not None:


From here on, everything config-related inside an algo is a AlgorithmConfig object.

sven1977 · 2022-10-18T10:00:38Z

rllib/algorithms/algorithm.py

@@ -427,19 +438,24 @@ def default_logger_creator(config):

    @OverrideToImplementCustomLogic
    @classmethod
-    def get_default_config(cls) -> AlgorithmConfigDict:
-        return AlgorithmConfig().to_dict()
+    def get_default_config(cls) -> Union[AlgorithmConfig, AlgorithmConfigDict]:


This method can still be overridden in two ways: return dict OR return instantiated AlgorithmConfig object.

sven1977 · 2022-10-18T10:02:12Z

rllib/algorithms/algorithm.py

-                )
-
-            self.config["evaluation_config"] = eval_config
+            self.validate_config(self.config.evaluation_config)


We do most of the config validation logic now inside the AlgorithmConfig class itself. So a lot of the validate_config code disappears. Eventually, we should get rid of validate_config entirely (should be handled by each config class directly).

this is fantastic!

sven1977 · 2022-10-18T10:04:20Z

rllib/algorithms/algorithm.py

@@ -576,65 +592,12 @@ def setup(self, config: PartialAlgorithmConfigDict):

        # Evaluation WorkerSet setup.
        # User would like to setup a separate evaluation worker set.
-
-        # Update with evaluation settings:
-        user_eval_config = copy.deepcopy(self.config["evaluation_config"])


Evaluation config is now an automatically generated (full) AlgorithmConfig inside the main AlgorithmConfig (property: self.evaluation_config). The self.evaluation_config property within the evaluation config is None. This helps us do all validation checking early on (before freezing) and simplifies handling of eval configs.

The typical eval-config override (via a dict) is still supported, but will eventually be replaced by objects (e.g. a to-be-designed AlgorithmConfigOverride class) as well.

sven1977 · 2022-10-18T10:05:16Z

rllib/algorithms/algorithm.py

-                "returns a subclass of DefaultCallbacks, got "
-                f"{config['callbacks']}!"
-            )
+        from ray.rllib.models.catalog import MODEL_DEFAULTS


A lot of the code here was moved into AlgorithmConfig itself. We should eventually deprecate all validate_config methods.

sven1977 · 2022-10-18T10:05:57Z

rllib/algorithms/algorithm_config.py

@@ -152,13 +160,18 @@ def __init__(self, algo_class=None):
        }

        # `self.multi_agent()`
-        self.policies = {}
+        self._is_multi_agent = False
+        self.policies = {DEFAULT_POLICY_ID: PolicySpec()}


Moved more default settings here into ctor (where all defaults belong).

sven1977 · 2022-10-18T10:06:36Z

rllib/algorithms/algorithm_config.py

        self.observation_fn = None
        self.count_steps_by = "env_steps"
+        self._multi_agent_legacy_dict = {}
+        self._set_ma_legacy_dict()


Shim for backward compatibility, iff users still access the "multiagent" key inside a AlgorithmConfig object and poke around in that dict (e.g. access config["multiagent"]["policies"]).

sven1977 · 2022-10-18T10:07:02Z

rllib/algorithms/algorithm_config.py

@@ -170,14 +183,15 @@ def __init__(self, algo_class=None):
        self.output_config = {}
        self.output_compress_columns = ["obs", "new_obs"]
        self.output_max_file_size = 64 * 1024 * 1024
+        self.offline_sampling = False


Moved this here (only used by CRR and CQL, BUT required by RolloutWorker, which had to do a hasattr check :/ )

sven1977 · 2022-10-18T10:07:38Z

rllib/algorithms/algorithm_config.py

@@ -214,6 +228,9 @@ def __init__(self, algo_class=None):
        self._disable_action_flattening = False
        self._disable_execution_plan_api = True

+        # Has this config object been frozen (cannot alter its attributes anymore).


AlgorithmConfig objects can now be frozen (all Algorithms do this in their ctor now) to make sure no one can alter the config anymore once passed into an Algo's c'tor.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_configs_next_steps_1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)" This reverts commit 182744b.

…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)" (#29742) This reverts commit 182744b.

…s, RolloutWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)" (#29742)" This reverts commit 12b579d.

…, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (ray-project#29395) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (ray-project#29395)" (ray-project#29742) This reverts commit 182744b. Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

sven1977 added 6 commits September 28, 2022 21:57

wip

250a88a

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' into algo_configs_next_steps_1

b268d14

# Conflicts: # rllib/policy/policy.py

wip

dc8ea88

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into algo…

1edaf5f

…_configs_next_steps_1

wip

9b619da

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

c810b41

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners October 17, 2022 10:16

wip

e6ee47e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 commented Oct 18, 2022

View reviewed changes

sven1977 added 4 commits October 18, 2022 11:42

wip

4ac1944

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

d777f7a

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

ad2aafc

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

60b8b41

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 commented Oct 18, 2022

View reviewed changes

sven1977 added 14 commits October 21, 2022 16:30

wip

f5d1100

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

2bcd192

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into algo…

96fff6b

…_configs_next_steps_1

wip

c3d9acd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

9ff3ecc

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' into algo_configs_next_steps_1

73c7b21

wip

1c6f5ec

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

dfaf935

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

b4c17a5

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

6e7a684

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

d4f8ff7

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

6da2356

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

8e81a4d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

6443d8e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested a review from a team as a code owner October 25, 2022 16:19

sven1977 added 5 commits October 25, 2022 18:45

wip

cab6ddd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

1d66f41

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

e5920ea

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

1617b46

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' into algo_configs_next_steps_1

cdc2157

sven1977 merged commit 182744b into ray-project:master Oct 26, 2022

cadedaniel mentioned this pull request Oct 26, 2022

[CI] linux://python/ray/tune:test_progress_reporter is failing/flaky on master. #29714

Closed

krfricke added a commit that referenced this pull request Oct 27, 2022

Revert "[RLlib] AlgorithmConfig: Next steps (volume 01); Algos, Rollo…

7528f9c

…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)" This reverts commit 182744b.

gjoliver pushed a commit that referenced this pull request Oct 27, 2022

Revert "[RLlib] AlgorithmConfig: Next steps (volume 01); Algos, Rollo…

12b579d

…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)" (#29742) This reverts commit 182744b.

sven1977 added a commit that referenced this pull request Oct 27, 2022

Revert "Revert "[RLlib] AlgorithmConfig: Next steps (volume 01); Algo…

ecb2b12

…s, RolloutWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)" (#29742)" This reverts commit 12b579d.

bveeramani mentioned this pull request Oct 27, 2022

[CI] linux://python/ray/tune:test_result_grid is failing/flaky on master. #29719

Closed

sven1977 mentioned this pull request Nov 10, 2022

[RLlib] Fix tf + CNN (e.g. Atari) + GPU issues with release tests. #30176

Merged

7 tasks

sven1977 deleted the algo_configs_next_steps_1 branch May 5, 2023 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] AlgorithmConfig: Next steps (volume 01) #29395

[RLlib] AlgorithmConfig: Next steps (volume 01) #29395

sven1977 commented Oct 17, 2022 •

edited

Loading

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

maxpumperla Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

sven1977 Oct 18, 2022

[RLlib] AlgorithmConfig: Next steps (volume 01) #29395

[RLlib] AlgorithmConfig: Next steps (volume 01) #29395

Conversation

sven1977 commented Oct 17, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Oct 17, 2022 •

edited

Loading