Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] AlgorithmConfig: Next steps (volume 01) #29395

Merged
merged 74 commits into from
Oct 26, 2022

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Oct 17, 2022

This PR takes the next step in our journey to fully move off of python config dicts and utilize AlgorithmConfig (and its subclasses) in all of RLlib under the hood.

In particular, this PR:

  • Makes all Algorithms, RolloutWorker, Sampler, WorkerSet, and PolicyMap utilize AlgorithmConfig under the hood. In case an old-style python dict is passed into any of these constructors, an automatic conversion (to AlgorithmConfig) is applied. In case a user (or RLlib) still treats configs as dicts, AlgorithmConfig can handle getattr, setattr, etc.. gracefully. These shim-helper-methods will be removed in the future.
  • NOTE: Policies (except for PG Policies) still utilize old-style python dicts as configs under their hoods. PG was done in this PR for demonstration purposes. We need to convert all Policies to accepting AlgorithmConfig in the future, but this is a one-line change in each Policy class, mostly.
  • AlgorithmConfig objects can now be frozen (all Algorithms do this in their ctor now) to make sure no one can alter the config anymore once passed into an Algo's c'tor.
  • We do most of the config validation logic now inside the AlgorithmConfig class itself. So a lot of the validate_config code disappears. Eventually, we should get rid of validate_config entirely (should be handled by each config class directly).
  • Deprecated passing env into Algo's c'tor. Should no longer be used and now causes an error.
  • Evaluation config is now an automatically generated (full) AlgorithmConfig inside the main AlgorithmConfig (property: self.evaluation_config). The self.evaluation_config property within the evaluation config is None. This helps us do all validation checking early on (before freezing) and simplifies handling of eval configs. The typical eval-config override (via a dict) is still supported, but will eventually be replaced by objects (e.g. a to-be-designed AlgorithmConfigOverride class) as well.
  • RolloutWorker:
    ** Dramatically reduced complexity of its ctor signature (due to the fact that most args should have already been defined in the config anyways).
    ** Now uses AlgorithmConfig under the hood (passing in a dict is still supported, though).
    ** Moved some of the "conversion" logic that it used to perform on the config into AlgorithmConfig itself (e.g. the construction and unification of the policies dict in a multi-agent setup).

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
# Conflicts:
#	rllib/policy/policy.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@@ -18,6 +18,7 @@
)

if TYPE_CHECKING:
from ray.rllib.policy.policy import Policy
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation bug fix.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@@ -311,40 +308,50 @@ def from_state(state: Dict) -> "Algorithm":
@PublicAPI
def __init__(
self,
config: Optional[Union[PartialAlgorithmConfigDict, AlgorithmConfig]] = None,
env: Optional[Union[str, EnvType]] = None,
config: Union[AlgorithmConfig, PartialAlgorithmConfigDict],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated passing env into Algo's c'tor. Should no longer be used and now causes an error.

# TODO: In the future, only support AlgorithmConfig objects here.
if isinstance(config, AlgorithmConfig):
config = config.to_dict()
if isinstance(config, dict):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retro-support old-style python config dicts for a while.
We will translate these into AlgorithmConfig objects from here on, freeze the AlgorithmConfig object (so it cannot be changed anymore by anyone), and use it under the hood in all Algorithms.
The added backward-compat mechanism to look up old-style dict keys (str) from those AlgorithmConfig objects makes sure, this even works for custom algos that still think they are dealing with a config dict.

else:
config = default_config.update_from_dict(config)

if env is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here on, everything config-related inside an algo is a AlgorithmConfig object.

@@ -427,19 +438,24 @@ def default_logger_creator(config):

@OverrideToImplementCustomLogic
@classmethod
def get_default_config(cls) -> AlgorithmConfigDict:
return AlgorithmConfig().to_dict()
def get_default_config(cls) -> Union[AlgorithmConfig, AlgorithmConfigDict]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can still be overridden in two ways: return dict OR return instantiated AlgorithmConfig object.

)

self.config["evaluation_config"] = eval_config
self.validate_config(self.config.evaluation_config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do most of the config validation logic now inside the AlgorithmConfig class itself. So a lot of the validate_config code disappears. Eventually, we should get rid of validate_config entirely (should be handled by each config class directly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fantastic!

@@ -576,65 +592,12 @@ def setup(self, config: PartialAlgorithmConfigDict):

# Evaluation WorkerSet setup.
# User would like to setup a separate evaluation worker set.

# Update with evaluation settings:
user_eval_config = copy.deepcopy(self.config["evaluation_config"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluation config is now an automatically generated (full) AlgorithmConfig inside the main AlgorithmConfig (property: self.evaluation_config). The self.evaluation_config property within the evaluation config is None. This helps us do all validation checking early on (before freezing) and simplifies handling of eval configs.

The typical eval-config override (via a dict) is still supported, but will eventually be replaced by objects (e.g. a to-be-designed AlgorithmConfigOverride class) as well.

"returns a subclass of DefaultCallbacks, got "
f"{config['callbacks']}!"
)
from ray.rllib.models.catalog import MODEL_DEFAULTS
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the code here was moved into AlgorithmConfig itself. We should eventually deprecate all validate_config methods.

@@ -152,13 +160,18 @@ def __init__(self, algo_class=None):
}

# `self.multi_agent()`
self.policies = {}
self._is_multi_agent = False
self.policies = {DEFAULT_POLICY_ID: PolicySpec()}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved more default settings here into ctor (where all defaults belong).

self.observation_fn = None
self.count_steps_by = "env_steps"
self._multi_agent_legacy_dict = {}
self._set_ma_legacy_dict()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shim for backward compatibility, iff users still access the "multiagent" key inside a AlgorithmConfig object and poke around in that dict (e.g. access config["multiagent"]["policies"]).

@@ -170,14 +183,15 @@ def __init__(self, algo_class=None):
self.output_config = {}
self.output_compress_columns = ["obs", "new_obs"]
self.output_max_file_size = 64 * 1024 * 1024
self.offline_sampling = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this here (only used by CRR and CQL, BUT required by RolloutWorker, which had to do a hasattr check :/ )

@@ -214,6 +228,9 @@ def __init__(self, algo_class=None):
self._disable_action_flattening = False
self._disable_execution_plan_api = True

# Has this config object been frozen (cannot alter its attributes anymore).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlgorithmConfig objects can now be frozen (all Algorithms do this in their ctor now) to make sure no one can alter the config anymore once passed into an Algo's c'tor.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 requested a review from a team as a code owner October 25, 2022 16:19
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit 182744b into ray-project:master Oct 26, 2022
krfricke added a commit that referenced this pull request Oct 27, 2022
…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)"

This reverts commit 182744b.
gjoliver pushed a commit that referenced this pull request Oct 27, 2022
…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)" (#29742)

This reverts commit 182744b.
sven1977 added a commit that referenced this pull request Oct 27, 2022
…s, RolloutWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (#29395)" (#29742)"

This reverts commit 12b579d.
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
…, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (ray-project#29395)

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (ray-project#29395)" (ray-project#29742)

This reverts commit 182744b.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@sven1977 sven1977 deleted the algo_configs_next_steps_1 branch May 5, 2023 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants