-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] AlgorithmConfig: Next steps (volume 01) #29395
[RLlib] AlgorithmConfig: Next steps (volume 01) #29395
Conversation
# Conflicts: # rllib/policy/policy.py
…_configs_next_steps_1
@@ -18,6 +18,7 @@ | |||
) | |||
|
|||
if TYPE_CHECKING: | |||
from ray.rllib.policy.policy import Policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotation bug fix.
@@ -311,40 +308,50 @@ def from_state(state: Dict) -> "Algorithm": | |||
@PublicAPI | |||
def __init__( | |||
self, | |||
config: Optional[Union[PartialAlgorithmConfigDict, AlgorithmConfig]] = None, | |||
env: Optional[Union[str, EnvType]] = None, | |||
config: Union[AlgorithmConfig, PartialAlgorithmConfigDict], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecated passing env
into Algo's c'tor. Should no longer be used and now causes an error.
# TODO: In the future, only support AlgorithmConfig objects here. | ||
if isinstance(config, AlgorithmConfig): | ||
config = config.to_dict() | ||
if isinstance(config, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retro-support old-style python config dicts for a while.
We will translate these into AlgorithmConfig objects from here on, freeze the AlgorithmConfig object (so it cannot be changed anymore by anyone), and use it under the hood in all Algorithms.
The added backward-compat mechanism to look up old-style dict keys (str) from those AlgorithmConfig objects makes sure, this even works for custom algos that still think they are dealing with a config dict.
else: | ||
config = default_config.update_from_dict(config) | ||
|
||
if env is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here on, everything config-related inside an algo is a AlgorithmConfig object.
@@ -427,19 +438,24 @@ def default_logger_creator(config): | |||
|
|||
@OverrideToImplementCustomLogic | |||
@classmethod | |||
def get_default_config(cls) -> AlgorithmConfigDict: | |||
return AlgorithmConfig().to_dict() | |||
def get_default_config(cls) -> Union[AlgorithmConfig, AlgorithmConfigDict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can still be overridden in two ways: return dict OR return instantiated AlgorithmConfig object.
rllib/algorithms/algorithm.py
Outdated
) | ||
|
||
self.config["evaluation_config"] = eval_config | ||
self.validate_config(self.config.evaluation_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do most of the config validation logic now inside the AlgorithmConfig class itself. So a lot of the validate_config code disappears. Eventually, we should get rid of validate_config entirely (should be handled by each config class directly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fantastic!
@@ -576,65 +592,12 @@ def setup(self, config: PartialAlgorithmConfigDict): | |||
|
|||
# Evaluation WorkerSet setup. | |||
# User would like to setup a separate evaluation worker set. | |||
|
|||
# Update with evaluation settings: | |||
user_eval_config = copy.deepcopy(self.config["evaluation_config"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluation config is now an automatically generated (full) AlgorithmConfig inside the main AlgorithmConfig (property: self.evaluation_config
). The self.evaluation_config
property within the evaluation config is None. This helps us do all validation checking early on (before freezing) and simplifies handling of eval configs.
The typical eval-config override (via a dict) is still supported, but will eventually be replaced by objects (e.g. a to-be-designed AlgorithmConfigOverride
class) as well.
"returns a subclass of DefaultCallbacks, got " | ||
f"{config['callbacks']}!" | ||
) | ||
from ray.rllib.models.catalog import MODEL_DEFAULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the code here was moved into AlgorithmConfig itself. We should eventually deprecate all validate_config
methods.
@@ -152,13 +160,18 @@ def __init__(self, algo_class=None): | |||
} | |||
|
|||
# `self.multi_agent()` | |||
self.policies = {} | |||
self._is_multi_agent = False | |||
self.policies = {DEFAULT_POLICY_ID: PolicySpec()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved more default settings here into ctor (where all defaults belong).
rllib/algorithms/algorithm_config.py
Outdated
self.observation_fn = None | ||
self.count_steps_by = "env_steps" | ||
self._multi_agent_legacy_dict = {} | ||
self._set_ma_legacy_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shim for backward compatibility, iff users still access the "multiagent" key inside a AlgorithmConfig object and poke around in that dict (e.g. access config["multiagent"]["policies"]).
@@ -170,14 +183,15 @@ def __init__(self, algo_class=None): | |||
self.output_config = {} | |||
self.output_compress_columns = ["obs", "new_obs"] | |||
self.output_max_file_size = 64 * 1024 * 1024 | |||
self.offline_sampling = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this here (only used by CRR and CQL, BUT required by RolloutWorker, which had to do a hasattr check :/ )
@@ -214,6 +228,9 @@ def __init__(self, algo_class=None): | |||
self._disable_action_flattening = False | |||
self._disable_execution_plan_api = True | |||
|
|||
# Has this config object been frozen (cannot alter its attributes anymore). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AlgorithmConfig objects can now be frozen (all Algorithms do this in their ctor now) to make sure no one can alter the config anymore once passed into an Algo's c'tor.
…_configs_next_steps_1
…, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (ray-project#29395) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
…utWorker, PolicyMap, WorkerSet use AlgorithmConfig objects under the hood. (ray-project#29395)" (ray-project#29742) This reverts commit 182744b. Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
This PR takes the next step in our journey to fully move off of python config dicts and utilize AlgorithmConfig (and its subclasses) in all of RLlib under the hood.
In particular, this PR:
Algorithm
s,RolloutWorker
,Sampler
,WorkerSet
, andPolicyMap
utilize AlgorithmConfig under the hood. In case an old-style python dict is passed into any of these constructors, an automatic conversion (to AlgorithmConfig) is applied. In case a user (or RLlib) still treats configs as dicts, AlgorithmConfig can handle getattr, setattr, etc.. gracefully. These shim-helper-methods will be removed in the future.AlgorithmConfig
class itself. So a lot of thevalidate_config
code disappears. Eventually, we should get rid ofvalidate_config
entirely (should be handled by each config class directly).** Dramatically reduced complexity of its ctor signature (due to the fact that most args should have already been defined in the config anyways).
** Now uses AlgorithmConfig under the hood (passing in a dict is still supported, though).
** Moved some of the "conversion" logic that it used to perform on the config into AlgorithmConfig itself (e.g. the construction and unification of the policies dict in a multi-agent setup).
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.