-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix env_check for parametric actions (with action mask) #34790
[RLlib] Fix env_check for parametric actions (with action mask) #34790
Conversation
703bd8c
to
0d0c0f9
Compare
@avnishn Could you review this PR, please? Failing checks don't seem to be related to the changes proposed. |
I think that is very much the case. We have to think about if we want to make an opinionated design decision on how to support action masking in RLlib. On an initial thought, I think that we'd rather not make an opinionated design decision here. I'll talk with @sven1977 who probably has more opinions on this and get back to you. |
@@ -104,6 +106,10 @@ def test_step(self): | |||
with pytest.raises(ValueError, match=error): | |||
check_env(env) | |||
|
|||
def test_parametric_actions(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
@@ -212,6 +213,10 @@ def check_gym_environments(env: Union[gym.Env, "old_gym.Env"]) -> None: | |||
space_type, | |||
) | |||
) | |||
# sample a valid action in case of parametric actions | |||
if isinstance(reset_obs, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really cool feature of gymnasium, actually :) which I didn't know about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR. Thanks for filing this @inpefess . I would like to suggest one enhancement to make the "action_mask" key not hard-coded.
Can we add an additional config setting in therllib/algorithms/algorithm_config.py::AlgorithmConfig::environment()
method to be able to customize the actual value of this action mask key in the observation_space?
something along the lines of:
config.environment("my_env", env_config=..., action_mask_key="action_mask")
Set the default value of self.action_mask_key = "action_mask"
in the AlgorithmConfig c'tor.
Then use that value instead of the hard-coded one in the pre-check.
You might have to change the signature of check_env
to pass along the AlgorithmConfig (from within the RolloutWorker) so that it has access to that configuration.
efdb078
to
9147f8d
Compare
@@ -1377,6 +1376,9 @@ def environment( | |||
(gym.wrappers.EnvCompatibility). If False, RLlib will produce a | |||
descriptive error on which steps to perform to upgrade to gymnasium | |||
(or to switch this flag to True). | |||
action_mask_key: If observation is a dictionary, expect the value by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for adding this so quickly! I think it's ready to merge now. Just waiting for tests to finish ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great now. Thanks again @inpefess !
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
524549d
to
5f753c4
Compare
…ay-project#34790) Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
@inpefess @sven1977 @avnishn I was browsing through this PR from earlier this year, and wanted to verify if my assumption is correct about the changes in this PR: the env_check for parametric actions (with action mask) will only work if the underlying environment is a |
Why are these changes needed?
Ray RLlib has a great example of using a parametric actions environment, but now it works only with
self._skip_env_checking = True
. Gymnasium action spaces have amask
argument to theirsample
method. We apply this feature to fix theenv_check
behaviour in the parametric actions environments case.Related issue number
Closes #23925
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.Discussion
rllib/utils/tests/test_env_check.py
because it seems to belong there, but seven tests from this module fail in themaster
branchinfo
, for example). If it's a standard or recommended way to do that in Ray RLlib, then one should also mention it in the documentation (now it's not mentioned at all, although the previous version of the documentation page was a bit more verbose). It seems to be a "standard" way, for example, in AlphaZero it's the same and in a couple of other examples.ParametricActionsCartPole
using theaction_mask
key, there is aParametricActionsCartPoleNoEmbeddings
having a key with the same meaning but calledvalid_avail_actions_mask
. I've changed it toaction_mask