[RLlib] Fix env_check for parametric actions (with action mask) #34790

inpefess · 2023-04-26T16:09:41Z

Why are these changes needed?

Ray RLlib has a great example of using a parametric actions environment, but now it works only with self._skip_env_checking = True. Gymnasium action spaces have a mask argument to their sample method. We apply this feature to fix the env_check behaviour in the parametric actions environments case.

Related issue number

Closes #23925

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Discussion

I added a test to rllib/utils/tests/test_env_check.py because it seems to belong there, but seven tests from this module fail in the master branch
The assumption that having a dictionary with the "action_mask" key is a standard parametric actions implementation seems too strong (one can use other names or add the mask to info, for example). If it's a standard or recommended way to do that in Ray RLlib, then one should also mention it in the documentation (now it's not mentioned at all, although the previous version of the documentation page was a bit more verbose). It seems to be a "standard" way, for example, in AlphaZero it's the same and in a couple of other examples.
Apart from ParametricActionsCartPole using the action_mask key, there is a ParametricActionsCartPoleNoEmbeddings having a key with the same meaning but called valid_avail_actions_mask. I've changed it to action_mask

inpefess · 2023-05-21T13:32:52Z

@avnishn Could you review this PR, please? Failing checks don't seem to be related to the changes proposed.

avnishn · 2023-05-25T22:17:39Z

The assumption that having a dictionary with the "action_mask" key is a standard parametric actions implementation seems too strong (one can use other names or add the mask to info, for example). If it's a standard or recommended way to do that in Ray RLlib, then one should also mention it in the documentation (now it's not mentioned at all, although the previous version of the documentation page was a bit more verbose). It seems to be a "standard" way, for example, in AlphaZero it's the same and in a couple of other examples.

I think that is very much the case. We have to think about if we want to make an opinionated design decision on how to support action masking in RLlib.

On an initial thought, I think that we'd rather not make an opinionated design decision here. I'll talk with @sven1977 who probably has more opinions on this and get back to you.

sven1977 · 2023-05-26T07:26:28Z

rllib/utils/tests/test_check_env.py

@@ -104,6 +106,10 @@ def test_step(self):
        with pytest.raises(ValueError, match=error):
            check_env(env)

+    def test_parametric_actions(self):


sven1977 · 2023-05-26T07:26:54Z

rllib/utils/pre_checks/env.py

@@ -212,6 +213,10 @@ def check_gym_environments(env: Union[gym.Env, "old_gym.Env"]) -> None:
                    space_type,
                )
            )
+    # sample a valid action in case of parametric actions
+    if isinstance(reset_obs, dict):


This is a really cool feature of gymnasium, actually :) which I didn't know about.

sven1977

Awesome PR. Thanks for filing this @inpefess . I would like to suggest one enhancement to make the "action_mask" key not hard-coded.
Can we add an additional config setting in therllib/algorithms/algorithm_config.py::AlgorithmConfig::environment() method to be able to customize the actual value of this action mask key in the observation_space?

something along the lines of:

config.environment("my_env", env_config=..., action_mask_key="action_mask")

Set the default value of self.action_mask_key = "action_mask" in the AlgorithmConfig c'tor.

Then use that value instead of the hard-coded one in the pre-check.
You might have to change the signature of check_env to pass along the AlgorithmConfig (from within the RolloutWorker) so that it has access to that configuration.

sven1977 · 2023-05-26T15:09:04Z

rllib/algorithms/algorithm_config.py

@@ -1377,6 +1376,9 @@ def environment(
                (gym.wrappers.EnvCompatibility). If False, RLlib will produce a
                descriptive error on which steps to perform to upgrade to gymnasium
                (or to switch this flag to True).
+             action_mask_key: If observation is a dictionary, expect the value by


Awesome, thanks for adding this so quickly! I think it's ready to merge now. Just waiting for tests to finish ..

sven1977

Looks great now. Thanks again @inpefess !

Signed-off-by: Boris Shminke <boris@shminke.ml>

…ay-project#34790) Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

afennelly-mitre · 2023-11-29T17:38:02Z

@inpefess @sven1977 @avnishn I was browsing through this PR from earlier this year, and wanted to verify if my assumption is correct about the changes in this PR:

the env_check for parametric actions (with action mask) will only work if the underlying environment is a gym.Env, and will not work if the environment is say, a VectorEnv or MultiAgentEnv since the logic is only added to the check_gym_environments() method?

inpefess requested review from sven1977, gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners April 26, 2023 16:09

inpefess force-pushed the env-check-with-action-mask branch 3 times, most recently from 703bd8c to 0d0c0f9 Compare April 30, 2023 15:26

avnishn approved these changes May 25, 2023

View reviewed changes

avnishn self-requested a review May 25, 2023 22:13

sven1977 reviewed May 26, 2023

View reviewed changes

inpefess force-pushed the env-check-with-action-mask branch from efdb078 to 9147f8d Compare May 26, 2023 14:39

sven1977 reviewed May 26, 2023

View reviewed changes

sven1977 approved these changes May 26, 2023

View reviewed changes

Boris Shminke added 9 commits June 19, 2023 11:05

action mask must be int8

d309c75

Signed-off-by: Boris Shminke <boris@shminke.ml>

sample using action mask if present

bad4299

Signed-off-by: Boris Shminke <boris@shminke.ml>

test with parametic actions

70c6137

Signed-off-by: Boris Shminke <boris@shminke.ml>

don't disable env_check

148d9f2

Signed-off-by: Boris Shminke <boris@shminke.ml>

action mask must be int8

3726d2d

Signed-off-by: Boris Shminke <boris@shminke.ml>

don't disable env_check

0c30468

Signed-off-by: Boris Shminke <boris@shminke.ml>

add action_mask_key to AlgorithmConfig

f6d7672

Signed-off-by: Boris Shminke <boris@shminke.ml>

add AlgorithmConfig as an optional argument to check_env

dc08e19

Signed-off-by: Boris Shminke <boris@shminke.ml>

action_mask_key can be valid_avail_actions_mask or anything

5f753c4

Signed-off-by: Boris Shminke <boris@shminke.ml>

inpefess force-pushed the env-check-with-action-mask branch from 524549d to 5f753c4 Compare June 19, 2023 09:05

sven1977 merged commit 05eea38 into ray-project:master Jun 20, 2023

akshay-anyscale mentioned this pull request Jul 21, 2023

Add service deployment instructions to stable diffusion template #37645

Closed

8 tasks

arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023

[RLlib] Fix env_check for parametric actions (with action mask). (r…

bcccdd1

…ay-project#34790) Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix env_check for parametric actions (with action mask) #34790

[RLlib] Fix env_check for parametric actions (with action mask) #34790

inpefess commented Apr 26, 2023 •

edited

Loading

inpefess commented May 21, 2023

avnishn commented May 25, 2023

sven1977 May 26, 2023

sven1977 May 26, 2023

sven1977 left a comment

sven1977 May 26, 2023

sven1977 left a comment

afennelly-mitre commented Nov 29, 2023

[RLlib] Fix env_check for parametric actions (with action mask) #34790

[RLlib] Fix env_check for parametric actions (with action mask) #34790

Conversation

inpefess commented Apr 26, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

Discussion

inpefess commented May 21, 2023

avnishn commented May 25, 2023

sven1977 May 26, 2023

Choose a reason for hiding this comment

sven1977 May 26, 2023

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

sven1977 May 26, 2023

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

afennelly-mitre commented Nov 29, 2023

inpefess commented Apr 26, 2023 •

edited

Loading