Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix env_check for parametric actions (with action mask) #34790

Merged
merged 9 commits into from
Jun 20, 2023

Conversation

inpefess
Copy link
Contributor

@inpefess inpefess commented Apr 26, 2023

Why are these changes needed?

Ray RLlib has a great example of using a parametric actions environment, but now it works only with self._skip_env_checking = True. Gymnasium action spaces have a mask argument to their sample method. We apply this feature to fix the env_check behaviour in the parametric actions environments case.

Related issue number

Closes #23925

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Discussion

  1. I added a test to rllib/utils/tests/test_env_check.py because it seems to belong there, but seven tests from this module fail in the master branch
  2. The assumption that having a dictionary with the "action_mask" key is a standard parametric actions implementation seems too strong (one can use other names or add the mask to info, for example). If it's a standard or recommended way to do that in Ray RLlib, then one should also mention it in the documentation (now it's not mentioned at all, although the previous version of the documentation page was a bit more verbose). It seems to be a "standard" way, for example, in AlphaZero it's the same and in a couple of other examples.
  3. Apart from ParametricActionsCartPole using the action_mask key, there is a ParametricActionsCartPoleNoEmbeddings having a key with the same meaning but called valid_avail_actions_mask. I've changed it to action_mask

@inpefess
Copy link
Contributor Author

@avnishn Could you review this PR, please? Failing checks don't seem to be related to the changes proposed.

@avnishn avnishn self-requested a review May 25, 2023 22:13
@avnishn
Copy link
Member

avnishn commented May 25, 2023

The assumption that having a dictionary with the "action_mask" key is a standard parametric actions implementation seems too strong (one can use other names or add the mask to info, for example). If it's a standard or recommended way to do that in Ray RLlib, then one should also mention it in the documentation (now it's not mentioned at all, although the previous version of the documentation page was a bit more verbose). It seems to be a "standard" way, for example, in AlphaZero it's the same and in a couple of other examples.

I think that is very much the case. We have to think about if we want to make an opinionated design decision on how to support action masking in RLlib.

On an initial thought, I think that we'd rather not make an opinionated design decision here. I'll talk with @sven1977 who probably has more opinions on this and get back to you.

@@ -104,6 +106,10 @@ def test_step(self):
with pytest.raises(ValueError, match=error):
check_env(env)

def test_parametric_actions(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@@ -212,6 +213,10 @@ def check_gym_environments(env: Union[gym.Env, "old_gym.Env"]) -> None:
space_type,
)
)
# sample a valid action in case of parametric actions
if isinstance(reset_obs, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really cool feature of gymnasium, actually :) which I didn't know about.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR. Thanks for filing this @inpefess . I would like to suggest one enhancement to make the "action_mask" key not hard-coded.
Can we add an additional config setting in therllib/algorithms/algorithm_config.py::AlgorithmConfig::environment() method to be able to customize the actual value of this action mask key in the observation_space?

something along the lines of:

config.environment("my_env", env_config=..., action_mask_key="action_mask")

Set the default value of self.action_mask_key = "action_mask" in the AlgorithmConfig c'tor.

Then use that value instead of the hard-coded one in the pre-check.
You might have to change the signature of check_env to pass along the AlgorithmConfig (from within the RolloutWorker) so that it has access to that configuration.

@inpefess inpefess force-pushed the env-check-with-action-mask branch from efdb078 to 9147f8d Compare May 26, 2023 14:39
@@ -1377,6 +1376,9 @@ def environment(
(gym.wrappers.EnvCompatibility). If False, RLlib will produce a
descriptive error on which steps to perform to upgrade to gymnasium
(or to switch this flag to True).
action_mask_key: If observation is a dictionary, expect the value by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for adding this so quickly! I think it's ready to merge now. Just waiting for tests to finish ..

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now. Thanks again @inpefess !

Boris Shminke added 9 commits June 19, 2023 11:05
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
Signed-off-by: Boris Shminke <boris@shminke.ml>
@inpefess inpefess force-pushed the env-check-with-action-mask branch from 524549d to 5f753c4 Compare June 19, 2023 09:05
@sven1977 sven1977 merged commit 05eea38 into ray-project:master Jun 20, 2023
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
@afennelly-mitre
Copy link

@inpefess @sven1977 @avnishn I was browsing through this PR from earlier this year, and wanted to verify if my assumption is correct about the changes in this PR:

the env_check for parametric actions (with action mask) will only work if the underlying environment is a gym.Env, and will not work if the environment is say, a VectorEnv or MultiAgentEnv since the logic is only added to the check_gym_environments() method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RLlib] check_gym_environments not working with parametric actions
4 participants