Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ray_rllib solver after the upgrade to ray 2.7.0 #284

Merged
merged 3 commits into from
Nov 30, 2023

Conversation

nhuet
Copy link
Contributor

@nhuet nhuet commented Nov 24, 2023

The RayRLlib solver class wraps ray.rllib algorithms.
Since we upgrade to gymnasium, the dependency in ray also upgrade from
1.10 to >=2.7.0 and the rllib api changed a bit.

Main changes/fixes:

  • ray.rllib.agents becomes ray.rllib.algorithms, Trainer -> Algorithm
  • policy_mapping_fn: maps now agent_id + episode + worker -> policy_id
  • algo.compute_action -> compute_single_action()
  • MultiAgentEnv follows now gym0.26/gymnasium convention => possibility
    to use MultiAgentEnvCompatibility rllib wrapper. We choose here,
    consistently with the work done in skdecide.hub.domain.gym to
    keep the old class AsRLlibMultiAgentEnv with the gym 0.21 api and wraps it.
    More precisely:
    • the old env is renamed as AsLegacyRLlibMultiAgentEnv and derives from AsLegacyGymV21Env
      in order to share some methods and define some missing ones (as render(), seed(), close())
    • the new AsRLlibMultiAgentEnv derives from MultiAgentEnvCompatibility which itself
      derives from MultiAgentEnv
  • MultiAgentEnv needs a get_agent_ids() (or at least a _agent_ids
    attribute)
  • MultiAgentEnv needs to define observation_space and action_space
    attributes. Preferably as gym.spaces.Dict in order to allow implicit
    implementation of new methods action_space_sample() and observation_space_sample()
  • done output in step() must be a dictionary with a key by agent + a
    "__all__ " key
  • algo.load() takes directly the directory path (which is overwritten at
    each store)

Other fixes done here:

  • __init__() of RayRLlib was using a mutable object as default value for
    policy_config which can lead to unforeseen bugs
  • main script moved as test in tests/ together with a test dedicated to
    AsRLlibMultiAgentEnv
  • fixes "__all__" value of done dictionary as MultiAgent domains are
    supposed to returned a dictionary agent_id->boolean instead of a
    single boolean
  • update the examples/rllib_solver.py
  • test also single agent domain to check the autocast works properly
    with rllib

@nhuet nhuet force-pushed the rllib branch 5 times, most recently from b099f6f to 7df3abc Compare November 30, 2023 11:52
@nhuet nhuet marked this pull request as ready for review November 30, 2023 11:52
Copy link
Collaborator

@neo-alex neo-alex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, please consider the few suggested minor changes and then LGTM.

def test_as_rllib_env_with_autocast_from_singleagent_to_multiagents():
ENV_NAME = "CartPole-v1"

domainupcasted = GymDomain(gym.make(ENV_NAME))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the previous PR, I suggest renaming it upcast_domain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, i change it right away

assert RayRLlib.check_domain(domain)

# solver factory
config_factory = lambda: PPO.get_default_config().resources(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A config_factory seems overkill here (why not just config?)

Copy link
Contributor Author

@nhuet nhuet Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it is crashing later when reusing it with the loaded solver (line 89). When loading, we reinitialize the underlyning algorithm and it does not want to use the previous config which has been "frozen" during training. So we need a fresh new config at that second step which is testing save/load.
And to be sure having the same config values as before I used this factory (perhaps a bit overkill, but i do not write twice the same values when i can avoid it since sooner or later, it introduce bugs when editing it only in one place...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment in the code to explain it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK, and then I suppose that the config being frozen is an issue (even if we don't modify it?)...

This solver class wraps ray.rllib algorithms.
Since we upgrade to gymnasium, the dependency in ray also upgrade from
1.10 to >=2.7.0 and the rllib api changed a bit.

Main changes/fixes:
- ray.rllib.agents becomes ray.rllib.algorithms, Trainer -> Algorithm
- policy_mapping_fn: maps now agent_id + episode + worker -> policy_id
- algo.compute_action -> compute_single_action()
- MultiAgentEnv follows now gym0.26/gymnasium convention => possibility
  to use MultiAgentEnvCompatibility rllib wrapper. We choose here,
  consistently with the work done in skdecide.hub.domain.gym to
  keep the old class AsRLlibMultiAgentEnv with the gym 0.21 api and wraps it.
  More precisely:
    - the old env is renamed as AsLegacyRLlibMultiAgentEnv and derives from AsLegacyGymV21Env
      in order to share some  methods and define some missing ones (as render(), seed(), close())
    - the new AsRLlibMultiAgentEnv derives from MultiAgentEnvCompatibility which itself
      derives from MultiAgentEnv
- MultiAgentEnv needs a get_agent_ids() (or at least a _agent_ids
  attribute)
- MultiAgentEnv needs to define observation_space and action_space
  attributes. Preferably as gym.spaces.Dict in order to allow implicit
  implementation of new methods action_space_sample() and observation_space_sample()
- done output in step() must be a dictionary with a key by agent + a
  __all__ key
- algo.load() takes directly the directory path (which is overwritten at
  each store)

Other fixes done here:
- __init__() of RayRLlib was using a mutable object as default value for
  policy_config which can lead to unforeseen bugs
- main script moved as test in tests/ together with a test dedicated to
  AsRLlibMultiAgentEnv
- fixes "__all__" value of done dictionary as MultiAgent domains are
  supposed to returned a dictionary agent_id->boolean instead of a
  single boolean
- update the examples/rllib_solver.py
- test also single agent domain to check the autocast works properly
  with rllib
…class definition

We faced serialization issues with ray.rllib solver and python 3.10.
This seemed to be due to NewType definition inside Domain class.
Actually the NewType assumed that the resulting is belonged by the
module and not the class which causes pickle issues.

See for instance https://stackoverflow.com/a/4677063 for similar issue.
The tests are hanging for ever on macos 11 in guthub actions with the
following message looping:

> (autoscaler +5h31m36s) Warning: The following resource request cannot be scheduled right now: {'CPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.

We limit the cpu quota allowed per worker to avoid this.
@neo-alex neo-alex merged commit fbf01e6 into airbus:master Nov 30, 2023
@nhuet nhuet deleted the rllib branch December 12, 2023 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants