-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ray_rllib solver after the upgrade to ray 2.7.0 #284
Conversation
b099f6f
to
7df3abc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, please consider the few suggested minor changes and then LGTM.
def test_as_rllib_env_with_autocast_from_singleagent_to_multiagents(): | ||
ENV_NAME = "CartPole-v1" | ||
|
||
domainupcasted = GymDomain(gym.make(ENV_NAME)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the previous PR, I suggest renaming it upcast_domain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, i change it right away
assert RayRLlib.check_domain(domain) | ||
|
||
# solver factory | ||
config_factory = lambda: PPO.get_default_config().resources( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A config_factory seems overkill here (why not just config
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it is crashing later when reusing it with the loaded solver (line 89). When loading, we reinitialize the underlyning algorithm and it does not want to use the previous config which has been "frozen" during training. So we need a fresh new config at that second step which is testing save/load.
And to be sure having the same config values as before I used this factory (perhaps a bit overkill, but i do not write twice the same values when i can avoid it since sooner or later, it introduce bugs when editing it only in one place...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment in the code to explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh OK, and then I suppose that the config being frozen is an issue (even if we don't modify it?)...
This solver class wraps ray.rllib algorithms. Since we upgrade to gymnasium, the dependency in ray also upgrade from 1.10 to >=2.7.0 and the rllib api changed a bit. Main changes/fixes: - ray.rllib.agents becomes ray.rllib.algorithms, Trainer -> Algorithm - policy_mapping_fn: maps now agent_id + episode + worker -> policy_id - algo.compute_action -> compute_single_action() - MultiAgentEnv follows now gym0.26/gymnasium convention => possibility to use MultiAgentEnvCompatibility rllib wrapper. We choose here, consistently with the work done in skdecide.hub.domain.gym to keep the old class AsRLlibMultiAgentEnv with the gym 0.21 api and wraps it. More precisely: - the old env is renamed as AsLegacyRLlibMultiAgentEnv and derives from AsLegacyGymV21Env in order to share some methods and define some missing ones (as render(), seed(), close()) - the new AsRLlibMultiAgentEnv derives from MultiAgentEnvCompatibility which itself derives from MultiAgentEnv - MultiAgentEnv needs a get_agent_ids() (or at least a _agent_ids attribute) - MultiAgentEnv needs to define observation_space and action_space attributes. Preferably as gym.spaces.Dict in order to allow implicit implementation of new methods action_space_sample() and observation_space_sample() - done output in step() must be a dictionary with a key by agent + a __all__ key - algo.load() takes directly the directory path (which is overwritten at each store) Other fixes done here: - __init__() of RayRLlib was using a mutable object as default value for policy_config which can lead to unforeseen bugs - main script moved as test in tests/ together with a test dedicated to AsRLlibMultiAgentEnv - fixes "__all__" value of done dictionary as MultiAgent domains are supposed to returned a dictionary agent_id->boolean instead of a single boolean - update the examples/rllib_solver.py - test also single agent domain to check the autocast works properly with rllib
…class definition We faced serialization issues with ray.rllib solver and python 3.10. This seemed to be due to NewType definition inside Domain class. Actually the NewType assumed that the resulting is belonged by the module and not the class which causes pickle issues. See for instance https://stackoverflow.com/a/4677063 for similar issue.
The tests are hanging for ever on macos 11 in guthub actions with the following message looping: > (autoscaler +5h31m36s) Warning: The following resource request cannot be scheduled right now: {'CPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster. We limit the cpu quota allowed per worker to avoid this.
The
RayRLlib
solver class wrapsray.rllib
algorithms.Since we upgrade to gymnasium, the dependency in ray also upgrade from
1.10 to >=2.7.0 and the rllib api changed a bit.
Main changes/fixes:
ray.rllib.agents
becomesray.rllib.algorithms
,Trainer
->Algorithm
policy_mapping_fn
: maps now agent_id + episode + worker -> policy_idalgo.compute_action
->compute_single_action()
MultiAgentEnv
follows now gym0.26/gymnasium convention => possibilityto use
MultiAgentEnvCompatibility
rllib wrapper. We choose here,consistently with the work done in
skdecide.hub.domain.gym
tokeep the old class
AsRLlibMultiAgentEnv
with the gym 0.21 api and wraps it.More precisely:
AsLegacyRLlibMultiAgentEnv
and derives fromAsLegacyGymV21Env
in order to share some methods and define some missing ones (as
render()
,seed()
,close()
)AsRLlibMultiAgentEnv
derives fromMultiAgentEnvCompatibility
which itselfderives from
MultiAgentEnv
MultiAgentEnv
needs aget_agent_ids()
(or at least a_agent_ids
attribute)
MultiAgentEnv
needs to defineobservation_space
andaction_space
attributes. Preferably as
gym.spaces.Dict
in order to allow implicitimplementation of new methods
action_space_sample()
andobservation_space_sample()
done
output instep()
must be a dictionary with a key by agent + a"__all__ "
keyalgo.load()
takes directly the directory path (which is overwritten ateach store)
Other fixes done here:
__init__()
ofRayRLlib
was using a mutable object as default value forpolicy_config which can lead to unforeseen bugs
AsRLlibMultiAgentEnv
"__all__"
value of done dictionary asMultiAgent
domains aresupposed to returned a dictionary agent_id->boolean instead of a
single boolean
with rllib