-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Add PPO multi-agent StatelessCartPole learning tests. #47196
[RLlib] Add PPO multi-agent StatelessCartPole learning tests. #47196
Conversation
…appo_multi_agent_cartpole_tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Would be nice to have some comments and a refactoring into helper functions.
rllib/core/learner/learner_group.py
Outdated
num_iters, | ||
): | ||
# Count total number of timesteps per module ID. | ||
if isinstance(episodes[0], MultiAgentEpisode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't it possible to use generator/iterator that can run empty and if so the learner returns?
rllib/env/single_agent_episode.py
Outdated
), | ||
lookback=self.observations.lookback, | ||
space=self.observation_space, | ||
_lb = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we leave a comment what we are calculating here and when this case can occur, please?
rllib/env/single_agent_episode.py
Outdated
space=self.action_space, | ||
) | ||
|
||
_lb = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also could we refactor into a helper function?
) | ||
.environment("multi_stateless_cart") | ||
.env_runners( | ||
env_to_module_connector=lambda env: MeanStdFilter(multi_agent=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember, we have still this open question, if we need to add this connector also to learner or not. I think we do not need it: MeanStd rewrites observations and learner has these observations then, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this connector (and most other env-to-module ones) directly writes back into the episode, thus making the change to the observation permanent. Hence no need to also add it to the Learner pipeline as the Learner pipeline then operates on the already changed episodes/observations.
…ppo_multi_agent_stateless_cartpole Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/BUILD # rllib/core/learner/learner_group.py # rllib/env/single_agent_episode.py # rllib/tuned_examples/ppo/multi_agent_pendulum_ppo.py # rllib/utils/minibatch_utils.py
…ppo_multi_agent_stateless_cartpole
…ppo_multi_agent_stateless_cartpole
…ppo_multi_agent_stateless_cartpole
…ppo_multi_agent_stateless_cartpole
…ppo_multi_agent_stateless_cartpole
Add PPO multi-agent StatelessCartPole learning tests to CI.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.