[RLlib] Add PPO multi-agent StatelessCartPole learning tests. #47196

sven1977 · 2024-08-19T12:48:40Z

Add PPO multi-agent StatelessCartPole learning tests to CI.

single (CPU) Learner
2 CPUs
single GPU Learner
2 GPUs

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…appo_multi_agent_cartpole_tests

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. Would be nice to have some comments and a refactoring into helper functions.

simonsays1980 · 2024-08-19T14:17:02Z

rllib/core/learner/learner_group.py

+        num_iters,
+    ):
+        # Count total number of timesteps per module ID.
+        if isinstance(episodes[0], MultiAgentEpisode):


Why isn't it possible to use generator/iterator that can run empty and if so the learner returns?

simonsays1980 · 2024-08-19T14:21:42Z

rllib/env/single_agent_episode.py

-                ),
-                lookback=self.observations.lookback,
-                space=self.observation_space,
+        _lb = (


Could we leave a comment what we are calculating here and when this case can occur, please?

simonsays1980 · 2024-08-19T14:23:32Z

rllib/env/single_agent_episode.py

+            space=self.action_space,
+        )
+
+        _lb = (


Also could we refactor into a helper function?

simonsays1980 · 2024-08-19T14:27:32Z

rllib/tuned_examples/ppo/multi_agent_stateless_cartpole_ppo.py

+    )
+    .environment("multi_stateless_cart")
+    .env_runners(
+        env_to_module_connector=lambda env: MeanStdFilter(multi_agent=True),


I remember, we have still this open question, if we need to add this connector also to learner or not. I think we do not need it: MeanStd rewrites observations and learner has these observations then, correct?

Correct, this connector (and most other env-to-module ones) directly writes back into the episode, thus making the change to the observation permanent. Hence no need to also add it to the Learner pipeline as the Learner pipeline then operates on the already changed episodes/observations.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ppo_multi_agent_stateless_cartpole Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/BUILD # rllib/core/learner/learner_group.py # rllib/env/single_agent_episode.py # rllib/tuned_examples/ppo/multi_agent_pendulum_ppo.py # rllib/utils/minibatch_utils.py

…ppo_multi_agent_stateless_cartpole

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ppo_multi_agent_stateless_cartpole

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ppo_multi_agent_stateless_cartpole

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ppo_multi_agent_stateless_cartpole

sven1977 added 9 commits June 28, 2024 16:00

wip

9ee2620

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

200ab25

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

8f4541f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

e27dbb1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into fix_…

9879005

…appo_multi_agent_cartpole_tests

wip

18415b9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

c1258ae

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

2c4ed58

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

9239ee8

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners August 19, 2024 12:48

sven1977 assigned simonsays1980 Aug 19, 2024

sven1977 enabled auto-merge (squash) August 19, 2024 12:57

github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 19, 2024

simonsays1980 approved these changes Aug 19, 2024

View reviewed changes

wip

4e3442d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge August 20, 2024 11:06

sven1977 added 10 commits August 20, 2024 16:23

Merge branch 'master' of https://github.com/ray-project/ray into fix_…

69618e6

…ppo_multi_agent_stateless_cartpole

Merge branch 'master' of https://github.com/ray-project/ray into fix_…

79cd483

…ppo_multi_agent_stateless_cartpole

wip

a1fad07

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

7e27ee4

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into fix_…

4e5c267

…ppo_multi_agent_stateless_cartpole

wip

a271782

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into fix_…

37b8c0f

…ppo_multi_agent_stateless_cartpole

wip

87fae96

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into fix_…

7b26750

…ppo_multi_agent_stateless_cartpole

sven1977 merged commit c8baeb2 into ray-project:master Aug 23, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add PPO multi-agent StatelessCartPole learning tests. #47196

[RLlib] Add PPO multi-agent StatelessCartPole learning tests. #47196

sven1977 commented Aug 19, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Aug 19, 2024

simonsays1980 Aug 19, 2024

simonsays1980 Aug 19, 2024

simonsays1980 Aug 19, 2024

sven1977 Aug 21, 2024

[RLlib] Add PPO multi-agent StatelessCartPole learning tests. #47196

[RLlib] Add PPO multi-agent StatelessCartPole learning tests. #47196

Conversation

sven1977 commented Aug 19, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

simonsays1980 Aug 19, 2024

Choose a reason for hiding this comment

sven1977 Aug 21, 2024

Choose a reason for hiding this comment

sven1977 commented Aug 19, 2024 •

edited

Loading