[RLlib] APPO enhancements (new API stack) vol 02: Cleanup loss function, add GAE-lambda to vtrace, make rho-clip configurable. #48800

sven1977 · 2024-11-19T12:11:40Z

APPO enhancements (new API stack) vol 02:

Cleanup loss function (cleaner , more consistent namings)
add GAE-lambda to vtrace
make rho-clip configurable

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. Everything gets cleaned up now. This looks really good. Do we have already first training results?

simonsays1980 · 2024-11-19T14:11:50Z

rllib/algorithms/appo/torch/appo_torch_learner.py

+
+        # Target policy.
+        target_action_dist = action_dist_cls_train.from_logits(
+            module.forward_target(batch)[TARGET_ACTION_DIST_LOGITS_KEY]


Can we do this here, if we want ot use multiple learners? I remember for SAC and CQL we needed to run a single forward pass before the loss calculation.

You are right, but it does seem to run fine in the CI (multi-GPU). 🤷‍♂️

simonsays1980 · 2024-11-19T14:13:49Z

rllib/algorithms/appo/torch/appo_torch_learner.py

-        # The discount factor that is used should be gamma except for timesteps where
-        # the episode is terminated. In that case, the discount factor should be 0.
+        # The discount factor that is used should be `gamma * lambda_`, except for
+        # termination timesteps, in which case the discount factor should be 0.


Why not gamma^(t-1)?

…on, add GAE-lambda to vtrace, make rho-clip configurable. (ray-project#48800) Signed-off-by: hjiang <dentinyhao@gmail.com>

sven1977 added 2 commits November 19, 2024 12:28

wip

5cffa87

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

dff411c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested a review from simonsays1980 as a code owner November 19, 2024 12:11

sven1977 assigned simonsays1980 Nov 19, 2024

sven1977 added rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. rllib-newstack labels Nov 19, 2024

simonsays1980 approved these changes Nov 19, 2024

View reviewed changes

sven1977 enabled auto-merge (squash) November 19, 2024 15:29

github-actions bot added the go add ONLY when ready to merge, run all tests label Nov 19, 2024

sven1977 merged commit 4165b1a into ray-project:master Nov 19, 2024
6 checks passed

sven1977 deleted the appo_enhancements_02_fix_rho branch November 20, 2024 11:30

dentiny pushed a commit to dentiny/ray that referenced this pull request Dec 7, 2024

[RLlib] APPO enhancements (new API stack) vol 02: Cleanup loss functi…

91f5f14

…on, add GAE-lambda to vtrace, make rho-clip configurable. (ray-project#48800) Signed-off-by: hjiang <dentinyhao@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] APPO enhancements (new API stack) vol 02: Cleanup loss function, add GAE-lambda to vtrace, make rho-clip configurable. #48800

[RLlib] APPO enhancements (new API stack) vol 02: Cleanup loss function, add GAE-lambda to vtrace, make rho-clip configurable. #48800

sven1977 commented Nov 19, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Nov 19, 2024

sven1977 Nov 19, 2024

simonsays1980 Nov 19, 2024

[RLlib] APPO enhancements (new API stack) vol 02: Cleanup loss function, add GAE-lambda to vtrace, make rho-clip configurable. #48800

[RLlib] APPO enhancements (new API stack) vol 02: Cleanup loss function, add GAE-lambda to vtrace, make rho-clip configurable. #48800

Conversation

sven1977 commented Nov 19, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

simonsays1980 Nov 19, 2024

Choose a reason for hiding this comment

sven1977 Nov 19, 2024

Choose a reason for hiding this comment

simonsays1980 Nov 19, 2024

Choose a reason for hiding this comment

sven1977 commented Nov 19, 2024 •

edited

Loading