[RLlib] Fix PPOTorchPolicy producing float metrics when not using critic #27980

ArturNiederfahrenhorst · 2022-08-18T09:59:39Z

Signed-off-by: Artur Niederfahrenhorst artur@anyscale.com

Why are these changes needed?

PPOTorchPolicy produces float metrics when not using a critic. This produces errors when, for example, moving them to another device. This also unifies the behaviour with TF, which will default to vf_explained_var=-1.

Related issue number

Closes #27822

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

sven1977 · 2022-08-18T14:55:42Z

rllib/algorithms/ppo/ppo_torch_policy.py

@@ -148,8 +148,8 @@ def reduce_mean_valid(t):
            mean_vf_loss = reduce_mean_valid(vf_loss_clipped)
        # Ignore the value function.
        else:
-            value_fn_out = 0
-            vf_loss_clipped = mean_vf_loss = 0.0
+            value_fn_out = torch.tensor(0.0).to(self.device)


Can you change this to mean_policy_loss.device? This will make sure that - in case we have multi-GPU - the tensor is really on the correct GPU and NOT on the Policy.device (which is the CPU, I believe).

Done. Looking at TorchPolicyV2 though, I think self.device will only be "cpu" in the case of fake GPUs or no GPUs.

l 134ff
self.devices = [ torch.device("cuda:{}".format(i)) for i, id_ in enumerate(gpu_ids) if i < num_gpus ] self.device = self.devices[0]

But still, if you had multiple GPUs, which one would be stored in self.device?
It's better to definitely use the same device as all the other tensors in this particular instance of the loss function (which is called on each(!) of the GPUs).

sven1977 · 2022-08-18T14:56:25Z

rllib/utils/torch_utils.py

@@ -211,6 +211,9 @@ def explained_variance(y: TensorType, pred: TensorType) -> TensorType:
        The explained variance given a pair of labels and predictions.
    """
    y_var = torch.var(y, dim=[0])
+    if y_var == 0.0:
+        # Model case in which y does not vary with explained variance of -1
+        return torch.tensor(-1.0)


We should do torch.tensor(-1.0).to(pred.device) instead to make sure this new code is also GPU compatible.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

sven1977

Thanks for the fixes @ArturNiederfahrenhorst , looks good now.

initial

6b50794

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

ArturNiederfahrenhorst requested review from sven1977, gjoliver, avnishn, smorad, maxpumperla, kouroshHakha and krfricke as code owners August 18, 2022 09:59

ArturNiederfahrenhorst added 2 commits August 18, 2022 13:19

Model no variance with explained variance = 1

06b838f

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

unify with TF

69b81d9

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

sven1977 reviewed Aug 18, 2022

View reviewed changes

sven's comments

4648a09

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

sven1977 approved these changes Aug 22, 2022

View reviewed changes

sven1977 merged commit 7ddd14b into ray-project:master Aug 22, 2022

RaymondKoopmanschap mentioned this pull request Sep 1, 2022

[RLlib] Fix A3CTorchPolicy producing float metrics when not using critic #28236

Closed

ArturNiederfahrenhorst deleted the fixtorchppostats branch September 21, 2022 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Fix PPOTorchPolicy producing float metrics when not using critic #27980

[RLlib] Fix PPOTorchPolicy producing float metrics when not using critic #27980

ArturNiederfahrenhorst commented Aug 18, 2022 •

edited

Loading

sven1977 Aug 18, 2022

ArturNiederfahrenhorst Aug 18, 2022 •

edited

Loading

sven1977 Aug 22, 2022

sven1977 Aug 18, 2022

sven1977 left a comment

[RLlib] Fix PPOTorchPolicy producing float metrics when not using critic #27980

[RLlib] Fix PPOTorchPolicy producing float metrics when not using critic #27980

Conversation

ArturNiederfahrenhorst commented Aug 18, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 Aug 18, 2022

Choose a reason for hiding this comment

ArturNiederfahrenhorst Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

sven1977 Aug 22, 2022

Choose a reason for hiding this comment

sven1977 Aug 18, 2022

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

ArturNiederfahrenhorst commented Aug 18, 2022 •

edited

Loading

ArturNiederfahrenhorst Aug 18, 2022 •

edited

Loading