Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix PPOTorchPolicy producing float metrics when not using critic #27980

Merged
merged 4 commits into from
Aug 22, 2022

Conversation

ArturNiederfahrenhorst
Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst commented Aug 18, 2022

Signed-off-by: Artur Niederfahrenhorst artur@anyscale.com

Why are these changes needed?

PPOTorchPolicy produces float metrics when not using a critic. This produces errors when, for example, moving them to another device. This also unifies the behaviour with TF, which will default to vf_explained_var=-1.

Related issue number

Closes #27822

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
@@ -148,8 +148,8 @@ def reduce_mean_valid(t):
mean_vf_loss = reduce_mean_valid(vf_loss_clipped)
# Ignore the value function.
else:
value_fn_out = 0
vf_loss_clipped = mean_vf_loss = 0.0
value_fn_out = torch.tensor(0.0).to(self.device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to mean_policy_loss.device? This will make sure that - in case we have multi-GPU - the tensor is really on the correct GPU and NOT on the Policy.device (which is the CPU, I believe).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Looking at TorchPolicyV2 though, I think self.device will only be "cpu" in the case of fake GPUs or no GPUs.

l 134ff
self.devices = [ torch.device("cuda:{}".format(i)) for i, id_ in enumerate(gpu_ids) if i < num_gpus ] self.device = self.devices[0]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But still, if you had multiple GPUs, which one would be stored in self.device?
It's better to definitely use the same device as all the other tensors in this particular instance of the loss function (which is called on each(!) of the GPUs).

@@ -211,6 +211,9 @@ def explained_variance(y: TensorType, pred: TensorType) -> TensorType:
The explained variance given a pair of labels and predictions.
"""
y_var = torch.var(y, dim=[0])
if y_var == 0.0:
# Model case in which y does not vary with explained variance of -1
return torch.tensor(-1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do torch.tensor(-1.0).to(pred.device) instead to make sure this new code is also GPU compatible.

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes @ArturNiederfahrenhorst , looks good now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RL Lib] Reporting metric / stats wrong type.
2 participants