-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix PPOTorchPolicy producing float metrics when not using critic #27980
[RLlib] Fix PPOTorchPolicy producing float metrics when not using critic #27980
Conversation
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
@@ -148,8 +148,8 @@ def reduce_mean_valid(t): | |||
mean_vf_loss = reduce_mean_valid(vf_loss_clipped) | |||
# Ignore the value function. | |||
else: | |||
value_fn_out = 0 | |||
vf_loss_clipped = mean_vf_loss = 0.0 | |||
value_fn_out = torch.tensor(0.0).to(self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to mean_policy_loss.device
? This will make sure that - in case we have multi-GPU - the tensor is really on the correct GPU and NOT on the Policy.device (which is the CPU, I believe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Looking at TorchPolicyV2 though, I think self.device will only be "cpu" in the case of fake GPUs or no GPUs.
l 134ff
self.devices = [ torch.device("cuda:{}".format(i)) for i, id_ in enumerate(gpu_ids) if i < num_gpus ] self.device = self.devices[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But still, if you had multiple GPUs, which one would be stored in self.device
?
It's better to definitely use the same device as all the other tensors in this particular instance of the loss function (which is called on each(!) of the GPUs).
rllib/utils/torch_utils.py
Outdated
@@ -211,6 +211,9 @@ def explained_variance(y: TensorType, pred: TensorType) -> TensorType: | |||
The explained variance given a pair of labels and predictions. | |||
""" | |||
y_var = torch.var(y, dim=[0]) | |||
if y_var == 0.0: | |||
# Model case in which y does not vary with explained variance of -1 | |||
return torch.tensor(-1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do torch.tensor(-1.0).to(pred.device)
instead to make sure this new code is also GPU compatible.
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes @ArturNiederfahrenhorst , looks good now.
Signed-off-by: Artur Niederfahrenhorst artur@anyscale.com
Why are these changes needed?
PPOTorchPolicy produces float metrics when not using a critic. This produces errors when, for example, moving them to another device. This also unifies the behaviour with TF, which will default to vf_explained_var=-1.
Related issue number
Closes #27822
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.