Fix Python DataParallel RNN in no_grad mode #21197

mrshenli · 2019-05-31T14:56:54Z

When grad is disabled, Python autograd function outputs are wrapped as detached aliases, which prevents calling Tensor.set_() on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call rnn.flatten_parameters() in the forward pass, as the function calls set_().

The proposed solution is to avoid using an autograd Broadcast if in no_grad mode.

@apsdehal

mrshenli · 2019-05-31T15:03:02Z

This is blocking facebookresearch/mmf/issues/76

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-06-01T19:04:25Z

@mrshenli merged this pull request in 51ebbe9.

Summary: Retry #21197 The previous one failed because it uses some Python3 only syntax. ezyang Do we still have multi-GPU py2 tests? I am curious why the CI tests did not catch this error. Pull Request resolved: #21262 Differential Revision: D15598941 Pulled By: mrshenli fbshipit-source-id: 95f416589448c443685d6d236d205b011998a715

BramVanroy · 2019-09-18T11:55:58Z

Is this fix part of 1.2?

gchanan · 2019-10-14T19:17:23Z

@BramVanroy yes, this made 1.2.

Emrys365 · 2020-07-21T03:53:44Z

Could you also modify the replicate part in data_parallel()? https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/data_parallel.py#L217

The same problem still happens when I call torch.nn.parallel.data_parallel instead of torch.nn.DataParallel. You can refer to this code snippet for reproducing the problem: #21108 (comment)

Fix Python DataParallel RNN in no_grad mode

12f043c

mrshenli requested review from colesbury, yf225, gchanan and pietern May 31, 2019 14:56

mrshenli added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 31, 2019

fix lint

abe8e8c

pytorchbot added the module: nn Related to torch.nn label May 31, 2019

facebook-github-bot reviewed May 31, 2019

View reviewed changes

soumith approved these changes Jun 1, 2019

View reviewed changes

facebook-github-bot closed this in 51ebbe9 Jun 1, 2019

facebook-github-bot added the merged label Jun 1, 2019

jamesr66a mentioned this pull request Jun 2, 2019

[JIT] Exclude file:line from graphs used for fuser kernel cache #21252

Closed

mrshenli mentioned this pull request Jun 2, 2019

Retry Fix Python DataParallel RNN in no_grad mode #21262

Closed

mrshenli deleted the nograd branch June 14, 2019 14:58

Emrys365 mentioned this pull request Jul 21, 2020

Fixing compatibility problems with PyTorch 1.3.0 in ESPnet (v0.5.3) espnet/espnet#1343

Merged

mruberry added the Merged label Oct 28, 2020

Emrys365 mentioned this pull request Feb 6, 2021

Minor refactoring espnet/espnet#2932

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Python DataParallel RNN in no_grad mode #21197

Fix Python DataParallel RNN in no_grad mode #21197

mrshenli commented May 31, 2019

mrshenli commented May 31, 2019

facebook-github-bot left a comment

facebook-github-bot commented Jun 1, 2019

BramVanroy commented Sep 18, 2019

gchanan commented Oct 14, 2019

Emrys365 commented Jul 21, 2020

Fix Python DataParallel RNN in no_grad mode #21197

Fix Python DataParallel RNN in no_grad mode #21197

Conversation

mrshenli commented May 31, 2019

mrshenli commented May 31, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 1, 2019

BramVanroy commented Sep 18, 2019

gchanan commented Oct 14, 2019

Emrys365 commented Jul 21, 2020