Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][FSDP] fix weight init when using apply() (fixes #490 and #444) #543

Merged
merged 6 commits into from
Mar 20, 2021

Conversation

myleott
Copy link
Contributor

@myleott myleott commented Mar 20, 2021

This also required reworking _all_buffers_to to no longer use apply, since it is called from within _lazy_init and created some circular logic: apply -> summon_full_params -> _lazy_init -> _all_buffers_to -> apply.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2021
@myleott myleott requested review from min-xu-ai and sshleifer March 20, 2021 13:00
@myleott myleott force-pushed the fsdp_apply branch 2 times, most recently from 46f422f to a31d5dd Compare March 20, 2021 13:09
@myleott myleott changed the title [fix][FSDP] Fix FSDP weight init (fixes #490 and #444) [fix][FSDP] fix weight init when using apply() (fixes #490 and #444) Mar 20, 2021
@myleott myleott marked this pull request as draft March 20, 2021 14:23
@myleott
Copy link
Contributor Author

myleott commented Mar 20, 2021

Doesn't work with SyncBN, will fix

Copy link
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice tests!

fairscale/nn/data_parallel/fully_sharded_data_parallel.py Outdated Show resolved Hide resolved
tests/nn/data_parallel/test_fsdp_apply.py Outdated Show resolved Hide resolved
Copy link
Contributor

@min-xu-ai min-xu-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice. Some nonblocking comments.

tests/nn/data_parallel/test_fsdp_apply.py Show resolved Hide resolved
tests/nn/data_parallel/test_fsdp_apply.py Outdated Show resolved Hide resolved
tests/nn/data_parallel/test_fsdp.py Show resolved Hide resolved
tests/ci_test_list_2.txt Show resolved Hide resolved
@min-xu-ai
Copy link
Contributor

The test failures might be related to the cast_buffer change?

@myleott
Copy link
Contributor Author

myleott commented Mar 20, 2021

The test failures might be related to the cast_buffer change?

It's very weird, tests only seem to fail on 1.6, but pass in 1.7.1 and 1.8 (and on my local machine)... will dig a bit

@myleott myleott marked this pull request as ready for review March 20, 2021 20:13
@myleott myleott merged commit fa1b85f into master Mar 20, 2021
@myleott myleott deleted the fsdp_apply branch March 20, 2021 21:15
facebook-github-bot pushed a commit to facebookresearch/vissl that referenced this pull request Apr 15, 2021
Summary:
Before this PR (facebookresearch/fairscale#543) was merged, we used to need the extra cuda() calls. Now, they are not needed.

Unfortunately, this doesn't solve the long model init time issue we have. A FSDP model init still take >20 mins for me. This is really bad for debugging the regnet128 conv layer crash problem I am debugging.

The following debugging output shows that most delays are in FSDP wrapping, some in BN wrapping and some in the layer wrapping.

```
INFO 2021-04-14 12:18:35,883 regnet_2.py: 159: block created
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu
INFO 2021-04-14 12:19:07,388 regnet_2.py: 163: block bn wrapped
INFO 2021-04-14 12:19:18,388 regnet_2.py: 166: block wrapped
```

In any case, this PR is pretty safe and should go in so that we don't need to do an extra `cuda()` call before wrapping.

Pull Request resolved: fairinternal/ssl_scaling#75

Reviewed By: prigoyal

Differential Revision: D27776285

Pulled By: min-xu-ai

fbshipit-source-id: 3e43c6fe750fd6ee35933400b03a069d62040d8a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FSDP] improve robustness to mismatch between torch.cuda.current_device and model's device
4 participants