This repository has been archived by the owner on Mar 19, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Before this PR (facebookresearch/fairscale#543) was merged, we used to need the extra cuda() calls. Now, they are not needed. Unfortunately, this doesn't solve the long model init time issue we have. A FSDP model init still take >20 mins for me. This is really bad for debugging the regnet128 conv layer crash problem I am debugging. The following debugging output shows that most delays are in FSDP wrapping, some in BN wrapping and some in the layer wrapping. ``` INFO 2021-04-14 12:18:35,883 regnet_2.py: 159: block created INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:19:07,388 regnet_2.py: 163: block bn wrapped INFO 2021-04-14 12:19:18,388 regnet_2.py: 166: block wrapped ``` In any case, this PR is pretty safe and should go in so that we don't need to do an extra `cuda()` call before wrapping. Pull Request resolved: fairinternal/ssl_scaling#75 Reviewed By: prigoyal Differential Revision: D27776285 Pulled By: min-xu-ai fbshipit-source-id: 3e43c6fe750fd6ee35933400b03a069d62040d8a
- Loading branch information