remove extra cuda calls (#75)

Summary: Before this PR (facebookresearch/fairscale#543) was merged, we used to need the extra cuda() calls. Now, they are not needed. Unfortunately, this doesn't solve the long model init time issue we have. A FSDP model init still take >20 mins for me. This is really bad for debugging the regnet128 conv layer crash problem I am debugging. The following debugging output shows that most delays are in FSDP wrapping, some in BN wrapping and some in the layer wrapping. ``` INFO 2021-04-14 12:18:35,883 regnet_2.py: 159: block created INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:18:35,884 regnet_2.py: 161: cpu INFO 2021-04-14 12:19:07,388 regnet_2.py: 163: block bn wrapped INFO 2021-04-14 12:19:18,388 regnet_2.py: 166: block wrapped ``` In any case, this PR is pretty safe and should go in so that we don't need to do an extra `cuda()` call before wrapping. Pull Request resolved: fairinternal/ssl_scaling#75 Reviewed By: prigoyal Differential Revision: D27776285 Pulled By: min-xu-ai fbshipit-source-id: 3e43c6fe750fd6ee35933400b03a069d62040d8a
facebookresearch · Apr 15, 2021 · c29fe66 · c29fe66
1 parent 20295c5
commit c29fe66
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/vissl/models/trunks/regnet_fsdp.py b/vissl/models/trunks/regnet_fsdp.py
@@ -106,7 +106,7 @@ def __init__(
                 bot_mul,
                 group_width,
                 params.se_ratio,
-            ).cuda()
+            )
             # Init weight before wrapping and sharding.
             init_weights(block)
 
@@ -127,7 +127,7 @@ class RegNetFSDP(FSDP):
     """
 
     def __init__(self, model_config: AttrDict, model_name: str):
-        module = _RegNetFSDP(model_config, model_name).cuda()
+        module = _RegNetFSDP(model_config, model_name)
         super().__init__(module, **model_config.FSDP_CONFIG)