Can't pickle with xformers after 81bc427 #203

jramapuram · 2022-02-03T17:39:58Z

🐛 Bug

I'm running into pickling errors after commit 81bc427

Pickling is needed for some DDP contexts.

Pseudocode

import pickle

class ViTB(nn.Module):
    def __init__(self):
        super().__init__()
        self.xformer = xFormer.from_config(xFormerConfig([self.model_config]))  # eg: Generic ViT-B conf

pickle.dumps(ViTB())

I haven't bisected, but on head (093224e at this time) this results in:

AttributeError: Can't pickle local object '_init_from_params.<locals>.init_method'

Expected behavior

Model pickles correctly (needed for certain DDP contexts).

Environment

PyTorch Version (e.g., 1.0): 1.10.2 (tried earlier versions, same issue).
OS (e.g., Linux): Ubuntu 20.04
How you installed PyTorch (conda, pip, source): conda

The text was updated successfully, but these errors were encountered:

blefaudeux · 2022-02-03T17:45:13Z

oh, thanks for the report, this is unexpected ! (and the repro will make a perfect unit test so that this does not appear again)

blefaudeux · 2022-02-03T17:49:31Z

ah, could be related to the FusedMLP which pulls in Triton, this would at least make sense with respect to the linked commit. Just in case, is triton installed on your machines @jramapuram ? (still writing down a test and checking that, but in case this can unlock you earlier) Does this happen if you ask for "MLP" and not "FullMLP" ?

blefaudeux · 2022-02-03T17:56:05Z

_init_from_params

replying to myself: nope, this is not because of fusedMLP, I can repro indeed. Fixing that

jramapuram · 2022-02-03T17:58:02Z

You beat me to it :). Let me know if you need anything else.

jramapuram · 2022-02-03T18:00:19Z

Here is my config if helpful (it is pretty vanilla):

reversible: False
block_type: "encoder"
num_layers: 12
dim_model: 768
layer_norm_style: "pre"

multi_head_config:
  num_heads: 12
  residual_dropout: 0.1
  use_rotary_embeddings: true

  attention:
    name: "scaled_dot_product"
    dropout: 0.0
    causal: False

feedforward_config:
  name: "MLP"
  dropout: 0.0
  activation: "gelu"
  hidden_layer_multiplier: 4

blefaudeux · 2022-02-03T18:12:11Z

@jramapuram the above should work with https://github.com/facebookresearch/xformers/tree/pickling_tests, but pickling still dies on the triton parts which are JIT compiled, so if you switch MLP to FusedMLP it will crash on that :( Trying to find a global solution

jramapuram · 2022-02-03T18:15:39Z

Got it; thanks for the quick turn around! This should be okay to get going in the meantime at least :)

blefaudeux · 2022-02-03T18:27:58Z

Got it; thanks for the quick turn around! This should be okay to get going in the meantime at least :)

I just updated the branch, make sure that you force pull, it was subtly buggy at some point (the init if k/q/v had different settings). The triton pickling part should be fixable but a bit short on time right now, checking back in a bit later. Putting up a draft PR

* Adding checkerboard pattern * Adding documentation * black reformatting * Revert "black reformatting" This reverts commit 032cf331495f63e1439c8334625220afae2d29ea. * Revert changes to notebook * Refactor implementation to handle some corner cases Also re-uses already existing methods to simplify things * Add tests and bugfix * Bugfix Co-authored-by: Marta Gazulla <martatintore@devfair0121.h2.fair> Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

blefaudeux · 2022-02-16T19:00:26Z

@jramapuram late thought, I've been meaning to write back but time flew.. It's a bit strange to me that you have to pickle the model, it's not typically considered good practice, each of the processes that DDP spawns should be able to rebuild the whole context by itself (you just need to be careful about the seeds which need to be set on each process, but it's typically a good idea anyway). Something to take care also is that the dataloaders need to be properly initialized, but you're probably on top of that already.

jramapuram · 2022-02-16T19:39:08Z

@blefaudeux : this was with some prototype pytorch-lightning code. I'm guessing they use the old mp.spawn method instead of something like torchrun (which we typically use 😄 ) to spinup K processes per GPU.

Good shout on the dataloaders, already seeding those with replica_id from the world pool already 👍

blefaudeux · 2022-03-02T18:31:11Z

Closing that for now, I don't think that it's actually possible to pickle the Triton parts since they are compiled JIT and that this is beyond the pickling mechanism. A fallback is to not use triton if this is required

blefaudeux self-assigned this Feb 3, 2022

jramapuram changed the title ~~Can't pickle xformers after 81bc427~~ Can't pickle with xformers after 81bc427 Feb 3, 2022

blefaudeux mentioned this issue Feb 3, 2022

[Partial Fix] Model pickability #204

Merged

10 tasks

blefaudeux closed this as completed Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't pickle with xformers after 81bc427 #203

Can't pickle with xformers after 81bc427 #203

jramapuram commented Feb 3, 2022 •

edited

Loading

blefaudeux commented Feb 3, 2022

blefaudeux commented Feb 3, 2022 •

edited

Loading

blefaudeux commented Feb 3, 2022

jramapuram commented Feb 3, 2022

jramapuram commented Feb 3, 2022

blefaudeux commented Feb 3, 2022

jramapuram commented Feb 3, 2022

blefaudeux commented Feb 3, 2022

blefaudeux commented Feb 16, 2022

jramapuram commented Feb 16, 2022

blefaudeux commented Mar 2, 2022

Can't pickle with xformers after 81bc427 #203

Can't pickle with xformers after 81bc427 #203

Comments

jramapuram commented Feb 3, 2022 • edited Loading

🐛 Bug

Pseudocode

Expected behavior

Environment

blefaudeux commented Feb 3, 2022

blefaudeux commented Feb 3, 2022 • edited Loading

blefaudeux commented Feb 3, 2022

jramapuram commented Feb 3, 2022

jramapuram commented Feb 3, 2022

blefaudeux commented Feb 3, 2022

jramapuram commented Feb 3, 2022

blefaudeux commented Feb 3, 2022

blefaudeux commented Feb 16, 2022

jramapuram commented Feb 16, 2022

blefaudeux commented Mar 2, 2022

jramapuram commented Feb 3, 2022 •

edited

Loading

blefaudeux commented Feb 3, 2022 •

edited

Loading