-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't pickle with xformers after 81bc427 #203
Comments
oh, thanks for the report, this is unexpected ! (and the repro will make a perfect unit test so that this does not appear again) |
ah, could be related to the FusedMLP which pulls in Triton, this would at least make sense with respect to the linked commit. Just in case, is triton installed on your machines @jramapuram ? (still writing down a test and checking that, but in case this can unlock you earlier) Does this happen if you ask for "MLP" and not "FullMLP" ? |
replying to myself: nope, this is not because of fusedMLP, I can repro indeed. Fixing that |
You beat me to it :). Let me know if you need anything else. |
Here is my config if helpful (it is pretty vanilla): reversible: False
block_type: "encoder"
num_layers: 12
dim_model: 768
layer_norm_style: "pre"
multi_head_config:
num_heads: 12
residual_dropout: 0.1
use_rotary_embeddings: true
attention:
name: "scaled_dot_product"
dropout: 0.0
causal: False
feedforward_config:
name: "MLP"
dropout: 0.0
activation: "gelu"
hidden_layer_multiplier: 4
|
@jramapuram the above should work with https://github.com/facebookresearch/xformers/tree/pickling_tests, but pickling still dies on the triton parts which are JIT compiled, so if you switch MLP to FusedMLP it will crash on that :( Trying to find a global solution |
Got it; thanks for the quick turn around! This should be okay to get going in the meantime at least :) |
I just updated the branch, make sure that you force pull, it was subtly buggy at some point (the init if k/q/v had different settings). The triton pickling part should be fixable but a bit short on time right now, checking back in a bit later. Putting up a draft PR |
* Adding checkerboard pattern * Adding documentation * black reformatting * Revert "black reformatting" This reverts commit 032cf331495f63e1439c8334625220afae2d29ea. * Revert changes to notebook * Refactor implementation to handle some corner cases Also re-uses already existing methods to simplify things * Add tests and bugfix * Bugfix Co-authored-by: Marta Gazulla <martatintore@devfair0121.h2.fair> Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
@jramapuram late thought, I've been meaning to write back but time flew.. It's a bit strange to me that you have to pickle the model, it's not typically considered good practice, each of the processes that DDP spawns should be able to rebuild the whole context by itself (you just need to be careful about the seeds which need to be set on each process, but it's typically a good idea anyway). Something to take care also is that the dataloaders need to be properly initialized, but you're probably on top of that already. |
@blefaudeux : this was with some prototype pytorch-lightning code. I'm guessing they use the old Good shout on the dataloaders, already seeding those with replica_id from the world pool already 👍 |
Closing that for now, I don't think that it's actually possible to pickle the Triton parts since they are compiled JIT and that this is beyond the pickling mechanism. A fallback is to not use triton if this is required |
🐛 Bug
I'm running into pickling errors after commit 81bc427
Pickling is needed for some DDP contexts.
Pseudocode
I haven't bisected, but on head (093224e at this time) this results in:
Expected behavior
Model pickles correctly (needed for certain DDP contexts).
Environment
conda
,pip
, source): condaThe text was updated successfully, but these errors were encountered: