-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't load models with a gamma or beta parameter #29554
Comments
Yes that's correct, it's a bug I pointed out in my video series on contributing to Transformers. This is due to these lines: transformers/src/transformers/modeling_utils.py Lines 579 to 582 in 0290ec1
I assume they are there for backwards compatibility reasons. If we would know which models require this exception, we could fix this. |
I assumed the same, but it's a pretty annoying bug to have to find on your own. Would it be worth adding a warning to the init method of the It's further complicated by the fact that |
Hi @malik-ali, thanks for raising this issue! Indeed, this isn't a desired behaviour.
I think this would be very hard to do. There are many saved checkpoints both on and off the hub, as well as all sorts of custom models which might rely on this behaviour.
Yes, I think a warning for a few cycle releases is the best way to go. I would put this in the It won't be possible to tell if the parameter is from an "old" state or a new model, but we can warn that the renaming is happening, that the behaviour will be removed in a future release and they should update the weights in their state dict to use "weight" or "bias" to be loaded properly. @malik-ali Would you like to open a PR to add this? This way you get the github contribution for your suggested solution |
@amyeroberts I'd be happy to! Just one question: if we add this to the I ask because I ran into this issue after training a model for several days and later loading it. It would have been nice to see the warning before doing all the training, so that I could rename the parameters on the spot. Do you think a warning like that would be feasible? (My fix was to manually rename the keys of the saved state_dict and then rename the parameters in my model) |
Good point! In this case, we'll need to add a warning in two places to make sure we catch both new model creations and old state dicts being loaded in. |
+1 Find this problem today... |
@amyeroberts I might not have a chance to push a fix for this for at least a few weeks so please feel free to make any changes as you (or anyone) wishes! |
@malik-ali OK - thanks for letting us know. I've added a 'Good difficult Issue' to flag for anyone in the community that might want to tackle this in the meantime |
Good question. It took me two full days to process him. At first when I troubleshooted I always thought it was a problem with my training process. :( Why have you added warnings only for the initialization process and not for renaming during loading as well? The model I'm using is timm's convnext (which is even the companion framework to transformers), which would have the parameter gamma. When loading he just tells me that I didn't successfully load the gamma function without telling me why, and I think the user should be informed when renaming the state_dict, otherwise it will cause unnecessary confusion. |
maybe we should revive this and fix the issue once and for all? this is the sort of legacy baggage that really should get cleaned up instead of ignoring for 'backwards compat'. For reference to others in here in the past I'm bringing it up because it required special work around for the TimmWrapper and it's still breaking timm models that hahave 'gamma' keys in the TimmBackbone module It dates back to old bert models ported from Transformers. I believe @thomwolf was overseeing that? There probably aren't actually that many weight instance in the wild which rely on this mechanism. And there's likely an identifiable signature (key names in the state_dict) of models that actually need the rename. EDIT: By signature based on key names, I mean an absolute key name like Also discussing in huggingface/pytorch-image-models#2324 |
The original bert weights, including the safetensor include the old keys that need renaming. If you look at the snippet below, covering bert would be easy to do without impacting other models. Slightly more specific, but still would make a bit be uneasy, I guess the big question is, are there any models besides bert that needed this? T5 looks fine, couldn't find any other suspects but I'm not intimately familiar with old TF LM ports.
Something like this would narrow the scope considerably and probably catch the intended models? for key in list(state_dict.keys()):
if key.endswith("LayerNorm.gamma"):
new_key = key.replace("LayerNorm.gamma", "LayerNorm.weight")
state_dict[new_key] = state_dict.pop(key)
elif key.endswith("LayerNorm.beta"):
new_key = key.replace("LayerNorm.beta", "LayerNorm.bias")
state_dict[new_key] = state_dict.pop(key) |
…ta rename scope, reduce number of characters searched on every load considerably.
Thanks so much for fixing this @rwightman! |
@NielsRogge thanks, if you happen to be aware of any other high risk/models weights that might have actually been relying on this rename (vs being constrained by it, heh), let me know. I could really only find Bert (which was what it was added for originally as far as I could tell). Hopefully we're safe wrt to any lurking regressions... |
…a/beta rename scope, optimize string search. (huggingface#35615) * An attempt to fix huggingface#29554. Include 'LayerNorm.' in gamma/beta rename scope, reduce number of characters searched on every load considerably. * Fix fix on load issue * Fix gamma/beta warning test * A style complaint * Improve efficiency of weight norm key rename. Add better comments about weight norm and layer norm renaming. * Habitual elif redunant with the return
…a/beta rename scope, optimize string search. (huggingface#35615) * An attempt to fix huggingface#29554. Include 'LayerNorm.' in gamma/beta rename scope, reduce number of characters searched on every load considerably. * Fix fix on load issue * Fix gamma/beta warning test * A style complaint * Improve efficiency of weight norm key rename. Add better comments about weight norm and layer norm renaming. * Habitual elif redunant with the return
It seems that you cannot create parameters with the string
gamma
orbeta
in any modules you write if you intend to save/load them with the transformers library. There is a small function called_fix_keys
implemented in the model loading (link). It renames all instances ofbeta
orgamma
in any substring of the sate_dict keys to bebias
andweight
. This means if your modules actually have a parameter with these names, they won't be loaded when using a pretrained model.As far as I can tell, it's completely undocumented that people shouldn't create any parameters with the string
gamma
orbeta
in them.Here is a minimal reproducible example:
When you run this code, you get the following error:
The text was updated successfully, but these errors were encountered: