Fix parametrization-based weight norm #33275

ylacombe · 2024-09-03T12:50:00Z

What does this PR do?

Supersedes #32194 and fixes #31970 and #26796!

While #32194 was already a great work, it wasn't compatible with versions of Torch that only had nn.utils.weight_norm.

I'll left a review to explain some choices and to highlight where I'm not quite sure of my solution!

cc @LysandreJik and @ArthurZucker !

…oad_state_dict with classic loading

ylacombe · 2024-09-03T12:54:00Z

src/transformers/modeling_utils.py

@@ -818,7 +818,6 @@ def _move_model_to_meta(model, loaded_state_dict_keys, start_prefix):
 def _load_state_dict_into_meta_model(
    model,
    state_dict,
-    loaded_state_dict_keys,  # left for now but could be removed, see below


Here, I removed loaded_state_dict_keys from _load_state_dict_into_meta_model, because according to the following snippet, it was not actually used before:

# First part of the test is always true as load_state_dict_keys always contains state_dict keys. if param_name not in loaded_state_dict_keys or param_name not in expected_keys:

I might have overlooked some downside effects, especially with quantization and/or training frameworks. WDYT @ArthurZucker and @LysandreJik ? Who should I tag for more info?

Also happy to change back to the original behaviour

If it doesn't break any tests, let's remove it and keep an eye out for eventual breakage

ylacombe · 2024-09-03T12:55:32Z

src/transformers/modeling_utils.py

@@ -818,7 +818,6 @@ def _move_model_to_meta(model, loaded_state_dict_keys, start_prefix):
 def _load_state_dict_into_meta_model(


As explained here, the issue doesn't appear when doing regular loading of the state dict, but only when doing metaloading!

HuggingFaceDocBuilderDev · 2024-09-03T13:18:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik

This looks good to me in practice for the affected models; @ArthurZucker if you can give it a second look just to confirm or infirm

LysandreJik · 2024-09-04T15:07:02Z

src/transformers/modeling_utils.py

@@ -818,7 +818,6 @@ def _move_model_to_meta(model, loaded_state_dict_keys, start_prefix):
 def _load_state_dict_into_meta_model(
    model,
    state_dict,
-    loaded_state_dict_keys,  # left for now but could be removed, see below


If it doesn't break any tests, let's remove it and keep an eye out for eventual breakage

…-norm

amyeroberts

Thanks for fixing @ylacombe!

Adding remapping in the loading functions I'm a bit squeamish about, as it causes issues for "gamma" and "beta" but this seems pretty well controlled and an only likely to hit some weights very rarely.

* refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading * make style * fix sew * fix sew and sew_d tests

ylacombe added 2 commits September 3, 2024 14:42

refactor weight_norm + propose uniformed solution to reconcile meta l…

b139abe

…oad_state_dict with classic loading

make style

42a4568

ylacombe commented Sep 3, 2024

View reviewed changes

fix sew

51edcb5

LysandreJik approved these changes Sep 4, 2024

View reviewed changes

ylacombe and others added 2 commits September 5, 2024 16:09

Merge branch 'huggingface:main' into fix-parametrization-based-weight…

9256965

…-norm

fix sew and sew_d tests

46e9008

ylacombe requested a review from ArthurZucker September 5, 2024 14:16

ylacombe requested a review from amyeroberts September 16, 2024 14:46

amyeroberts approved these changes Sep 16, 2024

View reviewed changes

ylacombe merged commit 18e1a9c into huggingface:main Sep 17, 2024
23 checks passed

This was referenced Sep 17, 2024

warning about weight_g/weight_v missing on WeightNorm on PyTorch #32194

Closed

(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch #26796

Open

jiminha mentioned this pull request Oct 15, 2024

Fix load INC load weights compile error due to Transformer 4.45 upgrade. huggingface/optimum-habana#1421

Merged

dannywhuang mentioned this pull request Dec 6, 2024

Add weight norm rename in _load_state_dict_into_model #35123

Closed

5 tasks

xenova mentioned this pull request Mar 1, 2025

Handle DAC conversion when using weight_norm with newer PyTorch versions #36393

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parametrization-based weight norm #33275

Fix parametrization-based weight norm #33275

ylacombe commented Sep 3, 2024

ylacombe Sep 3, 2024

LysandreJik Sep 4, 2024

ylacombe Sep 3, 2024

HuggingFaceDocBuilderDev commented Sep 3, 2024

LysandreJik left a comment

LysandreJik Sep 4, 2024

amyeroberts left a comment

		@@ -818,7 +818,6 @@ def _move_model_to_meta(model, loaded_state_dict_keys, start_prefix):
		def _load_state_dict_into_meta_model(

Fix parametrization-based weight norm #33275

Fix parametrization-based weight norm #33275

Conversation

ylacombe commented Sep 3, 2024

What does this PR do?

ylacombe Sep 3, 2024

Choose a reason for hiding this comment

LysandreJik Sep 4, 2024

Choose a reason for hiding this comment

ylacombe Sep 3, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 3, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Sep 4, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment