New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

🚨🚨🚨 [`Quantization`] Store the original dtype in the config as a private attribute 🚨🚨🚨 #26761

Merged

younesbelkada merged 8 commits into huggingface:main from younesbelkada:add-orig-dtype

Oct 16, 2023

Contributor

younesbelkada commented Oct 12, 2023 •

edited

Loading

What does this PR do?

First step of an alternative design of #26560

For quantized models, instead of introducing a complex logic of retrieving the original weights dtype, I propose to simply add a private attribute _quantization_original_dtype in the config object.

to method does not need to be touched here as to cannot be called on quantized models (but for GPTQ models you can call to to perform device placement only - not for dtype casting)

that way we could adapt #26560 to simply check if the config has the attribute _quantization_original_dtype which is the case only for quantized models, else retrieve the dtype by retrieving the dtype of the linear layer weights in a classic manner.

cc @LysandreJik

younesbelkada added 2 commits

October 12, 2023 15:54


          First step

fix

b7e797f

younesbelkada requested a review from LysandreJik

October 12, 2023 14:05

younesbelkada commented

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

HuggingFaceDocBuilderDev commented Oct 12, 2023 •

edited

Loading

The documentation is not available anymore as the PR was closed or merged.

ArthurZucker reviewed

View reviewed changes

Collaborator

ArthurZucker left a comment

Let's make sure we prevent people form casting an already quantised model WDYT? Should not be a recommended / desirable use case

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated

+                          # one the weights have been quantized
+                          # Note that once you have loaded a quantized model, you can't change its dtype so this will
+                          # remain a single source of truth
+                          config._quantization_original_dtype = torch_dtype

Collaborator

ArthurZucker Oct 13, 2023

Suggested change

      
                        config._quantization_original_dtype = torch_dtype
          
                        config._pre_quantization_dtype = torch_dtype

younesbelkada and others added 4 commits

October 13, 2023 11:49


          add adjustements for gptq

73d3109


          change to _pre_quantization_dtype

28d2b27


          Update src/transformers/modeling_utils.py

316f776

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>


          fix serialization

948de7e

younesbelkada requested a review from ArthurZucker

October 13, 2023 10:09

ArthurZucker approved these changes

View reviewed changes

Collaborator

ArthurZucker left a comment

In general, LGTM let's make sure we don't break the workflow for other as it's breaking (not being able to cast to a dtype after init) and add a 🚨 !

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/configuration_utils.py

Comment on lines +857 to +858

		# pop the `_pre_quantization_dtype` as torch.dtypes are not serializable.
		_ = serializable_config_dict.pop("_pre_quantization_dtype", None)

Collaborator

ArthurZucker Oct 13, 2023

we pop it because it should not be saved no?

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py

Comment on lines +3189 to +3190

		# Note that once you have loaded a quantized model, you can't change its dtype so this will
		# remain a single source of truth

Collaborator

ArthurZucker Oct 13, 2023

Might be needed in the quantizer config?

Contributor Author

younesbelkada Oct 13, 2023

I think it is ok since users can always load back quantized models with new torch_dtype making that _pre_quantization_dtype obsolete


          Apply suggestions from code review

06700c7

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

younesbelkada mentioned this pull request

Add support for loading GPTQ models on CPU #26719

Merged


          fixup

1ebe674

younesbelkada changed the title ~~[Quantization] Store the original dtype in the config as a private attribute~~ 🚨🚨🚨 [Quantization] Store the original dtype in the config as a private attribute 🚨🚨🚨

younesbelkada merged commit fd6a0ad into huggingface:main

younesbelkada deleted the add-orig-dtype branch

October 16, 2023 17:56

younesbelkada mentioned this pull request

[FA-2] Final fix for FA2 dtype #26846

Merged

SunMarc mentioned this pull request

Add exllamav2 better #27111

Merged

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request


          🚨🚨🚨 [Quantization] Store the original dtype in the config as a priv…

bc68386

…ate attribute 🚨🚨🚨 (huggingface#26761)

* First step

* fix

* add adjustements for gptq

* change to `_pre_quantization_dtype`

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix serialization

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet