Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨🚨🚨 [Quantization] Store the original dtype in the config as a private attribute 🚨🚨🚨 #26761

Merged
merged 8 commits into from
Oct 16, 2023

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Oct 12, 2023

What does this PR do?

First step of an alternative design of #26560

For quantized models, instead of introducing a complex logic of retrieving the original weights dtype, I propose to simply add a private attribute _quantization_original_dtype in the config object.

to method does not need to be touched here as to cannot be called on quantized models (but for GPTQ models you can call to to perform device placement only - not for dtype casting)

that way we could adapt #26560 to simply check if the config has the attribute _quantization_original_dtype which is the case only for quantized models, else retrieve the dtype by retrieving the dtype of the linear layer weights in a classic manner.

cc @LysandreJik

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 12, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we prevent people form casting an already quantised model WDYT? Should not be a recommended / desirable use case

# one the weights have been quantized
# Note that once you have loaded a quantized model, you can't change its dtype so this will
# remain a single source of truth
config._quantization_original_dtype = torch_dtype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
config._quantization_original_dtype = torch_dtype
config._pre_quantization_dtype = torch_dtype

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM let's make sure we don't break the workflow for other as it's breaking (not being able to cast to a dtype after init) and add a 🚨 !

Comment on lines +857 to +858
# pop the `_pre_quantization_dtype` as torch.dtypes are not serializable.
_ = serializable_config_dict.pop("_pre_quantization_dtype", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we pop it because it should not be saved no?

Comment on lines +3189 to +3190
# Note that once you have loaded a quantized model, you can't change its dtype so this will
# remain a single source of truth
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be needed in the quantizer config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok since users can always load back quantized models with new torch_dtype making that _pre_quantization_dtype obsolete

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@younesbelkada younesbelkada changed the title [Quantization] Store the original dtype in the config as a private attribute 🚨🚨🚨 [Quantization] Store the original dtype in the config as a private attribute 🚨🚨🚨 Oct 13, 2023
@younesbelkada younesbelkada merged commit fd6a0ad into huggingface:main Oct 16, 2023
@younesbelkada younesbelkada deleted the add-orig-dtype branch October 16, 2023 17:56
@SunMarc SunMarc mentioned this pull request Oct 27, 2023
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
…ate attribute 🚨🚨🚨 (huggingface#26761)

* First step

* fix

* add adjustements for gptq

* change to `_pre_quantization_dtype`

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix serialization

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants