-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨🚨🚨 [Quantization
] Store the original dtype in the config as a private attribute 🚨🚨🚨
#26761
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we prevent people form casting an already quantised model WDYT? Should not be a recommended / desirable use case
src/transformers/modeling_utils.py
Outdated
# one the weights have been quantized | ||
# Note that once you have loaded a quantized model, you can't change its dtype so this will | ||
# remain a single source of truth | ||
config._quantization_original_dtype = torch_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config._quantization_original_dtype = torch_dtype | |
config._pre_quantization_dtype = torch_dtype |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM let's make sure we don't break the workflow for other as it's breaking (not being able to cast to a dtype after init) and add a 🚨 !
# pop the `_pre_quantization_dtype` as torch.dtypes are not serializable. | ||
_ = serializable_config_dict.pop("_pre_quantization_dtype", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we pop it because it should not be saved no?
# Note that once you have loaded a quantized model, you can't change its dtype so this will | ||
# remain a single source of truth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be needed in the quantizer config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok since users can always load back quantized models with new torch_dtype making that _pre_quantization_dtype
obsolete
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Quantization
] Store the original dtype in the config as a private attributeQuantization
] Store the original dtype in the config as a private attribute 🚨🚨🚨
…ate attribute 🚨🚨🚨 (huggingface#26761) * First step * fix * add adjustements for gptq * change to `_pre_quantization_dtype` * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix serialization * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
First step of an alternative design of #26560
For quantized models, instead of introducing a complex logic of retrieving the original weights dtype, I propose to simply add a private attribute
_quantization_original_dtype
in the config object.to
method does not need to be touched here asto
cannot be called on quantized models (but for GPTQ models you can callto
to perform device placement only - not for dtype casting)that way we could adapt #26560 to simply check if the config has the attribute
_quantization_original_dtype
which is the case only for quantized models, else retrieve the dtype by retrieving the dtype of the linear layer weights in a classic manner.cc @LysandreJik