-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Awq
] Enable the possibility to skip quantization for some target modules
#27950
[Awq
] Enable the possibility to skip quantization for some target modules
#27950
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This PR enables compatiblity with Mixtral AWQ ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! 🔥
I just have a quick q on the test. Otherwise LGTM
self.assertTrue(isinstance(quantized_model.model.decoder.layers[0].self_attn.k_proj, torch.nn.Linear)) | ||
self.assertFalse(isinstance(quantized_model.model.decoder.layers[0].self_attn.k_proj, torch.nn.Linear)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be not spotting the difference - but the condition here look the same to me: how can it be both true and false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice catch yes! Actually v_proj
should be a quantized linear (hence not a nn.Linear), changed the test a bit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this capability!
Thanks a lot for the review @amyeroberts ! |
Release has been done! Merging! |
…odules (huggingface#27950) * v1 * add docstring * add tests * add awq 0.1.8 * oops * fix test
What does this PR do?
Adds the possibility to load AWQ models if some modules of the model are skipped for quantization.
E.g. for whisper, Llava, Mixtral, we respectively don't want to quantize the encoder, vision encoder and the gate layer to ensure inference stability.
Let's merge it once AWQ makes the 0.1.8 release
cc @ArthurZucker @casper-hansen @TheBloke @SunMarc
casper-hansen/AutoAWQ#248
This PR makes it also possible to run multi-modal models with AWQ: