[`Awq`] Enable the possibility to skip quantization for some target modules #27950

younesbelkada · 2023-12-11T14:19:25Z

What does this PR do?

Adds the possibility to load AWQ models if some modules of the model are skipped for quantization.
E.g. for whisper, Llava, Mixtral, we respectively don't want to quantize the encoder, vision encoder and the gate layer to ensure inference stability.

Let's merge it once AWQ makes the 0.1.8 release

cc @ArthurZucker @casper-hansen @TheBloke @SunMarc

casper-hansen/AutoAWQ#248

This PR makes it also possible to run multi-modal models with AWQ:

from transformers import pipeline
from PIL import Image    
import requests

model_id = "ybelkada/llava-1.5-7b-hf-awq"
pipe = pipeline("image-to-text", model=model_id, device=0)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-neg.png"

image = Image.open(requests.get(url, stream=True).raw)
prompt = "USER: <image>\nCan you please describe this image?\nASSISTANT:"

outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 100})
print(outputs[0]["generated_text"])

USER: \nCan you please describe this image?\nASSISTANT: The image features a brown and white cat sitting on a green surface, possibly a carpet or a grassy area. The cat is holding a red ball in its paws, seemingly playing with it. The cat appears to be focused on the ball, possibly preparing to play or just enjoying the toy.

…to quant-add-skip-modules

HuggingFaceDocBuilderDev · 2023-12-12T18:18:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada · 2023-12-22T13:51:44Z

This PR enables compatiblity with Mixtral AWQ !
casper-hansen/AutoAWQ#251 being merged, this PR is ready for review 🙏

amyeroberts

Very nice! 🔥

I just have a quick q on the test. Otherwise LGTM

amyeroberts · 2023-12-22T14:40:32Z

tests/quantization/autoawq/test_awq.py

+        self.assertTrue(isinstance(quantized_model.model.decoder.layers[0].self_attn.k_proj, torch.nn.Linear))
+        self.assertFalse(isinstance(quantized_model.model.decoder.layers[0].self_attn.k_proj, torch.nn.Linear))


Might be not spotting the difference - but the condition here look the same to me: how can it be both true and false?

Ah nice catch yes! Actually v_proj should be a quantized linear (hence not a nn.Linear), changed the test a bit!

amyeroberts

Thanks for adding this capability!

younesbelkada · 2023-12-22T17:16:07Z

Thanks a lot for the review @amyeroberts !
I will merge this as soon as AutoAWQ makes the 0.1.8 Mixtral release cc @casper-hansen just for your information

younesbelkada · 2023-12-25T10:06:53Z

Release has been done! Merging!

…odules (huggingface#27950) * v1 * add docstring * add tests * add awq 0.1.8 * oops * fix test

younesbelkada added 3 commits December 11, 2023 15:14

v1

47f837b

add docstring

43713c0

add tests

2ff039c

younesbelkada mentioned this pull request Dec 11, 2023

FEAT: add llava to autoawq casper-hansen/AutoAWQ#250

Merged

younesbelkada added 2 commits December 11, 2023 18:23

Merge branch 'main' of https://github.com/huggingface/transformers in…

12418e0

…to quant-add-skip-modules

Merge remote-tracking branch 'upstream/main' into quant-add-skip-modules

ecb0375

This was referenced Dec 14, 2023

Error: Mixtral-8x7B-Instruct-v0.1 issue with Llama-cpp-python h2oai/h2ogpt#1202

Closed

Support Mixtral casper-hansen/AutoAWQ#259

Closed

younesbelkada added 3 commits December 18, 2023 22:28

Merge remote-tracking branch 'upstream/main' into quant-add-skip-modules

8d99b63

Merge remote-tracking branch 'upstream/main' into quant-add-skip-modules

ce60822

add awq 0.1.8

0e8147a

younesbelkada marked this pull request as ready for review December 22, 2023 13:51

younesbelkada requested a review from amyeroberts December 22, 2023 13:51

amyeroberts reviewed Dec 22, 2023

View reviewed changes

younesbelkada added 2 commits December 22, 2023 14:46

oops

a340c69

Merge remote-tracking branch 'upstream/main' into quant-add-skip-modules

a6202a0

amyeroberts approved these changes Dec 22, 2023

View reviewed changes

fix test

1ccecc3

younesbelkada merged commit fa21ead into huggingface:main Dec 25, 2023
21 checks passed

younesbelkada deleted the quant-add-skip-modules branch December 25, 2023 10:17

Saibo-creator pushed a commit to epfl-dlab/transformers-GCD-PR that referenced this pull request Jan 4, 2024

[Awq] Enable the possibility to skip quantization for some target m…

b5d8e26

…odules (huggingface#27950) * v1 * add docstring * add tests * add awq 0.1.8 * oops * fix test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Awq`] Enable the possibility to skip quantization for some target modules #27950

[`Awq`] Enable the possibility to skip quantization for some target modules #27950

younesbelkada commented Dec 11, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 12, 2023

younesbelkada commented Dec 22, 2023

amyeroberts left a comment

amyeroberts Dec 22, 2023

younesbelkada Dec 22, 2023

amyeroberts left a comment

younesbelkada commented Dec 22, 2023 •

edited

Loading

younesbelkada commented Dec 25, 2023

		self.assertTrue(isinstance(quantized_model.model.decoder.layers[0].self_attn.k_proj, torch.nn.Linear))
		self.assertFalse(isinstance(quantized_model.model.decoder.layers[0].self_attn.k_proj, torch.nn.Linear))

[Awq] Enable the possibility to skip quantization for some target modules #27950

[Awq] Enable the possibility to skip quantization for some target modules #27950

Conversation

younesbelkada commented Dec 11, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Dec 12, 2023

younesbelkada commented Dec 22, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Dec 22, 2023

Choose a reason for hiding this comment

younesbelkada Dec 22, 2023

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

younesbelkada commented Dec 22, 2023 • edited Loading

younesbelkada commented Dec 25, 2023

[`Awq`] Enable the possibility to skip quantization for some target modules #27950

[`Awq`] Enable the possibility to skip quantization for some target modules #27950

younesbelkada commented Dec 11, 2023 •

edited

Loading

younesbelkada commented Dec 22, 2023 •

edited

Loading