Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Awq] Enable the possibility to skip quantization for some target modules #27950

Merged
merged 11 commits into from
Dec 25, 2023

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Dec 11, 2023

What does this PR do?

Adds the possibility to load AWQ models if some modules of the model are skipped for quantization.
E.g. for whisper, Llava, Mixtral, we respectively don't want to quantize the encoder, vision encoder and the gate layer to ensure inference stability.

Let's merge it once AWQ makes the 0.1.8 release

cc @ArthurZucker @casper-hansen @TheBloke @SunMarc

casper-hansen/AutoAWQ#248

This PR makes it also possible to run multi-modal models with AWQ:

from transformers import pipeline
from PIL import Image    
import requests

model_id = "ybelkada/llava-1.5-7b-hf-awq"
pipe = pipeline("image-to-text", model=model_id, device=0)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-neg.png"

image = Image.open(requests.get(url, stream=True).raw)
prompt = "USER: <image>\nCan you please describe this image?\nASSISTANT:"

outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 100})
print(outputs[0]["generated_text"])

image

USER: \nCan you please describe this image?\nASSISTANT: The image features a brown and white cat sitting on a green surface, possibly a carpet or a grassy area. The cat is holding a red ball in its paws, seemingly playing with it. The cat appears to be focused on the ball, possibly preparing to play or just enjoying the toy.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@younesbelkada younesbelkada marked this pull request as ready for review December 22, 2023 13:51
@younesbelkada
Copy link
Contributor Author

This PR enables compatiblity with Mixtral AWQ !
casper-hansen/AutoAWQ#251 being merged, this PR is ready for review 🙏

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! 🔥

I just have a quick q on the test. Otherwise LGTM

Comment on lines 235 to 236
self.assertTrue(isinstance(quantized_model.model.decoder.layers[0].self_attn.k_proj, torch.nn.Linear))
self.assertFalse(isinstance(quantized_model.model.decoder.layers[0].self_attn.k_proj, torch.nn.Linear))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be not spotting the difference - but the condition here look the same to me: how can it be both true and false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice catch yes! Actually v_proj should be a quantized linear (hence not a nn.Linear), changed the test a bit!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this capability!

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Dec 22, 2023

Thanks a lot for the review @amyeroberts !
I will merge this as soon as AutoAWQ makes the 0.1.8 Mixtral release cc @casper-hansen just for your information

@younesbelkada
Copy link
Contributor Author

Release has been done! Merging!

@younesbelkada younesbelkada merged commit fa21ead into huggingface:main Dec 25, 2023
21 checks passed
@younesbelkada younesbelkada deleted the quant-add-skip-modules branch December 25, 2023 10:17
Saibo-creator pushed a commit to epfl-dlab/transformers-GCD-PR that referenced this pull request Jan 4, 2024
…odules (huggingface#27950)

* v1

* add docstring

* add tests

* add awq 0.1.8

* oops

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants