Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Half-Quadratic Quantization (HQQ) #28328

Closed
michaelfeil opened this issue Jan 3, 2024 · 7 comments
Closed

Implement Half-Quadratic Quantization (HQQ) #28328

michaelfeil opened this issue Jan 3, 2024 · 7 comments
Labels
Feature request Request for a new feature

Comments

@michaelfeil
Copy link
Contributor

michaelfeil commented Jan 3, 2024

Feature request

I would be curious if https://github.com/mobiusml/hqq can be supported in similar fashion to autogptq or autoawq. hqq is most similar to bitsandbytes nf4/fp4 datatypes, but offers 2,3,4,8 bit quantization.
CC: @mobicham

Motivation

HQQ performs 2/3/4 bit quantization and can do drop-in replacement. Its fast for in-place quantization / non-pre-quantized weights and performs similar to bnb a expansion to fp16 at runtime (or similar).

Would be cool to support for models like mixtral to cut down the vram requirement.

Your contribution

Currently have no capacity for submitting an integration, but happy to review or assist.

@ArthurZucker ArthurZucker added the Feature request Request for a new feature label Jan 3, 2024
@ArthurZucker
Copy link
Collaborator

cc @younesbelkada

@younesbelkada
Copy link
Contributor

This is very cool ! We are definitely interested in adding HQQ inference support in transformers. The cool thing is that indeed it seems you don't need to pre-quantize the weights in order to quantize the models. We'll explore a bit on our side and let you know how it goes
cc @SunMarc @Titus-von-Koeller

@mobicham
Copy link
Contributor

mobicham commented Jan 8, 2024

Hi! I am the maintainer of the HQQ project, happy to assist with anything needed !

@younesbelkada
Copy link
Contributor

Very glad to e-meet you @mobicham ! do you have an email I can use so that we can contact you through Slack to iterate quickly?

@mobicham
Copy link
Contributor

mobicham commented Jan 9, 2024

Glad to e-meet you @younesbelkada as well! Sure: hicham@mobiuslabs.com

@younesbelkada
Copy link
Contributor

thanks @mobicham you should have received an invite by now!

@younesbelkada
Copy link
Contributor

Closing as HQQ is now part of the release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants