Implement Half-Quadratic Quantization (HQQ) #28328

michaelfeil · 2024-01-03T17:00:48Z

Feature request

I would be curious if https://github.com/mobiusml/hqq can be supported in similar fashion to autogptq or autoawq. hqq is most similar to bitsandbytes nf4/fp4 datatypes, but offers 2,3,4,8 bit quantization.
CC: @mobicham

Motivation

HQQ performs 2/3/4 bit quantization and can do drop-in replacement. Its fast for in-place quantization / non-pre-quantized weights and performs similar to bnb a expansion to fp16 at runtime (or similar).

Would be cool to support for models like mixtral to cut down the vram requirement.

Your contribution

Currently have no capacity for submitting an integration, but happy to review or assist.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-01-03T18:22:04Z

cc @younesbelkada

younesbelkada · 2024-01-08T06:22:53Z

This is very cool ! We are definitely interested in adding HQQ inference support in transformers. The cool thing is that indeed it seems you don't need to pre-quantize the weights in order to quantize the models. We'll explore a bit on our side and let you know how it goes
cc @SunMarc @Titus-von-Koeller

mobicham · 2024-01-08T18:13:44Z

Hi! I am the maintainer of the HQQ project, happy to assist with anything needed !

younesbelkada · 2024-01-09T03:55:55Z

Very glad to e-meet you @mobicham ! do you have an email I can use so that we can contact you through Slack to iterate quickly?

mobicham · 2024-01-09T13:33:12Z

Glad to e-meet you @younesbelkada as well! Sure: hicham@mobiuslabs.com

younesbelkada · 2024-01-09T14:00:58Z

thanks @mobicham you should have received an invite by now!

younesbelkada · 2024-05-17T16:33:31Z

Closing as HQQ is now part of the release!

ArthurZucker added the Feature request Request for a new feature label Jan 3, 2024

younesbelkada mentioned this issue Jan 18, 2024

Optimised 4bit inference kernels #28568

Open

mobicham mentioned this issue Mar 13, 2024

Add HQQ quantization support #29637

Merged

younesbelkada closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Half-Quadratic Quantization (HQQ) #28328

Implement Half-Quadratic Quantization (HQQ) #28328

michaelfeil commented Jan 3, 2024 •

edited

Loading

ArthurZucker commented Jan 3, 2024

younesbelkada commented Jan 8, 2024

mobicham commented Jan 8, 2024

younesbelkada commented Jan 9, 2024

mobicham commented Jan 9, 2024

younesbelkada commented Jan 9, 2024

younesbelkada commented May 17, 2024

Implement Half-Quadratic Quantization (HQQ) #28328

Implement Half-Quadratic Quantization (HQQ) #28328

Comments

michaelfeil commented Jan 3, 2024 • edited Loading

Feature request

Motivation

Your contribution

ArthurZucker commented Jan 3, 2024

younesbelkada commented Jan 8, 2024

mobicham commented Jan 8, 2024

younesbelkada commented Jan 9, 2024

mobicham commented Jan 9, 2024

younesbelkada commented Jan 9, 2024

younesbelkada commented May 17, 2024

michaelfeil commented Jan 3, 2024 •

edited

Loading