Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) #11024

BodhiHu · 2024-12-31T03:28:06Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Hello,

Great work here :D

Is it possible that llama.cpp support GPTQ quantized models ?
The GPTQ quantized model had the advantage that it was fine-tuned with a dataset, which I think is a good reason to support GPTQ:

Quotes from the GPTQModel repo:

Quality: GPTQModel 4bit can match BF16:

https://github.com/ModelCloud/GPTQModel

Motivation

The GPTQ quantized model had the advantage that it can be fine-tuned/calibrated with a dataset, and for further fine tuning.

Possible Implementation

No response

BodhiHu · 2024-12-31T03:48:45Z

Hi, noticed there're some GPTQ converter PRs merged before:

#301
#423

But seems it had been deleted from the master branch ? 🤓

ExtReMLapin · 2024-12-31T08:32:07Z

The GPTQ quantized model had the advantage that it was fine-tuned with a dataset, which I think is a good reason to support GPTQ:

Just like imatrix

henk717 · 2024-12-31T11:42:59Z

Llamacpp has the same techniques and in my understanding in two different ways. Either the iMatrix quants since they are generated based off a dataset or AWQ converted quants.

Thankfully on Llamacpp's side its optional since it savea so much time and hassle when you just want to make a quick quant. That does mean that if you want this that you need to be looking at GGUF's with an imatrix. I know mrademacher uses i1 to indicate that, other repackers you will have to look if an imatrix.dat was bundled or if its mentioned in the description. You only need an imatrix.dat to make gguf's not to use them so its not always there, but if one is there its a good indicator.

BodhiHu · 2025-01-02T08:18:54Z

Alright... so can we just run the existing GPTQ models from huggingface .etc ?

henk717 · 2025-01-02T10:33:59Z

No because those are different formats for a different older quant method. The same models almost always. have imatrixed GGUF's available and those you want to look for.

sorasoras · 2025-01-04T21:11:34Z

Hi, noticed there're some GPTQ converter PRs merged before:

#301 #423

But seems it had been deleted from the master branch ? 🤓

it was replace by imatrix and that's why it was deleted

BodhiHu added the enhancement New feature or request label Dec 31, 2024

BodhiHu changed the title ~~Feature Request: Support GPTQ~~ Feature Request: Support GPTQ (Quality: GPTQModel 4bit can match BF16) Dec 31, 2024

BodhiHu changed the title ~~Feature Request: Support GPTQ (Quality: GPTQModel 4bit can match BF16)~~ Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) #11024

Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) #11024

BodhiHu commented Dec 31, 2024

BodhiHu commented Dec 31, 2024 •

edited

Loading

ExtReMLapin commented Dec 31, 2024

henk717 commented Dec 31, 2024 •

edited

Loading

BodhiHu commented Jan 2, 2025

henk717 commented Jan 2, 2025

sorasoras commented Jan 4, 2025

Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) #11024

Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) #11024

Comments

BodhiHu commented Dec 31, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

BodhiHu commented Dec 31, 2024 • edited Loading

ExtReMLapin commented Dec 31, 2024

henk717 commented Dec 31, 2024 • edited Loading

BodhiHu commented Jan 2, 2025

henk717 commented Jan 2, 2025

sorasoras commented Jan 4, 2025

BodhiHu commented Dec 31, 2024 •

edited

Loading

henk717 commented Dec 31, 2024 •

edited

Loading