Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) #11024

Open
4 tasks done
BodhiHu opened this issue Dec 31, 2024 · 6 comments
Open
4 tasks done

Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) #11024

BodhiHu opened this issue Dec 31, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@BodhiHu
Copy link

BodhiHu commented Dec 31, 2024

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Hello,

Great work here :D

Is it possible that llama.cpp support GPTQ quantized models ?
The GPTQ quantized model had the advantage that it was fine-tuned with a dataset, which I think is a good reason to support GPTQ:

Quotes from the GPTQModel repo:

Quality: GPTQModel 4bit can match BF16:

https://github.com/ModelCloud/GPTQModel

Motivation

The GPTQ quantized model had the advantage that it can be fine-tuned/calibrated with a dataset, and for further fine tuning.

Possible Implementation

No response

@BodhiHu BodhiHu added the enhancement New feature or request label Dec 31, 2024
@BodhiHu BodhiHu changed the title Feature Request: Support GPTQ Feature Request: Support GPTQ (Quality: GPTQModel 4bit can match BF16) Dec 31, 2024
@BodhiHu BodhiHu changed the title Feature Request: Support GPTQ (Quality: GPTQModel 4bit can match BF16) Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) Dec 31, 2024
@BodhiHu
Copy link
Author

BodhiHu commented Dec 31, 2024

Hi, noticed there're some GPTQ converter PRs merged before:

#301
#423

But seems it had been deleted from the master branch ? 🤓

@ExtReMLapin
Copy link
Contributor

The GPTQ quantized model had the advantage that it was fine-tuned with a dataset, which I think is a good reason to support GPTQ:

Just like imatrix

@henk717
Copy link

henk717 commented Dec 31, 2024

Llamacpp has the same techniques and in my understanding in two different ways. Either the iMatrix quants since they are generated based off a dataset or AWQ converted quants.

Thankfully on Llamacpp's side its optional since it savea so much time and hassle when you just want to make a quick quant. That does mean that if you want this that you need to be looking at GGUF's with an imatrix. I know mrademacher uses i1 to indicate that, other repackers you will have to look if an imatrix.dat was bundled or if its mentioned in the description. You only need an imatrix.dat to make gguf's not to use them so its not always there, but if one is there its a good indicator.

@BodhiHu
Copy link
Author

BodhiHu commented Jan 2, 2025

Alright... so can we just run the existing GPTQ models from huggingface .etc ?

@henk717
Copy link

henk717 commented Jan 2, 2025

No because those are different formats for a different older quant method. The same models almost always. have imatrixed GGUF's available and those you want to look for.

@sorasoras
Copy link

Hi, noticed there're some GPTQ converter PRs merged before:

#301 #423

But seems it had been deleted from the master branch ? 🤓

it was replace by imatrix and that's why it was deleted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants