-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Support GPTQ (Quotes: GPTQModel 4bit can match BF16) #11024
Comments
Just like imatrix |
Llamacpp has the same techniques and in my understanding in two different ways. Either the iMatrix quants since they are generated based off a dataset or AWQ converted quants. Thankfully on Llamacpp's side its optional since it savea so much time and hassle when you just want to make a quick quant. That does mean that if you want this that you need to be looking at GGUF's with an imatrix. I know mrademacher uses i1 to indicate that, other repackers you will have to look if an imatrix.dat was bundled or if its mentioned in the description. You only need an imatrix.dat to make gguf's not to use them so its not always there, but if one is there its a good indicator. |
Alright... so can we just run the existing GPTQ models from huggingface .etc ? |
No because those are different formats for a different older quant method. The same models almost always. have imatrixed GGUF's available and those you want to look for. |
Prerequisites
Feature Description
Hello,
Great work here :D
Is it possible that llama.cpp support GPTQ quantized models ?
The GPTQ quantized model had the advantage that it was fine-tuned with a dataset, which I think is a good reason to support GPTQ:
Quotes from the GPTQModel repo:
https://github.com/ModelCloud/GPTQModel
Motivation
The GPTQ quantized model had the advantage that it can be fine-tuned/calibrated with a dataset, and for further fine tuning.
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: