Quality of 4-bit quantization #62

thement · 2023-03-12T21:05:56Z

The quality of the 4-bit quantization is really abysmal compared to both non-quantized models and GPTQ quantization
(https://github.com/qwopqwop200/GPTQ-for-LLaMa). Wouldn't it make sense for llama.cpp to load already-prequantized LLaMa models?

jminardi · 2023-03-12T21:15:46Z

Do you have examples showing the poor quality here and higher quality with other quantization models? Are you sure the hyper parameters are all the same?

Dicklesworthstone · 2023-03-13T02:55:40Z

I agree that this seems like the single biggest "bang for the buck" of quality improvement versus effort. Also, I wonder if using this technique and the 65B model, you could get down to 3 bits or even 2bit (say, only for the last 25% of the layers?).

On top of that, using some kind of high speed streaming compression (zstd for example), perhaps the quantized weights could be reduced even more, which might help with model load speed (assuming you are IO bound rather than compute bound during loading).

I wish I knew more C++ to help with this.

thement · 2023-03-13T08:45:47Z

Do you have examples showing the poor quality here and higher quality with other quantization models? Are you sure the hyper parameters are all the same?

No, unfortunately I didn't try to set the parameters to be exactly the same, but the output of 65B 4-bit quantized llama.cpp was obviously rubbish and I wasn't able to get good results from it:

...
llama_model_load: loading model part 8/8 from './models/65B/ggml-model-q4_0.bin.7'
llama_model_load: .......................................................................................... done
llama_model_load: model size =  4869.09 MB / num tensors = 723

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

Building a website can be done in 10 simple steps:
Step One – Pick Your Domain Name (Your Website Address)
Domain names are the addresses of websites, and every single site has one. A domain name is an address assigned to you so that others may find your business online. It’s like having a unique physical street or mailing address for each company; without it people would have no idea where to look when they want something from a specific website!
The most popular types of websites are those used by companies as their main marketing tool, and typically these will end with .com (for “company”), so that visitors know exactly what kind of site this is. If you’re building your business online then we recommend choosing one of the many available TLDs ending in ‘dot com.’
Step Two – Pick Your Host Name And URL Extension (.COM, ORG., ETC.) Is it www or Non-www? Does It Really Matter and Should You Care About This Stuff as a Newbie Webmaster Building A Website For The First Time Ever ??. You betcha !!! To make things easy for you , we recommend using WordPress on Cloud.
Step Three – Choose Your Platform (The Technology And Framework That Will Power Up Your Site) Hosting is like the land your house sits upon; technically, it’s not part of building a website as such since websites can be built anywhere—but without hosting there would just be nowhere for them to go.

ggerganov · 2023-03-13T17:24:55Z

Better quantization will be added in the future (#9)

ggerganov added the duplicate This issue or pull request already exists label Mar 13, 2023

ggerganov closed this as completed Mar 13, 2023

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Fix threading bug. Closes ggerganov#62

19598ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality of 4-bit quantization #62

Quality of 4-bit quantization #62

thement commented Mar 12, 2023

jminardi commented Mar 12, 2023

Dicklesworthstone commented Mar 13, 2023 •

edited

Loading

thement commented Mar 13, 2023

ggerganov commented Mar 13, 2023 •

edited

Loading

Quality of 4-bit quantization #62

Quality of 4-bit quantization #62

Comments

thement commented Mar 12, 2023

jminardi commented Mar 12, 2023

Dicklesworthstone commented Mar 13, 2023 • edited Loading

thement commented Mar 13, 2023

ggerganov commented Mar 13, 2023 • edited Loading

Dicklesworthstone commented Mar 13, 2023 •

edited

Loading

ggerganov commented Mar 13, 2023 •

edited

Loading