Regression in output of quantized Huginn-22b-Prototype #3040

cebtenzzre · 2023-09-06T02:37:58Z

This is the output of a q4_0 model converted to GGJTv3 around two weeks ago. I believe it was converted and quantized on commit 1f0bccb.

$ ./main -ngl 100 -n 50 --ignore-eos -m huginn-22b-prototype.ggmlv3.q4_0.bin -p 'This is a story about a quick brown fox.'
<snip>
llama_model_load_internal: format     = ggjt v3 (latest)
<snip>
 This is a story about a quick brown fox.
The fox was not, in fact, named Brown. She was a young vixen and her fur was red, with the occasional white ear-tipper. She was small and lean, suited to life in the wilds of

This is the output of a q4_0 model converted to GGUF yesterday on commit 2ba85c8 and quantized today on commit 9912b9e.

$ ./main -ngl 100 -n 50 --ignore-eos -m huginn-22b-prototype.q4_0.gguf -p 'This is a story about a quick brown fox.'
<snip>
llm_load_print_meta: format         = GGUF V2 (latest)
<snip>
 This is a story about a quick brown fox. In case the title'' of the animal remfined't us' -here' the youngest of the sister' H'sing' with' the elder' the young' un' -' H'are'tudes' the

It's not total gibberish, but it's quite broken output compared to the original.

I never ran into #2982 until GGUF, so I'd guess something in the convert script is causing the output tensor to have blocks of zeros where there should be data?

edit: converting the q4_0 from GGJTv3 to GGUF with ./convert-llama-ggmlv3-to-gguf.py --eps 1e-5 -c 4096 --metadata-dir ... produces a functioning model.

The text was updated successfully, but these errors were encountered:

cebtenzzre · 2023-09-06T06:02:06Z

Converting from GGJTv1 f16 to GGUF f16 before quantizing with the latest commit does not produce a functioning model. So there is most likely a regression in the quantization code.

edit: The model quantizes fine using only the latest convert.py and quantize if I use --leave-output-tensor.

cebtenzzre · 2023-09-06T06:33:35Z

@ikawrakow This appears to be a k-quants issue. If I modify the quantization strategy to use Q8_0 for the output tensor, the model produces correct output. There are no assertion failures, but something is still wrong.

KerfuffleV2 · 2023-09-06T08:10:40Z

converting the q4_0 from GGJTv3 to GGUF with ./convert-llama-ggmlv3-to-gguf.py --eps 1e-5 -c 4096 --metadata-dir ... produces a functioning model.

This should rule out metadata/vocab issues. (By the way, --eps, -c don't get used if you specify --metadata-dir.)

You mentioned the conversion that worked is q4_0, have you also tried quantizing to q4_0 from the f16 model?

ikawrakow · 2023-09-06T08:41:16Z

@cebtenzzre There have been no changes to the k_quants quantization code other than adding guards against meaningless weights produced by buggy conversion scripts (#3010). So, if using Q6_K for the output.weight tensor does not work for this specific model, it is definitely not due to a regression in the k_quants quantization.

Did you try running the quantize-stats tool? If not, please use

./quantize-stats -m your_f16_model.gguf -t q6_K -p
./quantize-stats -m your_f16_model.gguf -t q8_0 -p

and post the output for q6_K::output.weight and q8_0::output.weight.

cebtenzzre · 2023-09-06T15:20:19Z

For what it's worth, I do hit the assertion on dadbed9 (pre-GGUF), I just didn't notice it before because assertions are disabled by default:

quantize: k_quants.c:53: nearest_int: Assertion `fval <= 4194303.f' failed.
quantize: k_quants.c:53: nearest_int: Assertion `fval <= 4194303.f' failed.

cebtenzzre · 2023-09-06T15:27:04Z

Fixed by 178b185.

cebtenzzre added bug Something isn't working generation quality Quality of model output labels Sep 6, 2023

ggerganov added a commit that referenced this issue Sep 6, 2023

k-quants : fix zero-weight guard in Q6_K (ref #3040)

178b185

cebtenzzre closed this as completed Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in output of quantized Huginn-22b-Prototype #3040

Regression in output of quantized Huginn-22b-Prototype #3040

cebtenzzre commented Sep 6, 2023 •

edited

Loading

cebtenzzre commented Sep 6, 2023 •

edited

Loading

cebtenzzre commented Sep 6, 2023

KerfuffleV2 commented Sep 6, 2023

ikawrakow commented Sep 6, 2023

cebtenzzre commented Sep 6, 2023

cebtenzzre commented Sep 6, 2023

Regression in output of quantized Huginn-22b-Prototype #3040

Regression in output of quantized Huginn-22b-Prototype #3040

Comments

cebtenzzre commented Sep 6, 2023 • edited Loading

cebtenzzre commented Sep 6, 2023 • edited Loading

cebtenzzre commented Sep 6, 2023

KerfuffleV2 commented Sep 6, 2023

ikawrakow commented Sep 6, 2023

cebtenzzre commented Sep 6, 2023

cebtenzzre commented Sep 6, 2023

cebtenzzre commented Sep 6, 2023 •

edited

Loading

cebtenzzre commented Sep 6, 2023 •

edited

Loading