Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in output of quantized Huginn-22b-Prototype #3040

Closed
cebtenzzre opened this issue Sep 6, 2023 · 6 comments
Closed

Regression in output of quantized Huginn-22b-Prototype #3040

cebtenzzre opened this issue Sep 6, 2023 · 6 comments
Labels
bug Something isn't working generation quality Quality of model output

Comments

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 6, 2023

Tested model is Huginn-22b-Prototype.

This is the output of a q4_0 model converted to GGJTv3 around two weeks ago. I believe it was converted and quantized on commit 1f0bccb.

$ ./main -ngl 100 -n 50 --ignore-eos -m huginn-22b-prototype.ggmlv3.q4_0.bin -p 'This is a story about a quick brown fox.'
<snip>
llama_model_load_internal: format     = ggjt v3 (latest)
<snip>
 This is a story about a quick brown fox.
The fox was not, in fact, named Brown. She was a young vixen and her fur was red, with the occasional white ear-tipper. She was small and lean, suited to life in the wilds of

This is the output of a q4_0 model converted to GGUF yesterday on commit 2ba85c8 and quantized today on commit 9912b9e.

$ ./main -ngl 100 -n 50 --ignore-eos -m huginn-22b-prototype.q4_0.gguf -p 'This is a story about a quick brown fox.'
<snip>
llm_load_print_meta: format         = GGUF V2 (latest)
<snip>
 This is a story about a quick brown fox. In case the title'' of the animal remfined't us' -here' the youngest of the sister' H'sing' with' the elder' the young' un' -' H'are'tudes' the

It's not total gibberish, but it's quite broken output compared to the original.

I never ran into #2982 until GGUF, so I'd guess something in the convert script is causing the output tensor to have blocks of zeros where there should be data?

edit: converting the q4_0 from GGJTv3 to GGUF with ./convert-llama-ggmlv3-to-gguf.py --eps 1e-5 -c 4096 --metadata-dir ... produces a functioning model.

@cebtenzzre cebtenzzre added bug Something isn't working generation quality Quality of model output labels Sep 6, 2023
@cebtenzzre
Copy link
Collaborator Author

cebtenzzre commented Sep 6, 2023

Converting from GGJTv1 f16 to GGUF f16 before quantizing with the latest commit does not produce a functioning model. So there is most likely a regression in the quantization code.

edit: The model quantizes fine using only the latest convert.py and quantize if I use --leave-output-tensor.

@cebtenzzre
Copy link
Collaborator Author

@ikawrakow This appears to be a k-quants issue. If I modify the quantization strategy to use Q8_0 for the output tensor, the model produces correct output. There are no assertion failures, but something is still wrong.

@KerfuffleV2
Copy link
Collaborator

converting the q4_0 from GGJTv3 to GGUF with ./convert-llama-ggmlv3-to-gguf.py --eps 1e-5 -c 4096 --metadata-dir ... produces a functioning model.

This should rule out metadata/vocab issues. (By the way, --eps, -c don't get used if you specify --metadata-dir.)

You mentioned the conversion that worked is q4_0, have you also tried quantizing to q4_0 from the f16 model?

@ikawrakow
Copy link
Contributor

@cebtenzzre There have been no changes to the k_quants quantization code other than adding guards against meaningless weights produced by buggy conversion scripts (#3010). So, if using Q6_K for the output.weight tensor does not work for this specific model, it is definitely not due to a regression in the k_quants quantization.

Did you try running the quantize-stats tool? If not, please use

./quantize-stats -m your_f16_model.gguf -t q6_K -p
./quantize-stats -m your_f16_model.gguf -t q8_0 -p

and post the output for q6_K::output.weight and q8_0::output.weight.

@cebtenzzre
Copy link
Collaborator Author

For what it's worth, I do hit the assertion on dadbed9 (pre-GGUF), I just didn't notice it before because assertions are disabled by default:

quantize: k_quants.c:53: nearest_int: Assertion `fval <= 4194303.f' failed.
quantize: k_quants.c:53: nearest_int: Assertion `fval <= 4194303.f' failed.

@cebtenzzre
Copy link
Collaborator Author

Fixed by 178b185.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working generation quality Quality of model output
Projects
None yet
Development

No branches or pull requests

3 participants