-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in output of quantized Huginn-22b-Prototype #3040
Comments
Converting from GGJTv1 f16 to GGUF f16 before quantizing with the latest commit does not produce a functioning model. So there is most likely a regression in the quantization code. edit: The model quantizes fine using only the latest |
@ikawrakow This appears to be a k-quants issue. If I modify the quantization strategy to use Q8_0 for the output tensor, the model produces correct output. There are no assertion failures, but something is still wrong. |
This should rule out metadata/vocab issues. (By the way, You mentioned the conversion that worked is |
@cebtenzzre There have been no changes to the k_quants quantization code other than adding guards against meaningless weights produced by buggy conversion scripts (#3010). So, if using Did you try running the
and post the output for |
For what it's worth, I do hit the assertion on dadbed9 (pre-GGUF), I just didn't notice it before because assertions are disabled by default:
|
Fixed by 178b185. |
Tested model is Huginn-22b-Prototype.
This is the output of a q4_0 model converted to GGJTv3 around two weeks ago. I believe it was converted and quantized on commit 1f0bccb.
This is the output of a q4_0 model converted to GGUF yesterday on commit 2ba85c8 and quantized today on commit 9912b9e.
It's not total gibberish, but it's quite broken output compared to the original.
I never ran into #2982 until GGUF, so I'd guess something in the convert script is causing the output tensor to have blocks of zeros where there should be data?
edit: converting the q4_0 from GGJTv3 to GGUF with
./convert-llama-ggmlv3-to-gguf.py --eps 1e-5 -c 4096 --metadata-dir ...
produces a functioning model.The text was updated successfully, but these errors were encountered: