Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantize: k_quants.c:73: nearest_int: Assertion `fval <= 4194303.f' failed. #2982

Closed
cebtenzzre opened this issue Sep 3, 2023 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@cebtenzzre
Copy link
Collaborator

While trying to quantize Huginn-22b-Prototype to Q5_0, I ran into this assertion failure while quantizing the output tensor:

[ 331/ 363]                        output.weight - [ 6656, 32000,     1,     1], type =    f16, quantizing to q6_K .. quantize: k_quants.c:73: nearest_int: Assertion `fval <= 4194303.f' failed.
quantize: k_quants.c:73: nearest_int: Assertion `fval <= 4194303.f' failed.

It happens here:

#5  0x00007fed68032d26 in __assert_fail (assertion=0x557583a83303 "fval <= 4194303.f", file=0x557583a832f8 "k_quants.c", 
    line=73, function=0x557583a83338 <__PRETTY_FUNCTION__.31> "nearest_int") at assert.c:101
#6  0x0000557583a6c991 in nearest_int (fval=-nan(0x400000)) at k_quants.c:73
#7  0x0000557583a7142c in quantize_row_q6_K_reference (x=0x7fed18577010, y=0x7fecdd38c210, k=16384) at k_quants.c:1092
#8  0x0000557583a71cad in ggml_quantize_q6_K (src=0x7fed18577010, dst=0x7fecdd38c210, n=16384, k=16384, hist=0x7feccc000b70)
    at k_quants.c:1200
#9  0x00005575839dad38 in ggml_quantize_chunk (type=GGML_TYPE_Q6_K, src=0x7fed18537010, dst=0x7fecdd37f010, start=65536, 
    n=16384, hist=0x7feccc000b70) at ggml.c:19527
@cebtenzzre cebtenzzre added the bug Something isn't working label Sep 3, 2023
@KerfuffleV2
Copy link
Collaborator

#2434 might fix this if implemented?

@ikawrakow
Copy link
Contributor

Does #3010 solve it?

In order to get this assertion, all weights in a block of 256 must be zero. This has never happened before, so I wonder how meaningful this model is. Although I'm somewhat surprised to see NaN instead of Inf as the argument triggering the assert in the nearest_int() function. Any chance there are already NaNs in the fp16 model?

@KerfuffleV2 No, #2434 will not be solving it. The zeros (or NaNs) will remain zeros or NaNs after normalization, so the unforeseen situation will still arise and the assert will still be triggered.

@KerfuffleV2
Copy link
Collaborator

so the unforeseen situation will still arise and the assert will still be triggered.

My mistake. I was thinking it was a model that just had particularly large values in the weights.

jart added a commit to Mozilla-Ocho/llamafile that referenced this issue Apr 1, 2024
This assertion fails when quantizing Mixtral 8x7b as Q5_K_M, because I
used `convert.py --outtype f32` and the Mixtral weights use bf16 which
has a much larger exponent range than the K quantizer is expecting. If
--outtype f16 is used then the assert doesn't fail.

See ggerganov/llama.cpp#2982
cc: @JohannesGaessler
jart added a commit to jart/llama.cpp that referenced this issue Apr 25, 2024
This assertion fails when quantizing Mixtral 8x7b as Q5_K_M, because I
used `convert.py --outtype f32` and the Mixtral weights use bf16 which
has a much larger exponent range than the K quantizer is expecting. If
--outtype f16 is used then the assert doesn't fail.

See ggerganov#2982
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants