Skip to content

Commit

Permalink
Clamp out of range values in K quantizer
Browse files Browse the repository at this point in the history
This assertion fails when quantizing Mixtral 8x7b as Q5_K_M, because I
used `convert.py --outtype f32` and the Mixtral weights use bf16 which
has a much larger exponent range than the K quantizer is expecting. If
--outtype f16 is used then the assert doesn't fail.

See ggerganov/llama.cpp#2982
cc: @JohannesGaessler
  • Loading branch information
jart committed Apr 1, 2024
1 parent a8b0b15 commit ef0307e
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion llama.cpp/ggml-quants.c
Original file line number Diff line number Diff line change
Expand Up @@ -1314,7 +1314,11 @@ void dequantize_row_q8_0(const block_q8_0 * restrict x, float * restrict y, int
// ===================== Helper functions
//
static inline int nearest_int(float fval) {
assert(fval <= 4194303.f);

// [jart] https://github.com/ggerganov/llama.cpp/issues/2982
// assert(fval <= 4194303.f);
fval = fminf(fval, 4194303.f);

float val = fval + 12582912.f;
int i; memcpy(&i, &val, sizeof(int));
return (i & 0x007fffff) - 0x00400000;
Expand Down

1 comment on commit ef0307e

@JohannesGaessler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, it seems my assumptions about model weight ranges were incorrect. I really did not expect individual weights to be this large.

Please sign in to comment.