-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : fix quants nans when all the group weights are very close to zero #7313
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way the change in threshold has any significant effects on the results. Even a threshold of
I tried progressively higher values and found that some quants still fail with |
0e1c4f6
to
f59edee
Compare
While increasing |
To clarify, I'm not sure how generalizable my results are to other models; I think the model for which the fix is needed at least should also be checked since that particular model seems to have some blocks with only very small values. |
I have tried to find the lowest possible eps for the quants that require lower than I don't really like this solution, I think the best way to handle this would be to check for zero before doing the division, but that would require deeper changes, the code is not very easy to follow, and don't want to risk introducing bugs that may cause models with bad quants to be distributed. |
6b41894
to
61e8a0a
Compare
61e8a0a
to
f07e570
Compare
When the group abs max value is very close to zero but not zero, it may still result in a division by zero when computing the scale, which ends with a
nan
scale. To avoid this, we check the max value against an epsilon instead of zero. With the IQ quants, this could also result in aOops: found point %u not on grid
error.While doing this, I noticed that there was already a similar check with
1e-30
epsilon inmake_qx_quants
, however values this small can still result innan
, so I bumped it to1e-20
and extended it to all the cases that I could find. I used the commented code intest-backend-ops
to find these cases. It is possible that an even higher epsilon may be necessary.I don't expect this to result in lower precision in the quants since the epsilon is so small, but it may be worth checking.
Fixes #7311.