-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot #11917
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve the formatting of the code to be more consistent with the rest of the code. I've given a few hints below.
Thank you. Improved the code formatting for consistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't ran any tests myself, so taking a small leap of faith here, assuming you've done all the necessary tests for this change.
Thank you! We've done all the necessary tests for this change. |
This PR introduces support for SVE (Scalable Vector Extensions) kernels for the q3_K_q8_K vector dot on the Arm architecture. A similar proposal for SVE support is made in PR #7433 and #11227.
This PR contains the SVE implementation of the vector dot used to compute the Q3_K quantization.
By running a Q3_K quantized model of mistral-7b-v01, on Graviton 3 (Perf 01 XL), Accuracy and Performance are measured.
Performance
The performance enhancement with this PR (SVE) is ~ x1.02 to x1.15 faster than the NEON implementation.
The command used to measure the performance is
Perplexity
I also verified that perplexity matches between the NEON and SVE Implementation.