Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp sync for SVE support for Q4_K_Ms #109

Closed
a-ghorbani opened this issue Jan 16, 2025 · 4 comments · Fixed by #110
Closed

llama.cpp sync for SVE support for Q4_K_Ms #109

a-ghorbani opened this issue Jan 16, 2025 · 4 comments · Fixed by #110

Comments

@a-ghorbani
Copy link
Contributor

Apologies, I know you just synced a few days ago, but the numbers for this PR look amazing:
ggerganov/llama.cpp#11227 (comment)

@Vali-98
Copy link
Contributor

Vali-98 commented Jan 17, 2025

Hey there, wanted to ask if you actually tested this on device?

As far as I know, SVE isn't not actually implemented by most Android mobile SOCs, and the few which do have limited compatibility (Pixel devices are the biggest offender).

Most SVE implementations seem to be for server-grade ARM, like Graviton.

@a-ghorbani
Copy link
Contributor Author

good point. I'll give it a try and if I see any improvements, at least on any devices I have, I'll report here.

@a-ghorbani
Copy link
Contributor Author

@Vali-98 jup, no improvement on TG or PP on Pixel 9.
I am no expert in this, but seeing sve and sve2 features in the CPU was hoping it would support.

Pixel 9 features:

 Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti ecv afp wfxt

@Vali-98
Copy link
Contributor

Vali-98 commented Jan 23, 2025

features in the CPU was hoping it would support.

Though I don't own a Pixel device, I've read elsewhere that the SVE support was spotty and incomplete. I don't think there is much left to be gained for CPU acceleration on android.

Our best bet would be someone implementing Qualcomm's hexagon APIs for NPUs to llama.cpp, similar to what has been done with PowerServe: https://github.com/powerserve-project/PowerServe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants