-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use vdotq_s32 to improve performance #67
Conversation
Data point: no fault on M1 Ultra (16 threads, 65B), same bump:
|
Can confirm that there are no errors on M1 Max either. |
Cannot run ./main on Macbook Air M1 anymore. But it works on Raspberry Pi 4. (llama.cpp) @mio: llama.cpp $ ./main -m ./models/7B/ggml-model-q4_0.bin |
@miolini Just in case, can you do |
@ggerganov I did it many times. No luck. But I guess I found the problem. It says it now builds x86_64 binary after git pull. (llama.cpp) @mio: llama.cpp $ file ./main |
Build log (llama.cpp) @mio: llama.cpp $ make cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mf16c -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o options: c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize -framework Accelerate |
Source of problem - pipenv makes shell with wring environment. I will try to fix it on my side. |
@miolini Could you please tell how you fixed? I am having the same issue on Macbook with conda, but it works with MacMini. |
I observe 10% performance improvement on M1 Pro with 8 threads
However, it seems to cause illegal instruction on M1 Air
https://twitter.com/miolini/status/1635055060316200960
Need to figure out why.
Would be nice to confirm