-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279
Comments
FWIW, seems to work for me on Windows 11, Intel Core i7 on latest
|
In case you haven't tried it, remove llama.cpp directory and just download / git pull again from scratch. Build scripts don't really handle it well if you change compile flags or update without cleaning from previous build, sometimes it gets borked like this and produces invalid binaries. If you try with fresh llama.cpp and you still have that problem, post the exact steps and commands you used to build it, it's kind of hard to guess otherwise. |
Did you build with |
Thank you all for the helpful responses. Yes, I've already tried to use a clean repository and I used Here are the steps to recreate: $ git clone https://github.com/ggerganov/llama.cpp
$ cd llama.cpp
$ make
$ python convert.py ../llama/llama-2-7b-chat
$ ./quantize ../llama/llama-2-7b-chat/ggml-model-f16.gguf models/llama-2-7b-chat.Q5_K.gguf Q5_K So as there seems to be no one else having this problem, I think is that this error is specific to my setup. I have a 2021, 16-inch MacBook Pro with an M1 Max and 32 GB of RAM. But I really doubt that this is the reason for my error. The only thing that is individual to my use case might be the f16 gguf file I use to quantize. I created it with the So my next steps would be to download a float16 gguf from somewhere and quantize that... I will also look into #3273 - even though I didn't use cmake (although building it with cmake - again from a clean repository - yields the same results). I would be really happy about any other ideas! Thanks! |
Oh very interesting! I checked out the PR #3273 and compiled using So, it seems that Should I mark this as "resolved"? |
@thilomichael |
I'm seeing a similar issue (macOS 14.0, Apple clang 15.0.0) with an illegal instruction deep in This seems like a toolchain bug. Compiling this file with |
Besides #3273 waiting for merge, Since The most reasonable course of action is still simply trying make vs cmake, and perhaps something as simple as system/toolchain update might help too. If you encounter such problem, and find a combination that works, just stick with it. As a sidenote, optimization level shouldn't influence linking stage afaik, so that |
I have same problem on my M1 Max Mac Studio with macOS Sonoma and clang version 15.0.0 (clang-1500.0.40.1). Also changing -O3 to -O2 removes the error. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
When I try to quantize any model (e.g., llama-2-7b-chat) with the command
I receive an error "illegal hardware instruction"
I'm using a MacBook Pro M1 with 32 GB of RAM. I tried compiling llama.cpp with METAL support and without, but with both configurations I've got this error. I already checked if my quantization binary is built for the correct architecture:
I tried to find out which instruction is not supported and used lldb and got the following output:
Does anyone else have this problem? I've already tried redownloading llama, converting the weights (through
convert.py
) but always run into that problem. I get the error for bothQ4_K
(which should be the same asQ4_K_M
) andQ5_K
. Interestingly, forQ6_K
,Q8_0
,Q4_0
,Q4_1
, etc. everything runs fine.The text was updated successfully, but these errors were encountered: