Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

Closed
thilomichael opened this issue Sep 20, 2023 · 10 comments
Closed
Labels

Comments

@thilomichael
Copy link

thilomichael commented Sep 20, 2023

When I try to quantize any model (e.g., llama-2-7b-chat) with the command

$ ./quantize models/llama-2-7b-chat-f16.gguf models/test.gguf Q5_K

I receive an error "illegal hardware instruction"

[   1/ 291]                    token_embd.weight - [ 4096, 32000,     1,     1], type =    f16, quantizing to q5_K .. [1]    92878 illegal hardware instruction  ./quantize models/llama-2-7b-chat-f16.gguf  Q5_K

I'm using a MacBook Pro M1 with 32 GB of RAM. I tried compiling llama.cpp with METAL support and without, but with both configurations I've got this error. I already checked if my quantization binary is built for the correct architecture:

$ file quantize
quantize: Mach-O 64-bit executable arm64

I tried to find out which instruction is not supported and used lldb and got the following output:

* thread #2, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #3, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
Target 0: (quantize) stopped.

Does anyone else have this problem? I've already tried redownloading llama, converting the weights (through convert.py) but always run into that problem. I get the error for both Q4_K (which should be the same as Q4_K_M) and Q5_K. Interestingly, for Q6_K, Q8_0, Q4_0, Q4_1, etc. everything runs fine.

@goerch
Copy link
Collaborator

goerch commented Sep 20, 2023

FWIW, seems to work for me on Windows 11, Intel Core i7 on latest main, CPU only, no special BLAS with

python.exe convert.py models\llama-2\llama-2-7B
.\build\bin\Release\quantize models\llama-2\llama-2-7B\ggml-model-f16.gguf models\llama-2\llama-2-7B\ggml-model-q5_K.gguf Q5_K

@staviq
Copy link
Contributor

staviq commented Sep 20, 2023

In case you haven't tried it, remove llama.cpp directory and just download / git pull again from scratch.

Build scripts don't really handle it well if you change compile flags or update without cleaning from previous build, sometimes it gets borked like this and produces invalid binaries.

If you try with fresh llama.cpp and you still have that problem, post the exact steps and commands you used to build it, it's kind of hard to guess otherwise.

@alonfaraj
Copy link
Contributor

Did you build with CMake?
Just wondering if #3273 might be related?

@thilomichael
Copy link
Author

Thank you all for the helpful responses. Yes, I've already tried to use a clean repository and I used make to build llama.cpp.

Here are the steps to recreate:

$ git clone https://github.com/ggerganov/llama.cpp
$ cd llama.cpp
$ make
$ python convert.py ../llama/llama-2-7b-chat
$ ./quantize ../llama/llama-2-7b-chat/ggml-model-f16.gguf models/llama-2-7b-chat.Q5_K.gguf Q5_K

So as there seems to be no one else having this problem, I think is that this error is specific to my setup. I have a 2021, 16-inch MacBook Pro with an M1 Max and 32 GB of RAM. But I really doubt that this is the reason for my error. The only thing that is individual to my use case might be the f16 gguf file I use to quantize. I created it with the convert.py from the llama-2 weights and it works without any errors (e.g., when running it with ./main).

So my next steps would be to download a float16 gguf from somewhere and quantize that... I will also look into #3273 - even though I didn't use cmake (although building it with cmake - again from a clean repository - yields the same results).

I would be really happy about any other ideas! Thanks!

@thilomichael
Copy link
Author

Oh very interesting! I checked out the PR #3273 and compiled using cmake and now it works!

So, it seems that make is also using the "wrong" flags, but I know too little about that to be sure.

Should I mark this as "resolved"?

@alonfaraj
Copy link
Contributor

@thilomichael
Glad it solved! 🙂
Maybe it worth mentioning the make problem in the PR

@duskwuff
Copy link

I'm seeing a similar issue (macOS 14.0, Apple clang 15.0.0) with an illegal instruction deep in quantize_row_q4_K_reference. The instruction in question is fine in the .o file (fcsel s2, s23, s2, gt = 1e22cee2), but for some reason ends up getting mangled to the invalid 1e011ee2 during linking (!!!).

This seems like a toolchain bug. Compiling this file with -O2 seems to mitigate the issue, though.

@staviq
Copy link
Contributor

staviq commented Sep 27, 2023

Besides #3273 waiting for merge,

Since native compilation does provide meaningful performance gains, and toolchain bugs aren't something that can be reasonably eliminated "for sure"

The most reasonable course of action is still simply trying make vs cmake, and perhaps something as simple as system/toolchain update might help too.

If you encounter such problem, and find a combination that works, just stick with it.

As a sidenote, optimization level shouldn't influence linking stage afaik, so that O2 thing might just be a happy accident causing that bit of the code to be compiled into a different arrangement of instructions, which do not trigger whatever bug happens during the linking stage.

@beebopkim
Copy link

beebopkim commented Nov 11, 2023

I have same problem on my M1 Max Mac Studio with macOS Sonoma and clang version 15.0.0 (clang-1500.0.40.1). Also changing -O3 to -O2 removes the error.

@github-actions github-actions bot added the stale label Mar 20, 2024
Copy link
Contributor

github-actions bot commented Apr 3, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants