Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

thilomichael · 2023-09-20T15:23:04Z

When I try to quantize any model (e.g., llama-2-7b-chat) with the command

$ ./quantize models/llama-2-7b-chat-f16.gguf models/test.gguf Q5_K

I receive an error "illegal hardware instruction"

[   1/ 291]                    token_embd.weight - [ 4096, 32000,     1,     1], type =    f16, quantizing to q5_K .. [1]    92878 illegal hardware instruction  ./quantize models/llama-2-7b-chat-f16.gguf  Q5_K

I'm using a MacBook Pro M1 with 32 GB of RAM. I tried compiling llama.cpp with METAL support and without, but with both configurations I've got this error. I already checked if my quantization binary is built for the correct architecture:

$ file quantize
quantize: Mach-O 64-bit executable arm64

I tried to find out which instruction is not supported and used lldb and got the following output:

* thread #2, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #3, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
Target 0: (quantize) stopped.

Does anyone else have this problem? I've already tried redownloading llama, converting the weights (through convert.py) but always run into that problem. I get the error for both Q4_K (which should be the same as Q4_K_M) and Q5_K. Interestingly, for Q6_K, Q8_0, Q4_0, Q4_1, etc. everything runs fine.

The text was updated successfully, but these errors were encountered:

goerch · 2023-09-20T21:40:33Z

FWIW, seems to work for me on Windows 11, Intel Core i7 on latest main, CPU only, no special BLAS with

python.exe convert.py models\llama-2\llama-2-7B
.\build\bin\Release\quantize models\llama-2\llama-2-7B\ggml-model-f16.gguf models\llama-2\llama-2-7B\ggml-model-q5_K.gguf Q5_K

staviq · 2023-09-20T21:55:37Z

In case you haven't tried it, remove llama.cpp directory and just download / git pull again from scratch.

Build scripts don't really handle it well if you change compile flags or update without cleaning from previous build, sometimes it gets borked like this and produces invalid binaries.

If you try with fresh llama.cpp and you still have that problem, post the exact steps and commands you used to build it, it's kind of hard to guess otherwise.

alonfaraj · 2023-09-21T09:22:46Z

Did you build with CMake?
Just wondering if #3273 might be related?

thilomichael · 2023-09-21T12:45:38Z

Thank you all for the helpful responses. Yes, I've already tried to use a clean repository and I used make to build llama.cpp.

Here are the steps to recreate:

$ git clone https://github.com/ggerganov/llama.cpp
$ cd llama.cpp
$ make
$ python convert.py ../llama/llama-2-7b-chat
$ ./quantize ../llama/llama-2-7b-chat/ggml-model-f16.gguf models/llama-2-7b-chat.Q5_K.gguf Q5_K

So as there seems to be no one else having this problem, I think is that this error is specific to my setup. I have a 2021, 16-inch MacBook Pro with an M1 Max and 32 GB of RAM. But I really doubt that this is the reason for my error. The only thing that is individual to my use case might be the f16 gguf file I use to quantize. I created it with the convert.py from the llama-2 weights and it works without any errors (e.g., when running it with ./main).

So my next steps would be to download a float16 gguf from somewhere and quantize that... I will also look into #3273 - even though I didn't use cmake (although building it with cmake - again from a clean repository - yields the same results).

I would be really happy about any other ideas! Thanks!

thilomichael · 2023-09-21T13:36:05Z

Oh very interesting! I checked out the PR #3273 and compiled using cmake and now it works!

So, it seems that make is also using the "wrong" flags, but I know too little about that to be sure.

Should I mark this as "resolved"?

alonfaraj · 2023-09-21T13:40:01Z

@thilomichael
Glad it solved! 🙂
Maybe it worth mentioning the make problem in the PR

duskwuff · 2023-09-27T20:44:27Z

I'm seeing a similar issue (macOS 14.0, Apple clang 15.0.0) with an illegal instruction deep in quantize_row_q4_K_reference. The instruction in question is fine in the .o file (fcsel s2, s23, s2, gt = 1e22cee2), but for some reason ends up getting mangled to the invalid 1e011ee2 during linking (!!!).

This seems like a toolchain bug. Compiling this file with -O2 seems to mitigate the issue, though.

staviq · 2023-09-27T21:48:52Z

Besides #3273 waiting for merge,

Since native compilation does provide meaningful performance gains, and toolchain bugs aren't something that can be reasonably eliminated "for sure"

The most reasonable course of action is still simply trying make vs cmake, and perhaps something as simple as system/toolchain update might help too.

If you encounter such problem, and find a combination that works, just stick with it.

As a sidenote, optimization level shouldn't influence linking stage afaik, so that O2 thing might just be a happy accident causing that bit of the code to be compiled into a different arrangement of instructions, which do not trigger whatever bug happens during the linking stage.

beebopkim · 2023-11-11T17:01:39Z

I have same problem on my M1 Max Mac Studio with macOS Sonoma and clang version 15.0.0 (clang-1500.0.40.1). Also changing -O3 to -O2 removes the error.

github-actions · 2024-04-03T01:15:53Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

thilomichael mentioned this issue Sep 21, 2023

Make CMake LLAMA_NATIVE flag actually use the instructions supported by the processor #3273

Merged

alonfaraj mentioned this issue Sep 26, 2023

[User] main program built by cmake crashed due to Illegal instruction #3339

Closed

4 tasks

slaren mentioned this issue Oct 4, 2023

sync : ggml (conv 1d + 2d updates) #3468

Merged

github-actions bot added the stale label Mar 20, 2024

github-actions bot closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

thilomichael commented Sep 20, 2023 •

edited

Loading

goerch commented Sep 20, 2023 •

edited

Loading

staviq commented Sep 20, 2023

alonfaraj commented Sep 21, 2023

thilomichael commented Sep 21, 2023

thilomichael commented Sep 21, 2023

alonfaraj commented Sep 21, 2023

duskwuff commented Sep 27, 2023

staviq commented Sep 27, 2023

beebopkim commented Nov 11, 2023 •

edited

Loading

github-actions bot commented Apr 3, 2024

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

Comments

thilomichael commented Sep 20, 2023 • edited Loading

goerch commented Sep 20, 2023 • edited Loading

staviq commented Sep 20, 2023

alonfaraj commented Sep 21, 2023

thilomichael commented Sep 21, 2023

thilomichael commented Sep 21, 2023

alonfaraj commented Sep 21, 2023

duskwuff commented Sep 27, 2023

staviq commented Sep 27, 2023

beebopkim commented Nov 11, 2023 • edited Loading

github-actions bot commented Apr 3, 2024

thilomichael commented Sep 20, 2023 •

edited

Loading

goerch commented Sep 20, 2023 •

edited

Loading

beebopkim commented Nov 11, 2023 •

edited

Loading