Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml_metal_init failure: loading kernel function on Intel based mac #1292

Closed
nchudleigh opened this issue Sep 15, 2023 · 10 comments
Closed

ggml_metal_init failure: loading kernel function on Intel based mac #1292

nchudleigh opened this issue Sep 15, 2023 · 10 comments
Labels
bug Something isn't working solution This issue contains a potential solution

Comments

@nchudleigh
Copy link
Contributor

nchudleigh commented Sep 15, 2023

Using latest on master (951a119) with metal

load pipeline error: Error Domain=CompilerError Code=2 "SC compilation failure

image
❯ ./main -m models/ggml-tiny.en.bin -f samples/jfk.wav
whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB
whisper_init_state: compute buffer (conv)   =   11.17 MB
whisper_init_state: compute buffer (encode) =   61.76 MB
whisper_init_state: compute buffer (cross)  =    3.67 MB
whisper_init_state: compute buffer (decode) =   18.82 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Intel Iris Pro Graphics
ggml_metal_init: found device: AMD Radeon R9 M370X
ggml_metal_init: picking default device: AMD Radeon R9 M370X
ggml_metal_init: loading '/Users/neil/Development/whisper.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                         0x7fa14b207ab0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_add_row                     0x7fa14b208200 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul                         0x7fa14b208950 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_row                     0x7fa14b2090a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_scale                       0x7fa14b2097f0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_silu                        0x7fa14b209f40 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_relu                        0x7fa14b20a690 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_gelu                        0x7fa14b20ade0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_soft_max                    0x7fa14b20b530 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_soft_max_4                  0x7fa14b20bc80 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_diag_mask_inf               0x7fa14b20c3d0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_diag_mask_inf_8             0x7fa14b20cc90 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_f32                0x7fa14b20d3e0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_f16                0x7fa14b20db30 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_0               0x7fa14b20e280 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_1               0x7fa14b20e9d0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q8_0               0x7fa14b20f120 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q2_K               0x7fa14b20f870 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q3_K               0x7fa14b20ffc0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_K               0x7fa14b2109a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q5_K               0x7fa14b210f90 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q6_K               0x7fa14b2116e0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_rms_norm                    0x7fa14b211fb0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_norm                        0x7fa14b212700 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f32_f32             0x7fa14b212e50 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32             0x7fa14b2135a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32_1row        0x7fa14b213cf0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32_l4          0x7fa14b214440 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32            0x7fa14b214b90 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32            0x7fa14b2152e0 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q8_0_f32            0x7fa14b215bb0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32            0x7fa14b216300 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32            0x7fa14b2168b0 | th_max =  512 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32            0x7fa14b217000 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32            0x7fa14b217750 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32            0x7fa14b217ea0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mm_f32_f32                         0x0 | th_max =    0 | th_width =    0
ggml_metal_init: load pipeline error: Error Domain=CompilerError Code=2 "SC compilation failure
There is a call to an undefined label" UserInfo={NSLocalizedDescription=SC compilation failure
There is a call to an undefined label}
whisper_init_state: ggml_metal_init() failed
error: failed to initialize whisper context

@nchudleigh
Copy link
Contributor Author

nchudleigh commented Sep 15, 2023

Issue persists if I force GPU device to be integrated Intel

❯ ./main -m models/ggml-tiny.en.bin -f samples/jfk.wav
whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB
whisper_init_state: compute buffer (conv)   =   11.17 MB
whisper_init_state: compute buffer (encode) =   61.76 MB
whisper_init_state: compute buffer (cross)  =    3.67 MB
whisper_init_state: compute buffer (decode) =   18.82 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Intel Iris Pro Graphics
ggml_metal_init: found device: AMD Radeon R9 M370X
ggml_metal_init: picking default device: Intel Iris Pro Graphics
ggml_metal_init: loading '/Users/neil/Development/whisper.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                         0x6000025f8380 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_add_row                     0x6000025f4300 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul                         0x6000025ec180 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_row                     0x6000025ec500 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale                       0x6000025ec680 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu                        0x6000025f9280 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                        0x6000025f8800 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                        0x6000025ec800 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max                               0x0 | th_max =    0 | th_width =    0
ggml_metal_init: load pipeline error: Error Domain=CompilerError Code=2 "Compiler encountered an internal error" UserInfo={NSLocalizedDescription=Compiler encountered an internal error}
whisper_init_state: ggml_metal_init() failed
error: failed to initialize whisper context

@nchudleigh nchudleigh changed the title Intel based mac SC compilation failure ggml_metal_init failure: loading kernel function on Intel based mac Sep 15, 2023
@nchudleigh
Copy link
Contributor Author

nchudleigh commented Sep 15, 2023

Commenting out these kernal function adds allows completion.

Performance is same as before metal support was merged (80c1512)

image
❯ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
whisper_init_from_file_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: compute buffer (conv)   =   14.10 MB
whisper_init_state: compute buffer (encode) =   81.85 MB
whisper_init_state: compute buffer (cross)  =    4.40 MB
whisper_init_state: compute buffer (decode) =   24.61 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Intel Iris Pro Graphics
ggml_metal_init: found device: AMD Radeon R9 M370X
ggml_metal_init: picking default device: AMD Radeon R9 M370X
ggml_metal_init: loading '/Users/neil/Development/whisper.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                         0x7fb22bf07070 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_add_row                     0x7fb22bf077c0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul                         0x7fb22bf07f10 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_row                     0x7fb22bf08660 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_scale                       0x7fb22bf08db0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_silu                        0x7fb22bf09500 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_relu                        0x7fb22bf09c50 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_gelu                        0x7fb22bf0a3a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_soft_max                    0x7fb22bf0aaf0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_soft_max_4                  0x7fb22bf0b240 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_diag_mask_inf               0x7fb22bf0b990 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_diag_mask_inf_8             0x7fb22bf0c250 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_f32                0x7fb22bf0c9a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_f16                0x7fb22bf0d0f0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_0               0x7fb22e708b40 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_1               0x7fb22e709290 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q8_0               0x7fb22e7099e0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q2_K               0x7fb22e70a130 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q3_K               0x7fb22e70a880 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_K               0x7fb22e70b260 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q5_K               0x7fb22e70b9b0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q6_K               0x7fb22e70c100 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_rms_norm                    0x7fb22e70c9d0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_norm                        0x7fb22e70d120 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f32_f32             0x7fb22e70d870 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32             0x7fb22e70dfc0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32_1row        0x7fb22e70e710 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32_l4          0x7fb22e70ee60 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32            0x7fb22e70f5b0 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32            0x7fb22e70fd00 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q8_0_f32            0x7fb22e7105d0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32            0x7fb22e710d20 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32            0x7fb22e7112d0 | th_max =  512 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32            0x7fb22e711a20 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32            0x7fb22e712170 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32            0x7fb22e7128c0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_rope                        0x7fb22e713190 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_alibi_f32                   0x7fb22e7138e0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_cpy_f32_f16                 0x7fb22e714030 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_cpy_f32_f32                 0x7fb22e714780 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_cpy_f16_f16                 0x7fb22e714ed0 | th_max = 1024 | th_width =   64
ggml_metal_init: hasUnifiedMemory              = false
ggml_metal_init: recommendedMaxWorkingSetSize  =  2048.00 MB
ggml_metal_init: maxTransferRate               = built-in GPU
whisper_init_state: Metal context initialized
whisper_init_state: max tensor size =    50.65 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =   142.00 MB, (  142.27 /  2048.00)
ggml_metal_add_buffer: allocated 'meta_conv       ' buffer, size =     1.47 MB, (  143.74 /  2048.00)
ggml_metal_add_buffer: allocated 'meta_encode     ' buffer, size =     1.47 MB, (  145.21 /  2048.00)
ggml_metal_add_buffer: allocated 'meta_cross      ' buffer, size =     1.47 MB, (  146.69 /  2048.00)
ggml_metal_add_buffer: allocated 'meta_decode     ' buffer, size =     1.47 MB, (  148.16 /  2048.00)
ggml_metal_add_buffer: allocated 'data_conv       ' buffer, size =    12.64 MB, (  160.80 /  2048.00)
ggml_metal_add_buffer: allocated 'data_encode     ' buffer, size =    80.39 MB, (  241.18 /  2048.00)
ggml_metal_add_buffer: allocated 'data_cross      ' buffer, size =     2.93 MB, (  244.12 /  2048.00)
ggml_metal_add_buffer: allocated 'data_decode     ' buffer, size =    23.14 MB, (  267.25 /  2048.00)
ggml_metal_add_buffer: allocated 'kv_cross        ' buffer, size =    17.58 MB, (  284.84 /  2048.00)
ggml_metal_add_buffer: allocated 'kv_self_0       ' buffer, size =     5.25 MB, (  290.09 /  2048.00)

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 1 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

ggml_metal_add_buffer: allocated 'kv_self_1       ' buffer, size =     5.25 MB, (  295.34 /  2048.00)

[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   215.61 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    18.76 ms
whisper_print_timings:   sample time =    18.54 ms /    27 runs (    0.69 ms per run)
whisper_print_timings:   encode time = 31450.64 ms /     1 runs (31450.64 ms per run)
whisper_print_timings:   decode time =   787.62 ms /    27 runs (   29.17 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 32935.19 ms
ggml_metal_free: deallocating

@bobqianic bobqianic added bug Something isn't working solution This issue contains a potential solution labels Sep 17, 2023
@haraldrudell
Copy link

The actual diff to run whisper.cpp on legacy Intel macOS amd64 MacBook Pro:

git diff ggml-metal.m                                                                                                                                                            
diff --git a/ggml-metal.m b/ggml-metal.m                                                                                                                                                                           
index 1139ee3..01e189e 100644
--- a/ggml-metal.m
+++ b/ggml-metal.m
@@ -251,16 +251,16 @@ struct ggml_metal_context * ggml_metal_init(int n_cb) {
         GGML_METAL_ADD_KERNEL(mul_mat_q4_K_f32);
         GGML_METAL_ADD_KERNEL(mul_mat_q5_K_f32);
         GGML_METAL_ADD_KERNEL(mul_mat_q6_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_f32_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_f16_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q4_0_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q8_0_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q4_1_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q2_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q3_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q4_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q5_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q6_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_f32_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_f16_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q4_0_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q8_0_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q4_1_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q2_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q3_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q4_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q5_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q6_K_f32);
         GGML_METAL_ADD_KERNEL(rope);
         GGML_METAL_ADD_KERNEL(alibi_f32);

@haraldrudell
Copy link

Although it runs, it crashes down the line:

The same code runs on any MacOS M and Linux amd64

It just no like the Intels. Those last Intel Macs had 64 GiB RAM and 8 cores which is still OK performance in 2023 if you can stand the fan noise

ggml_metal_add_buffer: allocated 'kv_self_1       ' buffer, size =    15.75 MB, (  738.86 /  1536.00)
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:1290: false
230918 06:14:09-07 whisper.Wait error: signal: "abort trap" ‘signal: abort trap’

@nchudleigh
Copy link
Contributor Author

I got it to run on my 2015 MBP with no crashing with this PR
#1294

@haraldrudell
Copy link

Thanks,

I run small.en on 2020 MacBook Pro 13” 4-core hyper-thread 32 GiB macOS 13.5
just now the thing rebooted without a word.
M also rebooted when I ran too many instances with slab concurrency issues

I’ll try with easier data to see if it completes and stays up

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 |
METAL = 1 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
COREML = 0 | OPENVINO = 0 | 

ggml_metal_init: found device: Intel(R) Iris(TM) Plus Graphics

@haraldrudell
Copy link

haraldrudell commented Sep 18, 2023

Doesn’t work on this Intel. Crashes towards what I believe to be the end

Could be a different issue since it never ran successfully

@fredrik-smedberg
Copy link

fredrik-smedberg commented Sep 19, 2023

I think the issue might span to more than x86/amd64.
Running Whisper.cpp checkout from 2023-09-18, compiled with just "make" on my Macbook Pro 13 M1 gives the error below.

Recompiling whisper.cpp with make clean && WHISPER_NO_METAL=true make disables metal support and makes Whisper run properly on my M1.

System info output from working Whisper:
system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 0 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

Error when Metal is enabled:

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1
ggml_metal_init: picking default device: Apple M1
ggml_metal_init: loading '/Users/fredriksmedberg/Code/whisper-cpp/whisper.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x13d707fc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_add_row                        0x13d708790 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul                            0x13d708cd0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_row                        0x13d6064a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale                          0x13d606b00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu                           0x13d606f20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                           0x13d7090f0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                           0x13d709630 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max                       0x13e804fd0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max_4                     0x13d607620 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf                  0x13d607de0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf_8                0x13d709e70 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f32                   0x13d70a6f0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f16                   0x13d608640 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x13d608ec0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_1                  0x13d6095b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q8_0                  0x13d609ca0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q2_K                  0x13d70acc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q3_K                  0x13e805730 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_K                  0x13d60a0c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_K                  0x13d60a940 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q6_K                  0x13d60b030 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rms_norm                       0x13d70b3c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_norm                           0x13d60ba00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f32_f32                0x13d60c5e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f16_f32                0x13d70bde0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f16_f32_1row           0x13d60cd60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f16_f32_l4             0x13d60d920 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32               0x13d60dfa0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32               0x13d60e740 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q8_0_f32               0x13d70c600 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32               0x13d60edc0 | th_max =  640 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32               0x13d70ce20 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32               0x13d60f560 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32               0x13d60fe20 | th_max =  640 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32               0x13d70d5c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f32_f32                 0x13d610670 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f16_f32                 0x13d610fe0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32                0x13d70d360 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32                0x13d611710 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32                0x13d612080 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32                0x13d70e700 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32                0x13d6127b0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32                0x13e805b50 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32                0x13d613000 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32                0x13d70ef50 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_rope                           0x13d613540 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_alibi_f32                      0x13d70fbf0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x13d710670 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x13d613e10 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x13d614770 | th_max = 1024 | th_width =   32
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 10922.67 MB
ggml_metal_init: maxTransferRate               = built-in GPU
whisper_init_state: Metal context initialized
whisper_init_state: max tensor size =    43.53 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  1034.00 MB, ( 1034.50 / 10922.67)
ggml_metal_add_buffer: allocated 'meta_conv       ' buffer, size =     1.48 MB, ( 1035.98 / 10922.67)
ggml_metal_add_buffer: allocated 'meta_encode     ' buffer, size =     1.48 MB, ( 1037.47 / 10922.67)
ggml_metal_add_buffer: allocated 'meta_cross      ' buffer, size =     1.48 MB, ( 1038.95 / 10922.67)
ggml_metal_add_buffer: allocated 'meta_decode     ' buffer, size =     1.48 MB, ( 1040.44 / 10922.67)
ggml_metal_add_buffer: allocated 'data_conv       ' buffer, size =    30.22 MB, ( 1070.66 / 10922.67)
ggml_metal_add_buffer: allocated 'data_encode     ' buffer, size =   200.97 MB, ( 1271.62 / 10922.67)
ggml_metal_add_buffer: allocated 'data_cross      ' buffer, size =     7.33 MB, ( 1278.95 / 10922.67)
ggml_metal_add_buffer: allocated 'data_decode     ' buffer, size =    57.84 MB, ( 1336.80 / 10922.67)
ggml_metal_add_buffer: allocated 'kv_cross        ' buffer, size =   234.39 MB, ( 1571.19 / 10922.67)
ggml_metal_add_buffer: allocated 'kv_self_0       ' buffer, size =    70.02 MB, ( 1641.20 / 10922.67)

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing '../scripts/only_speech/0.wav' (1792448 samples, 112.0 sec), 4 threads, 1 processors, lang = auto, task = translate, timestamps = 1 ...

GGML_ASSERT: ggml-metal.m:904: false && "MUL MAT-MAT not implemented"
./whisper.sh: line 34: 26276 Abort trap: 6           ./main -m models/ggml-$1-q5_0.bin -f $2 -tr --best-of $BEST_OF --beam-size $BEAM_SIZE --entropy-thold $ENTROPY_THRESHOLD --max-context $MAX_CONTEXT_SIZE -l auto --output-$FORMAT -of $2

@z11h
Copy link

z11h commented Oct 26, 2023

was having this same issue, and running make clean && WHISPER_NO_METAL=true make fixed the problem :)

@bobqianic
Copy link
Collaborator

This issue should have been fixed in the most recent GGML sync. #1422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working solution This issue contains a potential solution
Projects
None yet
Development

No branches or pull requests

5 participants