ggml_metal_init failure: loading kernel function on Intel based mac #1292

nchudleigh · 2023-09-15T20:34:00Z

Using latest on master (951a119) with metal

load pipeline error: Error Domain=CompilerError Code=2 "SC compilation failure

❯ ./main -m models/ggml-tiny.en.bin -f samples/jfk.wav
whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB
whisper_init_state: compute buffer (conv)   =   11.17 MB
whisper_init_state: compute buffer (encode) =   61.76 MB
whisper_init_state: compute buffer (cross)  =    3.67 MB
whisper_init_state: compute buffer (decode) =   18.82 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Intel Iris Pro Graphics
ggml_metal_init: found device: AMD Radeon R9 M370X
ggml_metal_init: picking default device: AMD Radeon R9 M370X
ggml_metal_init: loading '/Users/neil/Development/whisper.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                         0x7fa14b207ab0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_add_row                     0x7fa14b208200 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul                         0x7fa14b208950 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_row                     0x7fa14b2090a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_scale                       0x7fa14b2097f0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_silu                        0x7fa14b209f40 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_relu                        0x7fa14b20a690 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_gelu                        0x7fa14b20ade0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_soft_max                    0x7fa14b20b530 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_soft_max_4                  0x7fa14b20bc80 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_diag_mask_inf               0x7fa14b20c3d0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_diag_mask_inf_8             0x7fa14b20cc90 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_f32                0x7fa14b20d3e0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_f16                0x7fa14b20db30 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_0               0x7fa14b20e280 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_1               0x7fa14b20e9d0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q8_0               0x7fa14b20f120 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q2_K               0x7fa14b20f870 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q3_K               0x7fa14b20ffc0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_K               0x7fa14b2109a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q5_K               0x7fa14b210f90 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q6_K               0x7fa14b2116e0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_rms_norm                    0x7fa14b211fb0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_norm                        0x7fa14b212700 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f32_f32             0x7fa14b212e50 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32             0x7fa14b2135a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32_1row        0x7fa14b213cf0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32_l4          0x7fa14b214440 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32            0x7fa14b214b90 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32            0x7fa14b2152e0 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q8_0_f32            0x7fa14b215bb0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32            0x7fa14b216300 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32            0x7fa14b2168b0 | th_max =  512 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32            0x7fa14b217000 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32            0x7fa14b217750 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32            0x7fa14b217ea0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mm_f32_f32                         0x0 | th_max =    0 | th_width =    0
ggml_metal_init: load pipeline error: Error Domain=CompilerError Code=2 "SC compilation failure
There is a call to an undefined label" UserInfo={NSLocalizedDescription=SC compilation failure
There is a call to an undefined label}
whisper_init_state: ggml_metal_init() failed
error: failed to initialize whisper context

The text was updated successfully, but these errors were encountered:

nchudleigh · 2023-09-15T20:37:52Z

Issue persists if I force GPU device to be integrated Intel

❯ ./main -m models/ggml-tiny.en.bin -f samples/jfk.wav
whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB
whisper_init_state: compute buffer (conv)   =   11.17 MB
whisper_init_state: compute buffer (encode) =   61.76 MB
whisper_init_state: compute buffer (cross)  =    3.67 MB
whisper_init_state: compute buffer (decode) =   18.82 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Intel Iris Pro Graphics
ggml_metal_init: found device: AMD Radeon R9 M370X
ggml_metal_init: picking default device: Intel Iris Pro Graphics
ggml_metal_init: loading '/Users/neil/Development/whisper.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                         0x6000025f8380 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_add_row                     0x6000025f4300 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul                         0x6000025ec180 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_row                     0x6000025ec500 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale                       0x6000025ec680 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu                        0x6000025f9280 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                        0x6000025f8800 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                        0x6000025ec800 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max                               0x0 | th_max =    0 | th_width =    0
ggml_metal_init: load pipeline error: Error Domain=CompilerError Code=2 "Compiler encountered an internal error" UserInfo={NSLocalizedDescription=Compiler encountered an internal error}
whisper_init_state: ggml_metal_init() failed
error: failed to initialize whisper context

nchudleigh · 2023-09-15T21:10:20Z

Commenting out these kernal function adds allows completion.

Performance is same as before metal support was merged (80c1512)

❯ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
whisper_init_from_file_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: compute buffer (conv)   =   14.10 MB
whisper_init_state: compute buffer (encode) =   81.85 MB
whisper_init_state: compute buffer (cross)  =    4.40 MB
whisper_init_state: compute buffer (decode) =   24.61 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Intel Iris Pro Graphics
ggml_metal_init: found device: AMD Radeon R9 M370X
ggml_metal_init: picking default device: AMD Radeon R9 M370X
ggml_metal_init: loading '/Users/neil/Development/whisper.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                         0x7fb22bf07070 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_add_row                     0x7fb22bf077c0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul                         0x7fb22bf07f10 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_row                     0x7fb22bf08660 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_scale                       0x7fb22bf08db0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_silu                        0x7fb22bf09500 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_relu                        0x7fb22bf09c50 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_gelu                        0x7fb22bf0a3a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_soft_max                    0x7fb22bf0aaf0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_soft_max_4                  0x7fb22bf0b240 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_diag_mask_inf               0x7fb22bf0b990 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_diag_mask_inf_8             0x7fb22bf0c250 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_f32                0x7fb22bf0c9a0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_f16                0x7fb22bf0d0f0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_0               0x7fb22e708b40 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_1               0x7fb22e709290 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q8_0               0x7fb22e7099e0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q2_K               0x7fb22e70a130 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q3_K               0x7fb22e70a880 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q4_K               0x7fb22e70b260 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q5_K               0x7fb22e70b9b0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_get_rows_q6_K               0x7fb22e70c100 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_rms_norm                    0x7fb22e70c9d0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_norm                        0x7fb22e70d120 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f32_f32             0x7fb22e70d870 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32             0x7fb22e70dfc0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32_1row        0x7fb22e70e710 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_f16_f32_l4          0x7fb22e70ee60 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32            0x7fb22e70f5b0 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32            0x7fb22e70fd00 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q8_0_f32            0x7fb22e7105d0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32            0x7fb22e710d20 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32            0x7fb22e7112d0 | th_max =  512 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32            0x7fb22e711a20 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32            0x7fb22e712170 | th_max =  768 | th_width =   64
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32            0x7fb22e7128c0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_rope                        0x7fb22e713190 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_alibi_f32                   0x7fb22e7138e0 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_cpy_f32_f16                 0x7fb22e714030 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_cpy_f32_f32                 0x7fb22e714780 | th_max = 1024 | th_width =   64
ggml_metal_init: loaded kernel_cpy_f16_f16                 0x7fb22e714ed0 | th_max = 1024 | th_width =   64
ggml_metal_init: hasUnifiedMemory              = false
ggml_metal_init: recommendedMaxWorkingSetSize  =  2048.00 MB
ggml_metal_init: maxTransferRate               = built-in GPU
whisper_init_state: Metal context initialized
whisper_init_state: max tensor size =    50.65 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =   142.00 MB, (  142.27 /  2048.00)
ggml_metal_add_buffer: allocated 'meta_conv       ' buffer, size =     1.47 MB, (  143.74 /  2048.00)
ggml_metal_add_buffer: allocated 'meta_encode     ' buffer, size =     1.47 MB, (  145.21 /  2048.00)
ggml_metal_add_buffer: allocated 'meta_cross      ' buffer, size =     1.47 MB, (  146.69 /  2048.00)
ggml_metal_add_buffer: allocated 'meta_decode     ' buffer, size =     1.47 MB, (  148.16 /  2048.00)
ggml_metal_add_buffer: allocated 'data_conv       ' buffer, size =    12.64 MB, (  160.80 /  2048.00)
ggml_metal_add_buffer: allocated 'data_encode     ' buffer, size =    80.39 MB, (  241.18 /  2048.00)
ggml_metal_add_buffer: allocated 'data_cross      ' buffer, size =     2.93 MB, (  244.12 /  2048.00)
ggml_metal_add_buffer: allocated 'data_decode     ' buffer, size =    23.14 MB, (  267.25 /  2048.00)
ggml_metal_add_buffer: allocated 'kv_cross        ' buffer, size =    17.58 MB, (  284.84 /  2048.00)
ggml_metal_add_buffer: allocated 'kv_self_0       ' buffer, size =     5.25 MB, (  290.09 /  2048.00)

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 1 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

ggml_metal_add_buffer: allocated 'kv_self_1       ' buffer, size =     5.25 MB, (  295.34 /  2048.00)

[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   215.61 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    18.76 ms
whisper_print_timings:   sample time =    18.54 ms /    27 runs (    0.69 ms per run)
whisper_print_timings:   encode time = 31450.64 ms /     1 runs (31450.64 ms per run)
whisper_print_timings:   decode time =   787.62 ms /    27 runs (   29.17 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 32935.19 ms
ggml_metal_free: deallocating

haraldrudell · 2023-09-18T04:01:13Z

The actual diff to run whisper.cpp on legacy Intel macOS amd64 MacBook Pro:

git diff ggml-metal.m                                                                                                                                                            
diff --git a/ggml-metal.m b/ggml-metal.m                                                                                                                                                                           
index 1139ee3..01e189e 100644
--- a/ggml-metal.m
+++ b/ggml-metal.m
@@ -251,16 +251,16 @@ struct ggml_metal_context * ggml_metal_init(int n_cb) {
         GGML_METAL_ADD_KERNEL(mul_mat_q4_K_f32);
         GGML_METAL_ADD_KERNEL(mul_mat_q5_K_f32);
         GGML_METAL_ADD_KERNEL(mul_mat_q6_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_f32_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_f16_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q4_0_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q8_0_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q4_1_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q2_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q3_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q4_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q5_K_f32);
-        GGML_METAL_ADD_KERNEL(mul_mm_q6_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_f32_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_f16_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q4_0_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q8_0_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q4_1_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q2_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q3_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q4_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q5_K_f32);
+//        GGML_METAL_ADD_KERNEL(mul_mm_q6_K_f32);
         GGML_METAL_ADD_KERNEL(rope);
         GGML_METAL_ADD_KERNEL(alibi_f32);

haraldrudell · 2023-09-18T13:38:42Z

Although it runs, it crashes down the line:

The same code runs on any MacOS M and Linux amd64

It just no like the Intels. Those last Intel Macs had 64 GiB RAM and 8 cores which is still OK performance in 2023 if you can stand the fan noise

ggml_metal_add_buffer: allocated 'kv_self_1       ' buffer, size =    15.75 MB, (  738.86 /  1536.00)
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:1290: false
230918 06:14:09-07 whisper.Wait error: signal: "abort trap" ‘signal: abort trap’

nchudleigh · 2023-09-18T13:39:51Z

I got it to run on my 2015 MBP with no crashing with this PR
#1294

haraldrudell · 2023-09-18T13:52:25Z

Thanks,

I run small.en on 2020 MacBook Pro 13” 4-core hyper-thread 32 GiB macOS 13.5
just now the thing rebooted without a word.
M also rebooted when I ran too many instances with slab concurrency issues

I’ll try with easier data to see if it completes and stays up

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 |
METAL = 1 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
COREML = 0 | OPENVINO = 0 | 

ggml_metal_init: found device: Intel(R) Iris(TM) Plus Graphics

haraldrudell · 2023-09-18T23:36:22Z

Doesn’t work on this Intel. Crashes towards what I believe to be the end

Could be a different issue since it never ran successfully

fredrik-smedberg · 2023-09-19T19:15:21Z

I think the issue might span to more than x86/amd64.
Running Whisper.cpp checkout from 2023-09-18, compiled with just "make" on my Macbook Pro 13 M1 gives the error below.

Recompiling whisper.cpp with make clean && WHISPER_NO_METAL=true make disables metal support and makes Whisper run properly on my M1.

System info output from working Whisper:
system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 0 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

Error when Metal is enabled:

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1
ggml_metal_init: picking default device: Apple M1
ggml_metal_init: loading '/Users/fredriksmedberg/Code/whisper-cpp/whisper.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x13d707fc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_add_row                        0x13d708790 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul                            0x13d708cd0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_row                        0x13d6064a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale                          0x13d606b00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu                           0x13d606f20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                           0x13d7090f0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                           0x13d709630 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max                       0x13e804fd0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max_4                     0x13d607620 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf                  0x13d607de0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf_8                0x13d709e70 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f32                   0x13d70a6f0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f16                   0x13d608640 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x13d608ec0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_1                  0x13d6095b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q8_0                  0x13d609ca0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q2_K                  0x13d70acc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q3_K                  0x13e805730 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_K                  0x13d60a0c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_K                  0x13d60a940 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q6_K                  0x13d60b030 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rms_norm                       0x13d70b3c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_norm                           0x13d60ba00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f32_f32                0x13d60c5e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f16_f32                0x13d70bde0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f16_f32_1row           0x13d60cd60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_f16_f32_l4             0x13d60d920 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32               0x13d60dfa0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32               0x13d60e740 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q8_0_f32               0x13d70c600 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32               0x13d60edc0 | th_max =  640 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32               0x13d70ce20 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32               0x13d60f560 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32               0x13d60fe20 | th_max =  640 | th_width =   32
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32               0x13d70d5c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f32_f32                 0x13d610670 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f16_f32                 0x13d610fe0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32                0x13d70d360 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32                0x13d611710 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32                0x13d612080 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32                0x13d70e700 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32                0x13d6127b0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32                0x13e805b50 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32                0x13d613000 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32                0x13d70ef50 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_rope                           0x13d613540 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_alibi_f32                      0x13d70fbf0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x13d710670 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x13d613e10 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x13d614770 | th_max = 1024 | th_width =   32
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 10922.67 MB
ggml_metal_init: maxTransferRate               = built-in GPU
whisper_init_state: Metal context initialized
whisper_init_state: max tensor size =    43.53 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  1034.00 MB, ( 1034.50 / 10922.67)
ggml_metal_add_buffer: allocated 'meta_conv       ' buffer, size =     1.48 MB, ( 1035.98 / 10922.67)
ggml_metal_add_buffer: allocated 'meta_encode     ' buffer, size =     1.48 MB, ( 1037.47 / 10922.67)
ggml_metal_add_buffer: allocated 'meta_cross      ' buffer, size =     1.48 MB, ( 1038.95 / 10922.67)
ggml_metal_add_buffer: allocated 'meta_decode     ' buffer, size =     1.48 MB, ( 1040.44 / 10922.67)
ggml_metal_add_buffer: allocated 'data_conv       ' buffer, size =    30.22 MB, ( 1070.66 / 10922.67)
ggml_metal_add_buffer: allocated 'data_encode     ' buffer, size =   200.97 MB, ( 1271.62 / 10922.67)
ggml_metal_add_buffer: allocated 'data_cross      ' buffer, size =     7.33 MB, ( 1278.95 / 10922.67)
ggml_metal_add_buffer: allocated 'data_decode     ' buffer, size =    57.84 MB, ( 1336.80 / 10922.67)
ggml_metal_add_buffer: allocated 'kv_cross        ' buffer, size =   234.39 MB, ( 1571.19 / 10922.67)
ggml_metal_add_buffer: allocated 'kv_self_0       ' buffer, size =    70.02 MB, ( 1641.20 / 10922.67)

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing '../scripts/only_speech/0.wav' (1792448 samples, 112.0 sec), 4 threads, 1 processors, lang = auto, task = translate, timestamps = 1 ...

GGML_ASSERT: ggml-metal.m:904: false && "MUL MAT-MAT not implemented"
./whisper.sh: line 34: 26276 Abort trap: 6           ./main -m models/ggml-$1-q5_0.bin -f $2 -tr --best-of $BEST_OF --beam-size $BEAM_SIZE --entropy-thold $ENTROPY_THRESHOLD --max-context $MAX_CONTEXT_SIZE -l auto --output-$FORMAT -of $2

z11h · 2023-10-26T22:26:06Z

was having this same issue, and running make clean && WHISPER_NO_METAL=true make fixed the problem :)

bobqianic · 2023-11-09T00:05:27Z

This issue should have been fixed in the most recent GGML sync. #1422

nchudleigh changed the title ~~Intel based mac SC compilation failure~~ ggml_metal_init failure: loading kernel function on Intel based mac Sep 15, 2023

nchudleigh mentioned this issue Sep 15, 2023

Skip mm_mul kernel functions additions if on Intel #1294

Closed

bobqianic added bug Something isn't working solution This issue contains a potential solution labels Sep 17, 2023

This was referenced Oct 27, 2023

ggml_metal_init fails on Intel Iris Graphics #1393

Closed

Problems with Intel Mac -- what to do? #1411

Open

bobqianic closed this as completed Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml_metal_init failure: loading kernel function on Intel based mac #1292

ggml_metal_init failure: loading kernel function on Intel based mac #1292

nchudleigh commented Sep 15, 2023 •

edited

Loading

nchudleigh commented Sep 15, 2023 •

edited

Loading

nchudleigh commented Sep 15, 2023 •

edited

Loading

haraldrudell commented Sep 18, 2023

haraldrudell commented Sep 18, 2023

nchudleigh commented Sep 18, 2023

haraldrudell commented Sep 18, 2023

haraldrudell commented Sep 18, 2023 •

edited

Loading

fredrik-smedberg commented Sep 19, 2023 •

edited

Loading

z11h commented Oct 26, 2023

bobqianic commented Nov 9, 2023

ggml_metal_init failure: loading kernel function on Intel based mac #1292

ggml_metal_init failure: loading kernel function on Intel based mac #1292

Comments

nchudleigh commented Sep 15, 2023 • edited Loading

nchudleigh commented Sep 15, 2023 • edited Loading

nchudleigh commented Sep 15, 2023 • edited Loading

haraldrudell commented Sep 18, 2023

haraldrudell commented Sep 18, 2023

nchudleigh commented Sep 18, 2023

haraldrudell commented Sep 18, 2023

haraldrudell commented Sep 18, 2023 • edited Loading

fredrik-smedberg commented Sep 19, 2023 • edited Loading

z11h commented Oct 26, 2023

bobqianic commented Nov 9, 2023

nchudleigh commented Sep 15, 2023 •

edited

Loading

nchudleigh commented Sep 15, 2023 •

edited

Loading

nchudleigh commented Sep 15, 2023 •

edited

Loading

haraldrudell commented Sep 18, 2023 •

edited

Loading

fredrik-smedberg commented Sep 19, 2023 •

edited

Loading