Every run with CoreML "first run on a device may take a while ..." #2126

tarasis · 2024-05-06T13:59:33Z

I'm on macOS 14.4.1, and testing out whisper.cpp.

I have compiled binaries with both CoreML and just Metal.

But every single time I run the CoreML version it spends an age loading the coreml model version, and doesn't seem to cache for the next run

whisper_init_state: loading Core ML model from 'models/ggml-small.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...

Runs 1-3 with command

./main -m models/ggml-small.en.bin -f samples/jfk.wav

whisper_print_timings:    total time = 25367.44 ms
whisper_print_timings:    total time = 25146.03 ms
whisper_print_timings:    total time = 25412.08 ms

Not sure what I'm doing wrong, but I was expecting it to cache whatever it was doing so subsequent runs were quicker.

The text was updated successfully, but these errors were encountered:

FrostiizGit · 2024-05-08T10:21:36Z

I'm playing with whisper.cpp today and was impressed by the result so I wanted to compare the large model and the large one using CoreML.
I'm on MacOS 14.4.1 (M1 Max) and every time I run my command it says first run, maybe I did something wrong so I'll write the whole process I followed:

1 - ./models/generate-coreml-model.sh large-v3 To generate the coreml model
2 - make large-v3 WHISPER_COREML=1 make -j To build whisper with the coreml support
3 - ./main -m models/ggml-large-v3.bin -f ./samples/myaudio.wav The command used to run the model on my audio

I tried the same command 5 times in a row and every time I had the first time message appear:

whisper_init_state: loading Core ML model from 'models/ggml-large-v3-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...

Here are the results:

1 --> whisper_print_timings:    total time = 124125.69 ms
2 --> whisper_print_timings:    total time = 123887.73 ms
3 --> whisper_print_timings:    total time = 123487.95 ms
4 --> whisper_print_timings:    total time = 123119.12 ms
5 --> whisper_print_timings:    total time = 123199.10 ms

Example of full timings for one of the run:
whisper_print_timings:     load time =  1048.37 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     9.26 ms
whisper_print_timings:   sample time =    52.25 ms /   180 runs (    0.29 ms per run)
whisper_print_timings:   encode time =  1486.21 ms /     1 runs ( 1486.21 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time =  1431.58 ms /   178 runs (    8.04 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 123199.10 ms

Maybe this is the correct behaviour and thus using the coreML version is not worth it on small samples or I'm simply doing something wrong.

ceberhardt · 2024-05-11T04:55:18Z

I'm on macOS 14.2.1 and I've run into the same problem.

ggerganov · 2024-05-13T12:38:27Z

I'm on 14.4.1 and don't observe this issue - first time is slow, but next runs load fast

ggerganov · 2024-05-15T06:40:07Z

The Metal backend was incorrectly initialized 2 times, which I support might have caused this problem. Should be fixed on latest master after #2152 was just merged

furyhawk · 2024-05-16T00:55:53Z

M1 pro. os14.4.1. latest master pull and make with coreml:

base.en 1st run whisper_print_timings: total time = 9904.61 ms -> 2nd run whisper_print_timings: total time = 462.63 ms. 👍
small.en 1,2,3 run no improvement ~ whisper_print_timings: total time = 24493.98 ms 🤷

--edit tried on macos 14.5 too.

FrostiizGit · 2024-07-07T10:44:12Z

I pulled the master branch again on the same machine (OS upgrade 14.5) and tried to use CoreML.

The problem is still here for the large-v3 model below are the results (I used same commands and sample as my previous post).

Large v3:
1 -> whisper_print_timings:    total time = 119960.76 ms
2 -> whisper_print_timings:    total time = 119801.12 ms
3 -> whisper_print_timings:    total time = 118888.53 ms
4 -> whisper_print_timings:    total time = 119638.27 ms
5 -> whisper_print_timings:    total time = 119626.38 ms

However when reading the discussion I got curious and decided to test the base model (not base.en model), and it works fine ! Here are the results:

Base model:
1 -> whisper_print_timings:    total time =  9990.66 ms
2 -> whisper_print_timings:    total time =   689.29 ms
3 -> whisper_print_timings:    total time =   691.73 ms
4 -> whisper_print_timings:    total time =   689.75 ms
5 -> whisper_print_timings:    total time =   689.24 ms

ggerganov · 2024-07-08T10:36:25Z

Not sure - maybe the OS somehow determines automatically when to cache or not to cache based on the size of the model

jianliao · 2024-10-15T03:03:21Z

I encountered this issue today using the latest commit on the master branch. After retrying the build process from scratch, the problem was resolved.

juanitolaguna · 2024-11-22T17:22:20Z

Same here, large v3 model not cached.

tarasis changed the title ~~Every run with CoreML~~ Every run with CoreML "first run on a device may take a while ..." May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Every run with CoreML "first run on a device may take a while ..." #2126

Every run with CoreML "first run on a device may take a while ..." #2126

tarasis commented May 6, 2024

FrostiizGit commented May 8, 2024

ceberhardt commented May 11, 2024

ggerganov commented May 13, 2024

ggerganov commented May 15, 2024

furyhawk commented May 16, 2024 •

edited

Loading

FrostiizGit commented Jul 7, 2024

ggerganov commented Jul 8, 2024

jianliao commented Oct 15, 2024

juanitolaguna commented Nov 22, 2024

Every run with CoreML "first run on a device may take a while ..." #2126

Every run with CoreML "first run on a device may take a while ..." #2126

Comments

tarasis commented May 6, 2024

FrostiizGit commented May 8, 2024

ceberhardt commented May 11, 2024

ggerganov commented May 13, 2024

ggerganov commented May 15, 2024

furyhawk commented May 16, 2024 • edited Loading

FrostiizGit commented Jul 7, 2024

ggerganov commented Jul 8, 2024

jianliao commented Oct 15, 2024

juanitolaguna commented Nov 22, 2024

furyhawk commented May 16, 2024 •

edited

Loading