-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coreml medium.en model takes a very long time to run every time #937
Comments
Try killing |
Killing ANECompilerService works. Is there something with the way it's being called that makes it churn before it realizes that the model is already generated? |
As part of running the model inference, I have another script that starts running in the background that waits for the Calling client (cpp, nodejs, bash script, etc) --> Swift wrapper --> whisper.cpp CoreML |
Well, I packaged the whisper.cpp and OpenAI whisper inside a perl script in order to call either on a per-file basis. So I’ll try scanning for the ANECompilerService then killing it inside that perl wrapper… but how would we know if it NEEDED to be called vs killed - as in it wasn’t the first run for that language model? Would running (for instance) ./models/generate-coreml-model.sh modelname on each model once (and every time a new model was released) ensure we didn’t need to do the first-run compile? @ggerganov Just wondering. Thanks! |
I thought about this too, but I couldn't reproduce it locally because I don't know where the model cache is - so I can't delete it and test. Presumably it doesn't just modify the file in place? |
This (plus the hallucinations on long files - thus requiring a re-run) just totally negates any benefit from using CoreML over the normal non-CoreML versions of whisper.cpp - until it is addressed. |
@janngobble I see this behavior from non-CoreML built of whisper.cpp too… |
./main -m models/ggml-medium.en.bin -f samples/jfk.wav
...
...
whisper_init_state: loading Core ML model from 'models/ggml-medium.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
I ran the above command 3 times and these are the results.
whisper_print_timings: total time = 11890580.00 ms
whisper_print_timings: total time = 11944257.00 ms
whisper_print_timings: total time = 11783808.00 ms
I'm running this on an M1 Max with 64 GB Ram and Ventura 13.3.1(a) and python 3.10 using conda
The text was updated successfully, but these errors were encountered: