Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make whisperfile 12% faster in GPU mode
whisper_process_logits() was computing soft max on its own many times so I changed it to call my vectorized expf() function ggml_vec_soft_max_f32 which was upstreamed to llama.cpp a few months ago. Since this is pretty much the only CPU operation that happens in GPU mode, it has a very huge impact on performance compared to llama's large language model inference
- Loading branch information