Make whisperfile 12% faster in GPU mode · Mozilla-Ocho/llamafile@b3bdc62 · GitHub

Commit

Make whisperfile 12% faster in GPU mode

Browse files

whisper_process_logits() was computing soft max on its own many times so
I changed it to call my vectorized expf() function ggml_vec_soft_max_f32
which was upstreamed to llama.cpp a few months ago. Since this is pretty
much the only CPU operation that happens in GPU mode, it has a very huge
impact on performance compared to llama's large language model inference

Loading branch information

jart committed Jul 31, 2024

1 parent 0849f32 commit b3bdc62

0 comments on commit `b3bdc62`

Please sign in to comment.