Skip to content

Commit

Permalink
Make whisperfile 12% faster in GPU mode
Browse files Browse the repository at this point in the history
whisper_process_logits() was computing soft max on its own many times so
I changed it to call my vectorized expf() function ggml_vec_soft_max_f32
which was upstreamed to llama.cpp a few months ago. Since this is pretty
much the only CPU operation that happens in GPU mode, it has a very huge
impact on performance compared to llama's large language model inference
  • Loading branch information
jart committed Jul 31, 2024
1 parent 0849f32 commit b3bdc62
Show file tree
Hide file tree
Showing 4 changed files with 101 additions and 122 deletions.
Loading

0 comments on commit b3bdc62

Please sign in to comment.