Bug: Flash attention reduces vulkan performance by ~50% #9572
Labels
bug-unconfirmed
medium severity
Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
What happened?
Enabling flash attention reduces performance on vulkan by a lot more than expected.
Even if performance varies between hardware, it feels like a 50% drop would be a bug
Hardware is AMD RX 6800 XT
Name and Version
version: 3772 (23e0d70)
built with MSVC 19.29.30154.0 for x64
What operating system are you seeing the problem on?
Windows
Relevant log output
The text was updated successfully, but these errors were encountered: