-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kompute : llama-bench support and ggml_cpu_has_kompute() #5226
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really need to be added to ggml.c, eventually all of the backend code will be removed from there, and the llama.cpp change does nothing since the CPU backend is no longer running at the same time as the GPU backends, it's just a leftover from the pre-ggml-backend implementation that I forget to remove. Anyway it doesn't really matter, the changes to llama-bench and common are good.
@slaren I removed that code from llama.cpp, does that seem right? |
Somehow this PR huts performance for the other Vulkan backend? This doesn't make sense when I look at the changes, but the difference with the previous commit on master is very significant. (It's not just with llama-bench.) Before merge:ggml_vulkan: Using AMD Radeon RX 5700 XT | uma: 0 | fp16: 1 | warp size: 64
build: e0085fd (2026) After merge:ggml_vulkan: Using AMD Radeon RX 5700 XT | uma: 0 | fp16: 1 | warp size: 64
build: e8dc55d (2027) EDIT: reverting 3536cf6 fixes it. |
The only way I can imagine this could make a difference is there is a large overhead for launching the extra threads for the get_rows operation that still runs on the CPU. Are you on Windows? |
I observe a simillar drop of performance with the Kompute backend Lastest master
With 3536cf6 reverted
|
Yes, on Windows 10 |
Setting |
I didn't realize that the Kompute backend should have been added in these places.