-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats #10721
Conversation
Quick check showed that latest mesa fixed the low performance I saw on my Intel A770 regardless of this subgroup change. Edit: However, newer mesa also reduces tg performance by 25%.. |
Intel's Mesa drivers are really a complete mess! With A770: Master + mesa (24.0.9-0ubuntu0.2):
PR + mesa (24.0.9-0ubuntu0.2):
PR + mesa kisak/kisak-mesa (24.3.1)
PR + mesa oibaf/graphics-drivers ( git 25.0 )
and for comparison the results with sycl ( F16 ON) :
|
Yeah, that's close to what I saw as well. Thank you for testing it. |
Hi @0cc4m , Is it possible to set warp size to 16 for Intel GPUs? Not sure why it hangs the GPU and crashing here:
|
Yeah, I think that might be possible, but the crash is some other problem. Even m=1 works, the coopmat simply gets zeropadded in that case. |
Hmm. It's failing on n = 2 and passing on n = 1. Edit: It is also crashing at model warmup!
|
n = 2 uses coopmats, n = 1 does not. Whether intel_gpu_top puts it in compute or 3d shader bar doesn't matter, don't trust it. It's the same hardware. Model warmup uses n = 2, so it's the same kind of crash. |
595c1a7
to
239927a
Compare
…oups for coopmats
Add accf32 and accf16 checks for coopmats
239927a
to
9131c59
Compare
I added a few more fixes for coopmat support and disabled it again for now on Intel and non-Mesa AMD. I've made some progress on supporting them, but it's not yet ready and I don't want to delay #10665 further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I won't have a chance to test it today, but please go ahead.
* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires #10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs
* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov/llama.cpp#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs
* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov/llama.cpp#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs
* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov/llama.cpp#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs
* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov/llama.cpp#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs
…oups for coopmats (ggerganov#10721) * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * Fix subgroup size control extension support check Add accf32 and accf16 checks for coopmats * Also disable coopmats on amdvlk
* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs
* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov/llama.cpp#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs
At least on latest mesa ANV (24.3.1) this improves pp speed on Intel, so it seems that XMX engines are working in some way. Performance is still lower than expected.
Maybe this fixes the coopmat issue with AMD on Windows? I'll leave it on draft while we figure out how all the hardware/driver/OS combinations react to this.