Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats #10721

0cc4m · 2024-12-08T14:48:02Z

At least on latest mesa ANV (24.3.1) this improves pp speed on Intel, so it seems that XMX engines are working in some way. Performance is still lower than expected.

Maybe this fixes the coopmat issue with AMD on Windows? I'll leave it on draft while we figure out how all the hardware/driver/OS combinations react to this.

0cc4m · 2024-12-08T14:50:53Z

Quick check showed that latest mesa fixed the low performance I saw on my Intel A770 regardless of this subgroup change.

Edit: However, newer mesa also reduces tg performance by 25%..

easyfab · 2024-12-08T15:45:06Z

Intel's Mesa drivers are really a complete mess!

With A770:

Master + mesa (24.0.9-0ubuntu0.2):

| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         pp512 |        142.42 ± 0.15 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         54.51 ± 0.04 |

PR + mesa (24.0.9-0ubuntu0.2):

| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         pp512 |         70.36 ± 0.05 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         54.57 ± 0.08 |

PR + mesa kisak/kisak-mesa (24.3.1)

| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         pp512 |        373.20 ± 1.03 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         37.26 ± 0.02 |

PR + mesa oibaf/graphics-drivers ( git 25.0 )

ggml_vulkan: Compiling shaders..........................Done!
terminate called after throwing an instance of 'vk::DeviceLostError'
  what():  vk::Device::waitForFences: ErrorDeviceLost

and for comparison the results with sycl ( F16 ON) :

| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |         pp512 |        926.04 ± 0.27 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |         tg128 |         40.28 ± 0.03 |

0cc4m · 2024-12-08T16:48:28Z

Yeah, that's close to what I saw as well. Thank you for testing it.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

qnixsynapse · 2024-12-09T12:35:06Z

Hi @0cc4m , Is it possible to set warp size to 16 for Intel GPUs?

Not sure why it hangs the GPU and crashing here:

ggml_vulkan: 0 = Intel(R) Arc(tm) A750 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | warp size: 32 | matrix cores: KHR_coopmat
[....]
MUL_MAT(type_a=f16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): terminate called after throwing an instance of 'vk::DeviceLostError'
  what():  vk::Device::waitForFences: ErrorDeviceLost
Aborted (core dumped)

0cc4m · 2024-12-09T12:51:05Z

Yeah, I think that might be possible, but the crash is some other problem. Even m=1 works, the coopmat simply gets zeropadded in that case.

qnixsynapse · 2024-12-09T12:57:11Z

Hmm. It's failing on n = 2 and passing on n = 1.
Also, I did not see any activity on the compute engine. The 3D shader engine gets filled up to 100 before the crash.

Edit: It is also crashing at model warmup!

llm_load_print_meta: EOG token        = 107 '<end_of_turn>'
llm_load_print_meta: max token length = 48
ggml_vulkan: Compiling shaders..........................Done!
llm_load_tensors: offloading 26 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 27/27 layers to GPU
llm_load_tensors:      Vulkan0 model buffer size =  4986.92 MiB
llm_load_tensors:  Vulkan_Host model buffer size =  1125.00 MiB
.................................................................
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 8192
llama_new_context_with_model: n_ctx_per_seq = 8192
llama_new_context_with_model: n_batch       = 2048
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 10000.0
llama_new_context_with_model: freq_scale    = 1
llama_kv_cache_init:    Vulkan0 KV buffer size =   832.00 MiB
llama_new_context_with_model: KV self size  =  832.00 MiB, K (f16):  416.00 MiB, V (f16):  416.00 MiB
llama_new_context_with_model: Vulkan_Host  output buffer size =     0.98 MiB
llama_new_context_with_model:    Vulkan0 compute buffer size =   504.50 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =    36.51 MiB
llama_new_context_with_model: graph nodes  = 1050
llama_new_context_with_model: graph splits = 2
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
terminate called after throwing an instance of 'vk::DeviceLostError'
  what():  vk::Device::waitForFences: ErrorDeviceLost
Aborted (core dumped)

0cc4m · 2024-12-09T14:04:33Z

Hmm. It's failing on n = 2 and passing on n = 1. Also, I did not see any activity on the compute engine. The 3D shader engine gets filled up to 100 before the crash.

Edit: It is also crashing at model warmup!

n = 2 uses coopmats, n = 1 does not. Whether intel_gpu_top puts it in compute or 3d shader bar doesn't matter, don't trust it. It's the same hardware.

Model warmup uses n = 2, so it's the same kind of crash.

…oups for coopmats

Add accf32 and accf16 checks for coopmats

0cc4m · 2024-12-12T07:49:55Z

I added a few more fixes for coopmat support and disabled it again for now on Intel and non-Mesa AMD. I've made some progress on supporting them, but it's not yet ready and I don't want to delay #10665 further.

jeffbolznv

LGTM. I won't have a chance to test it today, but please go ahead.

* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires #10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs

* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov/llama.cpp#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs

…oups for coopmats (ggerganov#10721) * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * Fix subgroup size control extension support check Add accf32 and accf16 checks for coopmats * Also disable coopmats on amdvlk

* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs

* double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires ggerganov/llama.cpp#10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 8, 2024

0cc4m mentioned this pull request Dec 8, 2024

vulkan: small mul_mat_vec optimizations #10665

Merged

jeffbolznv reviewed Dec 9, 2024

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Show resolved Hide resolved

0cc4m force-pushed the 0cc4m/vulkan-subgroup-size-control branch from 595c1a7 to 239927a Compare December 12, 2024 07:41

0cc4m added 2 commits December 12, 2024 07:45

Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgr…

5a9d9f7

…oups for coopmats

Fix subgroup size control extension support check

9131c59

Add accf32 and accf16 checks for coopmats

0cc4m force-pushed the 0cc4m/vulkan-subgroup-size-control branch from 239927a to 9131c59 Compare December 12, 2024 07:46

0cc4m marked this pull request as ready for review December 12, 2024 07:57

0cc4m requested a review from jeffbolznv December 12, 2024 07:57

Also disable coopmats on amdvlk

1c40582

jeffbolznv approved these changes Dec 12, 2024

View reviewed changes

0cc4m merged commit dc5301d into master Dec 12, 2024
47 checks passed

0cc4m deleted the 0cc4m/vulkan-subgroup-size-control branch December 12, 2024 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats #10721

Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats #10721

0cc4m commented Dec 8, 2024

0cc4m commented Dec 8, 2024 •

edited

Loading

easyfab commented Dec 8, 2024

0cc4m commented Dec 8, 2024

qnixsynapse commented Dec 9, 2024 •

edited

Loading

0cc4m commented Dec 9, 2024

qnixsynapse commented Dec 9, 2024 •

edited

Loading

0cc4m commented Dec 9, 2024

0cc4m commented Dec 12, 2024

jeffbolznv left a comment

Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats #10721

Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats #10721

Conversation

0cc4m commented Dec 8, 2024

0cc4m commented Dec 8, 2024 • edited Loading

easyfab commented Dec 8, 2024

0cc4m commented Dec 8, 2024

qnixsynapse commented Dec 9, 2024 • edited Loading

0cc4m commented Dec 9, 2024

qnixsynapse commented Dec 9, 2024 • edited Loading

0cc4m commented Dec 9, 2024

0cc4m commented Dec 12, 2024

jeffbolznv left a comment

Choose a reason for hiding this comment

0cc4m commented Dec 8, 2024 •

edited

Loading

qnixsynapse commented Dec 9, 2024 •

edited

Loading

qnixsynapse commented Dec 9, 2024 •

edited

Loading