vulkan: optimize coopmat2 q2_k dequant function #11130

jeffbolznv · 2025-01-07T20:58:00Z

Same kind of optimization as in #10855, just for Q2_K. I happened to be trying out a Q2_K model for stable-diffusion/flux and this makes it about 5% faster.

stable-diffusion:
sd --diffusion-model  models\flux\flux1-dev-Q2_K.gguf --vae models\flux\ae.safetensors --clip_l models\flux\clip_l.safetensors --t5xxl models\flux\t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --diffusion-fa
coopmat2 before:
  |==================================================| 20/20 - 1.45it/s
coopmat2 after:
  |==================================================| 20/20 - 1.53it/s
coopmat2 after without --diffusion-fa:
  |==================================================| 20/20 - 1.20it/s
coopmat1 without --diffusion-fa:
  |==================================================| 20/20 - 1.08it/s
no coopmat without --diffusion-fa:
  |==================================================| 20/20 - 1.83s/it ( == 0.55it/s)

test-backend-ops:
before
  MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 2880 runs -  1736.50 us/run -  60.13 GFLOP/run -  34.63 TFLOPS
after
  MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 3094 runs -  1616.07 us/run -  60.13 GFLOP/run -  37.21 TFLOPS

0cc4m

Works fine. I see a small improvement on RTX 3090, not as big as it was for you.

vulkan: optimize coopmat2 q2_k dequant function

30645aa

jeffbolznv requested a review from 0cc4m January 7, 2025 20:58

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 7, 2025

0cc4m approved these changes Jan 16, 2025

View reviewed changes

0cc4m merged commit 206bc53 into ggerganov:master Jan 16, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: optimize coopmat2 q2_k dequant function #11130

vulkan: optimize coopmat2 q2_k dequant function #11130

jeffbolznv commented Jan 7, 2025

0cc4m left a comment

vulkan: optimize coopmat2 q2_k dequant function #11130

vulkan: optimize coopmat2 q2_k dequant function #11130

Conversation

jeffbolznv commented Jan 7, 2025

0cc4m left a comment

Choose a reason for hiding this comment