Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: optimize coopmat2 q2_k dequant function #11130

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

jeffbolznv
Copy link
Collaborator

Same kind of optimization as in #10855, just for Q2_K. I happened to be trying out a Q2_K model for stable-diffusion/flux and this makes it about 5% faster.

stable-diffusion:
sd --diffusion-model  models\flux\flux1-dev-Q2_K.gguf --vae models\flux\ae.safetensors --clip_l models\flux\clip_l.safetensors --t5xxl models\flux\t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --diffusion-fa
coopmat2 before:
  |==================================================| 20/20 - 1.45it/s
coopmat2 after:
  |==================================================| 20/20 - 1.53it/s
coopmat2 after without --diffusion-fa:
  |==================================================| 20/20 - 1.20it/s
coopmat1 without --diffusion-fa:
  |==================================================| 20/20 - 1.08it/s
no coopmat without --diffusion-fa:
  |==================================================| 20/20 - 1.83s/it ( == 0.55it/s)

test-backend-ops:
before
  MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 2880 runs -  1736.50 us/run -  60.13 GFLOP/run -  34.63 TFLOPS
after
  MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 3094 runs -  1616.07 us/run -  60.13 GFLOP/run -  37.21 TFLOPS

@jeffbolznv jeffbolznv requested a review from 0cc4m January 7, 2025 20:58
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 7, 2025
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine. I see a small improvement on RTX 3090, not as big as it was for you.

@0cc4m 0cc4m merged commit 206bc53 into ggerganov:master Jan 16, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants