Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: add environment variable to avoid VRAM allocation #11592

Merged
merged 1 commit into from
Feb 10, 2025

Conversation

wbruna
Copy link
Contributor

@wbruna wbruna commented Feb 2, 2025

With Vulkan on my PC (Ryzen 5 3400G APU, DDR4-3000, Debian 12), I noticed big performance drops (~2x or ~3x) associated with buffer allocations on VRAM.

It's easier to test with stable-diffusion.cpp: the VAE step on a 512x512 sd1.5 generation usually takes around 40 seconds with the default 2G dedicated VRAM. But if I restrict VRAM to a very small value (64M-80M), that timing drops to around 13 seconds.

I noticed a similar performance drop on LLMs, but it's harder to pinpoint. For instance, prompt processing on smaller models running nearly twice as slow as larger ones, performance changing right after a koboldcpp restart, or inconsistent results between benchmarks and generation.

Checking with GGML_VULKAN_MEMORY_DEBUG, the slower behavior seems to be always associated with allocations on device memory, so I added this env var to confirm. And forcing host memory allocations seems to fix the performance drop.

OTOH, I don't see the original performance issue on a 4500U laptop (Ubuntu 24.04, DDR4-3200), so this would benefit from testing on different iGPU+OS combinations.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 2, 2025
@0cc4m 0cc4m self-requested a review February 3, 2025 09:04
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for the contribution!

@0cc4m 0cc4m merged commit b044a0f into ggml-org:master Feb 10, 2025
46 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
ubergarm pushed a commit to ubergarm/llama.cpp that referenced this pull request Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants