vulkan: add environment variable to avoid VRAM allocation #11592

wbruna · 2025-02-02T12:27:24Z

With Vulkan on my PC (Ryzen 5 3400G APU, DDR4-3000, Debian 12), I noticed big performance drops (~2x or ~3x) associated with buffer allocations on VRAM.

It's easier to test with stable-diffusion.cpp: the VAE step on a 512x512 sd1.5 generation usually takes around 40 seconds with the default 2G dedicated VRAM. But if I restrict VRAM to a very small value (64M-80M), that timing drops to around 13 seconds.

I noticed a similar performance drop on LLMs, but it's harder to pinpoint. For instance, prompt processing on smaller models running nearly twice as slow as larger ones, performance changing right after a koboldcpp restart, or inconsistent results between benchmarks and generation.

Checking with GGML_VULKAN_MEMORY_DEBUG, the slower behavior seems to be always associated with allocations on device memory, so I added this env var to confirm. And forcing host memory allocations seems to fix the performance drop.

OTOH, I don't see the original performance issue on a 4500U laptop (Ubuntu 24.04, DDR4-3200), so this would benefit from testing on different iGPU+OS combinations.

…VRAM allocation

0cc4m

Looks good to me. Thank you for the contribution!

…VRAM allocation (ggml-org#11592)

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

27df617

…VRAM allocation

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 2, 2025

0cc4m self-requested a review February 3, 2025 09:04

0cc4m approved these changes Feb 9, 2025

View reviewed changes

0cc4m merged commit b044a0f into ggml-org:master Feb 10, 2025
46 checks passed

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

3e3db9d

…VRAM allocation (ggml-org#11592)

orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

f111685

…VRAM allocation (ggml-org#11592)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

a51d625

…VRAM allocation (ggml-org#11592)

ubergarm pushed a commit to ubergarm/llama.cpp that referenced this pull request Mar 1, 2025

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

f7dad52

…VRAM allocation (ggml-org#11592)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: add environment variable to avoid VRAM allocation #11592

vulkan: add environment variable to avoid VRAM allocation #11592

wbruna commented Feb 2, 2025

0cc4m left a comment

vulkan: add environment variable to avoid VRAM allocation #11592

vulkan: add environment variable to avoid VRAM allocation #11592

Conversation

wbruna commented Feb 2, 2025

0cc4m left a comment

Choose a reason for hiding this comment