Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory #5319

alex4o · 2024-02-04T08:37:09Z

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

I am using a Google Pixel 6 Pro with vulkan, build with make and clang clang version 17.0.6 Target: aarch64-unknown-linux-android24
I am on 277fad3 b2059

Here is a link for the output of vulkaninfo https://gist.github.com/alex4o/20f949910574295c22f951f64e1d421d
here is a link for the output of main https://gist.github.com/alex4o/7809ed6597cb88c4f44fcbab03475d9e

Have not looked too deep in this but it can be seen that llama.cpp tries to allocate a bigger chunk of memory then it needs for some reason.

The text was updated successfully, but these errors were encountered:

alex4o · 2024-02-04T09:42:19Z

After doing :%s/ | vk::MemoryPropertyFlagBits::eHostCached// on the ggml-vulkan.cpp file it seems to compile and run. Running the tests seem to provide reasonable results.

Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4
My phone fell a sleep while running the test so they did not complete.

alex4o · 2024-02-04T10:20:56Z

Ok got it to work like that and the results it output seem to look fine: https://gist.github.com/alex4o/2879f3997e72fc5cc44d78ee54333113

0cc4m · 2024-02-04T13:00:26Z

Ok got it to work like that and the results it output seem to look fine: https://gist.github.com/alex4o/2879f3997e72fc5cc44d78ee54333113

That looks like it was CPU-only. It would mention the Vulkan device otherwise.

After doing :%s/ | vk::MemoryPropertyFlagBits::eHostCached// on the ggml-vulkan.cpp file it seems to compile and run. Running the tests seem to provide reasonable results.

Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4 My phone fell a sleep while running the test so they did not complete.

Pretty high error values in the matrix multiplications, so it doesn't really work with your GPU yet. Warp size 16 isn't really something I dealt with yet, that's the most likely cause. Nvidia, AMD and Intel use 32 and 64.

pure-water · 2024-02-05T00:29:31Z

What about another way around which remove the HostCoherent Flag instead（leaving HostCached instead?)

pure-water · 2024-02-05T00:36:37Z

Ok got it to work like that and the results it output seem to look fine: https://gist.github.com/alex4o/2879f3997e72fc5cc44d78ee54333113

That looks like it was CPU-only. It would mention the Vulkan device otherwise.

After doing :%s/ | vk::MemoryPropertyFlagBits::eHostCached// on the ggml-vulkan.cpp file it seems to compile and run. Running the tests seem to provide reasonable results.
Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4 My phone fell a sleep while running the test so they did not complete.

Pretty high error values in the matrix multiplications, so it doesn't really work with your GPU yet. Warp size 16 isn't really something I dealt with yet, that's the most likely cause. Nvidia, AMD and Intel use 32 and 64.

Is there anything explictly mention it is actually being on Mali G78 or not?

pure-water · 2024-02-05T04:25:13Z

After doing :%s/ | vk::MemoryPropertyFlagBits::eHostCached// on the ggml-vulkan.cpp file it seems to compile and run. Running the tests seem to provide reasonable results.

Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4 My phone fell a sleep while running the test so they did not complete.

what is the command to build to get this far? At least it is saying "offload tensors to GPU"?

alex4o · 2024-02-05T09:19:34Z

What about another way around which remove the HostCoherent Flag instead（leaving HostCached instead?)

Do you think that will have better performance?

alex4o · 2024-02-05T09:20:24Z

Ok so I got the direct output of main It now shows what GPU it is used.

https://gist.github.com/alex4o/702fa06fdcc716234002459dbf6c3270

pure-water · 2024-02-05T14:42:11Z

What about another way around which remove the HostCoherent Flag instead（leaving HostCached instead?)

Do you think that will have better performance?

No, i have same issue just use the way i mentioned above. I have no comparsion numbers

pure-water · 2024-02-05T14:43:41Z

Ok so I got the direct output of main It now shows what GPU it is used.

https://gist.github.com/alex4o/702fa06fdcc716234002459dbf6c3270

you just use the head of the git repo or there are quite a lot of your own code apart from the memory type thing?

alex4o · 2024-02-05T14:53:31Z

Yes I use the head of the git repo without anything else then the removal of the memory type requirement.

pure-water · 2024-02-06T00:32:16Z

Many thanks. Would you possible to show the tokens/s as well, very interesting to know the G78 number?

luciferous · 2024-02-06T06:21:04Z

@pure-water @alex4o Adding a data point for Pixel 7 (Mali-G710): eHostCached makes the output return random characters; eHostCoherent works for me. See: https://gist.github.com/luciferous/66fbefd24ed321cd79bfd8940e043860

0cc4m · 2024-02-06T17:04:39Z

I suppose I should do a check on device creation whether HostVisible, HostCoherent and HostCached memory is available and if not fall back to HostVisible and HostCoherent. HostVisible and HostCached would require me to manually manage the synchronization between CPU and GPU, which currently is not implemented. That's why you get bad results that way.

luciferous · 2024-02-06T19:31:30Z

I could give it a shot writing a check/fallback for available memory types.

0cc4m · 2024-02-06T19:44:33Z

I could give it a shot writing a check/fallback for available memory types.

Sure, go ahead. Just be aware of #5321 which touches a lot of the code and will get merged soon. Maybe wait until it's done or start building on top of it.

Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memory property isn't available. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory This change differentiates between those properties that are critical and those that are just nice-to-have, and will fail only when critical properties aren't available. Fixes ggerganov#5319.

pure-water · 2024-02-07T09:10:28Z

Would you guys please provide more data points whereby the performance difference between GPU run and CPU run in your devices?

luciferous · 2024-02-07T10:41:33Z

@pure-water Good idea to collect more performance data. What do you think if we open another ticket for that? Let this thread focus on the memory type failure.

pure-water · 2024-02-07T11:17:11Z

Yes, please go ahead

Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memory property isn't available. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory This change differentiates between those properties that are critical and those that are just nice-to-have, and will fail only when critical properties aren't available. Fixes ggerganov#5319.

luciferous · 2024-02-07T23:52:32Z

@pure-water I'll let you open the ticket since you're the one requesting and can provide more context.

pure-water · 2024-02-08T11:04:08Z

Hi, Please look here #5410

alex4o added the bug-unconfirmed label Feb 4, 2024

0cc4m added the Vulkan Issues specific to the Vulkan backend label Feb 4, 2024

gurgalof mentioned this issue Feb 4, 2024

Issues with running Llama.cpp on Raspberry Pi 5 with Vulkan. #5237

Closed

luciferous mentioned this issue Feb 7, 2024

vulkan: Find optimal memory type but with fallback #5381

Merged

0cc4m closed this as completed in #5381 Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory #5319

Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory #5319

alex4o commented Feb 4, 2024

alex4o commented Feb 4, 2024

alex4o commented Feb 4, 2024 •

edited

Loading

0cc4m commented Feb 4, 2024

pure-water commented Feb 5, 2024 •

edited

Loading

pure-water commented Feb 5, 2024

pure-water commented Feb 5, 2024

alex4o commented Feb 5, 2024

alex4o commented Feb 5, 2024

pure-water commented Feb 5, 2024

pure-water commented Feb 5, 2024

alex4o commented Feb 5, 2024

pure-water commented Feb 6, 2024

luciferous commented Feb 6, 2024

0cc4m commented Feb 6, 2024

luciferous commented Feb 6, 2024

0cc4m commented Feb 6, 2024

pure-water commented Feb 7, 2024

luciferous commented Feb 7, 2024

pure-water commented Feb 7, 2024

luciferous commented Feb 7, 2024

pure-water commented Feb 8, 2024

Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory #5319

Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory #5319

Comments

alex4o commented Feb 4, 2024

alex4o commented Feb 4, 2024

alex4o commented Feb 4, 2024 • edited Loading

0cc4m commented Feb 4, 2024

pure-water commented Feb 5, 2024 • edited Loading

pure-water commented Feb 5, 2024

pure-water commented Feb 5, 2024

alex4o commented Feb 5, 2024

alex4o commented Feb 5, 2024

pure-water commented Feb 5, 2024

pure-water commented Feb 5, 2024

alex4o commented Feb 5, 2024

pure-water commented Feb 6, 2024

luciferous commented Feb 6, 2024

0cc4m commented Feb 6, 2024

luciferous commented Feb 6, 2024

0cc4m commented Feb 6, 2024

pure-water commented Feb 7, 2024

luciferous commented Feb 7, 2024

pure-water commented Feb 7, 2024

luciferous commented Feb 7, 2024

pure-water commented Feb 8, 2024

alex4o commented Feb 4, 2024 •

edited

Loading

pure-water commented Feb 5, 2024 •

edited

Loading