Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory #5319

Closed
alex4o opened this issue Feb 4, 2024 · 21 comments · Fixed by #5381
Closed

Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory #5319

alex4o opened this issue Feb 4, 2024 · 21 comments · Fixed by #5381
Labels
bug-unconfirmed Vulkan Issues specific to the Vulkan backend

Comments

@alex4o
Copy link

alex4o commented Feb 4, 2024

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

  1. I am using a Google Pixel 6 Pro with vulkan, build with make and clang clang version 17.0.6 Target: aarch64-unknown-linux-android24
  2. I am on 277fad3 b2059

Here is a link for the output of vulkaninfo https://gist.github.com/alex4o/20f949910574295c22f951f64e1d421d
here is a link for the output of main https://gist.github.com/alex4o/7809ed6597cb88c4f44fcbab03475d9e

Have not looked too deep in this but it can be seen that llama.cpp tries to allocate a bigger chunk of memory then it needs for some reason.

@alex4o
Copy link
Author

alex4o commented Feb 4, 2024

After doing :%s/ | vk::MemoryPropertyFlagBits::eHostCached// on the ggml-vulkan.cpp file it seems to compile and run. Running the tests seem to provide reasonable results.

Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4
My phone fell a sleep while running the test so they did not complete.

@alex4o
Copy link
Author

alex4o commented Feb 4, 2024

Ok got it to work like that and the results it output seem to look fine: https://gist.github.com/alex4o/2879f3997e72fc5cc44d78ee54333113

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 4, 2024

Ok got it to work like that and the results it output seem to look fine: https://gist.github.com/alex4o/2879f3997e72fc5cc44d78ee54333113

That looks like it was CPU-only. It would mention the Vulkan device otherwise.

After doing :%s/ | vk::MemoryPropertyFlagBits::eHostCached// on the ggml-vulkan.cpp file it seems to compile and run. Running the tests seem to provide reasonable results.

Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4 My phone fell a sleep while running the test so they did not complete.

Pretty high error values in the matrix multiplications, so it doesn't really work with your GPU yet. Warp size 16 isn't really something I dealt with yet, that's the most likely cause. Nvidia, AMD and Intel use 32 and 64.

@0cc4m 0cc4m added the Vulkan Issues specific to the Vulkan backend label Feb 4, 2024
@pure-water
Copy link

pure-water commented Feb 5, 2024

What about another way around which remove the HostCoherent Flag instead(leaving HostCached instead?)

@pure-water
Copy link

Ok got it to work like that and the results it output seem to look fine: https://gist.github.com/alex4o/2879f3997e72fc5cc44d78ee54333113

That looks like it was CPU-only. It would mention the Vulkan device otherwise.

After doing :%s/ | vk::MemoryPropertyFlagBits::eHostCached// on the ggml-vulkan.cpp file it seems to compile and run. Running the tests seem to provide reasonable results.
Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4 My phone fell a sleep while running the test so they did not complete.

Pretty high error values in the matrix multiplications, so it doesn't really work with your GPU yet. Warp size 16 isn't really something I dealt with yet, that's the most likely cause. Nvidia, AMD and Intel use 32 and 64.

Is there anything explictly mention it is actually being on Mali G78 or not?

@pure-water
Copy link

After doing :%s/ | vk::MemoryPropertyFlagBits::eHostCached// on the ggml-vulkan.cpp file it seems to compile and run. Running the tests seem to provide reasonable results.

Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4 My phone fell a sleep while running the test so they did not complete.

what is the command to build to get this far? At least it is saying "offload tensors to GPU"?

@alex4o
Copy link
Author

alex4o commented Feb 5, 2024

What about another way around which remove the HostCoherent Flag instead(leaving HostCached instead?)

Do you think that will have better performance?

@alex4o
Copy link
Author

alex4o commented Feb 5, 2024

Ok so I got the direct output of main It now shows what GPU it is used.

https://gist.github.com/alex4o/702fa06fdcc716234002459dbf6c3270

@pure-water
Copy link

What about another way around which remove the HostCoherent Flag instead(leaving HostCached instead?)

Do you think that will have better performance?

No, i have same issue just use the way i mentioned above. I have no comparsion numbers

@pure-water
Copy link

Ok so I got the direct output of main It now shows what GPU it is used.

https://gist.github.com/alex4o/702fa06fdcc716234002459dbf6c3270

you just use the head of the git repo or there are quite a lot of your own code apart from the memory type thing?

@alex4o
Copy link
Author

alex4o commented Feb 5, 2024

Yes I use the head of the git repo without anything else then the removal of the memory type requirement.

@pure-water
Copy link

Many thanks. Would you possible to show the tokens/s as well, very interesting to know the G78 number?

@luciferous
Copy link
Contributor

@pure-water @alex4o Adding a data point for Pixel 7 (Mali-G710): eHostCached makes the output return random characters; eHostCoherent works for me. See: https://gist.github.com/luciferous/66fbefd24ed321cd79bfd8940e043860

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 6, 2024

I suppose I should do a check on device creation whether HostVisible, HostCoherent and HostCached memory is available and if not fall back to HostVisible and HostCoherent. HostVisible and HostCached would require me to manually manage the synchronization between CPU and GPU, which currently is not implemented. That's why you get bad results that way.

@luciferous
Copy link
Contributor

I could give it a shot writing a check/fallback for available memory types.

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 6, 2024

I could give it a shot writing a check/fallback for available memory types.

Sure, go ahead. Just be aware of #5321 which touches a lot of the code and will get merged soon. Maybe wait until it's done or start building on top of it.

luciferous added a commit to luciferous/llama.cpp that referenced this issue Feb 7, 2024
Some memory properties are nice to have, but not critical.
`eHostCached`, for instance, isn't essential, and yet we fail on devices
where this memory property isn't available.

    ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory

This change differentiates between those properties that are critical
and those that are just nice-to-have, and will fail only when critical
properties aren't available.

Fixes ggerganov#5319.
@pure-water
Copy link

Would you guys please provide more data points whereby the performance difference between GPU run and CPU run in your devices?

@luciferous
Copy link
Contributor

@pure-water Good idea to collect more performance data. What do you think if we open another ticket for that? Let this thread focus on the memory type failure.

@pure-water
Copy link

Yes, please go ahead

luciferous added a commit to luciferous/llama.cpp that referenced this issue Feb 7, 2024
Some memory properties are nice to have, but not critical.
`eHostCached`, for instance, isn't essential, and yet we fail on devices
where this memory property isn't available.

    ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory

This change differentiates between those properties that are critical
and those that are just nice-to-have, and will fail only when critical
properties aren't available.

Fixes ggerganov#5319.
luciferous added a commit to luciferous/llama.cpp that referenced this issue Feb 7, 2024
Some memory properties are nice to have, but not critical.
`eHostCached`, for instance, isn't essential, and yet we fail on devices
where this memory property isn't available.

    ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory

This change differentiates between those properties that are critical
and those that are just nice-to-have, and will fail only when critical
properties aren't available.

Fixes ggerganov#5319.
@luciferous
Copy link
Contributor

@pure-water I'll let you open the ticket since you're the one requesting and can provide more context.

@pure-water
Copy link

Hi, Please look here #5410

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants