-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails with ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory #5319
Comments
After doing Here is a gist of the results: https://gist.github.com/alex4o/12a29218a9860a8f1dad7c087adfa6b4 |
Ok got it to work like that and the results it output seem to look fine: https://gist.github.com/alex4o/2879f3997e72fc5cc44d78ee54333113 |
That looks like it was CPU-only. It would mention the Vulkan device otherwise.
Pretty high error values in the matrix multiplications, so it doesn't really work with your GPU yet. Warp size 16 isn't really something I dealt with yet, that's the most likely cause. Nvidia, AMD and Intel use 32 and 64. |
What about another way around which remove the HostCoherent Flag instead(leaving HostCached instead?) |
Is there anything explictly mention it is actually being on Mali G78 or not? |
what is the command to build to get this far? At least it is saying "offload tensors to GPU"? |
Do you think that will have better performance? |
Ok so I got the direct output of https://gist.github.com/alex4o/702fa06fdcc716234002459dbf6c3270 |
No, i have same issue just use the way i mentioned above. I have no comparsion numbers |
you just use the head of the git repo or there are quite a lot of your own code apart from the memory type thing? |
Yes I use the head of the git repo without anything else then the removal of the memory type requirement. |
Many thanks. Would you possible to show the tokens/s as well, very interesting to know the G78 number? |
@pure-water @alex4o Adding a data point for Pixel 7 (Mali-G710): |
I suppose I should do a check on device creation whether HostVisible, HostCoherent and HostCached memory is available and if not fall back to HostVisible and HostCoherent. HostVisible and HostCached would require me to manually manage the synchronization between CPU and GPU, which currently is not implemented. That's why you get bad results that way. |
I could give it a shot writing a check/fallback for available memory types. |
Sure, go ahead. Just be aware of #5321 which touches a lot of the code and will get merged soon. Maybe wait until it's done or start building on top of it. |
Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memory property isn't available. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory This change differentiates between those properties that are critical and those that are just nice-to-have, and will fail only when critical properties aren't available. Fixes ggerganov#5319.
Would you guys please provide more data points whereby the performance difference between GPU run and CPU run in your devices? |
@pure-water Good idea to collect more performance data. What do you think if we open another ticket for that? Let this thread focus on the memory type failure. |
Yes, please go ahead |
Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memory property isn't available. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory This change differentiates between those properties that are critical and those that are just nice-to-have, and will fail only when critical properties aren't available. Fixes ggerganov#5319.
Some memory properties are nice to have, but not critical. `eHostCached`, for instance, isn't essential, and yet we fail on devices where this memory property isn't available. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory This change differentiates between those properties that are critical and those that are just nice-to-have, and will fail only when critical properties aren't available. Fixes ggerganov#5319.
@pure-water I'll let you open the ticket since you're the one requesting and can provide more context. |
Hi, Please look here #5410 |
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
clang version 17.0.6 Target: aarch64-unknown-linux-android24
b2059
Here is a link for the output of
vulkaninfo
https://gist.github.com/alex4o/20f949910574295c22f951f64e1d421dhere is a link for the output of
main
https://gist.github.com/alex4o/7809ed6597cb88c4f44fcbab03475d9eHave not looked too deep in this but it can be seen that llama.cpp tries to allocate a bigger chunk of memory then it needs for some reason.
The text was updated successfully, but these errors were encountered: