-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : do not abort when ggml_aligned_malloc fails #10130
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add temporary GGML_ASSERT
s in ggml_threadpool_new_impl
where we use ggml_aligned_malloc
, until we start handling the failures?
Yes, although if a malloc so small fails, there isn't much that you can do at that point anyway, so crashing the application in that case is fine. I am not sure why the threadpool needs aligned malloc in any case, I will replace it with a standard malloc and add a check. |
Would it be possible that, in case the kv initialization fails, llama.cpp diminishes by itself automatically the ctx size (by steps of 2048, for example) until the initialization passes during the same loading process, crashing only when no kv cache can be allocated, or does such failure technically demands a crash? |
To clarify:
The llama.cpp library however, should not do that automatically, that's entirely up to the application. So with that out of the way, the question is if the llama.cpp examples should do that? I expect that would make about as many people angry as it would make happy, so I would say no. |
I also think better not do it for the examples. |
The change to |
The change to use
ggml_aligned_malloc
in ggml-backend also caused it to crash the application when the memory allocation fails, which is not intended.Additionally, added a suggestion to reduce the ctx size when kv initialization fails in llama.cpp.