Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : do not abort when ggml_aligned_malloc fails #10130

Closed
wants to merge 2 commits into from

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Nov 1, 2024

The change to use ggml_aligned_malloc in ggml-backend also caused it to crash the application when the memory allocation fails, which is not intended.

Additionally, added a suggestion to reduce the ctx size when kv initialization fails in llama.cpp.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 2, 2024
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add temporary GGML_ASSERTs in ggml_threadpool_new_impl where we use ggml_aligned_malloc, until we start handling the failures?

@slaren
Copy link
Collaborator Author

slaren commented Nov 2, 2024

Should we add temporary GGML_ASSERTs in ggml_threadpool_new_impl where we use ggml_aligned_malloc, until we start handling the failures?

Yes, although if a malloc so small fails, there isn't much that you can do at that point anyway, so crashing the application in that case is fine. I am not sure why the threadpool needs aligned malloc in any case, I will replace it with a standard malloc and add a check.

@Nexesenex
Copy link
Contributor

Additionally, added a suggestion to reduce the ctx size when kv initialization fails in llama.cpp.

Would it be possible that, in case the kv initialization fails, llama.cpp diminishes by itself automatically the ctx size (by steps of 2048, for example) until the initialization passes during the same loading process, crashing only when no kv cache can be allocated, or does such failure technically demands a crash?

@slaren
Copy link
Collaborator Author

slaren commented Nov 2, 2024

To clarify:

  • llama.cpp does not crash when there is insufficient memory to allocate the KV cache, it returns an error (this was a bug)
  • Applications are free to handle the error in any way they want, including trying allocating a smaller KV cache

The llama.cpp library however, should not do that automatically, that's entirely up to the application.

So with that out of the way, the question is if the llama.cpp examples should do that? I expect that would make about as many people angry as it would make happy, so I would say no.

@ggerganov
Copy link
Owner

So with that out of the way, the question is if the llama.cpp examples should do that? I expect that would make about as many people angry as it would make happy, so I would say no.

I also think better not do it for the examples.

@slaren
Copy link
Collaborator Author

slaren commented Nov 4, 2024

The change to ggml_aligned_malloc was included in #10144

@slaren slaren closed this Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants