Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Free the allocated tokens in the batch #5252

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

irbull
Copy link
Contributor

@irbull irbull commented Feb 1, 2024

The llama_batch_init allocates memory for a fixed number of tokens. However, the llama_batch_free only frees memory for the number of tokens that were added to the batch.

This change-set tracks the size of the batch structure and frees the all the allocated memory.

@ggerganov
Copy link
Member

Thanks for catching that

Instead of adding n_alloc_tokens, should we make seq_id to be null-terminated? This way we'll avoid changing the public struct and introducing what is essentially private data.

In the long run, llama_batch would probably have to be just forward declared and accessed via an interface.

@irbull
Copy link
Contributor Author

irbull commented Feb 1, 2024

I've updated the commit to use a null terminated array. I added an extra element to the seq_id array and assigned the last value to nullptr. During the free, I loop through until we find the nullptr element.

I agree, not updating the public API is a +1. I did change the name of the first parameter of llama_batch_init to n_alloc_tokens as n_tokens is a very overloaded term in Llama.cpp, and I thought this made it clear that this isn't the number of tokens in the batch.

@ggerganov
Copy link
Member

Ok makes sense. Just a tiny nit: change the name to n_tokens_alloc

@irbull irbull force-pushed the free-batch-fix branch 2 times, most recently from d32b7de to 483af49 Compare February 1, 2024 20:36
@irbull
Copy link
Contributor Author

irbull commented Feb 1, 2024

Ok makes sense. Just a tiny nit: change the name to n_tokens_alloc

+1, sounds good. Done!

The llama_batch_init allocates memory for a fixed number of tokens.
However, the llama_batch_free only frees memory for the number of
tokens that were added to the batch.

This change-set uses a null terminated array for the batch seq_id, and
frees all the elements until the nullptr is reached. This change-set
also changes the name of the first parameter from `n_tokens` to
`n_tokens_alloc` to more clearly indicate that this value is the number
of tokens allocated to the batch, not the number of tokens in the batch.
@irbull
Copy link
Contributor Author

irbull commented Feb 2, 2024

Is there typically build errors? This seems to be a timeout of some sort (after 5h) on the Mac build running. cmake on my M1 seems to work, but I'm sure the server setup is different. I'm not sure if this is a transient error, or if this is an indication of a real problem.

@ggerganov ggerganov merged commit e1e7210 into ggml-org:master Feb 2, 2024
52 of 53 checks passed
@ggerganov
Copy link
Member

I think it's the Github Actions acting up - should be ok

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
The llama_batch_init allocates memory for a fixed number of tokens.
However, the llama_batch_free only frees memory for the number of
tokens that were added to the batch.

This change-set uses a null terminated array for the batch seq_id, and
frees all the elements until the nullptr is reached. This change-set
also changes the name of the first parameter from `n_tokens` to
`n_tokens_alloc` to more clearly indicate that this value is the number
of tokens allocated to the batch, not the number of tokens in the batch.
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
The llama_batch_init allocates memory for a fixed number of tokens.
However, the llama_batch_free only frees memory for the number of
tokens that were added to the batch.

This change-set uses a null terminated array for the batch seq_id, and
frees all the elements until the nullptr is reached. This change-set
also changes the name of the first parameter from `n_tokens` to
`n_tokens_alloc` to more clearly indicate that this value is the number
of tokens allocated to the batch, not the number of tokens in the batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants