bug: Free the allocated tokens in the batch #5252

irbull · 2024-02-01T08:03:13Z

The llama_batch_init allocates memory for a fixed number of tokens. However, the llama_batch_free only frees memory for the number of tokens that were added to the batch.

This change-set tracks the size of the batch structure and frees the all the allocated memory.

ggerganov · 2024-02-01T11:29:29Z

Thanks for catching that

Instead of adding n_alloc_tokens, should we make seq_id to be null-terminated? This way we'll avoid changing the public struct and introducing what is essentially private data.

In the long run, llama_batch would probably have to be just forward declared and accessed via an interface.

irbull · 2024-02-01T15:56:43Z

I've updated the commit to use a null terminated array. I added an extra element to the seq_id array and assigned the last value to nullptr. During the free, I loop through until we find the nullptr element.

I agree, not updating the public API is a +1. I did change the name of the first parameter of llama_batch_init to n_alloc_tokens as n_tokens is a very overloaded term in Llama.cpp, and I thought this made it clear that this isn't the number of tokens in the batch.

ggerganov · 2024-02-01T17:37:39Z

Ok makes sense. Just a tiny nit: change the name to n_tokens_alloc

irbull · 2024-02-01T20:39:46Z

Ok makes sense. Just a tiny nit: change the name to n_tokens_alloc

+1, sounds good. Done!

llama.cpp

The llama_batch_init allocates memory for a fixed number of tokens. However, the llama_batch_free only frees memory for the number of tokens that were added to the batch. This change-set uses a null terminated array for the batch seq_id, and frees all the elements until the nullptr is reached. This change-set also changes the name of the first parameter from `n_tokens` to `n_tokens_alloc` to more clearly indicate that this value is the number of tokens allocated to the batch, not the number of tokens in the batch.

irbull · 2024-02-02T05:46:28Z

Is there typically build errors? This seems to be a timeout of some sort (after 5h) on the Mac build running. cmake on my M1 seems to work, but I'm sure the server setup is different. I'm not sure if this is a transient error, or if this is an indication of a real problem.

ggerganov · 2024-02-02T07:20:37Z

I think it's the Github Actions acting up - should be ok

The llama_batch_init allocates memory for a fixed number of tokens. However, the llama_batch_free only frees memory for the number of tokens that were added to the batch. This change-set uses a null terminated array for the batch seq_id, and frees all the elements until the nullptr is reached. This change-set also changes the name of the first parameter from `n_tokens` to `n_tokens_alloc` to more clearly indicate that this value is the number of tokens allocated to the batch, not the number of tokens in the batch.

irbull force-pushed the free-batch-fix branch from 4c8006e to 0dd3f7e Compare February 1, 2024 15:53

irbull force-pushed the free-batch-fix branch 2 times, most recently from d32b7de to 483af49 Compare February 1, 2024 20:36

slaren reviewed Feb 1, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

irbull force-pushed the free-batch-fix branch from 483af49 to 185333b Compare February 1, 2024 20:44

ggerganov approved these changes Feb 2, 2024

View reviewed changes

ggerganov merged commit e1e7210 into ggml-org:master Feb 2, 2024
52 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Free the allocated tokens in the batch #5252

bug: Free the allocated tokens in the batch #5252

irbull commented Feb 1, 2024

ggerganov commented Feb 1, 2024

irbull commented Feb 1, 2024

ggerganov commented Feb 1, 2024

irbull commented Feb 1, 2024

irbull commented Feb 2, 2024

ggerganov commented Feb 2, 2024

bug: Free the allocated tokens in the batch #5252

bug: Free the allocated tokens in the batch #5252

Conversation

irbull commented Feb 1, 2024

ggerganov commented Feb 1, 2024

irbull commented Feb 1, 2024

ggerganov commented Feb 1, 2024

irbull commented Feb 1, 2024

irbull commented Feb 2, 2024

ggerganov commented Feb 2, 2024