llama: refactor llama_decode_impl #11381

JohannesGaessler · 2025-01-23T22:53:04Z

This PR refactors llama_decode_impl by moving some of the code to functions llama_prepare_sbatch and llama_prepare_ubatch. There should be no change to functionality. The motivation for the change is that this enables re-using the code for training in #10544 .

ggerganov

In the #11213 refactoring, the llama_prepare_sbatch and llama_prepare_ubatch will be replaced with something like:

// internal llama_context logic will be implemented
// for example, the llama_prepare_sbatch logic will go here
llama_batch_manager_i bm = lctx.prepare_batch(batch, logits_all);

while (!bm->done()) {
    // the llama_prepare_ubatch() will be implemented here
    llama_ubatch ubatch = bm->next();

    ...
}

Feel free to merge. After merging, I will rebase the #11213 PR to illustrate this.

llama: refactor llama_decode_impl

cd0aee8

ggerganov mentioned this pull request Jan 24, 2025

llama : refactor llama_kv_cache, llama_context and llm_build_context #11213

Draft

13 tasks

ggerganov approved these changes Jan 27, 2025

View reviewed changes

JohannesGaessler merged commit df984e0 into ggerganov:master Jan 27, 2025
45 checks passed

JamePeng mentioned this pull request Jan 28, 2025

Update llama_cpp: Sync LLAMA_API names with llama.cpp mainline. Needs more testing abetlen/llama-cpp-python#1901

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: refactor llama_decode_impl #11381

llama: refactor llama_decode_impl #11381

JohannesGaessler commented Jan 23, 2025

ggerganov left a comment

llama: refactor llama_decode_impl #11381

llama: refactor llama_decode_impl #11381

Conversation

JohannesGaessler commented Jan 23, 2025

ggerganov left a comment

Choose a reason for hiding this comment