Debugging model output in llama-server #11389

Mushoz · 2025-01-24T09:57:38Z

Mushoz
Jan 24, 2025

When running aider's benchmark, COT models tend to get stuck at 100% load indefinitely. I am assuming these models get stuck in some infinite reasoning loops, but I am not entirely sure if that's truly the case. Perhaps it's not stuck, but just taking too long for the 1 hour hardcoded timeout to hit.

Is there some way I can see what llama-server is generating? Aider's benchmark runs with streaming disabled as it doesn't show much to the user. So I cannot inspect what is being generated. I was hoping there is some way to see what is being generated in llama-server directly.

Answered by ggerganov

Jan 24, 2025

Add -lv 1.

Most likely your are using a small context size (the default is -c 4096). Try increasing it.

View full answer

ggerganov · 2025-01-24T10:10:36Z

ggerganov
Jan 24, 2025
Maintainer

Add -lv 1.

Most likely your are using a small context size (the default is -c 4096). Try increasing it.

1 reply

Mushoz Jan 24, 2025
Author

This works, thanks! I was already using 32k context with context shift disabled. It isn't able to fill up the entire context size within the 1 hour timeout due to the model slowing down massively as the context increases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging model output in llama-server #11389

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Debugging model output in llama-server #11389

Mushoz Jan 24, 2025

Replies: 1 comment · 1 reply

ggerganov Jan 24, 2025 Maintainer

Mushoz Jan 24, 2025 Author

Mushoz
Jan 24, 2025

Replies: 1 comment 1 reply

ggerganov
Jan 24, 2025
Maintainer

Mushoz Jan 24, 2025
Author