-
When running aider's benchmark, COT models tend to get stuck at 100% load indefinitely. I am assuming these models get stuck in some infinite reasoning loops, but I am not entirely sure if that's truly the case. Perhaps it's not stuck, but just taking too long for the 1 hour hardcoded timeout to hit. Is there some way I can see what llama-server is generating? Aider's benchmark runs with streaming disabled as it doesn't show much to the user. So I cannot inspect what is being generated. I was hoping there is some way to see what is being generated in llama-server directly. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Add Most likely your are using a small context size (the default is |
Beta Was this translation helpful? Give feedback.
Add
-lv 1
.Most likely your are using a small context size (the default is
-c 4096
). Try increasing it.