-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: llama.cpp server reports inaccurate n_ctx_per_seq? #10186
Comments
It's using the correct context - 8192. The message is incorrect due to the hack that you noticed. Currently, this hack is not necessary - it was used for the old system prompt functionality, which was removed (#9811). I thought about keeping this extra sequence, as I had some ideas to utilize it. We should just remove the hack now. |
Could you check if #10187 works as expected? |
Before:
After:
Interesting that the CPU output buffer is slightly smaller. Is that expected? |
I get the following results when I load llamacpp with the following message.
|
It looks like this shouldn't have been closed: @horenbergerb 's post above gives the same error message both before and after the change. I'm getting the same issue as well. |
Same issue over here. |
What happened?
Running a model and specifying 8192 context like so:
Causes the following to print during initialization:
This freaked me out, because based on this discussion, the message implies that I'm actually only getting 4096 context due to parallelization. On the other hand, and I also see:
which is what I would expect.
This discrepancy seems to be due to the fact that the llama.cpp server temporarily increments n_parallel when loading the model (for a reason relating to Mamba? Not sure why we do this).
My concerns are:
Please let me know if any other information is needed, but this should be easy to replicate. Thanks!
Name and Version
What operating system are you seeing the problem on?
Linux
Relevant log output
No response
The text was updated successfully, but these errors were encountered: