-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama.cpp server doesn't return grammar error messages when in streaming mode #7391
Comments
@richardanaya Sorry for my delay in seeing this! Great catch, and thank you for the excellent writeup in the ollama repo! Fantastic reporting, thank you! I particularly appreciated the annotated screenshots of what you're seeing. I have not yet dug through that whole thread in detail, but plan to read more later. Over there, you wrote:
I've spent a bit of time (#5948, #5950, #6004) improving the GBNF grammar validation within llama.cpp, and I would hate to see this work be re-implemented. Building a separate GBNF validator is non-ideal, since any changes to the functionality of the base (such as extensions to the grammar from #6640 or #6467) would need to be duplicated in each other implementation. That's the path I originally started going down, but I ultimately chose the path in #5948) so that we would have a validation program that would always be in lockstep with master. But that said, I would love to better-support the server. As noted in this comment), at the time that I started working on this, I wasn't entirely sure how to best provide this error-checking to the CLI or other applications (such as the server). In an ideal world, what would be the behavior of Llama.cpp (for server and / or CLI) in the case of invalid grammar? I think I was the last one to touch the grammar validation code, and am happy to implement the changes you need. Knowing what you would like to see would help me implement this. I think that adding grammar support to ollama would be a huge quality of life improvement for a lot of people -- I want to do whatever I can to make this happen! |
@HanClinto hey! thanks for the response. I 100% agree with you, I would love to just use llama.cpp too. The biggest issue seems to be short-circuiting execution of tasks (without ejecting the in memory model) during "streaming=true" mode. Any kind of 400 error would do :) if that's not easy, having some kind of |
Sounds pretty reasonable. Looks like 422 might be a reasonable error to return in the case of an invalid grammar?
Could definitely add something like that as well. Might be nice to expose the gbnf-validator functionality via such an endpoint. I haven't messed around with the streaming mode of the server very much -- do you happen to have a sample script handy that I can use for testing? If not, I'm happy to figure it out and write something up in Python or whatever, but figured it was worth it for me to ask. If it's not self-contained / not easy-to-share, then don't worry about it. Out of curiosity, do you happen to know what the behavior is like when streaming is set to false -- is that more sane behavior? If so, then that might be able to function as your |
Hmm, reading through the server documentation it appears that 400 is already reserved for an invalid grammar error:
I'm reading through the server code and not entirely sure why this response code of 400 wouldn't be returned when streaming = true -- I still need an easier test setup but I'd like to dig to the bottom of this. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
When you run a streaming request against llama.cpp there appears to be no way to see error messages in the http response related to grammar errors.
The text was updated successfully, but these errors were encountered: