You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Love this server, super fast and really one of the few that utilises the GPUs I am using to their full capacity.
The one problem I am having is that I use Grammar from Llama-cpp-python to control the output from the LLM and force it into a JSON format. Which I can parse.
I have tried without it, and the formatting is just so poor that the remedial work required makes any time saving from the faster server a wash, bearing in mind I am dealing with thousands of requests not just one or two.
It would be great if we could use grammar with vLLM and get back the responses we need.
Appreciate the consideration.
The text was updated successfully, but these errors were encountered:
Hi There,
Love this server, super fast and really one of the few that utilises the GPUs I am using to their full capacity.
The one problem I am having is that I use Grammar from Llama-cpp-python to control the output from the LLM and force it into a JSON format. Which I can parse.
I have tried without it, and the formatting is just so poor that the remedial work required makes any time saving from the faster server a wash, bearing in mind I am dealing with thousands of requests not just one or two.
It would be great if we could use grammar with vLLM and get back the responses we need.
Appreciate the consideration.
The text was updated successfully, but these errors were encountered: