Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Llama-cpp-python Grammar #2548

Closed
Kaotic3 opened this issue Jan 22, 2024 · 2 comments
Closed

Add Support for Llama-cpp-python Grammar #2548

Kaotic3 opened this issue Jan 22, 2024 · 2 comments

Comments

@Kaotic3
Copy link

Kaotic3 commented Jan 22, 2024

Hi There,

Love this server, super fast and really one of the few that utilises the GPUs I am using to their full capacity.

The one problem I am having is that I use Grammar from Llama-cpp-python to control the output from the LLM and force it into a JSON format. Which I can parse.

I have tried without it, and the formatting is just so poor that the remedial work required makes any time saving from the faster server a wash, bearing in mind I am dealing with thousands of requests not just one or two.

It would be great if we could use grammar with vLLM and get back the responses we need.

Appreciate the consideration.

@lzl12051
Copy link

lzl12051 commented Feb 1, 2024

I second this, Grammar is super useful when you need to process LLM's results in a deterministic way. Appreciate it.

@hmellor
Copy link
Collaborator

hmellor commented Apr 4, 2024

vLLM supports guided decoding via outlines #2819

@hmellor hmellor closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants