-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add guided decoding for OpenAI API server #2819
Conversation
@felixzhu555 can you please add example on how the guidedJson or guidedRegex should look like :
|
Hi @jalotra this feature is still in development, but once it's added I'd imagine you would define the JSON or regex as a python dictionary or string or pydantic BaseModel class. You can see this simple example that uses pydantic here: https://github.com/outlines-dev/outlines/blob/main/examples/vllm_integration.py For your example, you would pass that JSON into a request to the vllm openai server through the extra_body parameter: |
@felixzhu555 great to see this being picked-up so quickly! Don't you think the guided decoding in the OpenAI API server should mimic the OpenAI way of having guided decoding? Now that outlines is integrated, it becomes easier to support e.g. the Take this example request from the OpenAI API reference docs (see https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools): curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "What is the weather like in Boston?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}' We could extract the json schema from |
Supporting |
Hey @br3no, thanks for the suggestion, I'll definitely look into supporting the |
I gave this a try and it worked. Thank you There's a bug in the
|
@br3no, are you suggesting to remove the proposed It seems to me that it doesn't have to be one or the other. OpenAI API doesn't allow a custom json schema for constrained decoding, so this is a new capability that's doesn't fit the exiting API. You are right that the guided json can be used to power |
@ibeltagy vLLM offers two server implementations, one of which mimics the OpenAI API. The nice thing about offering an OpenAI-compatible server is that vLLM works as a drop-in replacement for OpenAI. All the stuff that works with OpenAI simply works with vLLM. As an example, take this project here: https://github.com/jxnl/instructor. (I’m not affiliated, nor do I recommend or know much about this project; it’s just an example) If you start diverging from the OpenAI API, this can break. At the same time, if users want to use features the OpenAI API doesn’t offer, they are free to use the vLLM-own server implementation, where the vLLM maintainers and community are free to go beyond what OpenAI offers. While the OpenAI API doesn’t offer an explicit json schema option, the tools functionality offers a way to do exactly that. So yes, I don’t see a good argument for adding these parameters to the OpenAI server because:
|
@br3no, makes sense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A biiiig problem here is that we are creating new logits processor per request. A common scenarios will be handling for common schemas/constraints. Can you think of a good (and semantically) way to cache the logits processors or the FSM in outlines?
I think module is actually the right one. Otherwise it needs to be in
conftest for uniqueness
On February 13, 2024, GitHub ***@***.***> wrote:
@felixzhu555 commented on this pull request.
On tests/entrypoints/test_openai_server_guided_decoding.py
<https://github.com/vllm-
project/vllm/pull/2819#discussion_r1488918490>:
Ok, should the scope for fixtures be reverted to "session"?
—
Reply to this email directly, view it on GitHub
<#2819 (comment)>,
or unsubscribe <https://github.com/notifications/unsubscribe-
auth/AFBD7A2E4CNQHTPJWCDIZRDYTRFNXAVCNFSM6AAAAABDAPCGJ6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNZZGQYDSNJYHE>.
You are receiving this because you were assigned.Message ID: <vllm-
***@***.***>
|
Co-authored-by: br3no <breno@veltefaria.de> Co-authored-by: simon-mo <simon.mo@hey.com>
Where can I find documentation for using guided decoding? |
Hi @arshadshk, sorry we don't have written documentation for guided decoding just yet, I'll try to add that soon. If you have a specific use case I can try to explain how to use it, otherwise you can check out the guided decoding tests here for some examples. |
Is this strictly an OpenAI-compatible server feature? I don't see any mention on having this available as part of simple |
Currently yes. But it would be valuable to have similar API is |
Where can I find documentation for this feature? |
I don't think there is a lengthy documentation on it, but it is shortly mentioned here: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-completions-api |
That same states the parameters, but doesn't provide any actual samples of JSON SCHEMA. I have found a few others, but I can't find anything that shows how to return an ARRAY of JSON objects. Example: "Find me 5 companies that sell cars" and I get back { Results: [ { name: "Ford" }, {name: "Toyota"}, {name: "BMW}] |
@ProVega Using tools like pydantic or zod-to-json-schema are helpful here. I think the schema you want is, for just an array of names: {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" }
}
}
} Or with a top level {
"type": "object",
"properties": {
"Results": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" }
}
}
}
}
} |
This took me some tinkering to figure out, given all the different methods and tools that have been discussed regarding this issue. Here is a simple example that will produce guided json output for me consistently:
Output:
The only issue is the order, which appears to be alphabetical regardless of how I write the code. This is important for tree of thought-type prompting. |
Support guided decoding (JSON, regex, choice) using outlines for the completion and chat completion OpenAI endpoints.
This is a continuation of @br3no's work in #2815.
relevant: #288