-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: V1 engine ignores guided json #12692
Comments
We are facing similar issues with the V1. |
I'm seeing an almost-json output. This is vllm 0.7.1. |
vLLM V1 dropped the logits_processors support so guided decoding doesn't work currently (but vLLM seems making effort to refactor and bring it back? #12388) I have a legacy-style logits processors impl for V1 (#12688, not accpeted by vLLM though) You can try it with my branch code and this offline inference example: import os
os.environ['VLLM_USE_V1'] = '1'
os.environ['VLLM_ENABLE_V1_MULTIPROCESSING'] = '0'
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams
from vllm.transformers_utils.tokenizer import get_cached_tokenizer
from vllm.model_executor.guided_decoding.outlines_decoding import get_local_outlines_guided_decoding_logits_processor
# or
# from vllm.model_executor.guided_decoding.xgrammar_decoding import get_local_xgrammar_guided_decoding_logits_processor
MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
llm = LLM(
model=MODEL,
max_model_len=1024,
gpu_memory_utilization=0.9,
enforce_eager=True,
)
tokenizer = get_cached_tokenizer(AutoTokenizer.from_pretrained(MODEL))
prompts = [
"Classify this sentiment: vLLM is wonderful!",
]
guided_decoding_params = GuidedDecodingParams(choice=["Positive", "Negative"])
logits_processor = get_local_outlines_guided_decoding_logits_processor(
guided_decoding_params,
tokenizer,
)
sampling_params = SamplingParams(
temperature=0,
max_tokens=16,
logits_processors=[logits_processor],
)
outputs = llm.generate(prompts, [sampling_params])
for output in outputs:
prompt = output.prompt
output = output.outputs[0]
generated_text = output.text
print(f"Prompt: {prompt!r}")
print(f"Generated text: {generated_text!r}") It outputs:
|
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
When making a request with to the OpenAi compatible Api with the extra fields for guided_json generation like so:
The output simply ignores the guided decoding paramter. When switching back to V0 it works fine.
Here are the logs from the vllm server:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: