-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: openai.serving_chat tries to call _create_chat_logprobs when the output.text is empty #8988
Comments
This is an interesting bug, but I don't think adding Did you check what's in |
For the chunks with empty text, the token_ids list are quite long. It has 13834 elements. I checked that the output.token_ids here is the same as prompt_token_ids.
|
Did a bit deep dive, when in streaming and for the first few chunks with empty generation, it executes this line: https://github.com/vllm-project/vllm/blob/main/vllm/sequence.py#L531. It returns the cached_token_ids, and with |
@njhill I tried to change https://github.com/vllm-project/vllm/blob/main/vllm/sequence.py#L531 to:
It fixed both logprobs and the completion_tokens usage info issue I mentioned. But I don't think it is the correct fix. Since the logic there seems tricky. Kindly ask for your input. Thanks! |
@CatherineSue sorry for the delay, I will look at this today. |
@CatherineSue actually this is the correct fix I think! This is why negative indexing is a bit precarious. Would you like to open another PR with this? It would be great to have a unit test for this case too. Thanks for finding the bug. |
@njhill thank you for verifying. Opened a PR. |
Your current environment
The output of `python send_llama3_128k.py`
Model Input Dumps
server start command
The input prompt is too long and github doesn't support .py so I had to override it to .txt
send_llama3_128k.py.txt
🐛 Describe the bug
When I try to run the above script
send_llama3_128k.py
, the server raises an error:Upon debugging, I think the issue here is that for the first few chunks, the output.text is empty, aka
''
, output.logprobs is [], essentially there is no logprobs available.I am uncertain why the output.text is empty, but here we should add a check, if the output.text is empty, we can skip create logprobs.
After adding the above line, the new output of
python send_llama3_128k.py
is:Before submitting a new issue...
The text was updated successfully, but these errors were encountered: