VLLM output not complete #1095

RickyGunawan09 · 2023-09-19T03:47:17Z

RickyGunawan09
Sep 19, 2023

hai guys,
thank you for making this super library.
i have a question about the output of vllm

i'm using GPU RTX A6000 50GB cuda 12 with model Vicuna13B-v1.5-4k from lmsys
vllm is serve with gpu_memory_utilization 0.8
the parameter that i change for request is:

max_token 4096
temperature 0
i'm make custom prompt with context from text/document.

why sometimes the output is not complete ?

zhiqiangzhongddu · 2023-10-10T21:38:59Z

zhiqiangzhongddu
Oct 10, 2023

+1, my obtained answers are always not complete. Even shorter than a sentence.

0 replies

zhiqiangzhongddu · 2023-10-11T09:41:53Z

zhiqiangzhongddu
Oct 11, 2023

Hi this might be helpful for you. You can set the length of output to get complete answers.

vllm/vllm/sampling_params.py

Line 61 in ee8217e

max_tokens: Maximum number of tokens to generate per output sequence.

1 reply

jayachandrakalakutagar Jan 29, 2024

Still broken responses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLLM output not complete #1095

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

VLLM output not complete #1095

RickyGunawan09 Sep 19, 2023

Replies: 2 comments · 1 reply

zhiqiangzhongddu Oct 10, 2023

zhiqiangzhongddu Oct 11, 2023

jayachandrakalakutagar Jan 29, 2024

RickyGunawan09
Sep 19, 2023

Replies: 2 comments 1 reply

zhiqiangzhongddu
Oct 10, 2023

zhiqiangzhongddu
Oct 11, 2023