VLLM output not complete #1095
Unanswered
RickyGunawan09
asked this question in
Q&A
Replies: 2 comments 1 reply
-
+1, my obtained answers are always not complete. Even shorter than a sentence. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi this might be helpful for you. You can set the length of output to get complete answers. Line 61 in ee8217e |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
hai guys,
thank you for making this super library.
i have a question about the output of vllm
i'm using GPU RTX A6000 50GB cuda 12 with model Vicuna13B-v1.5-4k from lmsys
vllm is serve with gpu_memory_utilization 0.8
the parameter that i change for request is:
max_token 4096
temperature 0
i'm make custom prompt with context from text/document.
why sometimes the output is not complete ?
Beta Was this translation helpful? Give feedback.
All reactions