Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly #6337

Closed
thies1006 opened this issue Jul 11, 2024 · 7 comments · Fixed by #6686
Labels
bug Something isn't working

Comments

@thies1006
Copy link

Your current environment

vllm==0.5.1

🐛 Describe the bug

python -m vllm.entrypoints.openai.api_server --model /secondary/thies/Hermes-2-Theta-Llama-3-70B/ --tensor-parallel-size 8 --max-num-batched-tokens 8192

The entries of the histograms

  • time_to_first_token_seconds
  • time_per_output_token_seconds

are identical for all buckets, so the time values are apparently always 0 seconds.

# HELP vllm:time_to_first_token_seconds Histogram of time to first token in seconds.
# TYPE vllm:time_to_first_token_seconds histogram
vllm:time_to_first_token_seconds_bucket{le="0.001",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.005",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.01",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.02",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.04",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.06",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.08",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.1",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.25",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.75",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="1.0",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="2.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="5.0",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="7.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="10.0",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="+Inf",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_count{model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_sum{model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 0.011674165725708008
# HELP vllm:time_per_output_token_seconds Histogram of time per output token in seconds.
# TYPE vllm:time_per_output_token_seconds histogram
vllm:time_per_output_token_seconds_bucket{le="0.01",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.025",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.05",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.075",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.1",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.15",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.2",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.3",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.4",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.75",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="1.0",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="2.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="+Inf",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_count{model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
@thies1006 thies1006 added the bug Something isn't working label Jul 11, 2024
@AllenDou
Copy link
Contributor

@thies1006 Consider increasing the prompt length. You may see a difference in the results

@ashgold
Copy link

ashgold commented Jul 23, 2024

@AllenDou
same on me.

Since v0.5.x, this same symptom occurs.
The average number of input tokens per request is 1.5K and the average number of response tokens is 0.2K.

@AllenDou
Copy link
Contributor

@AllenDou same on me.

Since v0.5.x, this same symptom occurs. The average number of input tokens per request is 1.5K and the average number of response tokens is 0.2K.

Could you show your model(public model is better) & test data & method(chat or completion)?

@AllenDou
Copy link
Contributor

#6686 @ashgold @thies1006

@yejingfu
Copy link

yejingfu commented Aug 7, 2024

duplicated #6507

@AllenDou
Copy link
Contributor

AllenDou commented Aug 7, 2024

duplicated #6507

fixed by #6686

@thies1006
Copy link
Author

just checked, metrics are now fine for me in v0.5.4 (unrelated: need to use --disable-frontend-multiprocessing in this version)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants