[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly #6337

thies1006 · 2024-07-11T13:02:07Z

Your current environment

vllm==0.5.1

🐛 Describe the bug

python -m vllm.entrypoints.openai.api_server --model /secondary/thies/Hermes-2-Theta-Llama-3-70B/ --tensor-parallel-size 8 --max-num-batched-tokens 8192

The entries of the histograms

time_to_first_token_seconds
time_per_output_token_seconds

are identical for all buckets, so the time values are apparently always 0 seconds.

# HELP vllm:time_to_first_token_seconds Histogram of time to first token in seconds.
# TYPE vllm:time_to_first_token_seconds histogram
vllm:time_to_first_token_seconds_bucket{le="0.001",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.005",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.01",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.02",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.04",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.06",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.08",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.1",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.25",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="0.75",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="1.0",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="2.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="5.0",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="7.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="10.0",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_bucket{le="+Inf",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_count{model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 208.0
vllm:time_to_first_token_seconds_sum{model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 0.011674165725708008
# HELP vllm:time_per_output_token_seconds Histogram of time per output token in seconds.
# TYPE vllm:time_per_output_token_seconds histogram
vllm:time_per_output_token_seconds_bucket{le="0.01",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.025",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.05",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.075",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.1",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.15",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.2",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.3",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.4",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="0.75",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="1.0",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="2.5",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_bucket{le="+Inf",model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0
vllm:time_per_output_token_seconds_count{model_name="/secondary/thies/Hermes-2-Theta-Llama-3-70B/"} 2493.0

The text was updated successfully, but these errors were encountered:

AllenDou · 2024-07-12T10:35:29Z

@thies1006 Consider increasing the prompt length. You may see a difference in the results

ashgold · 2024-07-23T06:14:49Z

@AllenDou
same on me.

Since v0.5.x, this same symptom occurs.
The average number of input tokens per request is 1.5K and the average number of response tokens is 0.2K.

AllenDou · 2024-07-23T07:44:20Z

@AllenDou same on me.

Since v0.5.x, this same symptom occurs. The average number of input tokens per request is 1.5K and the average number of response tokens is 0.2K.

Could you show your model(public model is better) & test data & method(chat or completion)?

AllenDou · 2024-07-23T14:13:48Z

#6686 @ashgold @thies1006

yejingfu · 2024-08-07T06:54:45Z

duplicated #6507

AllenDou · 2024-08-07T07:06:42Z

duplicated #6507

fixed by #6686

thies1006 · 2024-08-07T09:37:19Z

just checked, metrics are now fine for me in v0.5.4 (unrelated: need to use --disable-frontend-multiprocessing in this version)

thies1006 added the bug Something isn't working label Jul 11, 2024

AllenDou mentioned this issue Jul 23, 2024

[Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate. #6686

Merged

simon-mo closed this as completed in #6686 Jul 24, 2024

elfiegg mentioned this issue Sep 30, 2024

[Bug]: Later version have degradation based on vllm:time_to_first_token_seconds_sum metric #8819

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly #6337

[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly #6337

thies1006 commented Jul 11, 2024

AllenDou commented Jul 12, 2024

ashgold commented Jul 23, 2024 •

edited

Loading

AllenDou commented Jul 23, 2024

AllenDou commented Jul 23, 2024

yejingfu commented Aug 7, 2024

AllenDou commented Aug 7, 2024

thies1006 commented Aug 7, 2024

[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly #6337

[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly #6337

Comments

thies1006 commented Jul 11, 2024

Your current environment

🐛 Describe the bug

AllenDou commented Jul 12, 2024

ashgold commented Jul 23, 2024 • edited Loading

AllenDou commented Jul 23, 2024

AllenDou commented Jul 23, 2024

yejingfu commented Aug 7, 2024

AllenDou commented Aug 7, 2024

thies1006 commented Aug 7, 2024

ashgold commented Jul 23, 2024 •

edited

Loading