Massive /metrics logs are generate due to short metric scrape interval #419

Jeffwan · 2024-11-20T21:59:20Z

🚀 Feature Description and Motivation

I think the 50ms duration is too short (change introduced in https://github.com/aibrix/aibrix/pull/343/files). I see massive logs generated. It not pnly gives logs system etc pressures but also make debugging very complex because user commonly check logs.

Use Case

No response

Proposed Solution

Update interval to longer value
Surpress /metrics logs in vLLM inference engine and mocked app
Disable scrape annotation could be added from workload but it's better not to disable the gateway feature.

The text was updated successfully, but these errors were encountered:

Jeffwan · 2024-11-20T21:59:59Z

#383 introduces AIBRIX_POD_METRIC_REFRESH_INTERVAL_MS to override the pluggin scrape interval

Jeffwan · 2024-11-20T22:10:19Z

vLLM's --disable-log-requests --disable-log-stats doesn't help this case.

--disable-log-requests input-ids etc are removed.
--disable-log-stats running requests, cache utilization are removed.

log-requests

INFO 11-20 22:14:55 logger.py:37] Received request cmpl-5d12af172b7942af8c2ca8fd006b6f29-0: prompt: 'San Francisco is a', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=7, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: [23729, 12879, 374, 264], lora_request: None, prompt_adapter_request: None.
INFO 11-20 22:14:55 engine.py:267] Added request cmpl-5d12af172b7942af8c2ca8fd006b6f29-0.

log-stats

INFO 11-20 22:12:01 metrics.py:449] Avg prompt throughput: 0.8 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

Jeffwan · 2024-11-20T22:15:46Z

vLLM also provides --uvicorn-log-level, --uvicorn-log-level warning is kind of a workaround that only prints request but not /metrics, it doesn't print /v1/chat/completions as well which makes the debug a little bit harder.

Jeffwan · 2024-11-20T22:27:39Z

I adopted similar way like mocked app but seems werkzeug so the startup way is a little bit different. We can use the workaround for short term and then add the disable_endpoint_log support later when we have time.

I created #420 to track it separately and this can be closed now.

Jeffwan added kind/enhancement New feature or request area/gateway priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/inference-engine labels Nov 20, 2024

Jeffwan added this to the v0.2.0 milestone Nov 20, 2024

Jeffwan self-assigned this Nov 20, 2024

Jeffwan mentioned this issue Nov 20, 2024

[Misc] Disable specific endpoints logs #418

Merged

Jeffwan closed this as completed in #418 Nov 20, 2024

Jeffwan reopened this Nov 20, 2024

Jeffwan mentioned this issue Nov 20, 2024

Disable logs from specific engine path #420

Open

Jeffwan closed this as completed Nov 20, 2024

Jeffwan mentioned this issue Feb 18, 2025

metrics log flushes the engine server and we can not see the response #695

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive /metrics logs are generate due to short metric scrape interval #419

Massive /metrics logs are generate due to short metric scrape interval #419

Jeffwan commented Nov 20, 2024 •

edited

Loading

Jeffwan commented Nov 20, 2024

Jeffwan commented Nov 20, 2024 •

edited

Loading

Jeffwan commented Nov 20, 2024 •

edited

Loading

Jeffwan commented Nov 20, 2024 •

edited

Loading

Massive /metrics logs are generate due to short metric scrape interval #419

Massive /metrics logs are generate due to short metric scrape interval #419

Comments

Jeffwan commented Nov 20, 2024 • edited Loading

🚀 Feature Description and Motivation

Use Case

Proposed Solution

Jeffwan commented Nov 20, 2024

Jeffwan commented Nov 20, 2024 • edited Loading

log-requests

log-stats

Jeffwan commented Nov 20, 2024 • edited Loading

Jeffwan commented Nov 20, 2024 • edited Loading

Jeffwan commented Nov 20, 2024 •

edited

Loading

Jeffwan commented Nov 20, 2024 •

edited

Loading

Jeffwan commented Nov 20, 2024 •

edited

Loading

Jeffwan commented Nov 20, 2024 •

edited

Loading