Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive /metrics logs are generate due to short metric scrape interval #419

Closed
Jeffwan opened this issue Nov 20, 2024 · 4 comments · Fixed by #418
Closed

Massive /metrics logs are generate due to short metric scrape interval #419

Jeffwan opened this issue Nov 20, 2024 · 4 comments · Fixed by #418
Assignees
Labels
area/gateway area/inference-engine kind/enhancement New feature or request priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 20, 2024

🚀 Feature Description and Motivation

img_v3_02gd_7d7a4a4a-8d7d-4ee3-9444-b6b5e6452bfh

I think the 50ms duration is too short (change introduced in https://github.com/aibrix/aibrix/pull/343/files). I see massive logs generated. It not pnly gives logs system etc pressures but also make debugging very complex because user commonly check logs.

Use Case

No response

Proposed Solution

  • Update interval to longer value
  • Surpress /metrics logs in vLLM inference engine and mocked app
  • Disable scrape annotation could be added from workload but it's better not to disable the gateway feature.
@Jeffwan Jeffwan added kind/enhancement New feature or request area/gateway priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/inference-engine labels Nov 20, 2024
@Jeffwan Jeffwan added this to the v0.2.0 milestone Nov 20, 2024
@Jeffwan Jeffwan self-assigned this Nov 20, 2024
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Nov 20, 2024

#383 introduces AIBRIX_POD_METRIC_REFRESH_INTERVAL_MS to override the pluggin scrape interval

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Nov 20, 2024

vLLM's --disable-log-requests --disable-log-stats doesn't help this case.

  • --disable-log-requests input-ids etc are removed.
  • --disable-log-stats running requests, cache utilization are removed.

image

log-requests

INFO 11-20 22:14:55 logger.py:37] Received request cmpl-5d12af172b7942af8c2ca8fd006b6f29-0: prompt: 'San Francisco is a', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=7, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: [23729, 12879, 374, 264], lora_request: None, prompt_adapter_request: None.
INFO 11-20 22:14:55 engine.py:267] Added request cmpl-5d12af172b7942af8c2ca8fd006b6f29-0.

log-stats

INFO 11-20 22:12:01 metrics.py:449] Avg prompt throughput: 0.8 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Nov 20, 2024

vLLM also provides --uvicorn-log-level, --uvicorn-log-level warning is kind of a workaround that only prints request but not /metrics, it doesn't print /v1/chat/completions as well which makes the debug a little bit harder.

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Nov 20, 2024

image
I adopted similar way like mocked app but seems werkzeug so the startup way is a little bit different. We can use the workaround for short term and then add the disable_endpoint_log support later when we have time.

I created #420 to track it separately and this can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gateway area/inference-engine kind/enhancement New feature or request priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant