From 0c0c2015c526f1fe6f86fdd8d6bd99a935d2d275 Mon Sep 17 00:00:00 2001 From: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Date: Thu, 26 Dec 2024 19:26:18 -0500 Subject: [PATCH] Update openai_compatible_server.md (#11536) Co-authored-by: Simon Mo --- docs/source/serving/openai_compatible_server.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md index 23c66f72162d2..caf5e8cafd9aa 100644 --- a/docs/source/serving/openai_compatible_server.md +++ b/docs/source/serving/openai_compatible_server.md @@ -112,7 +112,13 @@ completion = client.chat.completions.create( ## Extra HTTP Headers -Only `X-Request-Id` HTTP request header is supported for now. +Only `X-Request-Id` HTTP request header is supported for now. It can be enabled +with `--enable-request-id-headers`. + +> Note that enablement of the headers can impact performance significantly at high QPS +> rates. We recommend implementing HTTP headers at the router level (e.g. via Istio), +> rather than within the vLLM layer for this reason. +> See https://github.com/vllm-project/vllm/pull/11529 for more details. ```python completion = client.chat.completions.create(