From ec719193abbc34c19adcb38d127fb4d01bac2d76 Mon Sep 17 00:00:00 2001 From: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Date: Thu, 26 Dec 2024 19:24:22 -0500 Subject: [PATCH 1/2] Update openai_compatible_server.md --- docs/source/serving/openai_compatible_server.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md index 23c66f72162d2..e517bb393c81c 100644 --- a/docs/source/serving/openai_compatible_server.md +++ b/docs/source/serving/openai_compatible_server.md @@ -112,7 +112,13 @@ completion = client.chat.completions.create( ## Extra HTTP Headers -Only `X-Request-Id` HTTP request header is supported for now. +Only `X-Request-Id` HTTP request header is supported for now. It can be enabled +with `--enable-request-id-headers`. + +> Note that enablement of the headers can impact performance significantly at high QPS +> rates. We recommend implementing HTTP headers at the router level (e.g. via Istio), +> rather than within the VLLM layer for this reason. +> See https://github.com/vllm-project/vllm/pull/11529 for more details. ```python completion = client.chat.completions.create( From c725395c0f7f91dd70cb646ce66f65eba20957dc Mon Sep 17 00:00:00 2001 From: Simon Mo Date: Thu, 26 Dec 2024 16:25:54 -0800 Subject: [PATCH 2/2] Apply suggestions from code review --- docs/source/serving/openai_compatible_server.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md index e517bb393c81c..caf5e8cafd9aa 100644 --- a/docs/source/serving/openai_compatible_server.md +++ b/docs/source/serving/openai_compatible_server.md @@ -117,7 +117,7 @@ with `--enable-request-id-headers`. > Note that enablement of the headers can impact performance significantly at high QPS > rates. We recommend implementing HTTP headers at the router level (e.g. via Istio), -> rather than within the VLLM layer for this reason. +> rather than within the vLLM layer for this reason. > See https://github.com/vllm-project/vllm/pull/11529 for more details. ```python