Update multi-node inference examples

vllm-project · Feb 17, 2025 · 7e30b5b · 7e30b5b
1 parent 61fc34a
commit 7e30b5b
Show file tree

Hide file tree

Showing 3 changed files with 18 additions and 18 deletions.
diff --git a/docs/source/features/gateway-plugins.rst b/docs/source/features/gateway-plugins.rst
@@ -102,7 +102,7 @@ In most Kubernetes setups, ``LoadBalancer`` is supported by default. You can ret
     LB_IP=$(kubectl get svc/envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
     ENDPOINT="${LB_IP}:80"
 
- The model name, such as ``deepseek-r1-distill-llama-8b``, must match the label ``model.aibrix.ai/name`` in your deployment.
+The model name, such as ``deepseek-r1-distill-llama-8b``, must match the label ``model.aibrix.ai/name`` in your deployment.
 
 .. code-block:: bash
 
@@ -120,7 +120,7 @@ In most Kubernetes setups, ``LoadBalancer`` is supported by default. You can ret
     If vLLM, you can pass in the argument ``--api-key`` or environment variable ``VLLM_API_KEY`` to enable the server to check for API key in the header.
     Check `vLLM OpenAI-Compatible Server <https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server>`_ for more details.
 
-After you enable the authentication, you can query model with `-H Authorization: bearer your_key` in this way
+After you enable the authentication, you can query model with ``-H Authorization: bearer your_key`` in this way
 
 .. code-block:: bash
   :emphasize-lines: 3
@@ -156,10 +156,6 @@ Below are routing strategies gateway supports
         "temperature": 0.7
     }'
 
-if you use openai sdk client, you can query the
-
-
-
 
 Rate Limiting
 -------------

diff --git a/docs/source/features/multi-node-inference.rst b/docs/source/features/multi-node-inference.rst
@@ -45,15 +45,25 @@ It's similar like Kubernetes core concept ``ReplicaSet`` and ``Deployment``. Mos
 Workloads Examples
 ------------------
 
+.. attention::
+
+    Starting from v0.6.6, we've added essential packages to run distributed inference with vLLM official container image distribution out of the box.
+    If you use earlier versions, you can follow guidance below to build your own image compatible with multi-node inference.
+
+
+This is the ``RayClusterFleet`` example, you can apply this yaml in your cluster.
+
+.. literalinclude:: ../../../samples/distributed/fleet.yaml
+   :language: yaml
+
+
 vLLM Version
 ^^^^^^^^^^^^
 
-Starting from v0.6.6, we've added essential packages to run distributed inference with vLLM official container image distribution out of the box.
-
 If you are using vLLM earlier version, you have two options.
 
-1. Use our built image ``aibrix/vllm-openai:v0.6.1.post2-distributed``.
-2. Build your own image and follow steps here.
+* Use our built image ``aibrix/vllm-openai:v0.6.1.post2-distributed``.
+* Build your own image and follow steps here.
 
 .. code-block:: Dockerfile
 
@@ -62,13 +72,7 @@ If you are using vLLM earlier version, you have two options.
     RUN pip3 install ray[default] # important for future healthcheck
     ENTRYPOINT [""]
 
+
 .. code-block:: bash
 
     docker build -t aibrix/vllm-openai:v0.6.1.post2-distributed .
-
-
-RayClusterReplicaSet
-^^^^^^^^^^^^^^^^^^^^
-
-.. literalinclude:: ../../../samples/distributed/multi-host.yaml
-   :language: yaml
diff --git a/samples/distributed/multi-host.yaml → samples/distributed/fleet.yaml b/samples/distributed/multi-host.yaml → samples/distributed/fleet.yaml
@@ -30,7 +30,7 @@ spec:
           spec:
             containers:
               - name: ray-head
-                image: aibrix/vllm-openai:v0.6.1.post2-distributed
+                image: vllm/vllm-openai:v0.7.1
                 ports:
                   - containerPort: 6379
                     name: gcs-server