Skip to content

Commit

Permalink
Update multi-node inference examples
Browse files Browse the repository at this point in the history
  • Loading branch information
Jeffwan committed Feb 17, 2025
1 parent 61fc34a commit 7e30b5b
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 18 deletions.
8 changes: 2 additions & 6 deletions docs/source/features/gateway-plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ In most Kubernetes setups, ``LoadBalancer`` is supported by default. You can ret
LB_IP=$(kubectl get svc/envoy-aibrix-system-aibrix-eg-903790dc -n envoy-gateway-system -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
ENDPOINT="${LB_IP}:80"
The model name, such as ``deepseek-r1-distill-llama-8b``, must match the label ``model.aibrix.ai/name`` in your deployment.
The model name, such as ``deepseek-r1-distill-llama-8b``, must match the label ``model.aibrix.ai/name`` in your deployment.

.. code-block:: bash
Expand All @@ -120,7 +120,7 @@ In most Kubernetes setups, ``LoadBalancer`` is supported by default. You can ret
If vLLM, you can pass in the argument ``--api-key`` or environment variable ``VLLM_API_KEY`` to enable the server to check for API key in the header.
Check `vLLM OpenAI-Compatible Server <https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server>`_ for more details.

After you enable the authentication, you can query model with `-H Authorization: bearer your_key` in this way
After you enable the authentication, you can query model with ``-H Authorization: bearer your_key`` in this way

.. code-block:: bash
:emphasize-lines: 3
Expand Down Expand Up @@ -156,10 +156,6 @@ Below are routing strategies gateway supports
"temperature": 0.7
}'
if you use openai sdk client, you can query the



Rate Limiting
-------------
Expand Down
26 changes: 15 additions & 11 deletions docs/source/features/multi-node-inference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,25 @@ It's similar like Kubernetes core concept ``ReplicaSet`` and ``Deployment``. Mos
Workloads Examples
------------------

.. attention::

Starting from v0.6.6, we've added essential packages to run distributed inference with vLLM official container image distribution out of the box.
If you use earlier versions, you can follow guidance below to build your own image compatible with multi-node inference.


This is the ``RayClusterFleet`` example, you can apply this yaml in your cluster.

.. literalinclude:: ../../../samples/distributed/fleet.yaml
:language: yaml


vLLM Version
^^^^^^^^^^^^

Starting from v0.6.6, we've added essential packages to run distributed inference with vLLM official container image distribution out of the box.

If you are using vLLM earlier version, you have two options.

1. Use our built image ``aibrix/vllm-openai:v0.6.1.post2-distributed``.
2. Build your own image and follow steps here.
* Use our built image ``aibrix/vllm-openai:v0.6.1.post2-distributed``.
* Build your own image and follow steps here.

.. code-block:: Dockerfile
Expand All @@ -62,13 +72,7 @@ If you are using vLLM earlier version, you have two options.
RUN pip3 install ray[default] # important for future healthcheck
ENTRYPOINT [""]
.. code-block:: bash
docker build -t aibrix/vllm-openai:v0.6.1.post2-distributed .
RayClusterReplicaSet
^^^^^^^^^^^^^^^^^^^^

.. literalinclude:: ../../../samples/distributed/multi-host.yaml
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ spec:
spec:
containers:
- name: ray-head
image: aibrix/vllm-openai:v0.6.1.post2-distributed
image: vllm/vllm-openai:v0.7.1
ports:
- containerPort: 6379
name: gcs-server
Expand Down

0 comments on commit 7e30b5b

Please sign in to comment.