Add example for multi-host serving of llama 70B on GPUs #124

ahg-g · 2024-04-29T19:42:34Z

Add an example for running llama2 70B on GPUs using LWS; we can use the model server that comes with llama itself see meta-llama/llama#594

Edwinhr716 · 2024-04-30T17:52:18Z

/assign

ahg-g · 2024-05-05T05:20:39Z

ahg-g added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 29, 2024

liurupeng mentioned this issue Apr 30, 2024

Release v0.3.0 requirements #116

Closed

7 tasks

k8s-ci-robot assigned Edwinhr716 Apr 30, 2024

Edwinhr716 removed their assignment May 2, 2024

gujingit mentioned this issue May 8, 2024

add vllm distributed inference examples #133

Merged

k8s-ci-robot closed this as completed in #133 May 10, 2024

liurupeng mentioned this issue Jun 4, 2024

Release v0.3.0 #159

Closed

20 tasks

Provide feedback