[Core] Use zeromq to put request output tokens back to the api server #28

s5u13b · 2024-09-04T05:26:44Z

Previously, we use ray.queue as the request output queue to store the streaming request output tokens returned by llm engine. However, we found that the cost of ray rpc will become very high when the frequency of remote call is high.

Because the request output queue of a api server could be written by multiple llm engines due to dynamic serving of Llumnix, it means that the frequency of streaming output rpc can be very high. So we decide to use zeromq as the default rpc way of putting streaming request output tokens back to the request output queue.

Besides, rpc(tcp://...) has no more obvious performance interference with step than ipc(ipc://...).

fix fix pylint update fix add todo add todo fix

llumnix/backends/vllm/llm_engine.py

llumnix/rpc/queue_client.py

llumnix/backends/vllm/llm_engine.py

tests/backends/vllm/test_migration.py

llumnix/backends/vllm/scheduler.py

llumnix/entrypoints/vllm/api_server.py

…eromq

llumnix/rpc/queue_client.py

llumnix/entrypoints/vllm/api_server.py

ZeldaHuang and others added 17 commits August 30, 2024 09:37

refactor

3c92ed6

fix fix pylint update fix add todo add todo fix

update

59f3a63

fix

41fe8b2

remove todo

86050fc

update

036cd3d

add todo

69284ed

update

9a7d551

First version of zeromq queue

cb344fc

Merge branch 'main' into misc

5e2af51

Fix errors

7bd7896

Fix await

de5f30f

Fixing request leaking bug

193c966

Fix request output leaking bug

b2608ea

pylint

685dcc8

Fix

d5345c9

Fix pytest of engine and scheduler

725c9da

Merge branch 'fix-request-output-leaking' into zeromq

071f27c

s5u13b requested review from zhypku, KuilongCui and ZeldaHuang September 4, 2024 05:26

Minors

8ac92cb

ZeldaHuang reviewed Sep 4, 2024

View reviewed changes

llumnix/backends/vllm/llm_engine.py Outdated Show resolved Hide resolved

ZeldaHuang reviewed Sep 4, 2024

View reviewed changes

llumnix/rpc/queue_client.py Outdated Show resolved Hide resolved

zhypku reviewed Sep 4, 2024

View reviewed changes

s5u13b added 6 commits September 4, 2024 10:50

Fix

94b322b

Testing

45d623a

Testing

bb7be5a

Clean codes

2814da0

pylint

2bab685

Add and fix unittest of queue

081e9b6

s5u13b changed the title ~~[Core] Use zeromq to put request outputs back to the api server~~ [Core] Use zeromq to put request output tokens back to the api server Sep 5, 2024

s5u13b added 9 commits September 5, 2024 17:42

Merge branch 'main' into zeromq

6f4c133

Merge branch 'main' into zeromq

4ba5d02

Merge branch 'zeromq' of https://github.com/AlibabaPAI/llumnix into z…

b346046

…eromq

pylint

7a65902

Merge branch 'main' into zeromq

7e83900

pylint

937de14

pylint

ce82ba8

pylint

7fbbead

Rename AsyncActor

cd4f5b0

KuilongCui reviewed Sep 5, 2024

View reviewed changes

llumnix/rpc/queue_client.py Show resolved Hide resolved

s5u13b added 6 commits September 5, 2024 11:12

Fix default manager args

dfdfb22

Use thread to asynchronous put request outputs to servers

628540c

Add TODO

85ef719

Minors

b095815

Merge branch 'main' into zeromq

2d61672

Remove annotation

5a4308b

ZeldaHuang approved these changes Sep 10, 2024

View reviewed changes

KuilongCui reviewed Sep 10, 2024

View reviewed changes

llumnix/entrypoints/vllm/api_server.py Outdated Show resolved Hide resolved

Fix request output queue arg

f95c042

zhypku approved these changes Sep 10, 2024

View reviewed changes

s5u13b merged commit c6ac5db into main Sep 10, 2024
4 checks passed

s5u13b deleted the zeromq branch October 17, 2024 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Use zeromq to put request output tokens back to the api server #28

[Core] Use zeromq to put request output tokens back to the api server #28

s5u13b commented Sep 4, 2024 •

edited

Loading

[Core] Use zeromq to put request output tokens back to the api server #28

[Core] Use zeromq to put request output tokens back to the api server #28

Conversation

s5u13b commented Sep 4, 2024 • edited Loading

s5u13b commented Sep 4, 2024 •

edited

Loading