-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Overhaul async request cancellation #7111
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
Nice |
…borts # Conflicts: # vllm/entrypoints/api_server.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry for keeping you waiting.
vllm-project#7111 made a change to the merge_async_iterators utils function to add an is_cancelled arg. It would be good for this new arg to be optional to retain backwards compatibility for other server front-ends that might already be using this utility function.
vllm-project#7111 made a change to the merge_async_iterators utils function to add an is_cancelled arg. It would be good for this new arg to be optional to retain backwards compatibility for other server front-ends that might already be using this utility function.
Follow-on from vllm-project#7111, avoid unecessary enqueuing a final message after an exception and avoid aborting requests in the engine that were never started.
Signed-off-by: Alvant <alvasian@yandex.ru>
There are a number of problems currently with how request cancellation works upon client disconnection in the openai api server front-end and
AsyncLLMEngine
:This is a problem for production resilience and has a compounding affect when the server is overloaded and client requests time-out, with the server continuing to do useless work.
This PR reworks how the cancellation is propagated to make it more robust and consistent:
asyncio.aclose()
) rather than async iterators in most casesmerge_async_iterators
to encapsulate polling for disconnection, even before any results have been produced (while the request is queued)iterate_with_cancellation
function used for the single-prompt request casesAsyncLLMEngine
differentiate between cancelled requests and those that finish normally (generator returned from thegenerate()
will now raise aCancelledError
in the former case)I also plan to add some new tests to cover these various cases.