-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] Remove constraints on partial requests #12674
Conversation
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@comaniac The PR is ready for review, except for tests. Please feel free to take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Should be good to go once unit tests are in place.
if num_tokens_scheduled == 0: | ||
# The request was not scheduled in this step. | ||
new_running.append(request) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Later on when making the scheduler to support PP, I plan to introduce another queue like self.scheduled
(or other naming) for the actually list of requests being scheduled. The logic here can also be simplified a bit accordingly.
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
@comaniac I've added some tests. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…2674) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Felix Marty <felmarty@amd.com>
…2674) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…2674) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…2674) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
This PR enhances the scheduler's flexibility by removing constraints on partial request handling.
Key Changes
Performance Considerations
While the scheduler now supports more flexible request handling, frequent changes to scheduled requests can impact performance. The model runner removes unscheduled requests from
InputBatch
and adds them back when scheduled again. Therefore, highly dynamic scheduling patterns may reduce the effectiveness of persistent batching and increase input preparation overhead.