[WIP] V1 LoRA support #10579

varun-sundar-rabindranath · 2024-11-22T17:36:33Z

V1 LoRA support

TODOs:

Cleanup
Unit Tests for Request Batch
Account for LoRA in profile_run and cuda graphs
Profile + Optimize (V1 LoRA is slow)
LoRA Mixin adapters for add/remove/pin methods
Changes for Prefix Caching

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

github-actions · 2024-11-22T17:36:47Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2024-11-22T17:37:09Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @varun-sundar-rabindranath.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

WoosukKwon

Thanks for the PR! Will review this.

varun-sundar-rabindranath · 2024-11-22T18:13:28Z

benchmarks/benchmark_lora_throughput.py

@@ -0,0 +1,506 @@
+"""Benchmark offline inference throughput."""


Using this for testing purposes. I dont intend to land this .

varun-sundar-rabindranath · 2024-11-22T18:15:35Z

Added a design image for the changes in the PR and some TODOs. Please consider this an initial design.

varun-sundar-rabindranath · 2024-12-06T17:42:28Z

close in favor of #10957

V1 lora support

191afc8

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

varun-sundar-rabindranath requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners November 22, 2024 17:36

varun-sundar-rabindranath marked this pull request as draft November 22, 2024 17:36

mergify bot added the needs-rebase label Nov 22, 2024

revert changes to v0 model runner

3cdc02d

WoosukKwon requested changes Nov 22, 2024

View reviewed changes

varun-sundar-rabindranath commented Nov 22, 2024

View reviewed changes

Varun Sundar Rabindranath added 7 commits November 22, 2024 14:26

remove request_id_to_index()

947b035

get the correct request batch

5b89e78

update from resumed request data

be834a3

remove comments

1e8510a

remove unused batchinputs

a51254d

maintain lora requests in LoRARequestBatch

1fdcc72

remove batchinputs

0a70083

varun-sundar-rabindranath closed this Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] V1 LoRA support #10579

[WIP] V1 LoRA support #10579

varun-sundar-rabindranath commented Nov 22, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 22, 2024

mergify bot commented Nov 22, 2024

WoosukKwon left a comment

varun-sundar-rabindranath Nov 22, 2024

varun-sundar-rabindranath commented Nov 22, 2024

varun-sundar-rabindranath commented Dec 6, 2024

		@@ -0,0 +1,506 @@
		"""Benchmark offline inference throughput."""

[WIP] V1 LoRA support #10579

[WIP] V1 LoRA support #10579

Conversation

varun-sundar-rabindranath commented Nov 22, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 22, 2024

mergify bot commented Nov 22, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

varun-sundar-rabindranath Nov 22, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath commented Nov 22, 2024

varun-sundar-rabindranath commented Dec 6, 2024

varun-sundar-rabindranath commented Nov 22, 2024 •

edited by github-actions bot

Loading