Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Core functionality for Lora Adapaters #679

Closed
wants to merge 72 commits into from

Conversation

yuslepukhin
Copy link
Member

@yuslepukhin yuslepukhin commented Jul 5, 2024

Implement Lora Adapters management.
Add C and C++ APIs

Introduce flatbuffer format for lora parameters exchange.
Add helper scripts to convert .npz files to flatbuffers.
The same scripts modify genai_config.json and adds adapters section.

When the model is loaded, the adapters found within the configuration are automatically loaded.
This is done because the model that is modified for lora parameters consumption should always have corresponding inputs fed into the model. This is not a user's choice.

Add GeneratorParameters public API SetActiveAdapters to specify active adapters for a given instance of the generator.

Add unit tests.

TODO:

  • Other language bindings are pending verification of the initial API.
  • On device cache, memory mapping and other memory optimizations.
  • The feature does not work in Windows ARM64 builds although it does work in Linux ARM64 builds. This may have to do with the lack of well tested ORT ARM64 release.
  • The feature currently does not work with CUDA and DML providers. This has to do with the MatMul kernel implementation in ORT see [CUDA, DML] MatMul does not properly handle matrices with inner dim == 0 onnxruntime#21483. For this reason, testing is not performed for CUDA and DML. Also, CIs are affected because they are not running against the latest of ORT release.

@yuslepukhin yuslepukhin marked this pull request as ready for review July 5, 2024 21:37
@pranavsharma
Copy link
Contributor

Is there anything blocking this PR from getting approved?

@baijumeswani
Copy link
Collaborator

Is there anything blocking this PR from getting approved?

CI pipelines are failing as a result of recent ort change. We have a fix here: #724. Need a quick internal sync and that can be merged. Otherwise, this PR is good to go.

@skyline75489
Copy link
Contributor

@yuslepukhin The CUDA version was upgraded to v12 in #734. Could you please rebase and see if there's CI breakage? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants