-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Core functionality for Lora Adapaters #679
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…com/microsoft/onnxruntime-genai into yuslepukhin/implement_lora_adapters
- Make inputs returned ordered. - Create StaticBuffers inside adapters so they are shared.
…com/microsoft/onnxruntime-genai into yuslepukhin/implement_lora_adapters
yufenglee
reviewed
Jul 8, 2024
Is there anything blocking this PR from getting approved? |
CI pipelines are failing as a result of recent ort change. We have a fix here: #724. Need a quick internal sync and that can be merged. Otherwise, this PR is good to go. |
baijumeswani
approved these changes
Jul 25, 2024
@yuslepukhin The CUDA version was upgraded to v12 in #734. Could you please rebase and see if there's CI breakage? Thanks |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement Lora Adapters management.
Add C and C++ APIs
Introduce flatbuffer format for lora parameters exchange.
Add helper scripts to convert .npz files to flatbuffers.
The same scripts modify genai_config.json and adds adapters section.
When the model is loaded, the adapters found within the configuration are automatically loaded.
This is done because the model that is modified for lora parameters consumption should always have corresponding inputs fed into the model. This is not a user's choice.
Add GeneratorParameters public API
SetActiveAdapters
to specify active adapters for a given instance of the generator.Add unit tests.
TODO: