Implement Core functionality for Lora Adapaters #679

yuslepukhin · 2024-07-05T20:44:59Z

Implement Lora Adapters management.
Add C and C++ APIs

Introduce flatbuffer format for lora parameters exchange.
Add helper scripts to convert .npz files to flatbuffers.
The same scripts modify genai_config.json and adds adapters section.

When the model is loaded, the adapters found within the configuration are automatically loaded.
This is done because the model that is modified for lora parameters consumption should always have corresponding inputs fed into the model. This is not a user's choice.

Add GeneratorParameters public API SetActiveAdapters to specify active adapters for a given instance of the generator.

Add unit tests.

TODO:

Other language bindings are pending verification of the initial API.
On device cache, memory mapping and other memory optimizations.
The feature does not work in Windows ARM64 builds although it does work in Linux ARM64 builds. This may have to do with the lack of well tested ORT ARM64 release.
The feature currently does not work with CUDA and DML providers. This has to do with the MatMul kernel implementation in ORT see [CUDA, DML] MatMul does not properly handle matrices with inner dim == 0 onnxruntime#21483. For this reason, testing is not performed for CUDA and DML. Also, CIs are affected because they are not running against the latest of ORT release.

…com/microsoft/onnxruntime-genai into yuslepukhin/implement_lora_adapters

- Make inputs returned ordered. - Create StaticBuffers inside adapters so they are shared.

…com/microsoft/onnxruntime-genai into yuslepukhin/implement_lora_adapters

.clang-format

src/ort_genai_c.h

cmake/genai_flatbuffers.cmake

test/python/test_onnxruntime_genai_api.py

test/test_models/two_lora_params.npz

tools/python/convert_lora_and_add_config.py

src/flatbuffers/schema/genai_lora.fbs

src/flatbuffers/flatbuffers_utils.h

src/flatbuffers/schema/README.md

CMakeLists.txt

cmake/package.cmake

src/flatbuffers.h

src/ort_genai_c.h

tools/python/convert_lora_and_add_config.py

test/test_models/two_lora_params.py

test/CMakeLists.txt

src/python/python.cpp

src/python/CMakeLists.txt

… API docs

src/python/convert_lora_and_add_config.py

pranavsharma · 2024-07-25T22:00:28Z

Is there anything blocking this PR from getting approved?

baijumeswani · 2024-07-25T22:02:57Z

Is there anything blocking this PR from getting approved?

CI pipelines are failing as a result of recent ort change. We have a fix here: #724. Need a quick internal sync and that can be merged. Otherwise, this PR is good to go.

skyline75489 · 2024-08-05T03:20:14Z

@yuslepukhin The CUDA version was upgraded to v12 in #734. Could you please rebase and see if there's CI breakage? Thanks

yuslepukhin added 22 commits June 26, 2024 15:33

Lora begins

28e86c6

Rework span

52fa32e

Lora begins

fb0670e

Start public API

beb630f

Merge branch 'yuslepukhin/implement_lora_adapters' of https://github.…

d006037

…com/microsoft/onnxruntime-genai into yuslepukhin/implement_lora_adapters

Add param

5ad527b

Add C++ API

22d65cb

Add LoraManagement unit tests

94e2c83

Merge branch 'main' into yuslepukhin/implement_lora_adapters

ad00a14

More tests

0a712a9

Add DeactiveAdapters

11dc3f2

Add C and c++ API test

7a98784

Adjust ExtraInputs.

db9b5fa

- Make inputs returned ordered. - Create StaticBuffers inside adapters so they are shared.

Merge branch 'main' into yuslepukhin/implement_lora_adapters

ed29925

Add thread-safety, setup for potential caching

dbdc369

Added device copy to LoraAdapter

fa42f95

Add input addition verification

a104b20

Introduce utilities

7348901

Merge branch 'yuslepukhin/implement_lora_adapters' of https://github.…

9164a96

…com/microsoft/onnxruntime-genai into yuslepukhin/implement_lora_adapters

Add automatic span construction from an array

be1e6b0

Merge branch 'main' into yuslepukhin/implement_lora_adapters

754d3e7

Fix warnings

52b8aa5

yuslepukhin requested review from PatriceVignola, baijumeswani, yufenglee and pranavsharma July 5, 2024 20:44

Code issues

1247461

yuslepukhin marked this pull request as ready for review July 5, 2024 21:37

PatriceVignola reviewed Jul 5, 2024

View reviewed changes

.clang-format Outdated Show resolved Hide resolved

yufenglee reviewed Jul 8, 2024

View reviewed changes

src/ort_genai_c.h Outdated Show resolved Hide resolved

yuslepukhin added 3 commits July 18, 2024 17:03

Remove redundant methods

287693f

Run test coverage, remove some dead code. Cover base case.

f8de77e

Add missing checks

79bad70

baijumeswani reviewed Jul 19, 2024

View reviewed changes

yuslepukhin added 15 commits July 19, 2024 15:57

Merge branch 'main' into yuslepukhin/implement_lora_adapters

08059ff

Address review comments

6a6cc3b

Address build issues, refresh the test model

e0088e3

Adjust file paths

4252975

Make it work end to end

8814b47

Make FlatBuffers linkage public

175ea54

Move new test subfolder

0bc4e3f

Add model

a77ab66

Add fp16 model

ee6dcb9

Adjust src

67bebd2

Adjust for ARM

57d4ef1

Create separate model copy and config to run on DML

2318427

Disable DML

b902b70

Merge branch 'main' into yuslepukhin/implement_lora_adapters

33a3ec2

Clang format

5dc65d7

baijumeswani reviewed Jul 25, 2024

View reviewed changes

yuslepukhin added 3 commits July 25, 2024 11:32

Address python related comments, correct faulty formatting for public…

c765a16

… API docs

Remove redundant linkage and includes

36f569c

Rename python interface

3526f91

baijumeswani reviewed Jul 25, 2024

View reviewed changes

src/python/convert_lora_and_add_config.py Outdated Show resolved Hide resolved

Correct function name

b936657

baijumeswani approved these changes Jul 25, 2024

View reviewed changes

baijumeswani closed this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Core functionality for Lora Adapaters #679

Implement Core functionality for Lora Adapaters #679

yuslepukhin commented Jul 5, 2024 •

edited

Loading

pranavsharma commented Jul 25, 2024

baijumeswani commented Jul 25, 2024

skyline75489 commented Aug 5, 2024

Implement Core functionality for Lora Adapaters #679

Implement Core functionality for Lora Adapaters #679

Conversation

yuslepukhin commented Jul 5, 2024 • edited Loading

pranavsharma commented Jul 25, 2024

baijumeswani commented Jul 25, 2024

skyline75489 commented Aug 5, 2024

yuslepukhin commented Jul 5, 2024 •

edited

Loading