Skip to content

EricLBuehler/candle-lora

Repository files navigation

candle-lora

MIT License Continuous integration Documentation

LoRA (low rank adaptation) implemented in Rust for use with Candle. This technique interchanges the fully-trainable layers of the model with new, LoRA layers. These LoRA layers act as a wrapper over the original layers, but freeze the original layers. Because they contain fewer trainable parameters, LoRA allows for more efficient fine-tuning.

However, using a fine-tuned LoRA model for inference will have a negative impact on performance. This is because the original layer must still be used to calculate the outputs. However, for a LoRA model, an algorithm known as weight merging nullifies the added cost of using the fine-tuned LoRA model by merging the LoRA and original weights. Weights may also be unmerged.

Please see our recent paper X-LoRA. We introduce a MoE inspired method to densely gate LoRA adapters powered by a model self-reflection forward pass. For inference, we have created mistral.rs, which is written in Rust and enables inference of X-LoRA and other models including quantized.

Get started

  1. To install, run the following:
cargo add --git https://github.com/EricLBuehler/candle-lora.git candle-lora candle-lora-macro
  1. To allow candle-lora to swap layers, do the following for each model struct
    • Derive AutoLoraConvert from candle-lora-macro
    • Add the replace_layer_fields attribute macro.
  2. During instantiation of each model struct, call get_lora_model with the appropriate parameters to convert.

Features

  • Convert Linear, Conv1d, Conv2d, Embedding layers into LoRA layers
    • All conversions are implemented in accordance with HuggingFace's official LoRA implementation
  • Weight merging is implemented to improve inference performance
  • Weight unmerging
  • Easy-to-use APIs
  • Extensible trait-based layer swapping mechanism

Conversion Ergonomics

candle-lora-macro makes using candle-lora as simple as adding 2 macros to your model structs and calling a method!

It is inspired by the simplicity of the Python peft library's get_peft_model method. Together, these macros mean that candle-lora can be added to any candle model with minimal code changes!

LoRA transformers

See transformers from Candle which have LoRA integrated here. Currently, the following transformers have been converted:

  • llama
  • mistral
  • falcon
  • bert
  • stable_lm
  • t5
  • dinov2
  • resnet
  • mpt
  • blip
  • starcoder

To use a LoRA transformer, simply replace the model from candle-transformers with its counterpart in candle-lora-transformers!

Saving and loading

candle_lora supports retrieving weights for LoRA adapters via the get_tensors method, defined automatically in #[auto_layer_convert]. This function is meant to be used with candle_core::safetensors::save(). To load, simply load the VarBuilder and pass that to get_lora_model.

candle_lora's weight naming is not compatible with peft yet.

Resources

candle-lora's LoRA conversion implementations are based on HuggingFace's peft library. See the original paper here, as well as Microsoft's implementation.

About

Low rank adaptation (LoRA) for Candle.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages