In this notebook, we will finetune Llama-3.1-8B with Alpaca Financial Dataset.
LoRA is a technique designed to make fine-tuning more efficient by focusing on reducing the number of trainable parameters involved. This approach speeds up the fine-tuning process and creates smaller fine-tuned checkpoints.
Rather than adjusting all the model's weights during fine-tuning, LoRA freezes most layers and selectively trains only a few within the attention mechanisms. Instead of directly modifying the weights of these layers, LoRA introduces two smaller matrices that are combined and added to the original weights. These smaller matrices are the only parts updated during fine-tuning and are saved separately. This method preserves the model's original parameters, allowing the LoRA weights to integrate later through an adaptation process seamlessly. Unloading the LoRA adapter and revert back to the original base model is also possible.
For more information about LoRA, please refer to this paper.