This repository contains resources and guides from the blog series "Natural Language to SQL: Fine-Tuning CodeLLama with Amazon SageMaker". The first part introduces the concept of NL2SQL, the role of large language models (LLMs) like CodeLLama in this domain, and practical strategies for fine-tuning these models using Quantized Low-Rank Adaptation (QLoRA) on Amazon SageMaker.
-
Part 1: provides a comprehensive guide on setting up the development environment, loading and preparing the dataset, the fine-tuning process of CodeLLama using QLoRA, and deployment strategies on Amazon SageMaker.
- Setting up the development environment
- Loading and preparing the dataset
- Fine-tuning process
- Deployment strategies
-
Part 2: Fine-tune LLama3 with Unsloth on Google Colab
-
Part 3 & 4: (Upcoming) Further exploration and advanced topics in fine-tuning and deployment and synthetic data generation.
Contributions, ideas, and discussions are welcome. For major changes, please open an issue first to discuss what you would like to change.
Code and notebooks are licensed under the MIT licence.
All derived work based on SPIDER Dataset CC BY-SA 4.0 licence.
Stay tuned for the upcoming parts of the series, where we will explore more on the optimization and deployment of NL2SQL models.