Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to the Project for Support of sqlcoder-7b and sqlcoder2-15b #222

Merged
merged 6 commits into from
Jan 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,14 +290,15 @@

## Contents
- [DB-GPT-Hub: Text-to-SQL parsing with LLMs](#db-gpt-hub-text-to-sql-parsing-with-llms)
- [Baseline](#baseline)
- [Contents](#contents)
- [1. What is DB-GPT-Hub](#1-what-is-db-gpt-hub)
- [2. Fine-tuning Text-to-SQL](#2-fine-tuning-text-to-sql)
- [2.1. Dataset](#21-dataset)
- [2.2. Model](#22-model)
- [3. Usage](#3-usage)
- [3.1. Environment preparation](#31-environment-preparation)
- [3.2. Quick Start](#32-quick-start)
- [3.2 Quick Start](#32-quick-start)
- [3.3. Data preparation](#33-data-preparation)
- [3.4. Model fine-tuning](#34-model-fine-tuning)
- [3.5. Model Predict](#35-model-predict)
Expand Down Expand Up @@ -354,6 +355,9 @@ DB-GPT-Hub currently supports the following base models:
- [x] ChatGLM2
- [x] ChatGLM3
- [x] internlm
- [x] sqlcoder-7b(mistral)
- [x] sqlcoder2-15b(starcoder)




Expand Down Expand Up @@ -522,6 +526,14 @@ deepspeed --num_gpus 2 dbgpt_hub/train/sft_train.py \
--deepspeed dbgpt_hub/configs/ds_config.json \
--quantization_bit 4 \
...
```

if you need order card id
```
deepspeed --include localhost:0,1 dbgpt_hub/train/sft_train.py \
--deepspeed dbgpt_hub/configs/ds_config.json \
--quantization_bit 4 \
...
```

The other parts that are omitted (…) can be kept consistent. If you want to change the default deepseed configuration, go into the `dbgpt_hub/configs` directory and make changes to ds_config.json as needed,the default is stage2.
Expand All @@ -533,17 +545,20 @@ In the script, during fine-tuning, different models correspond to key parameters
| [LLaMA-2](https://huggingface.co/meta-llama) | q_proj,v_proj | llama2 |
| [CodeLlama-2](https://huggingface.co/codellama/) | q_proj,v_proj | llama2 |
| [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | W_pack | baichuan2 |
| [InternLM](https://github.com/InternLM/InternLM) | q_proj,v_proj | intern |
| [Qwen](https://github.com/QwenLM/Qwen-7B) | c_attn | chatml |
| [sqlcoder-7b](https://huggingface.co/defog/sqlcoder-7b) | q_proj,v_proj | mistral |
| [sqlcoder2-15b](https://huggingface.co/defog/sqlcoder2) | c_attn | default |
| [InternLM](https://github.com/InternLM/InternLM) | q_proj,v_proj | intern |
| [XVERSE](https://github.com/xverse-ai/XVERSE-13B) | q_proj,v_proj | xverse |
| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | query_key_value | chatglm2 |
| [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B) | query_key_value | chatglm3 |
| [LLaMA](https://github.com/facebookresearch/llama) | q_proj,v_proj | - |
| [BLOOM](https://huggingface.co/bigscience/bloom) | query_key_value | - |
| [BLOOMZ](https://huggingface.co/bigscience/bloomz) | query_key_value | - |
| [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | W_pack | baichuan |
| [Falcon](https://huggingface.co/tiiuae/falcon-7b) | query_key_value | - |



In `train_sft.sh` , other key parameters are as follows:

> quantization_bit: Indicates whether quantization is applied, with valid values being [4 or 8].
Expand Down Expand Up @@ -609,6 +624,8 @@ The whole process we will divide into three phases:
- [x] ChatGLM2
- [x] ChatGLM3
- [x] internlm
- [x] sqlcoder-7b(mistral)
- [x] sqlcoder2-15b(starcoder)

* Stage 2:
- [x] Optidmize model performance, and support fine-tuning more different models in various ways before `20231010`
Expand Down
33 changes: 25 additions & 8 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,20 +289,21 @@

## Contents
- [DB-GPT-Hub:利用LLMs实现Text-to-SQL](#db-gpt-hub利用llms实现text-to-sql)
- [Baseline](#baseline)
- [Contents](#contents)
- [一、简介](#一简介)
- [二、Text-to-SQL微调](#二text-to-sql微调)
- [2.1、数据集](#21数据集)
- [2.2、基座模型](#22基座模型)
- [三、使用方法](#三使用方法)
- [3.1、环境准备](#31环境准备)
- [3.2、快速开始](#32快速开始)
- [3.3、数据准备](#33数据准备)
- [3.4、模型微调](#34模型微调)
- [3.5、模型预测](#35模型预测)
- [3.6、模型权重](#36模型权重)
- [3.6.1 模型和微调权重合并](#361-模型和微调权重合并)
- [3.7、模型评估](#37模型评估)
- [3.2、数据准备](#32数据准备)
- [3.2 快速开始](#32-快速开始)
- [3.3、模型微调](#33模型微调)
- [3.4、模型预测](#34模型预测)
- [3.5、模型权重](#35模型权重)
- [3.5.1 模型和微调权重合并](#351-模型和微调权重合并)
- [3.6、模型评估](#36模型评估)
- [四、发展路线](#四发展路线)
- [五、贡献](#五贡献)
- [六、感谢](#六感谢)
Expand Down Expand Up @@ -350,6 +351,9 @@ DB-GPT-HUB目前已经支持的base模型有:
- [x] ChatGLM3
- [x] internlm
- [x] Falcon
- [x] sqlcoder-7b(mistral)
- [x] sqlcoder2-15b(starcoder)



模型可以基于quantization_bit为4的量化微调(QLoRA)所需的最低硬件资源,可以参考如下:
Expand Down Expand Up @@ -513,6 +517,14 @@ deepspeed --num_gpus 2 dbgpt_hub/train/sft_train.py \
--quantization_bit 4 \
...
```
如果需要指定对应的显卡id而不是默认的前两个如3,4,可以如下
```
deepspeed --include localhost:3,4 dbgpt_hub/train/sft_train.py \
--deepspeed dbgpt_hub/configs/ds_config.json \
--quantization_bit 4 \
...
```

其他省略(...)的部分均保持一致即可。 如果想要更改默认的deepseed配置,进入 `dbgpt_hub/configs` 目录,在ds_config.json 更改即可,默认为stage2的策略。

脚本中微调时不同模型对应的关键参数lora_target 和 template,如下表:
Expand All @@ -522,8 +534,10 @@ deepspeed --num_gpus 2 dbgpt_hub/train/sft_train.py \
| [LLaMA-2](https://huggingface.co/meta-llama) | q_proj,v_proj | llama2 |
| [CodeLlama-2](https://huggingface.co/codellama/) | q_proj,v_proj | llama2 |
| [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | W_pack | baichuan2 |
| [InternLM](https://github.com/InternLM/InternLM) | q_proj,v_proj | intern |
| [Qwen](https://github.com/QwenLM/Qwen-7B) | c_attn | chatml |
| [sqlcoder-7b](https://huggingface.co/defog/sqlcoder-7b) | q_proj,v_proj | mistral |
| [sqlcoder2-15b](https://huggingface.co/defog/sqlcoder2) | c_attn | default |
| [InternLM](https://github.com/InternLM/InternLM) | q_proj,v_proj | intern |
| [XVERSE](https://github.com/xverse-ai/XVERSE-13B) | q_proj,v_proj | xverse |
| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | query_key_value | chatglm2 |
| [LLaMA](https://github.com/facebookresearch/llama) | q_proj,v_proj | - |
Expand All @@ -532,6 +546,7 @@ deepspeed --num_gpus 2 dbgpt_hub/train/sft_train.py \
| [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | W_pack | baichuan |
| [Falcon](https://huggingface.co/tiiuae/falcon-7b) | query_key_value | - |


`train_sft.sh`中其他关键参数含义:
> quantization_bit:是否量化,取值为[4或者8]
> model_name_or_path: LLM模型的路径
Expand Down Expand Up @@ -593,6 +608,8 @@ poetry run python dbgpt_hub/eval/evaluation.py --plug_value --input Your_model_
- [x] ChatGLM2
- [x] ChatGLM3
- [x] internlm
- [x] sqlcoder-7b(mistral)
- [x] sqlcoder2-15b(starcoder)

* 阶段二:
- [x] 优化模型效果,支持更多不同模型进行不同方式的微调。截止`20231010`,我们已经完成对项目代码的重构,支持更多的模型。
Expand Down
Binary file modified assets/wechat.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions dbgpt_hub/data_process/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,17 @@ def register_template(
use_history=False,
)

r"""
Supports language model for mistral sqlcoder-7b
"""
register_template(
name="mistral",
prefix=["{{system}}"],
prompt=["[INST] {{query}} [/INST]"],
system="",
sep=[],
)


r"""
Default template.
Expand Down
6 changes: 3 additions & 3 deletions dbgpt_hub/scripts/train_sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ train_log="dbgpt_hub/output/logs/train_sft_test_${current_date}.log"
start_time=$(date +%s)
echo " Train Start time: $(date -d @$start_time +'%Y-%m-%d %H:%M:%S')" >>${train_log}

# # zero-shot
# num_shot=0
# default train , zero-shot,
num_shot=0

# one-shot train
num_shot=1
# num_shot=1

dataset="example_text2sql_train"
if [ "$num_shot" -eq 1 ]; then
Expand Down
Loading
Loading