Merge commit '6542c5455424f25e1d443682d78b1f1d51201001' into fix/app-…

…ui-1226 * commit '6542c5455424f25e1d443682d78b1f1d51201001': fix alpaca (modelscope#2771) support modern_bert & support bert deploy (modelscope#2767) fix app-ui (modelscope#2765) fix shell (modelscope#2764) fix bugs (modelscope#2761) fix web-ui (modelscope#2758) support SequenceClassification & update QVQ-72B-Preview (modelscope#2747) fix docs multimodal; fix pretrain mllm (modelscope#2742) Fix windows encoding gbk (modelscope#2741) support AI-ModelScope/Skywork-o1-Open-Llama-3.1-8B (modelscope#2739) support mm llamapro (modelscope#2738) fix windows (modelscope#2733) support paligemma2 (modelscope#2735) # Conflicts: # swift/ui/llm_infer/llm_infer.py
tastelikefeet · Dec 26, 2024 · 8444cac · 8444cac
2 parents d89afe1 + 6542c54
commit 8444cac
Show file tree

Hide file tree

Showing 156 changed files with 1,099 additions and 584 deletions.
diff --git a/README.md b/README.md
@@ -55,13 +55,13 @@ You can contact us and communicate with us by adding our group:
 
 
 ## 📝 Introduction
-🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of over 400 large models and 100+ multi-modal large models. These large language models (LLMs) include models such as Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, DeepSeek, Baichuan2, Gemma2, and TeleChat2. The multi-modal LLMs include models such as Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
+🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 400+ large models and 150+ multi-modal large models. These large language models (LLMs) include models such as Qwen2.5, Llama3.3, GLM4, Internlm2.5, Yi1.5, Mistral, DeepSeek2.5, Baichuan2, Gemma2, and TeleChat2. The multi-modal LLMs include models such as Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
 
-🍔 In addition, ms-swift gathers the latest training technologies, including LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger. ms-swift supports accelerating the inference, evaluation, and deployment modules using vLLM and LMDeploy. To help researchers and developers fine-tune and apply large models more easily, ms-swift also provides a Gradio-based Web-UI interface and a wealth of best practices.
+🍔 In addition, ms-swift gathers the latest training technologies, including LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM and LMDeploy, and supports the quantization of large models and multi-modal large models using technologies such as GPTQ, AWQ, and BNB. To help researchers and developers fine-tune and apply large models more easily, ms-swift also provides a Gradio-based Web-UI interface and a wealth of best practices.
 
 **Why choose ms-swift?**
 
-- 🍎 **Model Types**: Supports 400+ large language models and **100+ multi-modal large models** and all-to-all models, **providing a comprehensive solution from training to deployment**.
+- 🍎 **Model Types**: Supports 400+ large language models and **150+ multi-modal large models** and all-to-all models, **providing a comprehensive solution from training to deployment**.
 - **Dataset Types**: Comes with 150+ pre-training, fine-tuning, human alignment, multi-modal datasets, and supports custom datasets.
 - **Hardware Support**: Compatible with CPU, RTX series, T4/V100, A10/A100/H100, Ascend NPU, etc.
 - 🍊 **Lightweight Training**: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel.
@@ -114,9 +114,9 @@ CUDA_VISIBLE_DEVICES=0 \
 swift sft \
     --model Qwen/Qwen2.5-7B-Instruct \
     --train_type lora \
-    --dataset AI-ModelScope/alpaca-gpt4-data-zh#500 \
-              AI-ModelScope/alpaca-gpt4-data-en#500 \
-              swift/self-cognition#500 \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
+              'AI-ModelScope/alpaca-gpt4-data-en#500' \
+              'swift/self-cognition#500' \
     --torch_dtype bfloat16 \
     --num_train_epochs 1 \
     --per_device_train_batch_size 1 \
@@ -146,7 +146,9 @@ After training is complete, use the following command to perform inference with
 CUDA_VISIBLE_DEVICES=0 \
 swift infer \
     --adapters output/vx-xxx/checkpoint-xxx \
-    --stream true
+    --stream true \
+    --temperature 0 \
+    --max_new_tokens 2048
 
 # merge-lora and use vLLM for inference acceleration
 CUDA_VISIBLE_DEVICES=0 \
@@ -155,7 +157,9 @@ swift infer \
     --stream true \
     --merge_lora true \
     --infer_backend vllm \
-    --max_model_len 8192
+    --max_model_len 8192 \
+    --temperature 0 \
+    --max_new_tokens 2048
 ```
 
 ### Web-UI
@@ -262,15 +266,17 @@ CUDA_VISIBLE_DEVICES=0 swift rlhf \
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --model Qwen/Qwen2.5-7B-Instruct \
     --stream true \
-    --infer_backend pt
+    --infer_backend pt \
+    --max_new_tokens 2048
 
 # LoRA
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --model Qwen/Qwen2.5-7B-Instruct \
     --adapters swift/test_lora \
     --stream true \
     --infer_backend pt \
-    --temperature 0
+    --temperature 0 \
+    --max_new_tokens 2048
 ```
 
 ### Deployment

diff --git a/README_CN.md b/README_CN.md
@@ -53,12 +53,12 @@
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 📝 简介
-🍲 ms-swift是魔搭社区提供的大模型与多模态大模型微调部署框架，现已支持400+大模型与100+多模态大模型的训练（预训练、微调、人类对齐）、推理、评测、量化与部署。其中LLM包括：Qwen2.5、Llama3.2、GLM4、Internlm2.5、Yi1.5、Mistral、DeepSeek、Baichuan2、Gemma2、TeleChat2等模型，多模态LLM包括：Qwen2-VL、Qwen2-Audio、Llama3.2-Vision、Llava、InternVL2.5、MiniCPM-V-2.6、GLM4v、Xcomposer2.5、Yi-VL、DeepSeek-VL2、Phi3.5-Vision、GOT-OCR2等模型。
+🍲 ms-swift是魔搭社区提供的大模型与多模态大模型微调部署框架，现已支持450+大模型与150+多模态大模型的训练（预训练、微调、人类对齐）、推理、评测、量化与部署。其中大模型包括：Qwen2.5、Llama3.3、GLM4、Internlm2.5、Yi1.5、Mistral、DeepSeek2.5、Baichuan2、Gemma2、TeleChat2等模型，多模态大模型包括：Qwen2-VL、Qwen2-Audio、Llama3.2-Vision、Llava、InternVL2.5、MiniCPM-V-2.6、GLM4v、Xcomposer2.5、Yi-VL、DeepSeek-VL2、Phi3.5-Vision、GOT-OCR2等模型。
 
-🍔 除此之外，ms-swift汇集了最新的训练技术，包括LoRA、QLoRA、Llama-Pro、LongLoRA、GaLore、Q-GaLore、LoRA+、LISA、DoRA、FourierFt、ReFT、UnSloth、和Liger等。ms-swift支持使用vLLM和LMDeploy对推理、评测和部署模块进行加速。为了帮助研究者和开发者更轻松地微调和应用大模型，ms-swift还提供了基于Gradio的Web-UI界面及丰富的最佳实践。
+🍔 除此之外，ms-swift汇集了最新的训练技术，包括LoRA、QLoRA、Llama-Pro、LongLoRA、GaLore、Q-GaLore、LoRA+、LISA、DoRA、FourierFt、ReFT、UnSloth、和Liger等。ms-swift支持使用vLLM和LMDeploy对推理、评测和部署模块进行加速，并支持使用GPTQ、AWQ、BNB等技术对大模型和多模态大模型进行量化。为了帮助研究者和开发者更轻松地微调和应用大模型，ms-swift还提供了基于Gradio的Web-UI界面及丰富的最佳实践。
 
 **为什么选择ms-swift？**
-- 🍎 **模型类型**：支持400+纯文本大模型、**100+多模态大模型**，All-to-All全模态模型的**训练到部署全流程**。
+- 🍎 **模型类型**：支持400+纯文本大模型、**150+多模态大模型**，All-to-All全模态模型的**训练到部署全流程**。
 - **数据集类型**：内置150+预训练、微调、人类对齐、多模态等各种类型的数据集，并支持自定义数据集。
 - **硬件支持**：CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU等。
 - 🍊 **轻量训练**：支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。
@@ -107,9 +107,9 @@ CUDA_VISIBLE_DEVICES=0 \
 swift sft \
     --model Qwen/Qwen2.5-7B-Instruct \
     --train_type lora \
-    --dataset AI-ModelScope/alpaca-gpt4-data-zh#500 \
-              AI-ModelScope/alpaca-gpt4-data-en#500 \
-              swift/self-cognition#500 \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
+              'AI-ModelScope/alpaca-gpt4-data-en#500' \
+              'swift/self-cognition#500' \
     --torch_dtype bfloat16 \
     --num_train_epochs 1 \
     --per_device_train_batch_size 1 \
@@ -139,7 +139,9 @@ swift sft \
 CUDA_VISIBLE_DEVICES=0 \
 swift infer \
     --adapters output/vx-xxx/checkpoint-xxx \
-    --stream true
+    --stream true \
+    --temperature 0 \
+    --max_new_tokens 2048
 
 # merge-lora并使用vLLM进行推理加速
 CUDA_VISIBLE_DEVICES=0 \
@@ -148,7 +150,9 @@ swift infer \
     --stream true \
     --merge_lora true \
     --infer_backend vllm \
-    --max_model_len 8192
+    --max_model_len 8192 \
+    --temperature 0 \
+    --max_new_tokens 2048
 ```
 
 ### Web-UI
@@ -254,15 +258,17 @@ CUDA_VISIBLE_DEVICES=0 swift rlhf \
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --model Qwen/Qwen2.5-7B-Instruct \
     --stream true \
-    --infer_backend pt
+    --infer_backend pt \
+    --max_new_tokens 2048
 
 # LoRA
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --model Qwen/Qwen2.5-7B-Instruct \
     --adapters swift/test_lora \
     --stream true \
     --infer_backend pt \
-    --temperature 0
+    --temperature 0 \
+    --max_new_tokens 2048
 ```
 
 ### 部署

diff --git a/docs/source/Customization/插件化.md b/docs/source/Customization/插件化.md
@@ -8,11 +8,6 @@ example在[这里](https://github.com/modelscope/swift/blob/main/swift/plugin/ca
 
 callback会在trainer构造前注册进trainer中，example中给出了一个简单版本的EarlyStop方案。
 
-## 定制化trainer
-
-example在[这里](https://github.com/modelscope/swift/blob/main/swift/plugin/custom_trainer.py).
-
-用户可以在这里继承现有trainer，并实现自己的训练逻辑，例如定制data_loader、定制compute_loss等。example中给出了一个text-classification任务的trainer。
 
 ## 定制化loss
 

diff --git a/docs/source/Customization/自定义数据集.md b/docs/source/Customization/自定义数据集.md
@@ -34,7 +34,7 @@ query-response格式：
 
 ## 推荐数据集格式
 
-以下给出ms-swift的推荐数据集格式：
+以下给出ms-swift的推荐数据集格式，其中system字段是可选的，默认使用template中定义的`default_system`。
 
 ### 预训练
 
@@ -69,11 +69,24 @@ query-response格式：
 
 ### 多模态
 
-对于多模态数据集，和上述任务的格式相同。区别在于增加了`images`, `videos`, `audios`几个key，分别代表多模态资源：
+对于多模态数据集，和上述任务的格式相同。区别在于增加了`images`, `videos`, `audios`几个key，分别代表多模态资源，`<image>` `<video>` `<audio>`标签代表了插入图片/视频/音频的位置。下面给出的四条示例分别展示了纯文本，以及包含图像、视频和音频数据的数据格式。
+
+预训练：
+```
+{"messages": [{"role": "assistant", "content": "预训练的文本在这里"}]}
+{"messages": [{"role": "assistant", "content": "<image>是一只小狗，<image>是一只小猫"}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
+{"messages": [{"role": "assistant", "content": "<audio>描述了今天天气真不错"}], "audios": ["/xxx/x.wav"]}
+{"messages": [{"role": "assistant", "content": "<image>是一个大象，<video>是一只狮子在跑步"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
+```
+
+微调：
 ```jsonl
-{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "<image>图片中是什么,<video>视频中是什么"}, {"role": "assistant", "content": "一个大象，一个狮子"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
+{"messages": [{"role": "user", "content": "浙江的省会在哪？"}, {"role": "assistant", "content": "浙江的省会在杭州。"}]}
+{"messages": [{"role": "user", "content": "<image><image>两张图片有什么区别"}, {"role": "assistant", "content": "前一张是小猫，后一张是小狗"}], "images": ["/xxx/x.jpg", "xxx/x.png"]}
+{"messages": [{"role": "user", "content": "<audio>语音说了什么"}, {"role": "assistant", "content": "今天天气真好呀"}], "audios": ["/xxx/x.mp3"]}
+{"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "<image>图片中是什么，<video>视频中是什么"}, {"role": "assistant", "content": "图片中是一个大象，视频中是一只小狗在草地上奔跑"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
 ```
-其中`<image>` `<video>` `<audio>`标签代表了插入图片/视频/音频的位置。
+RLHF的数据格式可以参考纯文本大模型的格式。
 
 #### grounding
 

diff --git a/docs/source/GetStarted/快速开始.md b/docs/source/GetStarted/快速开始.md
@@ -1,8 +1,8 @@
 # 快速开始
 
-ms-swift是魔搭社区提供的大模型与多模态大模型训练部署框架，现已支持400+大模型与100+多模态大模型的训练（预训练、微调、人类对齐）、推理、评测、量化与部署。模型开发者可以在ms-swift框架中一站式完成围绕大模型的各类需求。目前ms-swift的主要能力包含：
+ms-swift是魔搭社区提供的大模型与多模态大模型训练部署框架，现已支持400+大模型与150+多模态大模型的训练（预训练、微调、人类对齐）、推理、评测、量化与部署。模型开发者可以在ms-swift框架中一站式完成围绕大模型的各类需求。目前ms-swift的主要能力包含：
 
-- 🍎 模型类型：支持400+纯文本大模型、100+多模态大模型，All-to-All全模态模型的训练到部署全流程。
+- 🍎 模型类型：支持400+纯文本大模型、150+多模态大模型，All-to-All全模态模型的训练到部署全流程。
 - 数据集类型：内置150+预训练、微调、人类对齐、多模态等各种类型的数据集，并支持自定义数据集。
 - 硬件支持：CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU等。
 - 🍊 轻量训练：支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。
@@ -31,9 +31,9 @@ CUDA_VISIBLE_DEVICES=0 \
 swift sft \
     --model Qwen/Qwen2.5-7B-Instruct \
     --train_type lora \
-    --dataset AI-ModelScope/alpaca-gpt4-data-zh#500 \
-              AI-ModelScope/alpaca-gpt4-data-en#500 \
-              swift/self-cognition#500 \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
+              'AI-ModelScope/alpaca-gpt4-data-en#500' \
+              'swift/self-cognition#500' \
     --torch_dtype bfloat16 \
     --num_train_epochs 1 \
     --per_device_train_batch_size 1 \
@@ -63,7 +63,9 @@ swift sft \
 CUDA_VISIBLE_DEVICES=0 \
 swift infer \
     --adapters output/vx-xxx/checkpoint-xxx \
-    --stream true
+    --stream true \
+    --temperature 0 \
+    --max_new_tokens 2048
 
 # merge-lora并使用vLLM进行推理加速
 CUDA_VISIBLE_DEVICES=0 \
@@ -72,7 +74,9 @@ swift infer \
     --stream true \
     --merge_lora true \
     --infer_backend vllm \
-    --max_model_len 8192
+    --max_model_len 8192 \
+    --temperature 0 \
+    --max_new_tokens 2048
 ```
 
 > [!TIP]

diff --git a/docs/source/Instruction/ReleaseNote3.0.md b/docs/source/Instruction/ReleaseNote3.0.md
@@ -18,7 +18,6 @@
     - 采用messages格式作为入参接口
 4. 支持了plugin机制，用于定制训练过程，目前支持的plugin有：
     - callback 定制训练回调方法
-    - custom_trainer 定制trainer
     - loss 定制loss方法
     - loss_scale 定制每个token的权重
     - metric 定制交叉验证的指标

diff --git a/docs/source/Instruction/命令行参数.md b/docs/source/Instruction/命令行参数.md
@@ -16,12 +16,13 @@
 - custom_register_path: 自定义模型、对话模板和数据集注册的`.py`文件路径
 
 ### 模型参数
-
 - 🔥model: 模型id或模型本地路径。如果是自定义模型请配合`model_type`和`template`使用，具体可以参考[自定义模型](../Customization/自定义模型.md)
 - model_type: 模型类型。相同的模型架构、template、模型加载过程被定义为一个model_type
 - model_revision: 模型版本
+- task_type: 默认为'causal_lm'. 可选为'causal_lm', 'seq_cls'. 例子可以查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls).
 - 🔥torch_dtype: 模型权重的数据类型，支持`float16`,`bfloat16`,`float32`，默认从config文件中读取
 - attn_impl: attention类型，支持`flash_attn`, `sdpa`, `eager`，默认使用sdpa
+- num_labels: 分类模型需要指定。代表标签数量，默认为None
 - rope_scaling: rope类型，支持`linear`和`dynamic`，请配合`max_length`共同使用
 - device_map: 模型使用的device_map配置，例如：'auto'、'cpu'、json字符串、json文件路径
 - local_repo_path: 部分模型在加载时依赖于github repo. 为了避免`git clone`时遇到网络问题, 可以直接使用本地repo. 该参数需要传入本地repo的路径, 默认为`None`
@@ -290,7 +291,6 @@ Vera使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
 - resume_only_model: 如果resume_from_checkpoint，仅resume模型权重，默认为False
 - check_model: 检查本地模型文件有损坏或修改并给出提示，默认为True。如果是断网环境，请设置为False
 - loss_type: loss类型，默认使用模型自带损失函数
-- num_labels: 分类模型需要指定。代表标签数量，默认为None
 
 - packing: 是否使用packing，默认为False
 - 🔥lazy_tokenize: 是否使用lazy_tokenize，在LLM训练中默认False，MLLM训练中默认True
@@ -365,7 +365,7 @@ RLHF参数继承于[训练参数](#训练参数)
 - 🔥output_dir: 导出结果存储路径，默认为None
 
 - 🔥quant_method: 可选为'gptq', 'awq'，默认为None
-- quant_n_samples: gptq/awq的校验集抽样数，默认为256
+- quant_n_samples: gptq/awq的校验集抽样数，默认为128
 - max_length: 校准集的max_length, 默认值2048
 - quant_batch_size: 量化batch_size，默认为1
 - group_size: 量化group大小，默认为128

diff --git a/docs/source/Instruction/导出.md b/docs/source/Instruction/导出.md
@@ -77,7 +77,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer \
 CUDA_VISIBLE_DEVICES=0 swift sft \
     --model Qwen/Qwen2-7B-Instruct \
     --train_type lora \
-    --dataset AI-ModelScope/alpaca-gpt4-data-zh#5000 \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#5000' \
     --quant_method bnb \
     --quant_bits 4 \
     --torch_dtype bfloat16
@@ -86,15 +86,15 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 CUDA_VISIBLE_DEVICES=0 swift sft \
     --model Qwen/Qwen2-7B-Instruct \
     --train_type lora \
-    --dataset AI-ModelScope/alpaca-gpt4-data-zh#5000 \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#5000' \
     --quant_method hqq \
     --quant_bits 4
 
 # eetq
 CUDA_VISIBLE_DEVICES=0 swift sft \
     --model Qwen/Qwen2-7B-Instruct \
     --train_type lora \
-    --dataset AI-ModelScope/alpaca-gpt4-data-zh#5000 \
+    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#5000' \
     --quant_method eetq \
     --torch_dtype float16
 ```