Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add finetune tutorial #124

Merged
merged 1 commit into from
Jun 6, 2023
Merged

add finetune tutorial #124

merged 1 commit into from
Jun 6, 2023

Conversation

yuedongli1
Copy link
Collaborator

@yuedongli1 yuedongli1 commented May 30, 2023

Thank you for your contribution to the MindYOLO repo.
Before submitting this PR, please make sure:

Motivation

(Write your motivation for proposed changes here.)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

@yuedongli1 yuedongli1 added documentation Improvements or additions to documentation inside-test 内部开发者提的issue rfc 需求单issue labels May 30, 2023
@yuedongli1 yuedongli1 added this to the mindyolo-0.1 milestone May 30, 2023
@yuedongli1 yuedongli1 self-assigned this May 30, 2023
@yuedongli1 yuedongli1 linked an issue May 30, 2023 that may be closed by this pull request
@@ -0,0 +1,122 @@
# 自定义数据集finetune入门
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_name -> custom_dataset.md

Comment on lines 227 to 244
def modify_dataset_columns(image, labels, img_files):
return image, labels

loader = self.dataloader.map(
modify_dataset_columns,
input_columns=["image", "labels", "img_files"],
output_columns=["image", "labels"],
column_order=["image", "labels"],
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本处已在2.0适配pr中修复

@@ -0,0 +1,242 @@
import os
Copy link
Collaborator

@zhanghuiyao zhanghuiyao Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 这个放到外层mindyolo/examples/finetune_SHWD/convert_shwd2yolo.py
  2. 写一个 mindyolo/examples/finetune_SHWD/finetune_shwd.py 和 README.md

Comment on lines 80 to 82
### 举例

下面以安全帽佩戴检测数据集(SHWD)为例,介绍自定义数据集在MindYOLO上进行finetune的主要流程。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

链接一个 examples/finetune_SHWD/README.md

@yuedongli1 yuedongli1 closed this Jun 6, 2023
@yuedongli1 yuedongli1 reopened this Jun 6, 2023
@yuedongli1 yuedongli1 changed the title add finetune tutorial;revise trainer_factory(project) add finetune tutorial Jun 6, 2023
@yuedongli1 yuedongli1 requested a review from zhanghuiyao June 6, 2023 01:35

本文主要介绍MindYOLO套件使用的数据集格式。

MindYOLO套件使用yolo数据格式完成模型训练,使用coco数据格式借助coco api完成模型验证。因此,使用MindYOLO提供的api读取自定义数据集,需要将训练集转换为yolo格式,将验证集转换为coco格式。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 直接就是yolo格式,然后把具体的形式在下面整体给出来;
  2. 这个文件给的是自定义数据集的教程,dataset_format.md -> custom_dataset.md

Comment on lines 39 to 70
#### 模型训练

由于SHWD数据集只有7000+张图片,选择yolov7-tiny进行该数据集的训练,可下载MindYOLO提供的在coco数据集上训练好的[模型文件](https://github.com/mindspore-lab/mindyolo/blob/master/MODEL_ZOO.md)作为预训练模型。由于coco数据集含有80个物体类别,SHWD数据集只有两类,需将模型文件的最后一层head层去掉。具体训练流程可参见[GETTING_STARTED.md](https://github.com/mindspore-lab/mindyolo/blob/master/GETTING_STARTED.md)

MindYOLO所提供的默认参数下完成yolov7-tiny在SHWD数据集上的训练,即可达到ap50为87.0的精度结果;将lr_init参数由0.01改为0.001,即可实现ap50为89.2的精度结果,高于SHWD官方仓库提供的最高ap50精度88.5。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 这个应用的是finetune_shwd.py为入口,把具体的执行流程和命令描述清楚;
  2. “模型去掉最后一层head”是什么意思,在模型定义文件中没见到相应修改

Comment on lines 1 to 13
data:
dataset_name: shwd

train_set: ./SHWD/train.txt
val_set: ./SHWD/val.txt

nc: 2

# class names
names: [ 'person', 'hat' ]

train_transforms: []
test_transforms: []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 是否可以直接继承于外层的./configs/yolov7/yolov7-tiny.yaml,然后在shwd.yaml中修改变化的参数,本目录下的"hyp/yolov7-tiny.yaml"删除掉
  2. 末尾增加空行

Comment on lines 33 to 37
由于MindYOLO在验证阶段选用图片名称作为image_id,因此图片名称只能为数值类型,而不能为字符串类型,还需要对图片进行改名。对SHWD数据集格式的转换包含如下步骤,详细实现可参考[代码](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/examples/finetune_SHWD/convert_shwd2yolo.py)。
* 将图片复制到相应的路径下并改名
* 在根目录下相应的txt文件中写入该图片的相对路径
* 解析xml文件,在相应路径下生成对应的txt标注文件
* 验证集还需生成最终的json文件
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. link -> convert_shwd2yolo.py
  2. 把转换的流程写清楚,包括怎么运行convert_shwd2yolo.py得到什么形式的数据集

@yuedongli1 yuedongli1 requested a review from zhanghuiyao June 6, 2023 02:56
@@ -0,0 +1,123 @@
__BASE__: [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name -> yolov7-tiny_shwd.yaml

@yuedongli1 yuedongli1 requested a review from zhanghuiyao June 6, 2023 03:08
* 解析xml文件,在相应路径下生成对应的txt标注文件
* 验证集还需生成最终的json文件

详细实现可参考[代码](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/examples/finetune_SHWD/convert_shwd2yolo.py)。运行方式如下:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link -> convert_shwd2yolo.py

详细实现可参考[代码](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/examples/finetune_SHWD/convert_shwd2yolo.py)。运行方式如下:

```shell
cd mindyolo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd mindyolo去掉,默认在mindyolo工程目录下


#### 预训练模型文件转换

由于SHWD数据集只有7000+张图片,选择yolov7-tiny进行该数据集的训练,可下载MindYOLO提供的在coco数据集上训练好的[模型文件](https://github.com/mindspore-lab/mindyolo/blob/master/MODEL_ZOO.md)作为预训练模型。由于coco数据集含有80种物体类别,SHWD数据集只有两类,需将预训练模型文件的最后一层head层去掉, 可参考[代码](https://github.com/mindspore-lab/mindyolo/blob/master/examples/finetune_SHWD/convert_yolov7_headless.py)。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. link -> convert_yolov7_headless.py
  2. 给个运行命令
  3. 把head更nc有关加到说明中


```shell
cd mindyolo
python examples/finetune_SHWD/convert_shwd2yolo.py --root_dir ROOT_DIR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROOT_DIR 换成具体的路径,如 /path_to_hswd/HSWD

* 在多卡NPU/GPU上进行分布式模型训练,以8卡为例:

```shell
cd mindyolo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete "cd mindyolo"

* 在单卡NPU/GPU/CPU上训练模型:

```shell
cd mindyolo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete "cd mindyolo"


由于SHWD数据集只有7000+张图片,选择yolov7-tiny进行该数据集的训练,可下载MindYOLO提供的在coco数据集上训练好的[模型文件](https://github.com/mindspore-lab/mindyolo/blob/master/MODEL_ZOO.md)作为预训练模型。由于coco数据集含有80种物体类别,SHWD数据集只有两类,需将预训练模型文件的最后一层head层去掉, 可参考[代码](https://github.com/mindspore-lab/mindyolo/blob/master/examples/finetune_SHWD/convert_yolov7_headless.py)。

#### 模型训练
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

模型训练 -> 模型微调(Finetune) 这样会不会好一点

python examples/finetune_SHWD/finetune_shwd.py --config ./examples/finetune_SHWD/yolov7-tiny_shwd.yaml
```

MindYOLO所提供的默认参数下完成yolov7-tiny在SHWD数据集上的训练,即可达到ap50为87.0的精度结果;将lr_init参数由0.01改为0.001,即可实现ap50为89.2的精度结果,高于SHWD官方仓库提供的最高ap50精度88.5。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 开头加个 Note
  2. “MindYOLO所提供的默认参数下完成yolov7-tiny在SHWD数据集上的训练,即可xxx”
    改为
    “直接用yolov7-tiny默认coco参数在SHWD数据集上训练,可取得AP50 87.0的精度”
    是否更好

@@ -0,0 +1,15 @@
import mindspore as ms
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name -> conver_yolov7-tiny_pretrain_ckpt.py 这样是否更清晰

@@ -0,0 +1,76 @@
# 数据集格式介绍
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只给一个yolo格式的整体目录说明就可以了,然后下面说自定义数据集转成yolo格式进行训练,example参考hswd

@yuedongli1 yuedongli1 requested a review from zhanghuiyao June 6, 2023 06:09
* 解析xml文件,在相应路径下生成对应的txt标注文件
* 验证集还需生成最终的json文件

详细实现可参考[convert_shwd2yolo.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/examples/finetune_SHWD/convert_shwd2yolo.py)。运行方式如下:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link没改

python examples/finetune_SHWD/finetune_shwd.py --config ./examples/finetune_SHWD/yolov7-tiny_shwd.yaml
```

*注意:直接用yolov7-tiny默认coco参数在SHWD数据集上训练,可取得AP50 87.0的精度。将lr_init参数由0.01改为0.001,即可实现ap50为89.2的精度结果,高于SHWD官方仓库提供的最高ap50精度88.5。*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面这句"高于xxx"不要说;


#### 模型微调(Finetune)

简要的训练流程可参考[finetune_shwd.py](https://github.com/mindspore-lab/mindyolo/blob/master/examples/finetune_SHWD/finetune_shwd.py)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link没改

Comment on lines +3 to +21
适用于MindYOLO的数据集格式具有如下形式:
```
ROOT_DIR
├── val.txt
├── train.txt
├── annotations
│ └── instances_val2017.json
├── images
│ ├── train
│ │ ├── 00000001.jpg
│ │ └── 00000002.jpg
│ └── val
│ ├── 00006563.jpg
│ └── 00006564.jpg
└── labels
└── train
├── 00000001.txt
└── 00000002.txt
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面可以按照这个目录结合具体的文件说明下每个文件的标注格式和含义;

@yuedongli1 yuedongli1 requested a review from zhanghuiyao June 6, 2023 07:10
zhanghuiyao
zhanghuiyao previously approved these changes Jun 6, 2023
val_txt_yolo.close()

json_file = os.path.join(new_dir, 'annotations', 'instances_val2017.json')
json.dump(coco, open(json_file, 'w'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

教程中没有对关键代码的解释,需要在代码中关键部分加注释

new_ckpt = []
param_dict = ms.load_checkpoint(ori_weight)
for k, v in param_dict.items():
if '77' in k:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

77为什么要drop掉?

if __name__ == "__main__":
parser = get_parser_train()
args = parse_args(parser)
train_shwd(args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要单加一个文件,和训练的文件很像,有点多余

@zhanghuiyao zhanghuiyao merged commit a2b2129 into mindspore-lab:master Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation inside-test 内部开发者提的issue rfc 需求单issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Feature] mindyolo对外文档
3 participants