-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: change mpirun with msrun and add other notices (merge into main) #805
Conversation
README.md
Outdated
|
||
```shell | ||
# distributed training | ||
# assume you have 4 GPUs/NPUs | ||
mpirun -n 4 python train.py --distribute \ | ||
msrun --worker_num 4 python train.py --distribute \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msrun也加下绑核吧
README.md
Outdated
--model=densenet121 --dataset=imagenet --data_dir=/path/to/imagenet | ||
``` | ||
> Notes: If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. | ||
Notice that if you are using mpirun startup with 2 devices, please add `--bind-to numa` to avoid known performance error. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using mpirun xxxx --bind_core=True to improve performance.
README.md
Outdated
Notice that if you are using mpirun startup with 2 devices, please add `--bind-to numa` to avoid known performance error. For example: | ||
|
||
```shell | ||
mpirun --allow-run-as-root --merge-stderr-to-stdout --output-filename ./output_bind --bind-to numa -n 2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msrun --bind_core=True --worker_num=2--local_worker_num=2 --master_port=8118
--log_dir=msrun_log --join=True --cluster_time_out=300 \
README_CN.md
Outdated
--model densenet121 --dataset imagenet --data_dir ./datasets/imagenet | ||
``` | ||
注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 `--bind-to numa` 增加绑核操作以规避两卡场景下的性能问题,范例代码如下: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果在两卡环境下选用msrun作为启动方式--bind_core=True增加绑核操作以优化两卡性能
README_CN.md
Outdated
注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 `--bind-to numa` 增加绑核操作以规避两卡场景下的性能问题,范例代码如下: | ||
|
||
```shell | ||
mpirun --allow-run-as-root --merge-stderr-to-stdout --output-filename ./output_bind --bind-to numa -n 2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
configs/README.md
Outdated
|
||
```shell | ||
# standalone training on a gpu or ascend device | ||
python train.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/dataset --distribute False | ||
|
||
# distributed training on gpu or ascend divices | ||
mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet | ||
msrun --worker_num 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./device
docs/zh/index.md
Outdated
--model densenet121 --dataset imagenet --data_dir ./datasets/imagenet | ||
``` | ||
|
||
注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 `--bind-to numa` 增加绑核操作以规避两卡场景下的性能问题,范例代码如下: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
docs/zh/index.md
Outdated
注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 `--bind-to numa` 增加绑核操作以规避两卡场景下的性能问题,范例代码如下: | ||
|
||
```shell | ||
mpirun --allow-run-as-root --merge-stderr-to-stdout --output-filename ./output_bind --bind-to numa -n 2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
88a3eee
to
1812092
Compare
README.md
Outdated
--model=densenet121 --dataset=imagenet --data_dir=/path/to/imagenet | ||
``` | ||
> Notes: If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. | ||
|
||
Notice that if you are using msrun startup with 2 devices, please add `--bind_core=True` to avoid known performance error. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to improve performance.
下同
1812092
to
8758ec9
Compare
Thank you for your contribution to the MindCV repo.
Before submitting this PR, please make sure:
Motivation
(Write your motivation for proposed changes here.)
Test Plan
(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)
Related Issues and PRs
(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)