support online & ofline distributed inference #143

Mark-ZhouWX · 2023-06-15T01:36:47Z

Thank you for your contribution to the MindYOLO repo.
Before submitting this PR, please make sure:

You have read the Contributing Guidelines on pull requests
Your code builds clean without any errors or warnings
You are using approved terminology
You have added unit tests

Motivation

support online & ofline distributed inference to increase inference speed
(Write your motivation for proposed changes here.)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

closes #142
(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

CaitinZhao · 2023-06-15T07:35:17Z

mindyolo/utils/logger.py

@@ -37,6 +37,19 @@ def __init__(self, logger_name="MindYOLO"):
        self.device_per_servers = 8
        self.formatter = logging.Formatter("%(asctime)s [%(levelname)s] %(message)s")

+    def write(self, msg):


这个是啥方法?和info有啥不一样？

这两个方法（write , flush）可以将三方库（比如coco api）的print信息重定向到mindyolo的logger系统

CaitinZhao · 2023-06-15T07:36:35Z

mindyolo/utils/utils.py

@@ -31,7 +31,8 @@ def set_default(args):
    if args.is_parallel:
        init()
        args.rank, args.rank_size, parallel_mode = get_rank(), get_group_size(), ParallelMode.DATA_PARALLEL
-        context.set_auto_parallel_context(device_num=args.rank_size, parallel_mode=parallel_mode, gradients_mean=True)
+        context.set_auto_parallel_context(device_num=args.rank_size, parallel_mode=parallel_mode, gradients_mean=True,
+                                          parameter_broadcast=True)


为什么要加这个，cv那边加了这个后精度掉了

我想做一个强制保证。我做过2个简单实验，1、对比加前后network初始化参数时候一致； 2、对比加前后在yolox-tiny上训练25， 50，100个epoch时的eval精度是否接近，答案都是肯定的。我删掉吧

CaitinZhao · 2023-06-15T07:38:04Z

mindyolo/utils/utils.py

@@ -55,7 +56,9 @@ def set_default(args):
        args.config,
    )
    # Directories and Save run settings
-    args.save_dir = os.path.join(args.save_dir, datetime.now().strftime("%Y.%m.%d-%H_%M_%S"))
+    time = get_broadcast_datetime(rank_size=args.rank_size)


需要加判断，单卡用Broadcast会报错

做了判断，单卡不会走broadcast；同时测试了，单卡初始化Broadcast算子是可以的，只是不能调用

CaitinZhao · 2023-06-15T07:39:21Z

test.py

@@ -79,7 +90,9 @@ def set_default_test(args):
        args.config,
    )
    # Directories and Save run settings
-    args.save_dir = os.path.join(args.save_dir, datetime.now().strftime("%Y.%m.%d-%H:%M:%S"))
+    time = get_broadcast_datetime(rank_size=args.rank_size)


zhanghuiyao · 2023-06-16T02:03:40Z

mindyolo/utils/utils.py

+class Synchronizer:
+    def __init__(self, rank_size=1):
+        # this init method should be run only once
+        self.all_reduce = AllReduce()


单卡的时候初始化AllReduce算子可能存在问题，可以把rank_sink的判断放到上面

这个测试过，单卡的时候初始化AllReduce算子不会有问题；
此外，单卡不会初始化Synchronizer，也就不会调用allreduce；而且，单卡和多卡情况我测试过多次，程序运行正常

这个地方初始化后就把ranksize确定了，可以直接在init做判断，单多/卡都能调用，不用在外面做判断

zhanghuiyao · 2023-06-16T02:49:40Z

mindyolo/utils/utils.py

+class Synchronizer:
+    def __init__(self, rank_size=1):
+        # this init method should be run only once
+        self.all_reduce = AllReduce()


这个地方初始化后就把ranksize确定了，可以直接在init做判断，单多/卡都能调用，不用在外面做判断

Mark-ZhouWX added inside-test 内部开发者提的issue rfc 需求单issue labels Jun 15, 2023

Mark-ZhouWX added this to the mindyolo-0.1 milestone Jun 15, 2023

Mark-ZhouWX requested review from zhanghuiyao and CaitinZhao June 15, 2023 01:36

Mark-ZhouWX self-assigned this Jun 15, 2023

Mark-ZhouWX force-pushed the yolox branch from c90c667 to 3f3eb61 Compare June 15, 2023 02:27

CaitinZhao reviewed Jun 15, 2023

View reviewed changes

zhanghuiyao reviewed Jun 16, 2023

View reviewed changes

support online and offline distributed inference

7e0a775

Mark-ZhouWX force-pushed the yolox branch from 3f3eb61 to 7e0a775 Compare June 16, 2023 02:25

zhanghuiyao approved these changes Jun 16, 2023

View reviewed changes

CaitinZhao approved these changes Jun 16, 2023

View reviewed changes

CaitinZhao merged commit 4680b63 into mindspore-lab:master Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support online & ofline distributed inference #143

support online & ofline distributed inference #143

Mark-ZhouWX commented Jun 15, 2023

CaitinZhao Jun 15, 2023

Mark-ZhouWX Jun 16, 2023

CaitinZhao Jun 15, 2023

Mark-ZhouWX Jun 16, 2023 •

edited

Loading

CaitinZhao Jun 15, 2023

Mark-ZhouWX Jun 16, 2023 •

edited

Loading

CaitinZhao Jun 15, 2023

Mark-ZhouWX Jun 16, 2023

zhanghuiyao Jun 16, 2023

Mark-ZhouWX Jun 16, 2023 •

edited

Loading

zhanghuiyao Jun 16, 2023

zhanghuiyao Jun 16, 2023

support online & ofline distributed inference #143

support online & ofline distributed inference #143

Conversation

Mark-ZhouWX commented Jun 15, 2023

Motivation

Test Plan

Related Issues and PRs

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-ZhouWX Jun 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-ZhouWX Jun 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-ZhouWX Jun 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-ZhouWX Jun 16, 2023 •

edited

Loading

Mark-ZhouWX Jun 16, 2023 •

edited

Loading

Mark-ZhouWX Jun 16, 2023 •

edited

Loading