-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support online & ofline distributed inference #143
Conversation
@@ -37,6 +37,19 @@ def __init__(self, logger_name="MindYOLO"): | |||
self.device_per_servers = 8 | |||
self.formatter = logging.Formatter("%(asctime)s [%(levelname)s] %(message)s") | |||
|
|||
def write(self, msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是啥方法?和info有啥不一样?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个方法(write , flush)可以将三方库(比如coco api)的print信息重定向到mindyolo的logger系统
mindyolo/utils/utils.py
Outdated
@@ -31,7 +31,8 @@ def set_default(args): | |||
if args.is_parallel: | |||
init() | |||
args.rank, args.rank_size, parallel_mode = get_rank(), get_group_size(), ParallelMode.DATA_PARALLEL | |||
context.set_auto_parallel_context(device_num=args.rank_size, parallel_mode=parallel_mode, gradients_mean=True) | |||
context.set_auto_parallel_context(device_num=args.rank_size, parallel_mode=parallel_mode, gradients_mean=True, | |||
parameter_broadcast=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么要加这个,cv那边加了这个后精度掉了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我想做一个强制保证。我做过2个简单实验,1、对比加前后network初始化参数时候一致; 2、对比加前后 在yolox-tiny上训练25, 50,100个epoch时的eval精度是否接近,答案都是肯定的。 我删掉吧
@@ -55,7 +56,9 @@ def set_default(args): | |||
args.config, | |||
) | |||
# Directories and Save run settings | |||
args.save_dir = os.path.join(args.save_dir, datetime.now().strftime("%Y.%m.%d-%H_%M_%S")) | |||
time = get_broadcast_datetime(rank_size=args.rank_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要加判断,单卡用Broadcast会报错
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
做了判断,单卡不会走broadcast;同时测试了,单卡初始化Broadcast算子是可以的,只是不能调用
@@ -79,7 +90,9 @@ def set_default_test(args): | |||
args.config, | |||
) | |||
# Directories and Save run settings | |||
args.save_dir = os.path.join(args.save_dir, datetime.now().strftime("%Y.%m.%d-%H:%M:%S")) | |||
time = get_broadcast_datetime(rank_size=args.rank_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
class Synchronizer: | ||
def __init__(self, rank_size=1): | ||
# this init method should be run only once | ||
self.all_reduce = AllReduce() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
单卡的时候初始化AllReduce算子可能存在问题,可以把rank_sink的判断放到上面
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个测试过,单卡的时候初始化AllReduce算子不会有问题;
此外,单卡不会初始化Synchronizer,也就不会调用allreduce;而且,单卡和多卡情况我测试过多次,程序运行正常
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方初始化后就把ranksize确定了,可以直接在init做判断,单多/卡都能调用,不用在外面做判断
class Synchronizer: | ||
def __init__(self, rank_size=1): | ||
# this init method should be run only once | ||
self.all_reduce = AllReduce() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方初始化后就把ranksize确定了,可以直接在init做判断,单多/卡都能调用,不用在外面做判断
Thank you for your contribution to the MindYOLO repo.
Before submitting this PR, please make sure:
Motivation
(Write your motivation for proposed changes here.)
Test Plan
(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)
Related Issues and PRs
closes #142
(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)