-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix QAT ema issue and tensor type error #3219
Conversation
@@ -61,8 +61,7 @@ def update_ema(biased_ema, value, decay, step): | |||
float, float | |||
""" | |||
biased_ema = biased_ema * decay + (1 - decay) * value | |||
unbiased_ema = biased_ema / (1 - decay ** step) # Bias correction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we no longer need step
, remove it from parameter?
And just curious, compared with before, don't using unbiased_ema
will cause something different situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check this out post
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Have deleted step
.
Unbiased_ema is used to solve the problem of deviation during ema initialization. But this method have bad performance when testing some network designed for mobile device such as mobilenetv2. In this case, mobilenetv2 can't converge at all. We add code below to resolve this problem:
module.tracked_min_biased, module.tracked_max_biased = torch.min(output), torch.max(output)
and we don't need unbiased_ema anymore. I have test the accuracy in mobilenetv2, lenet and shufflenetv2. All of them have better or same performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or did u do any experiment to prove that with the [same initialization method, removing unbiased ema does help to converge faster and better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you mention the key point. The difference of whether removing unbiased_ema
reveals when quant_start_step
is small. I have done the experiment setting quant_start_step
to 5 on mobilenetv2. In this case, if we keep unbiased_ema
, the model can't converge. And when we remove it, the model can converge normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um,i think I didn’t make myself clear.
You made 2 changes here
- Initialize trackmin and trackmax to min and max of the tensor, which is good.
- You removed unbiased ema, which I am confused about,
From my view, your experiments above proved your first modification, which I think is reasonable. THUMBS UP !
And your experiment above didn’t give us any clue that unbiased ema hurts convergence. I think 4 experiments should be done
(1). With and Without modification 1. Which you have done I think. GOOD.
(2) With and Without unbiased ema but BOTH WITH MODIFICATION 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The experiment quant_start_step
to 5 mentioned above is on the experiment with modification 1. In this case, if we keep unbiased_ema, the model can't converge. And when we remove it, the model can converge normally. And I also test 'quant_start_step' 3, 4, the result is similar.
If we set quant_start_step
to large number like 1000, there will be little difference between deleting unbiased_ema and keeping unbiased_ema. All of them can converge and the accuracy is similar.
Based on the experiments, delete unbiased_ema performs good during the situation when quant_start_step
is small. And I choose to delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that is what I mean.Good!
This pr fixes the problem proposed in issue #3204 and a bug in the previous pr #3160.