A possible solution to the inability to quantify the output with QAT #3204

Lycan1003 · 2020-12-17T03:14:58Z

In issue #3084, I encountered a situation where the quantized output caused the loss to be nan.

Your team reply:

I have tested the mobilenetv2 in QAT. If only quantize weight, the training process is stable and the model can converge normally. If quantize activation, the loss will be nan. I think this phenomenon occurred because folding of batchnorm has not been supported yet. And it will always exists in the QAT method without folding.

And then, I realized the function of fusing batchnorm by myself, but the problem still exists.
So I checked the code again, find a possible solution.
Near line 233 of "quantizers.py", quantize_output function

    def quantize_output(self, output, wrapper, **kwargs):
    ...
        current_min, current_max = torch.min(output), torch.max(output)
        module.tracked_min_biased, module.tracked_min = update_ema(module.tracked_min_biased, current_min, module.ema_decay, self.steps)
        module.tracked_max_biased, module.tracked_max = update_ema(module.tracked_max_biased, current_max, module.ema_decay, self.steps)
        module.scale, module.zero_point = update_quantization_param(output_bits, module.tracked_min, module.tracked_max)

In my opinion, using tracked_min and tracked_max to update quantization param will smooth the drastic changes in data (just my understanding), code show as below

def update_ema(biased_ema, value, decay, step):
    biased_ema = biased_ema * decay + (1 - decay) * value
    unbiased_ema = biased_ema / (1 - decay ** step)  # Bias correction
    return biased_ema, unbiased_ema

However, When quantifying for the first time, the default value of tracked_max_biased (aka biased_ema in update_ema function) is zero, which will cause tracked_max (aka unbiased_ema) much smaller than current_max, same for the tracked_min. So the QAT result of the activation layer is incorrect, which causes weights of the Conv layer will be extremely large. After several epochs, the weight will be larger and larger until to be nan.

This problem does not appear in the example (QAT_torch_quantizer.py), I guess it may be because the network structure and dataset are simple?

So I change the code like this, using current_min and current_max replace the tracked_min and tracked_max, and the problem fixed.

 module.scale, module.zero_point = update_quantization_param(output_bits, current_min, current_max)

The accuracy of models will be acceptable.
ShufflenetV2(net_size=0.5) CIFAR10 with fuse bn, accuracy=74.6%
ShufflenetV2(net_size=0.5) CIFAR10 w/o fuse bn, accuracy=56.0%

I hope this can be helpful! ^_^

The text was updated successfully, but these errors were encountered:

linbinskn · 2020-12-19T03:13:12Z

Great! Thank you for your issue! Have submitted a pr to fix this problem.

scarlett2018 assigned QuanluZhang and linbinskn Dec 18, 2020

scarlett2018 added user raised external contributor quantize and removed user raised labels Dec 18, 2020

linbinskn mentioned this issue Dec 19, 2020

fix QAT ema issue and tensor type error #3211

Closed

Lycan1003 closed this as completed Dec 21, 2020

linbinskn mentioned this issue Dec 22, 2020

fix QAT ema issue and tensor type error #3219

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A possible solution to the inability to quantify the output with QAT #3204

A possible solution to the inability to quantify the output with QAT #3204

Lycan1003 commented Dec 17, 2020

linbinskn commented Dec 19, 2020 •

edited

Loading

A possible solution to the inability to quantify the output with QAT #3204

A possible solution to the inability to quantify the output with QAT #3204

Comments

Lycan1003 commented Dec 17, 2020

linbinskn commented Dec 19, 2020 • edited Loading

linbinskn commented Dec 19, 2020 •

edited

Loading