You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
In issue #3084, I encountered a situation where the quantized output caused the loss to be nan.
Your team reply:
I have tested the mobilenetv2 in QAT. If only quantize weight, the training process is stable and the model can converge normally. If quantize activation, the loss will be nan. I think this phenomenon occurred because folding of batchnorm has not been supported yet. And it will always exists in the QAT method without folding.
And then, I realized the function of fusing batchnorm by myself, but the problem still exists.
So I checked the code again, find a possible solution.
Near line 233 of "quantizers.py", quantize_output function
In my opinion, using tracked_min and tracked_max to update quantization param will smooth the drastic changes in data (just my understanding), code show as below
However, When quantifying for the first time, the default value of tracked_max_biased (aka biased_ema in update_ema function) is zero, which will cause tracked_max (aka unbiased_ema) much smaller than current_max, same for the tracked_min. So the QAT result of the activation layer is incorrect, which causes weights of the Conv layer will be extremely large. After several epochs, the weight will be larger and larger until to be nan.
This problem does not appear in the example (QAT_torch_quantizer.py), I guess it may be because the network structure and dataset are simple?
So I change the code like this, using current_min and current_max replace the tracked_min and tracked_max, and the problem fixed.
The accuracy of models will be acceptable.
ShufflenetV2(net_size=0.5) CIFAR10 with fuse bn, accuracy=74.6%
ShufflenetV2(net_size=0.5) CIFAR10 w/o fuse bn, accuracy=56.0%
I hope this can be helpful! ^_^
The text was updated successfully, but these errors were encountered:
In issue #3084, I encountered a situation where the quantized output caused the loss to be nan.
Your team reply:
And then, I realized the function of fusing batchnorm by myself, but the problem still exists.
So I checked the code again, find a possible solution.
Near line 233 of "quantizers.py", quantize_output function
In my opinion, using tracked_min and tracked_max to update quantization param will smooth the drastic changes in data (just my understanding), code show as below
However, When quantifying for the first time, the default value of tracked_max_biased (aka biased_ema in update_ema function) is zero, which will cause tracked_max (aka unbiased_ema) much smaller than current_max, same for the tracked_min. So the QAT result of the activation layer is incorrect, which causes weights of the Conv layer will be extremely large. After several epochs, the weight will be larger and larger until to be nan.
This problem does not appear in the example (QAT_torch_quantizer.py), I guess it may be because the network structure and dataset are simple?
So I change the code like this, using current_min and current_max replace the tracked_min and tracked_max, and the problem fixed.
The accuracy of models will be acceptable.
ShufflenetV2(net_size=0.5) CIFAR10 with fuse bn, accuracy=74.6%
ShufflenetV2(net_size=0.5) CIFAR10 w/o fuse bn, accuracy=56.0%
I hope this can be helpful! ^_^
The text was updated successfully, but these errors were encountered: