[QUANTIZE] Improve explicitness of rules during annotation/realization #3828

ZihengJiang · 2019-08-23T23:23:52Z

In the previous version, we have put too many duties to realization: transform the simulated graph; decide data type casting after simulated_quantize; for add operation, it also will decide the output scale and unify its operands. Things are quite complicated in this situation with many implicit rules. This PR would like to move those extra functions out of the realization procedure.

Insert cast_hint explicitly during annotation. The dtype filed in QRealizeExpr has been removed.
Currently, cast_hint does two things: 1.It has been inserted during Partition, and simulated_quantize to INPUT will be inserted before it during annotation. This is for storing low-precision output of residual block. 2. It has been inserted during Annotation, and will be transformed to cast during realization. Before realization, it just pass the input through like identity, so will has no effect to the output before realization.
Modify annotate/realize rule for addition.
Previously, we will quantize operands of addition separately during annotation, then unify their scale during realization. This way has a lot burden since there exist many combinations of operands' kind. Currently, we will quantize operands into the same dom_scale by two sq but with the same dom_scale parameter. So that we only need to do identity transform during realization.
This PR also did some refactor work for the calibration part: thanks @vinx13 evaluation script for KL, the collect stats part has been moved into internal collect_stats. A new config calibration_mode has been added.
Minor improvement: saturation for left_shift inside of the QuantizeRealize

ZihengJiang · 2019-08-24T03:40:03Z

python/tvm/relay/quantize/calibrate.py

+
+def _find_scale_by_kl(arr,
+                      quantized_dtype='int8',
+                      num_bins=8001,


How do we choose the parameter? @vinx13

this parameter is a tradeoff between precision of the computed KLD and speed of calibration, the default one is good in my experiments

ZihengJiang · 2019-08-27T08:24:59Z

tests/python/nightly/quantization/test_quantization_accuracy.py

-        # TODO: need to fix accuracy
-        # Config('mobilenetv2_1.0', nbit_input=8, dtype_input='int8', nbit_output=16, dtype_output='int16', global_scale=4.0),
+        # resnet18_v1 best configuration
+        Config('resnet18_v1', nbit_input=8, dtype_input='int8', nbit_output=16, dtype_output='int16', global_scale=8.0, expected_acc=0.675),


It seems this method brings some accuracy drop, I will hold this PR until find workaround

ZihengJiang added 2 commits August 16, 2019 15:43

Init

f665aac

Update

5427c86

ZihengJiang added the status: WIP label Aug 23, 2019

ZihengJiang commented Aug 24, 2019

View reviewed changes

ZihengJiang added 2 commits August 27, 2019 01:22

Update test

9af9da3

Update test

8dab80c

ZihengJiang commented Aug 27, 2019

View reviewed changes

vinx13 mentioned this pull request Nov 10, 2019

[Relay][Quantize] Integrate data-aware calibration into quantization #4295

Merged

tqchen added the status: inactive label Nov 15, 2019

tqchen closed this Nov 15, 2019

ZihengJiang deleted the quantize branch August 28, 2020 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUANTIZE] Improve explicitness of rules during annotation/realization #3828

[QUANTIZE] Improve explicitness of rules during annotation/realization #3828

ZihengJiang commented Aug 23, 2019

ZihengJiang Aug 24, 2019

vinx13 Aug 24, 2019

ZihengJiang Aug 27, 2019

[QUANTIZE] Improve explicitness of rules during annotation/realization #3828

[QUANTIZE] Improve explicitness of rules during annotation/realization #3828

Conversation

ZihengJiang commented Aug 23, 2019

ZihengJiang Aug 24, 2019

Choose a reason for hiding this comment

vinx13 Aug 24, 2019

Choose a reason for hiding this comment

ZihengJiang Aug 27, 2019

Choose a reason for hiding this comment