Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUANTIZE] Improve explicitness of rules during annotation/realization #3828

Closed
wants to merge 4 commits into from

Conversation

ZihengJiang
Copy link
Contributor

In the previous version, we have put too many duties to realization: transform the simulated graph; decide data type casting after simulated_quantize; for add operation, it also will decide the output scale and unify its operands. Things are quite complicated in this situation with many implicit rules. This PR would like to move those extra functions out of the realization procedure.

  • Insert cast_hint explicitly during annotation. The dtype filed in QRealizeExpr has been removed.
    Currently, cast_hint does two things: 1.It has been inserted during Partition, and simulated_quantize to INPUT will be inserted before it during annotation. This is for storing low-precision output of residual block. 2. It has been inserted during Annotation, and will be transformed to cast during realization. Before realization, it just pass the input through like identity, so will has no effect to the output before realization.

  • Modify annotate/realize rule for addition.
    Previously, we will quantize operands of addition separately during annotation, then unify their scale during realization. This way has a lot burden since there exist many combinations of operands' kind. Currently, we will quantize operands into the same dom_scale by two sq but with the same dom_scale parameter. So that we only need to do identity transform during realization.

  • This PR also did some refactor work for the calibration part: thanks @vinx13 evaluation script for KL, the collect stats part has been moved into internal collect_stats. A new config calibration_mode has been added.

  • Minor improvement: saturation for left_shift inside of the QuantizeRealize


def _find_scale_by_kl(arr,
quantized_dtype='int8',
num_bins=8001,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we choose the parameter? @vinx13

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is a tradeoff between precision of the computed KLD and speed of calibration, the default one is good in my experiments

# TODO: need to fix accuracy
# Config('mobilenetv2_1.0', nbit_input=8, dtype_input='int8', nbit_output=16, dtype_output='int16', global_scale=4.0),
# resnet18_v1 best configuration
Config('resnet18_v1', nbit_input=8, dtype_input='int8', nbit_output=16, dtype_output='int16', global_scale=8.0, expected_acc=0.675),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this method brings some accuracy drop, I will hold this PR until find workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants