[QUANTIZE] Improve explicitness of rules during annotation/realization #3828
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the previous version, we have put too many duties to realization: transform the simulated graph; decide data type casting after
simulated_quantize
; foradd
operation, it also will decide the output scale and unify its operands. Things are quite complicated in this situation with many implicit rules. This PR would like to move those extra functions out of the realization procedure.Insert
cast_hint
explicitly during annotation. Thedtype
filed inQRealizeExpr
has been removed.Currently,
cast_hint
does two things: 1.It has been inserted duringPartition
, and simulated_quantize to INPUT will be inserted before it during annotation. This is for storing low-precision output of residual block. 2. It has been inserted duringAnnotation
, and will be transformed tocast
during realization. Before realization, it just pass the input through likeidentity
, so will has no effect to the output before realization.Modify annotate/realize rule for
addition
.Previously, we will quantize operands of addition separately during annotation, then unify their scale during realization. This way has a lot burden since there exist many combinations of operands' kind. Currently, we will quantize operands into the same
dom_scale
by two sq but with the samedom_scale
parameter. So that we only need to do identity transform during realization.This PR also did some refactor work for the calibration part: thanks @vinx13 evaluation script for KL, the collect stats part has been moved into internal
collect_stats
. A new configcalibration_mode
has been added.Minor improvement: saturation for left_shift inside of the
QuantizeRealize