Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recompose QLinearMatMul and remove Quantize-Dequantize pairs #2875

Merged
merged 10 commits into from
Jul 16, 2024

Conversation

tungld
Copy link
Collaborator

@tungld tungld commented Jul 12, 2024

Add two patterns found when doing static quantization for bert-base models using onnxruntime.

  • 1st pattern: recompose DequantizeLinear - MatMul - QuantizeLinear back to QLinearMatmul
  • 2nd pattern: remove QuantizeLinear-DequantizeLinear pairs

tungld added 8 commits July 11, 2024 02:49
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
@@ -1,4 +1,4 @@
// RUN: onnx-mlir %s | FileCheck %s
// RUN: onnx-mlir %s -o %t| FileCheck %s && rm %t.so
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the generated .so during make check-onnx-lit.

Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, worthwhile opt, thanks

@tungld tungld merged commit 4a241ef into onnx:main Jul 16, 2024
7 checks passed
@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15146 [push] Recompose QLinearMatMul ... started at 01:12

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15141 [push] Recompose QLinearMatMul ... started at 00:12

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14171 [push] Recompose QLinearMatMul ... started at 01:24

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15141 [push] Recompose QLinearMatMul ... failed after 24 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15146 [push] Recompose QLinearMatMul ... passed after 1 hr 29 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14171 [push] Recompose QLinearMatMul ... passed after 2 hr 0 min

@mgehre-amd
Copy link
Contributor

mgehre-amd commented Sep 24, 2024

The QuantizeDequantizePattern has broken out downstream use case.
It's wrong to fold DequantizeLinear(QuantizeLinear(x)) to identity because the rounding then doesn't happen anymore.
(and already ignoring that they might have different scales & zero-point cannot be right).

Example assuming scale = 1 and zero_point = 0:
DequantizeLinear(QuantizeLinear(3.5)) = DequantizeLinear(4) = 4.0

Can you please revert this?

@AlexandreEichenberger
Copy link
Collaborator

@tungld Can you look into @mgehre-amd comment and fix appropriately? Thanks

@tungld
Copy link
Collaborator Author

tungld commented Sep 25, 2024

@mgehre-amd thanks for pointing it out! I created a PR to revert that pattern: #2952. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants