Recompose QLinearMatMul and remove Quantize-Dequantize pairs #2875

tungld · 2024-07-12T06:41:55Z

Add two patterns found when doing static quantization for bert-base models using onnxruntime.

1st pattern: recompose DequantizeLinear - MatMul - QuantizeLinear back to QLinearMatmul
2nd pattern: remove QuantizeLinear-DequantizeLinear pairs

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

tungld · 2024-07-12T07:29:58Z

test/mlir/driver/compile_phases.mlir

@@ -1,4 +1,4 @@
-// RUN: onnx-mlir %s | FileCheck %s
+// RUN: onnx-mlir %s -o %t| FileCheck %s && rm %t.so


Remove the generated .so during make check-onnx-lit.

AlexandreEichenberger

LGTM, worthwhile opt, thanks

jenkins-droid · 2024-07-16T05:12:58Z

Jenkins Linux s390x Build #15146 [push] Recompose QLinearMatMul ... started at 01:12

jenkins-droid · 2024-07-16T05:13:00Z

Jenkins Linux amd64 Build #15141 [push] Recompose QLinearMatMul ... started at 00:12

jenkins-droid · 2024-07-16T05:13:01Z

Jenkins Linux ppc64le Build #14171 [push] Recompose QLinearMatMul ... started at 01:24

jenkins-droid · 2024-07-16T05:37:21Z

Jenkins Linux amd64 Build #15141 [push] Recompose QLinearMatMul ... failed after 24 min

jenkins-droid · 2024-07-16T06:41:58Z

Jenkins Linux s390x Build #15146 [push] Recompose QLinearMatMul ... passed after 1 hr 29 min

jenkins-droid · 2024-07-16T07:13:17Z

Jenkins Linux ppc64le Build #14171 [push] Recompose QLinearMatMul ... passed after 2 hr 0 min

mgehre-amd · 2024-09-24T06:55:09Z

The QuantizeDequantizePattern has broken out downstream use case.
It's wrong to fold DequantizeLinear(QuantizeLinear(x)) to identity because the rounding then doesn't happen anymore.
(and already ignoring that they might have different scales & zero-point cannot be right).

Example assuming scale = 1 and zero_point = 0:
DequantizeLinear(QuantizeLinear(3.5)) = DequantizeLinear(4) = 4.0

Can you please revert this?

AlexandreEichenberger · 2024-09-24T13:17:50Z

@tungld Can you look into @mgehre-amd comment and fix appropriately? Thanks

tungld · 2024-09-25T00:30:03Z

@mgehre-amd thanks for pointing it out! I created a PR to revert that pattern: #2952. Thanks!

tungld added 8 commits July 11, 2024 02:49

init

2f1ad20

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

revise

d82a0aa

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

revise

e16a1e8

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

remove per-tensor check and add a lit test

a6b9b67

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

add a lit test for dequantizelinear canonicalization

f0edbc9

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

Merge branch 'main' into dequantize-matmul-quantize

8ccb577

Merge branch 'main' into dequantize-matmul-quantize

edc18a7

DialectBuilder for QLinearMatMul

aa29016

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

tungld commented Jul 12, 2024

View reviewed changes

tungld requested a review from AlexandreEichenberger July 12, 2024 07:37

AlexandreEichenberger approved these changes Jul 12, 2024

View reviewed changes

tungld added 2 commits July 16, 2024 11:22

Merge branch 'main' into dequantize-matmul-quantize

978bb8b

Merge branch 'main' into dequantize-matmul-quantize

ec5e20a

tungld merged commit 4a241ef into onnx:main Jul 16, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recompose QLinearMatMul and remove Quantize-Dequantize pairs #2875

Recompose QLinearMatMul and remove Quantize-Dequantize pairs #2875

tungld commented Jul 12, 2024

tungld Jul 12, 2024

AlexandreEichenberger left a comment

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

mgehre-amd commented Sep 24, 2024 •

edited

Loading

AlexandreEichenberger commented Sep 24, 2024

tungld commented Sep 25, 2024

		@@ -1,4 +1,4 @@
		// RUN: onnx-mlir %s \| FileCheck %s
		// RUN: onnx-mlir %s -o %t\| FileCheck %s && rm %t.so

Recompose QLinearMatMul and remove Quantize-Dequantize pairs #2875

Recompose QLinearMatMul and remove Quantize-Dequantize pairs #2875

Conversation

tungld commented Jul 12, 2024

tungld Jul 12, 2024

Choose a reason for hiding this comment

AlexandreEichenberger left a comment

Choose a reason for hiding this comment

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

jenkins-droid commented Jul 16, 2024

mgehre-amd commented Sep 24, 2024 • edited Loading

AlexandreEichenberger commented Sep 24, 2024

tungld commented Sep 25, 2024

mgehre-amd commented Sep 24, 2024 •

edited

Loading