-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recompose QLinearMatMul and remove Quantize-Dequantize pairs #2875
Conversation
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
@@ -1,4 +1,4 @@ | |||
// RUN: onnx-mlir %s | FileCheck %s | |||
// RUN: onnx-mlir %s -o %t| FileCheck %s && rm %t.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the generated .so during make check-onnx-lit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, worthwhile opt, thanks
Jenkins Linux s390x Build #15146 [push] Recompose QLinearMatMul ... started at 01:12 |
Jenkins Linux amd64 Build #15141 [push] Recompose QLinearMatMul ... started at 00:12 |
Jenkins Linux ppc64le Build #14171 [push] Recompose QLinearMatMul ... started at 01:24 |
Jenkins Linux amd64 Build #15141 [push] Recompose QLinearMatMul ... failed after 24 min |
Jenkins Linux s390x Build #15146 [push] Recompose QLinearMatMul ... passed after 1 hr 29 min |
Jenkins Linux ppc64le Build #14171 [push] Recompose QLinearMatMul ... passed after 2 hr 0 min |
The QuantizeDequantizePattern has broken out downstream use case. Example assuming scale = 1 and zero_point = 0: Can you please revert this? |
@tungld Can you look into @mgehre-amd comment and fix appropriately? Thanks |
@mgehre-amd thanks for pointing it out! I created a PR to revert that pattern: #2952. Thanks! |
Add two patterns found when doing static quantization for bert-base models using onnxruntime.
DequantizeLinear - MatMul - QuantizeLinear
back toQLinearMatmul
QuantizeLinear-DequantizeLinear
pairs