Quantize Bias for Conv/Gemm on Quantized Model #22889

centwang · 2024-11-19T10:54:58Z

Some quantized models don't have Conv/Gemm node's bias quantized but still leave them in float. This PR is to create a sub-graph to quantize the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1 and zp = 0. We only do this for bias initializer so that ConstantFolding will fold the sub-graph to a real quantized int32 bias initializer during the graph optimization next round.

onnxruntime/core/optimizer/qdq_transformer/bias_quantization.cc

adrianlizarraga

Thank you!

Some quantized models don't have Conv/Gemm node's bias quantized but still leave them in float. This PR is to create a sub-graph to quantize the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1 and zp = 0. We only do this for bias initializer so that ConstantFolding will fold the sub-graph to a real quantized int32 bias initializer during the graph optimization next round.

centwang added 2 commits November 26, 2024 10:34

bias quantization optimizer

1b9c20c

add uts

4860244

centwang force-pushed the weicwang/bias_quantization branch from 60f58bc to 4860244 Compare November 26, 2024 04:43

centwang marked this pull request as ready for review November 26, 2024 04:47

centwang requested review from adrianlizarraga, skottmckay and jywu-msft November 26, 2024 04:47

fix build error

3d475b1

skottmckay reviewed Nov 26, 2024

View reviewed changes

onnxruntime/core/optimizer/qdq_transformer/bias_quantization.cc Show resolved Hide resolved

onnxruntime/core/optimizer/qdq_transformer/bias_quantization.cc Outdated Show resolved Hide resolved

fix typo

49064e6

skottmckay approved these changes Nov 27, 2024

View reviewed changes

adrianlizarraga reviewed Nov 27, 2024

View reviewed changes

onnxruntime/core/optimizer/qdq_transformer/bias_quantization.cc Show resolved Hide resolved

adrianlizarraga approved these changes Nov 27, 2024

View reviewed changes

centwang merged commit 1128882 into main Nov 28, 2024
95 checks passed

centwang deleted the weicwang/bias_quantization branch November 28, 2024 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize Bias for Conv/Gemm on Quantized Model #22889

Quantize Bias for Conv/Gemm on Quantized Model #22889

centwang commented Nov 19, 2024 •

edited

Loading

adrianlizarraga left a comment

Quantize Bias for Conv/Gemm on Quantized Model #22889

Quantize Bias for Conv/Gemm on Quantized Model #22889

Conversation

centwang commented Nov 19, 2024 • edited Loading

adrianlizarraga left a comment

Choose a reason for hiding this comment

centwang commented Nov 19, 2024 •

edited

Loading