Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantize Bias for Conv/Gemm on Quantized Model #22889

Merged
merged 4 commits into from
Nov 28, 2024

Conversation

centwang
Copy link
Contributor

@centwang centwang commented Nov 19, 2024

Some quantized models don't have Conv/Gemm node's bias quantized but still leave them in float. This PR is to create a sub-graph to quantize the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1 and zp = 0. We only do this for bias initializer so that ConstantFolding will fold the sub-graph to a real quantized int32 bias initializer during the graph optimization next round.

@centwang centwang force-pushed the weicwang/bias_quantization branch from 60f58bc to 4860244 Compare November 26, 2024 04:43
@centwang centwang marked this pull request as ready for review November 26, 2024 04:47
Copy link
Contributor

@adrianlizarraga adrianlizarraga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@centwang centwang merged commit 1128882 into main Nov 28, 2024
95 checks passed
@centwang centwang deleted the weicwang/bias_quantization branch November 28, 2024 02:10
guschmue pushed a commit that referenced this pull request Dec 2, 2024
Some quantized models don't have Conv/Gemm node's bias quantized but
still leave them in float. This PR is to create a sub-graph to quantize
the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1
and zp = 0. We only do this for bias initializer so that ConstantFolding
will fold the sub-graph to a real quantized int32 bias initializer
during the graph optimization next round.
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
Some quantized models don't have Conv/Gemm node's bias quantized but
still leave them in float. This PR is to create a sub-graph to quantize
the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1
and zp = 0. We only do this for bias initializer so that ConstantFolding
will fold the sub-graph to a real quantized int32 bias initializer
during the graph optimization next round.
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
Some quantized models don't have Conv/Gemm node's bias quantized but
still leave them in float. This PR is to create a sub-graph to quantize
the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1
and zp = 0. We only do this for bias initializer so that ConstantFolding
will fold the sub-graph to a real quantized int32 bias initializer
during the graph optimization next round.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants