-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive bloating of ONNX files due to over-efficient conversion of "Tile" to constants (Protocol Buffers .onnx > 2GB) #178
Comments
Many thanks for the excellent analysis! It is absolutely a problem.
I can add such a flag soon. |
I disabled the constantization of Tile OPs internally (instead of providing a flag). Could you please try the latest 0.3.9 version? |
@daquexian Yes, the model size bloat has indeed been resolved. However, the structure of the model appears to be in much the same state as before optimization. Have you added an implementation that minimizes structural optimization when I have not yet done enough testing on other models, but I imagine that when hitnet_xl_sf_finalpass_from_tf_720x1280_disabel_tile_opt.onnx.zip
I see that |
@PINTO0309 Thanks for your try.
Will the problem be alleviated if onnxsim fuses all ConstantOfShape whose size < 1M? |
:) I will do it tomorrow (if I have some spare time) |
I temporarily enable the constant folding of ConstantOfShape in version 0.3.10. It is a bit hard in current onnxsim to fold only ConstantOfShape whose shape is smaller than some threshold. I'll implement it in the next version -- 0.4.0. |
Thank you! I will close this issue once this discussion is different from the original issue. I appreciate your efforts very much and use onnx-simplifier every day. 😃 |
|
@PINTO0309 Thanks! I'm very happy that you like it. :) |
1. Description
The tool's level of optimization of the model structure is very high, and in most situations its high optimization is effective. However, in some patterns, optimization may result in the final size of the model becoming bloated, exceeding Protocol Buffers' upper file size limit of 2GB, and optimization may fail. The situation is as follows.
The above is a pattern that is often reproduced by symptoms. Specifically, an 8MB ONNX file may exceed 2GB and Abort when optimized.
In the pre-optimized model, we found the following two Tile OPs where the problem occurred. These two Tiles generate Float32 input values and INT64 indicies to be passed to the next connected GatherElements, respectively.
This figure shows how my original workaround avoided generating a large number of INT64 constants, but if I used onnx-simplifier without taking any action, the constants at the points marked with arrows would generate 2.0 GB of constant values. Therefore, the file limit size of 2 GB for Protocol Buffers is exceeded, and onnx aborts.
2. My workaround
Before optimizing the model using onnx-simplifier, I extrapolated an operation to downcast from INT64 to INT32 just before the Tile OP using a utility for model processing that is implemented in onnx by default. This measure was unavoidably implemented knowing that it is not the best way, since the current onnx-simplifier does not provide any kind of optional flag to disable the full constantization of the Tile OP. However, it works well.
3. Feature Request
Therefore, it would be great if you could add an option to reduce the overall model size in order to apply the advanced optimization of the onnx-simplifier to more types of models. In particular, I would be very happy to add a flag to disable the constantization of Tile OPs, as mentioned in the above issue, and an option to specify downcasting of INT64 to INT32 for some OPs.
I know that some OPs do not accept any input other than INT64, but I am convinced that this tool would be even better if, with the exception of those OPs, it were possible to use as constants types with the smallest possible arithmetic precision, such as Float32 or INT32, while checking for overflow due to downcasting.
I personally investigated the logic of onnx-simplifier, onnx-optimizer and onnx in order to issue a pull request, but it was very difficult to understand because there were so many different things to investigate.
4. Remarks
I have begun building my own model compression tool to test this concept. I believe there are still many bugs due to insufficient validation. It has only been two days since I started making them. This tool is intended to further compress the overall size of the model after it has been optimized with onnx-simplifier. However, I originally wanted to incorporate this behavior as part of the internal behavior of onnx-simplifier.
"A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible. Added option to downcast from Float64 to Float32 and INT64 to INT32 to attempt size compression. Simple Constant value Shrink for ONNX. "
https://github.com/PINTO0309/scs4onnx
5. Sample Model
hitnet_xl_sf_finalpass_from_tf_720x1280.onnx.zip
hitnet_xl_sf_finalpass_from_tf_720x1280_cast.onnx.zip
hitnet_xl_sf_finalpass_from_tf_720x1280_cast_opt.onnx.zip
The text was updated successfully, but these errors were encountered: