-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimization and quantization options to optimum.exporters.onnx
#566
Comments
Yes I think it's a neat idea! Maybe it would be better to have a
Dynamic quantization:
ORT optimizations:
And we could actually support the same for OpenVINO, Intel Neural Compressor, etc. |
This is even better indeed! I will update here once I will have some time to start working on it :) |
Hi @jplu, I think there are two things here:
If you want to contribute, focusing on the first one would be great! |
Hi @michaelbenayoun |
Personal opinion, but I don't think introducing quantization / optimization in |
Ok then let me know if I can help, I let you discuss together and decide what should be done :) |
Hey @jplu I was thinking of creating the backbone of a Once this is done, would you be interested in adding for example |
Hi @fxmarty, sounds perfect to me! Let's talk more about the design once the |
Hi @jplu , hope you do well! A preliminary support for an
can be used, which is nothing really new ;-) The code for it is in https://github.com/huggingface/optimum/tree/main/optimum/commands , so if you would like to add options in there for quantization or optimization, it would be really cool! To me the idea would be to obtain quantized/optimized models with 0 code. I can help you out if you want. For the design, I was thinking something like this could be nice:
|
Hi @fxmarty! I'm doing well, I hope you too! Thanks a lot for sharing this, I will carefully read what you shared once I will be back from vacation and will let you know here if I have questions. What you suggest as usage is fine for me 👌 |
I was checking how to properly implement the
Does-it seems ok for you? |
It's a good question - ultimately we would want to obtain the best possible model as a single command / one-liner, so it could make sense to have it directly in For now, I would say it is reasonable to keep |
I agree, and we can always link About the options, I would not make them explicit, what about:
WDTY? |
I'm not against to have them explicit. I don't get what is the |
It's an error ! |
Hi, just to let you know @jplu ONNX Runtime optimization was added in the ONNX export in #807, because in the next release the ONNX exporter automatically fuses subgraphs of the decoder into a single ONNX, and ONNX Runtime optimizations can not be applied on a model with subgraphs. I haven't tried but I suspect there'll be the same issue with quantization. |
No problem! It makes sense :-) |
Feature request
Would be nice to have two more arguments in
optimum.exporters.onnx
in order to have the optimized and quantized version of the exported models along side with the "normal" ones. I can imagine something like:Where:
-OX
corresponds to the already availableO1
,O2
,O3
andO4
optimization possibilities.-quantized-arch
can take values such asarm64
,avx2
,avx512
,avx512_vnni
andtensorrt
Motivation
This will allow to very easily create optimized/quantized version of the models we need.
Your contribution
I might help on submiting a PR for it, but I'm not able to give a "when" for now.
The text was updated successfully, but these errors were encountered: