Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimization and quantization options to optimum.exporters.onnx #566

Closed
jplu opened this issue Dec 8, 2022 · 17 comments
Closed

Add optimization and quantization options to optimum.exporters.onnx #566

jplu opened this issue Dec 8, 2022 · 17 comments

Comments

@jplu
Copy link
Contributor

jplu commented Dec 8, 2022

Feature request

Would be nice to have two more arguments in optimum.exporters.onnx in order to have the optimized and quantized version of the exported models along side with the "normal" ones. I can imagine something like:

python -m optimum.exporters.onnx --model <model-name> -OX -quantized-arch <arch> output

Where:

  • -OX corresponds to the already available O1, O2, O3 and O4 optimization possibilities.
  • -quantized-arch can take values such as arm64, avx2, avx512, avx512_vnni and tensorrt

Motivation

This will allow to very easily create optimized/quantized version of the models we need.

Your contribution

I might help on submiting a PR for it, but I'm not able to give a "when" for now.

@fxmarty
Copy link
Contributor

fxmarty commented Dec 9, 2022

Yes I think it's a neat idea! Maybe it would be better to have a optimum-cli (#188) in the same fashion as transformers, and have commands as:

optimum-cli export --onnx --model <model-name> onnx_output/

Dynamic quantization:

optimum-cli quantize --onnxruntime --arch avx2 --path onnx_output/

ORT optimizations:

optimum-cli optimize --onnxruntime -O2 --path onnx_output/

And we could actually support the same for OpenVINO, Intel Neural Compressor, etc.

@jplu
Copy link
Contributor Author

jplu commented Dec 9, 2022

This is even better indeed! I will update here once I will have some time to start working on it :)

@michaelbenayoun
Copy link
Member

Hi @jplu,
I actually had that in mind as well!

I think there are two things here:

  • The options to be added to the optimum.exporters.onnx CLI you mentioned.
  • The general optimum-cli tool, which would requrie more work and thought it terms of what we allow and so on.

If you want to contribute, focusing on the first one would be great!

@jplu
Copy link
Contributor Author

jplu commented Dec 9, 2022

Hi @michaelbenayoun
Yes, I meant working on the usual exporters as it is right now, not on the future CLI tool. Sorry, my bad, I should have been more precise.

@fxmarty
Copy link
Contributor

fxmarty commented Dec 9, 2022

Personal opinion, but I don't think introducing quantization / optimization in optimum.exporters.onnx is a really good design, as it is ONNX Runtime stuff and not related to ONNX. The design of step by step conversion / optimization sounds better imho.

@jplu
Copy link
Contributor Author

jplu commented Dec 9, 2022

Ok then let me know if I can help, I let you discuss together and decide what should be done :)

@fxmarty
Copy link
Contributor

fxmarty commented Dec 12, 2022

Hey @jplu I was thinking of creating the backbone of a optimum-cli for now only supporting the onnx export (e.g. by simply mapping to python -m optimum.exporters.onnx).

Once this is done, would you be interested in adding for example optimum-cli --onnxruntime --optimize? We can discuss more the design if you want :)

@jplu
Copy link
Contributor Author

jplu commented Dec 12, 2022

Hi @fxmarty, sounds perfect to me! Let's talk more about the design once the optimum-cli will be created

@fxmarty
Copy link
Contributor

fxmarty commented Dec 26, 2022

Hi @jplu , hope you do well!

A preliminary support for an optimum-cli has been introduced in the latest release. For now, things like

optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/

can be used, which is nothing really new ;-)

The code for it is in https://github.com/huggingface/optimum/tree/main/optimum/commands , so if you would like to add options in there for quantization or optimization, it would be really cool! To me the idea would be to obtain quantized/optimized models with 0 code. I can help you out if you want.

For the design, I was thinking something like this could be nice:

optimum-cli onnxruntime ...
optimum-cli onnxruntime --help
optimum-cli onnxruntime quantize ...
optimum-cli onnxruntime quantize --help
optimum-cli onnxruntime optimize ...
optimum-cli onnxruntime optimize --help

@jplu
Copy link
Contributor Author

jplu commented Dec 27, 2022

Hi @fxmarty! I'm doing well, I hope you too!

Thanks a lot for sharing this, I will carefully read what you shared once I will be back from vacation and will let you know here if I have questions.

What you suggest as usage is fine for me 👌

@jplu
Copy link
Contributor Author

jplu commented Jan 6, 2023

I was checking how to properly implement the optimize and quantize commands. I have two questions:

  • No need to perform an export when running these two new commands?
  • What do you think about this list of options for each of them:
    • for onnxruntime optimize:
      • --onnx_model: the folder where the ONNX model files are (might be many in case of a decoder model for example)
      • --level: the level of optimization (an int from 1 to 4)
    • for onnxruntime quantize:
      • --onnx_model: the folder where the ONNX model files are (might be many in case of a decoder model for example)
      • --arch: an enum from arm64, avx2, avx512, avx512_vnni and tensorrt

Does-it seems ok for you?

@fxmarty
Copy link
Contributor

fxmarty commented Jan 6, 2023

No need to perform an export when running these two new commands?

It's a good question - ultimately we would want to obtain the best possible model as a single command / one-liner, so it could make sense to have it directly in optimum-cli export onnx. There's the same question for merging the decoder without/with past @JingyaHuang .

For now, I would say it is reasonable to keep optimum-cli export onnx untouched, and just add optimum-cli onnxruntime optimize and optimum-cli onnxruntime quantize, that already expect a folder with ONNX model(s). I think what you suggest is great!

@michaelbenayoun
Copy link
Member

I agree, and we can always link optimum-cli export onnx ... --optimize to optimum-cli onnxruntime optimize later.

About the options, I would not make them explicit, what about:

  • optimum-cli onnxruntime optimize -O3 optimize path_to_my_model path_to_optimized_model
  • optimum-cli onnxruntime quantize path_quantization_config_or_arch_name path_to_my_model path_to_quantized_model

WDTY?

@jplu
Copy link
Contributor Author

jplu commented Jan 13, 2023

I'm not against to have them explicit. I don't get what is the optimize after -O3 for?

@michaelbenayoun
Copy link
Member

It's an error !

@fxmarty
Copy link
Contributor

fxmarty commented Feb 27, 2023

Hi, just to let you know @jplu ONNX Runtime optimization was added in the ONNX export in #807, because in the next release the ONNX exporter automatically fuses subgraphs of the decoder into a single ONNX, and ONNX Runtime optimizations can not be applied on a model with subgraphs.

I haven't tried but I suspect there'll be the same issue with quantization.

@jplu
Copy link
Contributor Author

jplu commented Feb 27, 2023

No problem! It makes sense :-)

@jplu jplu closed this as completed Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants