Add optimization and quantization options to `optimum.exporters.onnx` #566

jplu · 2022-12-08T18:49:04Z

Feature request

Would be nice to have two more arguments in optimum.exporters.onnx in order to have the optimized and quantized version of the exported models along side with the "normal" ones. I can imagine something like:

python -m optimum.exporters.onnx --model <model-name> -OX -quantized-arch <arch> output

Where:

-OX corresponds to the already available O1, O2, O3 and O4 optimization possibilities.
-quantized-arch can take values such as arm64, avx2, avx512, avx512_vnni and tensorrt

Motivation

This will allow to very easily create optimized/quantized version of the models we need.

Your contribution

I might help on submiting a PR for it, but I'm not able to give a "when" for now.

The text was updated successfully, but these errors were encountered:

fxmarty · 2022-12-09T09:28:42Z

Yes I think it's a neat idea! Maybe it would be better to have a optimum-cli (#188) in the same fashion as transformers, and have commands as:

optimum-cli export --onnx --model <model-name> onnx_output/

Dynamic quantization:

optimum-cli quantize --onnxruntime --arch avx2 --path onnx_output/

ORT optimizations:

optimum-cli optimize --onnxruntime -O2 --path onnx_output/

And we could actually support the same for OpenVINO, Intel Neural Compressor, etc.

jplu · 2022-12-09T09:40:34Z

This is even better indeed! I will update here once I will have some time to start working on it :)

michaelbenayoun · 2022-12-09T09:51:07Z

Hi @jplu,
I actually had that in mind as well!

I think there are two things here:

The options to be added to the optimum.exporters.onnx CLI you mentioned.
The general optimum-cli tool, which would requrie more work and thought it terms of what we allow and so on.

If you want to contribute, focusing on the first one would be great!

jplu · 2022-12-09T09:56:54Z

Hi @michaelbenayoun
Yes, I meant working on the usual exporters as it is right now, not on the future CLI tool. Sorry, my bad, I should have been more precise.

fxmarty · 2022-12-09T10:05:26Z

Personal opinion, but I don't think introducing quantization / optimization in optimum.exporters.onnx is a really good design, as it is ONNX Runtime stuff and not related to ONNX. The design of step by step conversion / optimization sounds better imho.

jplu · 2022-12-09T10:24:54Z

Ok then let me know if I can help, I let you discuss together and decide what should be done :)

fxmarty · 2022-12-12T11:05:23Z

Hey @jplu I was thinking of creating the backbone of a optimum-cli for now only supporting the onnx export (e.g. by simply mapping to python -m optimum.exporters.onnx).

Once this is done, would you be interested in adding for example optimum-cli --onnxruntime --optimize? We can discuss more the design if you want :)

jplu · 2022-12-12T11:07:49Z

Hi @fxmarty, sounds perfect to me! Let's talk more about the design once the optimum-cli will be created

fxmarty · 2022-12-26T10:00:56Z

Hi @jplu , hope you do well!

A preliminary support for an optimum-cli has been introduced in the latest release. For now, things like

optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/

can be used, which is nothing really new ;-)

The code for it is in https://github.com/huggingface/optimum/tree/main/optimum/commands , so if you would like to add options in there for quantization or optimization, it would be really cool! To me the idea would be to obtain quantized/optimized models with 0 code. I can help you out if you want.

For the design, I was thinking something like this could be nice:

optimum-cli onnxruntime ...
optimum-cli onnxruntime --help
optimum-cli onnxruntime quantize ...
optimum-cli onnxruntime quantize --help
optimum-cli onnxruntime optimize ...
optimum-cli onnxruntime optimize --help

jplu · 2022-12-27T15:32:31Z

Hi @fxmarty! I'm doing well, I hope you too!

Thanks a lot for sharing this, I will carefully read what you shared once I will be back from vacation and will let you know here if I have questions.

What you suggest as usage is fine for me 👌

jplu · 2023-01-06T14:27:42Z

I was checking how to properly implement the optimize and quantize commands. I have two questions:

No need to perform an export when running these two new commands?
What do you think about this list of options for each of them:
- for onnxruntime optimize:
  - --onnx_model: the folder where the ONNX model files are (might be many in case of a decoder model for example)
  - --level: the level of optimization (an int from 1 to 4)
- for onnxruntime quantize:
  - --onnx_model: the folder where the ONNX model files are (might be many in case of a decoder model for example)
  - --arch: an enum from arm64, avx2, avx512, avx512_vnni and tensorrt

Does-it seems ok for you?

fxmarty · 2023-01-06T22:37:36Z

No need to perform an export when running these two new commands?

It's a good question - ultimately we would want to obtain the best possible model as a single command / one-liner, so it could make sense to have it directly in optimum-cli export onnx. There's the same question for merging the decoder without/with past @JingyaHuang .

For now, I would say it is reasonable to keep optimum-cli export onnx untouched, and just add optimum-cli onnxruntime optimize and optimum-cli onnxruntime quantize, that already expect a folder with ONNX model(s). I think what you suggest is great!

michaelbenayoun · 2023-01-13T16:14:20Z

I agree, and we can always link optimum-cli export onnx ... --optimize to optimum-cli onnxruntime optimize later.

About the options, I would not make them explicit, what about:

optimum-cli onnxruntime optimize -O3 optimize path_to_my_model path_to_optimized_model
optimum-cli onnxruntime quantize path_quantization_config_or_arch_name path_to_my_model path_to_quantized_model

WDTY?

jplu · 2023-01-13T16:21:57Z

I'm not against to have them explicit. I don't get what is the optimize after -O3 for?

michaelbenayoun · 2023-01-16T10:21:17Z

It's an error !

fxmarty · 2023-02-27T13:02:29Z

Hi, just to let you know @jplu ONNX Runtime optimization was added in the ONNX export in #807, because in the next release the ONNX exporter automatically fuses subgraphs of the decoder into a single ONNX, and ONNX Runtime optimizations can not be applied on a model with subgraphs.

I haven't tried but I suspect there'll be the same issue with quantization.

jplu · 2023-02-27T13:46:13Z

No problem! It makes sense :-)

jplu mentioned this issue Jan 17, 2023

Add optimize and quantize command CLI #700

Merged

3 tasks

jplu closed this as completed Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimization and quantization options to `optimum.exporters.onnx` #566

Add optimization and quantization options to `optimum.exporters.onnx` #566

jplu commented Dec 8, 2022

fxmarty commented Dec 9, 2022

jplu commented Dec 9, 2022 •

edited

Loading

michaelbenayoun commented Dec 9, 2022

jplu commented Dec 9, 2022

fxmarty commented Dec 9, 2022

jplu commented Dec 9, 2022

fxmarty commented Dec 12, 2022

jplu commented Dec 12, 2022 •

edited

Loading

fxmarty commented Dec 26, 2022 •

edited

Loading

jplu commented Dec 27, 2022

jplu commented Jan 6, 2023

fxmarty commented Jan 6, 2023

michaelbenayoun commented Jan 13, 2023

jplu commented Jan 13, 2023

michaelbenayoun commented Jan 16, 2023

fxmarty commented Feb 27, 2023

jplu commented Feb 27, 2023

Add optimization and quantization options to optimum.exporters.onnx #566

Add optimization and quantization options to optimum.exporters.onnx #566

Comments

jplu commented Dec 8, 2022

Feature request

Motivation

Your contribution

fxmarty commented Dec 9, 2022

jplu commented Dec 9, 2022 • edited Loading

michaelbenayoun commented Dec 9, 2022

jplu commented Dec 9, 2022

fxmarty commented Dec 9, 2022

jplu commented Dec 9, 2022

fxmarty commented Dec 12, 2022

jplu commented Dec 12, 2022 • edited Loading

fxmarty commented Dec 26, 2022 • edited Loading

jplu commented Dec 27, 2022

jplu commented Jan 6, 2023

fxmarty commented Jan 6, 2023

michaelbenayoun commented Jan 13, 2023

jplu commented Jan 13, 2023

michaelbenayoun commented Jan 16, 2023

fxmarty commented Feb 27, 2023

jplu commented Feb 27, 2023

Add optimization and quantization options to `optimum.exporters.onnx` #566

Add optimization and quantization options to `optimum.exporters.onnx` #566

jplu commented Dec 9, 2022 •

edited

Loading

jplu commented Dec 12, 2022 •

edited

Loading

fxmarty commented Dec 26, 2022 •

edited

Loading