Dipoorlet is an offline quantization tool that can perform offline quantization on ONNX model on a given calibration dataset:
- Support several Activation Calibration algorithms: Mse, Minmax, Hist, etc.
- Support Weight Transformation to achieve better quantization results: BiasCorrection, WeightEqualization, etc.
- Supports SOTA offline finetune algorithms to improve quantization performance: Adaround, Brecq, Qdrop.
- Generate Quantitative Parameters required for several platforms: SNP, TensorRT, STPU, ATLAS, etc.
- Provide detailed Quantitative Analysis to facilitate the identification of accuracy bottlenecks in model quantization.
git clone https://github.com/ModelTC/Dipoorlet.git
cd Dipoorlet
python setup.py install
Project using ONNXRuntime as inference runtime, using Pytorch as training tool, so users have to carefully make CUDA and CUDNN version right in order to make this two runtime work.
For example:
ONNXRuntime==1.10.0
and Pytorch==1.10.0-1.13.0
can runs under CUDA==11.4 CUDNN==8.2.4
Please visit ONNXRuntime and Pytorch.
ONNXRuntime has bug when running in docker when cpu-sets
is set.
Please check issue
The pre processed calibration data needs to be prepared and provided in a specific path form. For example, the model has two input tensors called "input_0" and "input_1", and the file structure is as follows:
cali_data_dir
|
├──input_0
│ ├──0.bin
│ ├──1.bin
│ ├──...
│ └──N-1.bin
└──input_1
├──0.bin
├──1.bin
├──...
└──N-1.bin
python -m torch.distributed.launch --use_env -m dipoorlet -M MODEL_PATH -I INPUT_PATH -N PIC_NUM -A [mse, hist, minmax] -D [trt, snpe, rv, atlas, ti, stpu] [--bc] [--adaround] [--brecq] [--drop]
python -m dipoorlet -M MODEL_PATH -I INPUT_PATH -N PIC_NUM -A [mse, hist, minmax] -D [trt, snpe, rv, atlas, ti, stpu] [--bc] [--adaround] [--brecq] [--drop] [--slurm | --mpirun]
- Using -M to specify ONNX model path.
- Using -A to select activation statistic algorithm, minmax, hist, mse.
- Using -D to select deploy platform, trt, snpe, rv, ti...
- Using -N to specify number of calibration pics.
- Using -I to specify path of calibration pics.
- Using -O to specify output path.
- For hist and kl:
--bins specify histogram bins.
--threshold specify histogram threshold for hist algorithm. - Using --bc to do Bias Correction algorithm.
- Using --we to do weight equalization.
- Using --adaround to do offline finetune by Adaround.
- Using --brecq to do offline finetune by Brecq.
- Using --brecq --drop to do offline finetune by Qdrop.
- Using --skip_layers to skip quantization of some layers.
- Using --slurm to launch task from slurm.
- Other usage can get by "python -m dipoorlet --h/-help"
Quantify an onnx model model.onnx, save 100 calibration data in workdir/data/, where "data" represents the name of the onnx model. Use “minmax“ activation value calibration algorithm, use “Qdrop“ to perform unlabeled fine tuning on weights, and finally generate TensorRT quantization configuration information:
workdir
|
├──data
├──0.bin
├──1.bin
├──...
└──99.bin
python -m torch.distributed.launch --use_env -m dipoorlet -M model.onnx -I workdir/ -N 100 -A minmax -D trt