This tutorial will introduce step by step instructions on how to integrate models with Intel® Low Precision Optimization Tool.
Intel® Low Precision Optimization Tool supports three usages:
-
Fully yaml configuration: User specifies all the info through yaml, including dataloaders used in calibration and evaluation phases and quantization tuning settings.
For this usage, only model parameter is mandotory.
-
Partial yaml configuration: User specifies dataloaders used in calibration and evaluation phase by code. The tool provides built-in dataloaders and evaluators, user just need provide a dataset implemented iter or getitem methods and invoke dataloader() with dataset as input parameter to create lpot dataloader before calling quantizer().
After that, User specifies fp32 "model", calibration dataset "q_dataloader" and evaluation dataset "eval_dataloader". The calibrated and quantized model is evaluated with "eval_dataloader" with evaluation metrics specified in the configuration file. The evaluation tells the tuner whether the quantized model meets the accuracy criteria. If not, the tuner starts a new calibration and tuning flow.
For this usage, model, q_dataloader and eval_dataloader parameters are mandotory.
-
Partial yaml configuration: User specifies dataloaders used in calibration phase by code. This usage is quite similar with b), just user specifies a custom "eval_func" which encapsulates the evaluation dataset by itself. The calibrated and quantized model is evaluated with "eval_func". The "eval_func" tells the tuner whether the quantized model meets the accuracy criteria. If not, the Tuner starts a new calibration and tuning flow.
For this usage, model, q_dataloader and eval_func parameters are mandotory
User need choose corresponding usage according to code. For example, if user wants to minmal code changes, then the first usage is recommended. If user wants to leverage existing evaluation function, then the third usage is recommended. If user has no existing evaluation function and the metric used is supported by lpot, then the second usage is recommended.
Copy ptq.yaml or qat.yaml or pruning.yaml to work directory and modify correspondingly.
Below is an example for beginner.
model: # mandatory. lpot uses this model name and framework name to decide where to save tuning history and deploy yaml.
name: ssd_mobilenet_v1
framework: tensorflow
inputs: image_tensor
outputs: num_detections,detection_boxes,detection_scores,detection_classes
tuning:
accuracy_criterion:
relative: 0.01
exit_policy:
timeout: 0
max_trials: 300
random_seed: 9527
Below is an example for advance user, which constrain the tuning space by specifing calibration, quantization, tuning.ops fields accordingly.
model:
name: ssd_mobilenet_v1
framework: tensorflow
inputs: image_tensor
outputs: num_detections,detection_boxes,detection_scores,detection_classes
quantization:
calibration:
sampling_size: 10, 50
model_wise:
- weight:
- granularity: per_channel
scheme: asym
dtype: int8
algorithm: minmax
activation:
- granularity: per_tensor
scheme: asym
dtype: int8
algorithm: minmax
op_wise: {
'conv1': {
'activation': {'dtype': ['uint8', 'fp32'], 'algorithm': ['minmax', 'kl'], 'scheme':['sym']},
'weight': {'dtype': ['int8', 'fp32'], 'algorithm': ['kl']}
}
}
tuning:
accuracy_criterion:
relative: 0.01
objective: performance
exit_policy:
timeout: 36000
max_trials: 1000
workspace:
path: /path/to/saving/directory
resume: /path/to/a/specified/snapshot/file
a. Check if calibration or evaluation dataloader in user code meets Intel® Low Precision Optimization Tool requirements, that is whether it returns a tuple of (input, label). In classification networks, its dataloader usually yield output like this. As calication dataset does not need to have label, user need wrapper the loader to return a tuple of (input, _) for Intel® Low Precision Optimization Tool on this case. In object detection or NLP or recommendation networks, its dataloader usually yield output not like this, user need wrapper the loder to return a tuple of (input, label), in which "input" may be a object, a tuple or a dict.
b. Check if model in user code could be directly feed "input" got from #a. If not, user need wrapper the model to take "input" as input.
c. If user choose the first use case, that is using Intel® Low Precision Optimization Tool build-in metrics. User need ensure metric built in Intel® Low Precision Optimization Tool could take output of model and label of eval_dataloader as input.
|tensorflow| resnet50v1.0 | |tensorflow| resnet50v1.5 | |tensorflow| resnet101 | |tensorflow| inception_v1 | |tensorflow| inception_v2 | |tensorflow| inception_v3 | |tensorflow| inception_v4 | |tensorflow| inception_resnet_v2 | |tensorflow| mobilenetv1 | |tensorflow| ssd_resnet50_v1 | |tensorflow| mask_rcnn_inception_v2 | |tensorflow| wide_deep_large_ds | |tensorflow| vgg16 | |tensorflow| vgg19 | |tensorflow| resnetv2_50 | |tensorflow| resnetv2_101 | |tensorflow| resnetv2_152 | |tensorflow| densenet121 | |tensorflow| densenet161 | |tensorflow| densenet169 | |tensorflow| style_transfer | |tensorflow| retinanet | |tensorflow| googlenet-v3 | |tensorflow| faster_rcnn_resnet101_kitti | |tensorflow| faster_rcnn_resnet101_ava_v2.1 | |tensorflow| faster_rcnn_resnet101_coco | |tensorflow| vgg19-oob | |tensorflow| faster_rcnn_resnet101_lowproposals_coco | |tensorflow| faster_rcnn_resnet50_coco | |tensorflow| vgg16-oob | |tensorflow| faster_rcnn_resnet50_lowproposals_coco | |tensorflow| rfcn-resnet101-coco | |tensorflow| openpose-pose | |tensorflow| googlenet-v1 | |tensorflow| resnet-50 | |tensorflow| googlenet-v2 | |tensorflow| ssd-resnet34_300x300 | |tensorflow| ssd_resnet50_v1_fpn_coco | |tensorflow| RetinaNet50 | |tensorflow| googlenet-v4 | |tensorflow| faster_rcnn_inception_v2_coco | |tensorflow| yolo-v2-ava-sparse-35-0001 | |tensorflow| yolo-v2-ava-sparse-70-0001 | |tensorflow| resnet-152 | |tensorflow| resnet-v2-152 | |tensorflow| resnet-101 | |tensorflow| person-vehicle-bike-detection-crossroad-yolov3-1020 | |tensorflow| squeezenet-1.1 | |tensorflow| yolo-v3 | |tensorflow| resnet-v2-101 | |tensorflow| darknet53 | |pytorch| resnet18 | |pytorch| resnet50 | |pytorch| resnext101_32x8d | |pytorch| bert_base_MRPC | |pytorch| bert_base_CoLA | |pytorch| bert_base_STS-B | |pytorch| bert_base_SST-2 | |pytorch| bert_base_RTE | |pytorch| bert_large_MRPC | |pytorch| bert_large_SQuAD | |pytorch| bert_large_QNLI | |pytorch| bert_large_RTE | |pytorch| bert_large_CoLA | |pytorch| dlrm | |pytorch| resnet18_qat | |pytorch| resnet50_qat | |pytorch| inception_v3 | |pytorch| peleenet | |pytorch| yolo_v3 | |pytorch| se_resnext50_32x4d | |pytorch| mobilenet_v2 | |pytorch| resnest50 | |mxnet| resnet50v1 | |mxnet| inceptionv3 | |mxnet| mobilenet1.0 | |mxnet| mobilenetv2_1.0 | |mxnet| resnet18_v1 | |mxnet| squeezenet1.0 | |mxnet| ssd-resnet50_v1 | |mxnet| ssd-mobilenet1.0 | |mxnet| resnet152_v1 |