YOLOv3-TensorRT-INT8-KCF is a TensorRT Int8-Quantization implementation of YOLOv3 (and tiny) on NVIDIA Jetson Xavier NX Board. The dataset we provide is a red ball. So we also use this to drive a car to catch the red ball, along with KCF, a traditional Object Tracking method.
- TensorRT >= 7.0.0 (Pre-installed on NX.)
- OpenCV and opencv_contrib == 3.4.0 (See here for installation help.)
On GPU server, not NX.
git clone https://github.com/lingffff/YOLOv3-TensorRT-INT8-KCF.git
cd YOLOv3-TensorRT-INT8-KCF
cd yolov3
# Download official pre-trained COCO darknet weights
sh weights/download_yolov3_weights.sh
Download redball dataset here, unzip and replace the folder redball. Then start training. Remove '--tiny' if you train YOLOv3 model.
python train.py --device 0 --tiny
Now we get YOLOv3(tiny) weights in weights/best.pt. Transfer it to binary file redball(-tiny).wts, which convert weights to TensorRT for building inference engine.
python gen_wts.py --tiny
Then copy ./redball(-tiny).wts to NX Board.
On NX Platform below.
git https://github.com/lingffff/YOLOv3-TensorRT-INT8-KCF.git
cd YOLOv3-TensorRT-INT8-KCF
# Put redball(-tiny).wts in YOLOv3-TensorRT-INT8-KCF
Build the project.
mkdir build
cd build
# YOLOv3: -DTINY=OFF, tiny: -DTINY=ON
cmake -DTINY=ON ..
make -j$(nproc)
Now we get executable file build_engine and detect.
Run build_engine. Use -s argument to specify quantization options: int8, fp16, fp32(default).
./build_engine -s int8
Run detect to detect pictures or camera video. You can also check KCF tracking method here by other options below.
./detect -d ../samples
Options:
Argument | Description |
---|---|
-d <folder> | Detect pictures in the folder. |
-v | Detect camera video stream. |
-t | Detect video along with KCF tracking method. |
Models | Device | BatchSize | Mode | Input Size | Speed |
---|---|---|---|---|---|
YOLOv3 | NX | 1 | FP32 | 416x416 | 85ms |
YOLOv3 | NX | 1 | FP16 | 416x416 | 30ms |
YOLOv3 | NX | 1 | INT8 | 416x416 | 26ms |
YOLOv3-tiny | NX | 1 | FP32 | 416x416 | 26ms |
YOLOv3-tiny | NX | 1 | FP16 | 416x416 | 19ms |
YOLOv3-tiny | NX | 1 | INT8 | 416x416 | 20ms |
Wow! FP16 is amazing!!!
- Convert weights to TensorRT by a more common way, like ONNX.
- Run detection and tracking multi-thread-ly.
- Implement a Quantization & Inference framework myself.
YOLOv3 Pytorch implementation from ultralytics/yolov3.
YOLOv3 TensorRT implementation from wang-xinyu/tensorrtx.
TensorRT Int8 implementation from NVIDIA/TensorRT/samples/sampleINT8.
With my sincerely appreciation!
Just call me Al (not ai but al. LOL.) / Albert / lingff.
E-mail: ling@stu.pku.edu.cn
Gitee: https://gitee.com/lingff
CSDN: https://blog.csdn.net/weixin_43214408