| Blog |
👋 Hi, I’m ThomasVonWu. I'd like to introduce you to a simple and practical repository that uses end-to-end model with sparse transformer to sense 3D obstacles. This repo has no complex dependency for Training | Inference | Deployment(which means, we don't need to install MMDetection3d, mmcv, mmcv-full, mmdeploy, etc.), so it's easy to install in your local workstation or supercomputing gpu clusters. This repository will also provide x86(NVIDIA RTX Series GPU) | ARM(NVIDIA ORIN) deployment solutions. Finally, you can deploy your e2e model onborad through this repo happily.
👀 I guess you are interested in:
1. how to define PyTorch custom operation: DeformableFeatureAggregation and register related onnx custom operator?
2. how to make custom plugin: DeformableFeatureAggregation in TensorRT engine?
3. how to export onnx with custom operations to TensorRT engine?
4. how to validate inference results consistency: PyTorch vs. ONNX vs. TensorRT ?
5. how to deploy temporal fusion transformer head successfully?
Model | ImgSize | Backbone | Framework | Precision | mAP | NDS | FPS | GPU | config | ckpt | onnx | engine |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sparse4Dv3 | 256x704 | Resnet50 | PyTorch | FP32 | 56.37 | 70.97 | 19.8 | NVIDIA GeForce RTX 3090 | config | ckpt | -- | -- |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | FP32 | wait | wait | wait | NVIDIA GeForce RTX 3090 | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | FP16 | wait | wait | wait | NVIDIA GeForce RTX 3090 | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | INT8+FP16 | wait | wait | wait | NVIDIA GeForce RTX 3090 | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | FP32 | wait | wait | wait | NVIDIA ORIN | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | FP16 | wait | wait | wait | NVIDIA ORIN | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | INT8+FP16 | wait | wait | wait | NVIDIA ORIN | config | ckpt | wait | wait |
25 Aug, 2024
: I release repo: SparseEnd2End. The complete deployment solution will be released as soon as possible. Please stay tuned!
- Register custom operation : DeformableFeatureAggregation and export ONNX and TensorRT engine.
25 Aug, 2024
- Verify the consistency of reasoning results : DeformableFeatureAggregation PyToch Implementation vs. TensorRT plugin Implementation.
25 Aug, 2024
- Reasoning acceleration using CUDA shared memory and CUDA FP16 in DeformableFeatureAggregation plugin Implementation.
- Export SparseTransFormer Backbone ONNX&TensorRT engine.
8 Sep, 2024
- Verify the consistency of reasoning results : SparseTransFormer Backbone PyTorch Implementation vs. ONNX Runtime vs. TensorRT engine.
8 Sep, 2024
- Export SparseTransFormer Head ONNX and TensorRT engine.
- Verify the consistency of reasoning results : SparseTransFormer Head PyTorch Implementation vs. TensorRT engine.
- Reasoning acceleration using FlashAttention in replace of MultiheadAttention.
- Reasoning acceleration using FP16/INT8 in replace of FP32 of TensorRT engine.
- Image pre-processing, Instancbank Caching and model post-processing Implementation with C++.
- Reasoning acceleration : Image pre-processing Instancbank Caching and model post-processing Implementation with CUDA.
- Onboard: Full-link reasoning using CUDA, TensorRT and C++.
SparseEnd2End is a Sparse-Centric paradigm for end-to-end autonomous driving perception.
If you find SparseEnd2End useful in your research or applications, please consider giving me a star 🌟
08/25/2024: [v1.0.0] This repo now supports Training | Inference in NuscenesDataset. It includes: data dump in JSON, Training | Inference log caching, TensorBoard hooking, and so on.