TensorRT Execution Provider for ONNXRuntime v1.0 Release based on the TensorRT 10.0 Release and OnnxRuntime 1.19.2 Release
In this repository, multiple onnxruntime config prebuilt binaries focusing on NVIDIA-related products will be shipped. For now, the following configs are provided:
- Windows
- DirectML+TensorRT+CUDA minimal
- DirectML+TensorRT+CUDA
- Linux
- TensorRT+CUDA minimal
The CUDA minimal build features a minimized CUDA build that makes the CUDA EP merely an utility provider for the TensorRT EP. This includes functionalities such as memory allocations, device management etc. For users who want to use TensorRT as the main EP and do not need CUDA EP fallback, the size of the generated CUDA EP is significantly decreased by 6X with this config due to:
- GPU kernels in the CUDA EP to run ONNX ops do not have to be compiled
- Dropping dependencies of CUDA EP such as CUFFT, CURAND, CUBLAS, CUDNN
The only dependency for CUDA minimal + TRT are nvinfer
, nvonnxparser
, nvinfer_builder_resource
and cudart
. If a further decrease in shipping size is needed the nvinfer_builder_resource
library (over 1GB) can be ommited by using ONNX embedded engines or ONNX embedded weightless engines, without this library TensorRT EP will no longer be able to build new engines, but only load existing ones.