Release TensorRT Execution Provider for ONNXRuntime 1.0 · NVIDIA/onnxruntime

TensorRT Execution Provider for ONNXRuntime v1.0 Release based on the TensorRT 10.0 Release and OnnxRuntime 1.19.2 Release

In this repository, multiple onnxruntime config prebuilt binaries focusing on NVIDIA-related products will be shipped. For now, the following configs are provided:

Windows
- DirectML+TensorRT+CUDA minimal
- DirectML+TensorRT+CUDA
Linux
- TensorRT+CUDA minimal

The CUDA minimal build features a minimized CUDA build that makes the CUDA EP merely an utility provider for the TensorRT EP. This includes functionalities such as memory allocations, device management etc. For users who want to use TensorRT as the main EP and do not need CUDA EP fallback, the size of the generated CUDA EP is significantly decreased by 6X with this config due to:

GPU kernels in the CUDA EP to run ONNX ops do not have to be compiled
Dropping dependencies of CUDA EP such as CUFFT, CURAND, CUBLAS, CUDNN

The only dependency for CUDA minimal + TRT are nvinfer, nvonnxparser, nvinfer_builder_resource and cudart. If a further decrease in shipping size is needed the nvinfer_builder_resource library (over 1GB) can be ommited by using ONNX embedded engines or ONNX embedded weightless engines, without this library TensorRT EP will no longer be able to build new engines, but only load existing ones.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT Execution Provider for ONNXRuntime 1.0