Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] Add new provider option to exclude ops from running on TRT #23705

Merged
merged 9 commits into from
Feb 21, 2025

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Feb 14, 2025

This PR removes the implicit filtering-out DDS ops from running on TRT. In other words, by default, DDS nodes will be run by TRT if it supports.

Moreover, it adds new provider option trt_op_types_to_exclude:

  • User can provide op type list to be excluded from running on TRT
  • e.g. trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlignl"

(This PR basically adds back featurethat previously being held to merge.)

[Note]
There may be potential performance issues in TRT 10 when running models that contain DDS operations such as NonMaxSuppression, NonZero, and RoiAlign (e.g., Faster-RCNN).
If user encounters significant performance degradation, we suggest specifying those DDS ops to be excluded from running by TRT, i.e. trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlign". Those DDS nodes will be run by CUDA EP or CPU.

@chilo-ms
Copy link
Contributor Author

This PR also modifies NonZero and NMS unit-tests to accommodate TRT's output.

@yf711
Copy link
Contributor

yf711 commented Feb 21, 2025

I've tested on Windows with FasterRCNN/MaskRCNN and see no perf regression on latency after enabling DDS.

@chilo-ms chilo-ms merged commit 23f787e into main Feb 21, 2025
96 of 98 checks passed
@chilo-ms chilo-ms deleted the chi/trt_ops_to_exclude branch February 21, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants