-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Cudagraphs integration for Torch-TRT + Non-default Stream Utilization #2881
Conversation
@@ -168,27 +210,66 @@ std::vector<at::Tensor> execute_engine(std::vector<at::Tensor> inputs, c10::intr | |||
auto dims = core::util::toVec(out_shape); | |||
auto type = util::TRTDataTypeToScalarType(compiled_engine->exec_ctx->getEngine().getTensorDataType(name.c_str())); | |||
outputs[pyt_idx] = std::move(at::empty(dims, {at::kCUDA}).to(type).contiguous()); | |||
|
|||
// In cudagraphs mode, the allocated output buffers are stored for reuse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a downside to always using the buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also what is the required lifetime of these buffers, can they be deallocated at any point or do they need to be reserved as long as the cuda graph is active?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not in cudagraphs mode, the main downside is the requirement that buffers need to change size/be reallocated in Dynamic shape usecases.
My understanding is that these I/O buffers need to be reserved for the duration of the cudagraph being active
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7625851
to
62e7b82
Compare
a45081f
to
19624b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending rebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- Add option to enable Cudagraphs in the runtime for additional acceleration of TRT engines - Add C++ and Python toggles and well as full integration for C++ and Python runtimes - Add support for dynamic shape cases via shape keys with cache invalidation - Add test cases for cudagraphs support
- Add support for non-default streams
ee1dc06
to
b01a379
Compare
Description
Addresses #2736
Type of change
Checklist: