-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(autoware_tensorrt_yolox): add GPU - CUDA device option #8245
feat(autoware_tensorrt_yolox): add GPU - CUDA device option #8245
Conversation
Thank you for contributing to the Autoware project! 🚧 If your pull request is in progress, switch it to draft mode. Please ensure:
|
7572d4f
to
70c43ba
Compare
@ismetatabay Please make sure to submit the PR after all the CIs are passing (json schema check is failing) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8245 +/- ##
==========================================
- Coverage 24.11% 23.99% -0.13%
==========================================
Files 1399 1397 -2
Lines 102423 102224 -199
Branches 38926 38778 -148
==========================================
- Hits 24702 24529 -173
+ Misses 75223 75187 -36
- Partials 2498 2508 +10
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. |
perception/autoware_tensorrt_yolox/include/autoware/tensorrt_yolox/tensorrt_yolox_node.hpp
Outdated
Show resolved
Hide resolved
@@ -90,6 +91,10 @@ TrtYoloXNode::TrtYoloXNode(const rclcpp::NodeOptions & node_options) | |||
const tensorrt_common::BatchConfig batch_config{1, 1, 1}; | |||
const size_t max_workspace_size = (1 << 30); | |||
|
|||
if (!setCudaDeviceId(gpu_id_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ismetatabay
We appreciate your contribution. I guess engine generation by TensorRT would be better to be performed on the target device (GPU) on which inference will run. From this perspective, I wonder we need to set target device before engine generation; so I would appreciate it if you consider that order of operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manato -san, thank you for your valuable comments. Since I am using the same 3xGPU model for our tests, the created engine file for GPU 0 (default) is valid for the remaining two GPUs. If there is another different model GPU, it will be a problem. Therefore, we can update the tensorrt_common
package to handle different GPUs as well. What do you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ismetatabay
Thank you for your consideration!
Yes, my main concern was such kinds of environments that different model GPUs exist. Regarding building engine on the specific device, it seems to be enough to call cudaSetDevice()
before an engine builder starts to work, according to FAQs: Q.How do I use TensorRT on multiple GPUs? (I believe you can confirm this behavior by monitoring which GPU is used using nvidia-smi
during engine building). If this is the case, I think we don't need to modify tensorrt_common
because we just need to call cudaSetDevice
before constructing tensorrt_common::TrtCommon
.
Sorry for repeatedly asking, but could you consider to move this cuda-related codes into tensorrt_yolox.cpp
for further encapsulation so that other nodes who use yolox as a module can enjoy your improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @manato -san, you are right. I am okay with moving everything to tensorrt_yolox.cpp
. With this update, traffic_light_fine_detector
can also select GPU as you mentioned. However, if I move everything to TrtYoloX
class, I still get an illegal memory access error from CUDA unless I call cudaSetDevice
before creating the trt_yolox_
object in the TrtYoloXNode
constructor. Do you have any suggestions on that?
autoware.universe/perception/autoware_tensorrt_yolox/src/tensorrt_yolox_node.cpp
Lines 92 to 97 in 3e79b8f
trt_yolox_ = std::make_unique<tensorrt_yolox::TrtYoloX>( | |
model_path, precision, label_map_.size(), score_threshold, nms_threshold, build_config, | |
preprocess_on_gpu, calibration_image_list_path, norm_factor, cache_dir, batch_config, | |
max_workspace_size, color_map_path); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ismetatabay -san for your thoughtful consideration!
I guess the cause of the issue you observed is this line; the line attempts to create cudaStream
on the default (0
) device during construction of an instance, and the stream will be used for memory copy before/after inference. Could you please try to move to call makeCudaStream()
after setting the device in the class constructor and see if it makes a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @manato -san, sorry for the late update. I was on leave and only had a chance to update it today. I have moved the updates to the tensorrt_yolox. Could you please check it? Thank you 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ismetatabay -san, thank you for your continuous updates. The modifications look good to me!
136a5ff
to
e581f64
Compare
941c3f9
to
6e136f5
Compare
/review |
/improve |
PR Reviewer Guide 🔍
|
perception/autoware_traffic_light_fine_detector/src/traffic_light_fine_detector_node.cpp
Outdated
Show resolved
Hide resolved
perception/autoware_tensorrt_yolox/include/autoware/tensorrt_yolox/tensorrt_yolox.hpp
Outdated
Show resolved
Hide resolved
perception/autoware_tensorrt_yolox/include/autoware/tensorrt_yolox/tensorrt_yolox.hpp
Outdated
Show resolved
Hide resolved
1e1688d
to
a8a998a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes.
Finally, could you please perform a functionality check with all the changes included and document the verification method and results?
Signed-off-by: ismetatabay <ismet@leodrive.ai>
Signed-off-by: ismetatabay <ismet@leodrive.ai>
Signed-off-by: ismetatabay <ismet@leodrive.ai>
Signed-off-by: ismetatabay <ismet@leodrive.ai>
Signed-off-by: ismetatabay <ismet@leodrive.ai> style(pre-commit): autofix
Signed-off-by: ismetatabay <ismet@leodrive.ai>
Signed-off-by: ismetatabay <ismet@leodrive.ai>
Signed-off-by: ismetatabay <ismet@leodrive.ai>
Signed-off-by: ismetatabay <ismet@leodrive.ai>
e0537f1
to
c29d6bc
Compare
Hello @Shin-kyoto -san, @manato -san, @Owen-Liuyuxuan -san, Thank you for your review and valuable comments. Today, I tested the latest version of the PR with our test vehicle, and everything worked as expected. The vehicle has 8 cameras and 3 GPUs; I divided the 8 cameras between GPU 1 and GPU 2 (4 cameras each) for the tests. The NVIDIA SMI output is shown in the following image. I used the following launch files to launch the tensorrt_yolox package at our vehicle: If it is okay for you, I think we can merge this PR. |
I have no further issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- I confirmed that the content of this PR is enough to merge.
- I did NOT check this PR by running autoware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ismetatabay
Thank you for your continuous updating to solve my concerns. LGTM!
I confirmed this modification causes no problem at least x86 laptop with 1 GPU.
…foundation#8245) * init CUDA device option Signed-off-by: ismetatabay <ismet@leodrive.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…foundation#8245) * init CUDA device option Signed-off-by: ismetatabay <ismet@leodrive.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Batuhan Beytekin <batuhanbeytekin@gmail.com>
…foundation#8245) * init CUDA device option Signed-off-by: ismetatabay <ismet@leodrive.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
This PR adds a parameter to specify which GPU will be used for
tensorrt_yolox
inference on multiple GPU devices.Related links
Parent Issue:
tensorrt_yolox
node does not support GPU selection #8178How was this PR tested?
This PR has been tested with 8 cameras on a 3-GPU computer. Since GPU-0 is used for the lidar centerpoint and other things, the 8 cameras are divided between GPU-1 and GPU-2. The NVIDIA-SMI output can be found in the following image.
Notes for reviewers
None.
Interface changes
gpu_id
int
0
Effects on system behavior
None.