diff --git a/.gitattributes b/.gitattributes index 5ca8b9a..5f9f974 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1,3 +1,6 @@ +# Ignore Python files in linguist +*.py linguist-detectable=false + # Images *.gif filter=lfs diff=lfs merge=lfs -text *.jpg filter=lfs diff=lfs merge=lfs -text @@ -16,5 +19,10 @@ *.so filter=lfs diff=lfs merge=lfs -text *.so.* filter=lfs diff=lfs merge=lfs -text +# ROS Bags +**/resources/**/*.zstd filter=lfs diff=lfs merge=lfs -text +**/resources/**/*.db3 filter=lfs diff=lfs merge=lfs -text +**/resources/**/*.yaml filter=lfs diff=lfs merge=lfs -text + # Model files -*.onnx filter=lfs diff=lfs merge=lfs -text \ No newline at end of file +*.onnx filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md index dfee85b..4b6e37e 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,34 @@ # Isaac ROS DNN Inference -
Isaac ROS DNN Inference Sample Output (DOPE)Isaac ROS DNN Inference Sample Output (PeopleSemSegnet)
+
bounding box for people detection segementation mask for people detection
--- + ## Webinar Available + Learn how to use this package by watching our on-demand webinar: [Accelerate YOLOv5 and Custom AI Models in ROS with NVIDIA Isaac](https://gateway.on24.com/wcc/experience/elitenvidiabrill/1407606/3998202/isaac-ros-webinar-series) --- ## Overview -This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom models. One node uses the TensorRT SDK, while the other uses the Triton SDK. This repository also contains a node to preprocess images, and convert them into tensors for use by TensorRT and Triton. +Isaac ROS DNN Inference contains ROS 2 packages for performing DNN inference, providing AI-based perception for robotics applications. DNN inference uses a pre-trained DNN model to ingest an input Tensor and output a prediction to an output Tensor. + +
graph of ROS nodes for DNN inference on images
+ +Above is a typical graph of nodes for DNN inference on image data. The input image is resized to match the input resolution of the DNN; the image resolution may be reduced to improve DNN inference performance ,which typically scales directly with the number of pixels in the image. DNN inference requires input Tensors, so a DNN encoder node is used to convert from an input image to Tensors, including any data pre-processing that is required for the DNN model. Once DNN inference is performed, the DNN decoder node is used to convert the output Tensors to results that can be used by the application. + +TensorRT and Triton are two separate ROS nodes to perform DNN inference. The TensorRT node uses [TensorRT](https://developer.nvidia.com/tensorrt) to provide high-performance deep learning inference. TensorRT optimizes the DNN model for inference on the target hardware, including Jetson and discrete GPUs. It also supports specific operations that are commonly used by DNN models. For newer or bespoke DNN models, TensorRT may not support inference on the model. For these models, use the Triton node. -### TensorRT +The Triton node uses the [Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server), which provides a compatible frontend supporting a combination of different inference backends (e.g. ONNX Runtime, TensorRT Engine Plan, TensorFlow, PyTorch). In-house benchmark results measure little difference between using TensorRT directly or configuring Triton to use TensorRT as a backend. -TensorRT is a library that enables faster inference on NVIDIA GPUs; it provides an API for the user to load and execute inference with their own models. The TensorRT ROS2 node in this package integrates with the TensorRT API, so the user has no need to make any calls to or directly use TensorRT SDK. Instead, users simply configure the TensorRT node with their own custom models and parameters, and the node will make the necessary TensorRT API calls to load and execute the model. For further documentation on TensorRT, refer to the main page [here](https://developer.nvidia.com/tensorrt). +Some DNN models may require custom DNN encoders to convert the input data to the Tensor format needed for the model, and custom DNN decoders to convert from output Tensors into results that can be used in the application. Leverage the DNN encoder and DNN decoder node(s) for image bounding box detection and image segmentation, or your own custom node(s). -### Triton +> **Note**: DNN inference can be performed on different types of input data, including audio, video, text, and various sensor data, such as LIDAR, camera, and RADAR. This package provides implementations for DNN encode and DNN decode functions for images, which are commonly used for perception in robotics. The DNNs operate on Tensors for their input, output, and internal transformations, so the input image needs to be converted to a Tensor for DNN inferencing. -Triton is a framework that brings up a generic inference server that can be configured with a model repository, which is a collection of various types of models (e.g. ONNX Runtime, TensorRT Engine Plan, TensorFlow, PyTorch). A brief tutorial on how to set up a model repository is included below, and further documentation on Triton is also available [here](https://github.com/triton-inference-server/server). +### DNN Models + +To perform DNN inferencing a DNN model is required. NGC provides [pre-trained models](https://catalog.ngc.nvidia.com/models) for use in your robotics application. Using [TAO](https://developer.nvidia.com/tao-toolkit) NGC pre-trained models can be fine-tuned for you application. Your own DNN models can be trained directly or download pre-trained models from one of the many model zoo’s available online for use with TensorRT and Triton ROS nodes. For more details about the setup of TensorRT and Triton, look [here](docs/tensorrt-and-triton-info.md). @@ -28,21 +38,22 @@ This package is powered by [NVIDIA Isaac Transport for ROS (NITROS)](https://dev ## Performance -The following are the benchmark performance results of the prepared pipelines in this package, by supported platform: - -| Pipeline | AGX Orin | Orin Nano | x86_64 w/ RTX 3060 Ti | -| ---------------------- | ------------------ | ------------------ | --------------------- | -| PeopleSemSegNet (544p) | 260 fps
3.7ms | 128 fps
6.7ms | 300 fps
2ms | +The following table summarizes the per-platform performance statistics of sample graphs that use this package, with links included to the full benchmark output. These benchmark configurations are taken from the [Isaac ROS Benchmark](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark#list-of-isaac-ros-benchmarks) collection, based on the [`ros2_benchmark`](https://github.com/NVIDIA-ISAAC-ROS/ros2_benchmark) framework. -These data have been collected per the methodology described [here](https://github.com/NVIDIA-ISAAC-ROS/.github/blob/main/profile/performance-summary.md#methodology). +| Sample Graph | Input Size | AGX Orin | Orin NX | Orin Nano 8GB | x86_64 w/ RTX 3060 Ti | +| ----------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [TensorRT Node
DOPE](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/scripts//isaac_ros_tensor_rt_dope_node.py) | VGA | [48.1 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_tensor_rt_dope_node-agx_orin.json)
22 ms | [17.2 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_tensor_rt_dope_node-orin_nx.json)
56 ms | [13.0 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_tensor_rt_dope_node-orin_nano_8gb.json)
79 ms | [94.9 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_tensor_rt_dope_node-x86_64_rtx_3060Ti.json)
10 ms | +| [Triton Node
DOPE](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/scripts//isaac_ros_triton_dope_node.py) | VGA | [48.0 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_triton_dope_node-agx_orin.json)
22 ms | [20.1 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_triton_dope_node-orin_nx.json)
540 ms | [14.5 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_triton_dope_node-orin_nano_8gb.json)
790 ms | [94.2 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_triton_dope_node-x86_64_rtx_3060Ti.json)
11 ms | +| [TensorRT Node
PeopleSemSegNet](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/scripts//isaac_ros_tensor_rt_ps_node.py) | 544p | [467 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_tensor_rt_ps_node-agx_orin.json)
2.3 ms | [270 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_tensor_rt_ps_node-orin_nx.json)
4.0 ms | [184 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_tensor_rt_ps_node-orin_nano_8gb.json)
9.0 ms | [1500 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_tensor_rt_ps_node-x86_64_rtx_3060Ti.json)
1.1 ms | +| [Triton Node
PeopleSemSegNet](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/scripts//isaac_ros_triton_ps_node.py) | 544p | [293 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_triton_ps_node-agx_orin.json)
3.7 ms | [191 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_triton_ps_node-orin_nx.json)
5.5 ms | -- | [512 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_triton_ps_node-x86_64_rtx_3060Ti.json)
2.1 ms | +| [DNN Image Encoder Node](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/scripts//isaac_ros_dnn_image_encoder_node.py) | VGA | [2230 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_dnn_image_encoder_node-agx_orin.json)
0.60 ms | [1560 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_dnn_image_encoder_node-orin_nx.json)
0.89 ms | -- | [5780 fps](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_dnn_image_encoder_node-x86_64_rtx_3060Ti.json)
0.45 ms | ## Table of Contents - [Isaac ROS DNN Inference](#isaac-ros-dnn-inference) - [Webinar Available](#webinar-available) - [Overview](#overview) - - [TensorRT](#tensorrt) - - [Triton](#triton) + - [DNN Models](#dnn-models) - [Isaac ROS NITROS Acceleration](#isaac-ros-nitros-acceleration) - [Performance](#performance) - [Table of Contents](#table-of-contents) @@ -76,18 +87,18 @@ These data have been collected per the methodology described [here](https://gith ## Latest Update -Update 2022-10-19: Updated OSS licensing +Update 2023-04-05: Source available GXF extensions ## Supported Platforms -This package is designed and tested to be compatible with ROS2 Humble running on [Jetson](https://developer.nvidia.com/embedded-computing) or an x86_64 system with an NVIDIA GPU. +This package is designed and tested to be compatible with ROS 2 Humble running on [Jetson](https://developer.nvidia.com/embedded-computing) or an x86_64 system with an NVIDIA GPU. -> **Note**: Versions of ROS2 earlier than Humble are **not** supported. This package depends on specific ROS2 implementation features that were only introduced beginning with the Humble release. +> **Note**: Versions of ROS 2 earlier than Humble are **not** supported. This package depends on specific ROS 2 implementation features that were only introduced beginning with the Humble release. -| Platform | Hardware | Software | Notes | -| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Jetson | [Jetson Orin](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/)
[Jetson Xavier](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/) | [JetPack 5.0.2](https://developer.nvidia.com/embedded/jetpack) | For best performance, ensure that [power settings](https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide/text/SD/PlatformPowerAndPerformance.html) are configured appropriately. | -| x86_64 | NVIDIA GPU | [Ubuntu 20.04+](https://releases.ubuntu.com/20.04/)
[CUDA 11.6.1+](https://developer.nvidia.com/cuda-downloads) | +| Platform | Hardware | Software | Notes | +| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Jetson | [Jetson Orin](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/)
[Jetson Xavier](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/) | [JetPack 5.1.1](https://developer.nvidia.com/embedded/jetpack) | For best performance, ensure that [power settings](https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide/text/SD/PlatformPowerAndPerformance.html) are configured appropriately. | +| x86_64 | NVIDIA GPU | [Ubuntu 20.04+](https://releases.ubuntu.com/20.04/)
[CUDA 11.8+](https://developer.nvidia.com/cuda-downloads) | ### Docker @@ -114,6 +125,10 @@ To simplify development, we strongly recommend leveraging the Isaac ROS Dev Dock git clone https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_nitros ``` + ```bash + git clone https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_image_pipeline + ``` + ```bash git clone https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_dnn_inference ``` @@ -338,12 +353,14 @@ To customize your development environment, reference [this guide](https://github #### ROS Parameters -| ROS Parameter | Type | Default | Description | -| ---------------------- | ------------- | ----------------- | ----------------------------------------------------------------------------------------------- | -| `network_image_width` | `uint16_t` | `0` | The image width that the network expects. This will be used to resize the input `image` width | -| `network_image_height` | `uint16_t` | `0` | The image height that the network expects. This will be used to resize the input `image` height | -| `image_mean` | `double list` | `[0.5, 0.5, 0.5]` | The mean of the images per channel that will be used for normalization | -| `image_stddev` | `double list` | `[0.5, 0.5, 0.5]` | The standard deviation of the images per channel that will be used for normalization | +| ROS Parameter | Type | Default | Description | +| ---------------------- | ------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| `network_image_width` | `uint16_t` | `0` | The image width that the network expects. This will be used to resize the input `image` width | +| `network_image_height` | `uint16_t` | `0` | The image height that the network expects. This will be used to resize the input `image` height | +| `image_mean` | `double list` | `[0.5, 0.5, 0.5]` | The mean of the images per channel that will be used for normalization | +| `image_stddev` | `double list` | `[0.5, 0.5, 0.5]` | The standard deviation of the images per channel that will be used for normalization | +| `resize_mode` | `int` | `0` | The mode to use when resizing an input image to match the required output dimensions
Supported values: Distort (`0`), Pad (`1`) | +| `num_blocks` | `int` | `40` | The number of pre-allocated memory blocks, should not be less than `40`. | > **Note**: the following parameters are no longer supported: > @@ -456,6 +473,7 @@ For solutions to problems with using DNN models, please check [here](docs/troubl | Date | Changes | | ---------- | ---------------------------------------------------------------------------------------------------------------------------- | +| 2023-04-05 | Source available GXF extensions | | 2022-10-19 | Updated OSS licensing | | 2022-08-31 | Update to be compatible with JetPack 5.0.2 | | 2022-06-30 | Added format string parameter in Triton/TensorRT, switched to NITROS implementation, removed parameters in DNN Image Encoder | diff --git a/docs/tensorrt-and-triton-info.md b/docs/tensorrt-and-triton-info.md index af171e3..4d2c449 100644 --- a/docs/tensorrt-and-triton-info.md +++ b/docs/tensorrt-and-triton-info.md @@ -28,13 +28,13 @@ Users can either prepare a custom model or download pre-trained models from NGC In order to be a useful component of a ROS graph, both Isaac ROS Triton and TensorRT inference nodes will require application-specific `pre-processor` (`encoder`) and `post-processor` (`decoder`) nodes to handle type conversion and other necessary steps. -A `pre-processor` node should take in a ROS2 message, perform the pre-processing steps dictated by the model, and then convert the data into an Isaac ROS Tensor List message. For example, a `pre-processor` node could resize an image, normalize it, and then convert it into a Tensor List. +A `pre-processor` node should take in a ROS 2 message, perform the pre-processing steps dictated by the model, and then convert the data into an Isaac ROS Tensor List message. For example, a `pre-processor` node could resize an image, normalize it, and then convert it into a Tensor List. -A `post-processor` node should be used to convert the Isaac ROS Tensor List output of the model inference into a useful ROS2 message. For example, a `post-processor` node may perform argmax to identify the class label from a classification problem. +A `post-processor` node should be used to convert the Isaac ROS Tensor List output of the model inference into a useful ROS 2 message. For example, a `post-processor` node may perform argmax to identify the class label from a classification problem.
-![Using TensorRT or Triton](../resources/pipeline.png "Using TensorRT or Triton") +![Using TensorRT or Triton](../resources/graph.png "Using TensorRT or Triton")
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 90e2d05..559ce63 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -21,3 +21,29 @@ One cause of this issue is when the GPU being used does not have enough memory t ### Solution Try using the Isaac ROS TensorRT node or the Isaac ROS Triton node with the TensorRT backend instead. Otherwise, a discrete GPU with more VRAM may be required. + +## Triton fails to create the TensorRT engine and load a model + +### Symptom + +```log +1: [component_container_mt-1] I0331 05:56:07.479791 11359 tensorrt.cc:5591] TRITONBACKEND_ModelInitialize: detectnet (version 1) +1: [component_container_mt-1] I0331 05:56:07.483989 11359 tensorrt.cc:5640] TRITONBACKEND_ModelInstanceInitialize: detectnet (GPU device 0) +1: [component_container_mt-1] I0331 05:56:08.169240 11359 logging.cc:49] Loaded engine size: 21 MiB +1: [component_container_mt-1] E0331 05:56:08.209208 11359 logging.cc:43] 1: [runtime.cpp::parsePlan::314] Error Code 1: Serialization (Serialization assertion plan->header.magicTag == rt::kPLAN_MAGIC_TAG failed.) +1: [component_container_mt-1] I0331 05:56:08.213483 11359 tensorrt.cc:5678] TRITONBACKEND_ModelInstanceFinalize: delete instance state +1: [component_container_mt-1] I0331 05:56:08.213525 11359 tensorrt.cc:5617] TRITONBACKEND_ModelFinalize: delete model state +1: [component_container_mt-1] E0331 05:56:08.214059 11359 model_lifecycle.cc:596] failed to load 'detectnet' version 1: Internal: unable to create TensorRT engine +1: [component_container_mt-1] ERROR: infer_trtis_server.cpp:1057 Triton: failed to load model detectnet, triton_err_str:Invalid argument, err_msg:load failed for model 'detectnet': version 1 is at UNAVAILABLE state: Internal: unable to create TensorRT engine; +1: [component_container_mt-1] +1: [component_container_mt-1] ERROR: infer_trtis_backend.cpp:54 failed to load model: detectnet, nvinfer error:NVDSINFER_TRITON_ERROR +1: [component_container_mt-1] ERROR: infer_simple_runtime.cpp:33 failed to initialize backend while ensuring model:detectnet ready, nvinfer error:NVDSINFER_TRITON_ERROR +1: [component_container_mt-1] ERROR: Error in createNNBackend() [UID = 16]: failed to initialize triton simple runtime for model:detectnet, nvinfer error:NVDSINFER_TRITON_ERROR +1: [component_container_mt-1] ERROR: Error in initialize() [UID = 16]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRITON_ERROR +``` + +### Solution + +This error can occur when TensorRT attempts to load an incompatible `model.plan` file. The incompatibility may arise due to a versioning or platform mismatch between the time of plan generation and the time of plan execution. + +Delete the `model.plan` file that is being passed in as an argument to the Triton node's `model_repository_paths` parameter, and then use the source package's instructions to regenerate the `model.plan` file from the original weights file (often a `.etlt` or `.onnx` file). diff --git a/isaac_ros_dnn_encoders/CMakeLists.txt b/isaac_ros_dnn_encoders/CMakeLists.txt index a19e6be..0d2de17 100644 --- a/isaac_ros_dnn_encoders/CMakeLists.txt +++ b/isaac_ros_dnn_encoders/CMakeLists.txt @@ -15,56 +15,21 @@ # # SPDX-License-Identifier: Apache-2.0 -cmake_minimum_required(VERSION 3.8) +cmake_minimum_required(VERSION 3.23.2) project(isaac_ros_dnn_encoders LANGUAGES C CXX) -# Default to C++17 -if(NOT CMAKE_CXX_STANDARD) - set(CMAKE_CXX_STANDARD 17) -endif() - if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") add_compile_options(-Wall -Wextra -Wpedantic) endif() -# Default to Release build -if(NOT CMAKE_BUILD_TYPE OR CMAKE_BUILD_TYPE STREQUAL "") - set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE) -endif() -message( STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}" ) - -execute_process(COMMAND uname -m COMMAND tr -d '\n' - OUTPUT_VARIABLE ARCHITECTURE -) -message( STATUS "Architecture: ${ARCHITECTURE}" ) - find_package(ament_cmake_auto REQUIRED) ament_auto_find_build_dependencies() -# Find VPI dependency -find_package(vpi REQUIRED) - -# DNN Image Encoder Node +# DnnImageEncoderNode ament_auto_add_library(dnn_image_encoder_node SHARED src/dnn_image_encoder_node.cpp) -target_compile_definitions(dnn_image_encoder_node - PRIVATE "COMPOSITION_BUILDING_DLL" -) -target_link_libraries(dnn_image_encoder_node) rclcpp_components_register_nodes(dnn_image_encoder_node "nvidia::isaac_ros::dnn_inference::DnnImageEncoderNode") set(node_plugins "${node_plugins}nvidia::isaac_ros::dnn_inference::DnnImageEncoderNode;$\n") -# Install config directory -install( - DIRECTORY config - DESTINATION share/${PROJECT_NAME} -) - -install(TARGETS dnn_image_encoder_node - ARCHIVE DESTINATION lib - LIBRARY DESTINATION lib - RUNTIME DESTINATION bin -) - if(BUILD_TESTING) find_package(ament_lint_auto REQUIRED) ament_lint_auto_find_test_dependencies() @@ -73,7 +38,6 @@ if(BUILD_TESTING) add_launch_test(test/isaac_ros_dnn_image_encoder_test.py) add_launch_test(test/isaac_ros_dnn_image_encoder_image_norm_test.py) add_launch_test(test/isaac_ros_dnn_image_encoder_image_resize_test.py) - endif() -ament_auto_package(INSTALL_TO_SHARE) +ament_auto_package(INSTALL_TO_SHARE config) diff --git a/isaac_ros_dnn_encoders/include/isaac_ros_dnn_encoders/dnn_image_encoder_node.hpp b/isaac_ros_dnn_encoders/include/isaac_ros_dnn_encoders/dnn_image_encoder_node.hpp index 20efc22..47fbc73 100644 --- a/isaac_ros_dnn_encoders/include/isaac_ros_dnn_encoders/dnn_image_encoder_node.hpp +++ b/isaac_ros_dnn_encoders/include/isaac_ros_dnn_encoders/dnn_image_encoder_node.hpp @@ -31,6 +31,13 @@ namespace isaac_ros namespace dnn_inference { +enum class ResizeMode +{ + kDistort = 0, + kPad = 1, + kCrop = 2 +}; + class DnnImageEncoderNode : public nitros::NitrosNode { public: @@ -47,6 +54,8 @@ class DnnImageEncoderNode : public nitros::NitrosNode const uint16_t network_image_height_; const std::vector image_mean_; const std::vector image_stddev_; + int64_t num_blocks_; + const ResizeMode resize_mode_; }; } // namespace dnn_inference diff --git a/isaac_ros_dnn_encoders/package.xml b/isaac_ros_dnn_encoders/package.xml index 9a53bdf..49f4680 100644 --- a/isaac_ros_dnn_encoders/package.xml +++ b/isaac_ros_dnn_encoders/package.xml @@ -21,7 +21,7 @@ SPDX-License-Identifier: Apache-2.0 isaac_ros_dnn_encoders - 0.20.0 + 0.30.0 Encoders for preprocessing before running deep learning inference Hemal Shah Apache-2.0 @@ -29,13 +29,16 @@ SPDX-License-Identifier: Apache-2.0 Ethan Yu Kajanan Chinniah Swapnesh Wani - + rclcpp rclcpp_components + isaac_ros_image_proc isaac_ros_nitros isaac_ros_nitros_image_type isaac_ros_nitros_tensor_list_type + isaac_ros_common + ament_lint_auto ament_lint_common isaac_ros_test diff --git a/isaac_ros_dnn_encoders/src/dnn_image_encoder_node.cpp b/isaac_ros_dnn_encoders/src/dnn_image_encoder_node.cpp index 6600350..47a530c 100644 --- a/isaac_ros_dnn_encoders/src/dnn_image_encoder_node.cpp +++ b/isaac_ros_dnn_encoders/src/dnn_image_encoder_node.cpp @@ -59,11 +59,11 @@ constexpr char APP_YAML_FILENAME[] = "config/dnn_image_encoder_node.yaml"; constexpr char PACKAGE_NAME[] = "isaac_ros_dnn_encoders"; const std::vector> EXTENSIONS = { - {"isaac_ros_nitros", "gxf/std/libgxf_std.so"}, - {"isaac_ros_nitros", "gxf/cuda/libgxf_cuda.so"}, - {"isaac_ros_nitros", "gxf/serialization/libgxf_serialization.so"}, - {"isaac_ros_nitros", "gxf/tensorops/libgxf_tensorops.so"}, - {"isaac_ros_nitros", "gxf/libgxf_message_compositor.so"} + {"isaac_ros_gxf", "gxf/lib/std/libgxf_std.so"}, + {"isaac_ros_gxf", "gxf/lib/cuda/libgxf_cuda.so"}, + {"isaac_ros_gxf", "gxf/lib/serialization/libgxf_serialization.so"}, + {"isaac_ros_image_proc", "gxf/lib/image_proc/libgxf_tensorops.so"}, + {"isaac_ros_gxf", "gxf/lib/libgxf_message_compositor.so"} }; const std::vector PRESET_EXTENSION_SPEC_NAMES = { "isaac_ros_dnn_encoders", @@ -108,7 +108,10 @@ DnnImageEncoderNode::DnnImageEncoderNode(const rclcpp::NodeOptions options) network_image_width_(declare_parameter("network_image_width", 0)), network_image_height_(declare_parameter("network_image_height", 0)), image_mean_(declare_parameter>("image_mean", {0.5, 0.5, 0.5})), - image_stddev_(declare_parameter>("image_stddev", {0.5, 0.5, 0.5})) + image_stddev_(declare_parameter>("image_stddev", {0.5, 0.5, 0.5})), + num_blocks_(declare_parameter("num_blocks", 40)), + resize_mode_(static_cast( + declare_parameter("resize_mode", static_cast(ResizeMode::kDistort)))) { if (network_image_width_ == 0) { throw std::invalid_argument( @@ -168,6 +171,10 @@ void DnnImageEncoderNode::postLoadGraphCallback() getNitrosContext().setParameterUInt64( "resizer", "nvidia::cvcore::tensor_ops::Resize", "output_height", network_image_height_); + getNitrosContext().setParameterBool( + "resizer", "nvidia::cvcore::tensor_ops::Resize", "keep_aspect_ratio", + resize_mode_ != ResizeMode::kDistort); + const gxf::optimizer::ComponentInfo output_comp_info = { "nvidia::gxf::Vault", // component_type_name "vault", // component_name @@ -199,6 +206,19 @@ void DnnImageEncoderNode::postLoadGraphCallback() getNitrosContext().setParameterUInt64( "reshaper", "nvidia::gxf::BlockMemoryPool", "block_size", block_size * sizeof(float)); + // The minimum number of memory blocks is set based on the receiver queue capacity + uint64_t num_blocks = std::max(static_cast(num_blocks_), 40); + getNitrosContext().setParameterUInt64( + "resizer", "nvidia::gxf::BlockMemoryPool", "num_blocks", num_blocks); + getNitrosContext().setParameterUInt64( + "color_space_converter", "nvidia::gxf::BlockMemoryPool", "num_blocks", num_blocks); + getNitrosContext().setParameterUInt64( + "normalizer", "nvidia::gxf::BlockMemoryPool", "num_blocks", num_blocks); + getNitrosContext().setParameterUInt64( + "interleaved_to_planar", "nvidia::gxf::BlockMemoryPool", "num_blocks", num_blocks); + getNitrosContext().setParameterUInt64( + "reshaper", "nvidia::gxf::BlockMemoryPool", "num_blocks", num_blocks); + std::vector final_tensor_shape{1, static_cast(image_type_to_channel_size.at(image_type->second)), static_cast(network_image_height_), diff --git a/isaac_ros_dnn_encoders/test/isaac_ros_dnn_image_encoder_image_resize_test.py b/isaac_ros_dnn_encoders/test/isaac_ros_dnn_image_encoder_image_resize_test.py index 3a03c6f..f9d542e 100644 --- a/isaac_ros_dnn_encoders/test/isaac_ros_dnn_image_encoder_image_resize_test.py +++ b/isaac_ros_dnn_encoders/test/isaac_ros_dnn_image_encoder_image_resize_test.py @@ -54,7 +54,8 @@ def generate_test_description(): 'network_image_width': NETWORK_IMAGE_WIDTH, 'network_image_height': NETWORK_IMAGE_HEIGHT, 'image_mean': list(IMAGE_MEAN), - 'image_stddev': list(IMAGE_STDDEV) + 'image_stddev': list(IMAGE_STDDEV), + 'resize_mode': 1 # Pad mode }], remappings=[('encoded_tensor', 'tensors')]) diff --git a/isaac_ros_dnn_inference_test/CMakeLists.txt b/isaac_ros_dnn_inference_test/CMakeLists.txt index c071287..14a83e0 100644 --- a/isaac_ros_dnn_inference_test/CMakeLists.txt +++ b/isaac_ros_dnn_inference_test/CMakeLists.txt @@ -15,58 +15,28 @@ # # SPDX-License-Identifier: Apache-2.0 -cmake_minimum_required(VERSION 3.5) +cmake_minimum_required(VERSION 3.23.2) project(isaac_ros_dnn_inference_test LANGUAGES C CXX) -# Default to C++17 -if(NOT CMAKE_CXX_STANDARD) - set(CMAKE_CXX_STANDARD 17) -endif() - if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") add_compile_options(-Wall -Wextra -Wpedantic) endif() -# Default to Release build -if(NOT CMAKE_BUILD_TYPE OR CMAKE_BUILD_TYPE STREQUAL "") - set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE) -endif() -message( STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}" ) - -execute_process(COMMAND uname -m COMMAND tr -d '\n' - OUTPUT_VARIABLE ARCHITECTURE -) -message( STATUS "Architecture: ${ARCHITECTURE}" ) - find_package(ament_cmake_auto REQUIRED) ament_auto_find_build_dependencies() # test_tensor_publisher_node ament_auto_add_library(test_tensor_publisher_node SHARED src/test_tensor_publisher_node.cpp) -target_compile_definitions(test_tensor_publisher_node - PRIVATE "COMPOSITION_BUILDING_DLL" -) -target_link_libraries(test_tensor_publisher_node) rclcpp_components_register_nodes(test_tensor_publisher_node "nvidia::isaac_ros::dnn_inference::TestTensorPublisherNode") set(node_plugins "${node_plugins}nvidia::isaac_ros::dnn_inference::TestTensorPublisherNode;$\n") # run test tensor publisher executable -ament_auto_add_executable("run_test_publisher" - src/test_tensor_publisher_main.cpp -) - -target_link_libraries("run_test_publisher" test_tensor_publisher_node) - -install(TARGETS "run_test_publisher" - ARCHIVE DESTINATION lib - LIBRARY DESTINATION lib - RUNTIME DESTINATION bin -) +ament_auto_add_executable(run_test_publisher src/test_tensor_publisher_main.cpp) +target_link_libraries(run_test_publisher test_tensor_publisher_node) if(BUILD_TESTING) find_package(ament_lint_auto REQUIRED) ament_lint_auto_find_test_dependencies() - endif() ament_auto_package() diff --git a/isaac_ros_dnn_inference_test/package.xml b/isaac_ros_dnn_inference_test/package.xml index 6a2e2e0..5e82b44 100644 --- a/isaac_ros_dnn_inference_test/package.xml +++ b/isaac_ros_dnn_inference_test/package.xml @@ -21,7 +21,7 @@ SPDX-License-Identifier: Apache-2.0 isaac_ros_dnn_inference_test - 0.20.0 + 0.30.0 DNN Inference support for Isaac ROS Hemal Shah diff --git a/isaac_ros_tensor_rt/CMakeLists.txt b/isaac_ros_tensor_rt/CMakeLists.txt index 037e414..67eff32 100644 --- a/isaac_ros_tensor_rt/CMakeLists.txt +++ b/isaac_ros_tensor_rt/CMakeLists.txt @@ -15,46 +15,28 @@ # # SPDX-License-Identifier: Apache-2.0 -cmake_minimum_required(VERSION 3.8) +cmake_minimum_required(VERSION 3.23.2) project(isaac_ros_tensor_rt LANGUAGES C CXX) -# Default to C++17 -if(NOT CMAKE_CXX_STANDARD) - set(CMAKE_CXX_STANDARD 17) -endif() - if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") add_compile_options(-Wall -Wextra -Wpedantic) endif() -# Default to Release build -if(NOT CMAKE_BUILD_TYPE OR CMAKE_BUILD_TYPE STREQUAL "") - set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE) -endif() -message( STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}" ) - -execute_process(COMMAND uname -m COMMAND tr -d '\n' - OUTPUT_VARIABLE ARCHITECTURE -) -message( STATUS "Architecture: ${ARCHITECTURE}" ) - find_package(ament_cmake_auto REQUIRED) ament_auto_find_build_dependencies() -# tensor_rt_node +# TensorRTNode ament_auto_add_library(tensor_rt_node SHARED src/tensor_rt_node.cpp) -target_compile_definitions(tensor_rt_node - PRIVATE "COMPOSITION_BUILDING_DLL" -) -target_link_libraries(tensor_rt_node) rclcpp_components_register_nodes(tensor_rt_node "nvidia::isaac_ros::dnn_inference::TensorRTNode") set(node_plugins "${node_plugins}nvidia::isaac_ros::dnn_inference::TensorRTNode;$\n") -# Install config directory -install( - DIRECTORY config - DESTINATION share/${PROJECT_NAME} -) +### Install extensions built from source + +# TensorRT +add_subdirectory(gxf/tensor_rt) +install(TARGETS gxf_tensor_rt DESTINATION share/${PROJECT_NAME}/gxf/tensor_rt) + +### End extensions if(BUILD_TESTING) find_package(ament_lint_auto REQUIRED) @@ -62,7 +44,6 @@ if(BUILD_TESTING) find_package(launch_testing_ament_cmake REQUIRED) add_launch_test(test/isaac_ros_tensor_rt_test.py TIMEOUT "300") - endif() -ament_auto_package(INSTALL_TO_SHARE launch) +ament_auto_package(INSTALL_TO_SHARE config launch) \ No newline at end of file diff --git a/isaac_ros_tensor_rt/config/tensor_rt_inference.yaml b/isaac_ros_tensor_rt/config/tensor_rt_inference.yaml index f484589..f3ccce9 100644 --- a/isaac_ros_tensor_rt/config/tensor_rt_inference.yaml +++ b/isaac_ros_tensor_rt/config/tensor_rt_inference.yaml @@ -59,6 +59,31 @@ components: verbose: true clock: utils/clock --- +name: cuda_stream_sync +components: +- name: rx + type: nvidia::gxf::DoubleBufferReceiver + parameters: + capacity: 1 + policy: 0 +- name: tx + type: nvidia::gxf::DoubleBufferTransmitter + parameters: + capacity: 12 + policy: 0 +- type: nvidia::gxf::MessageAvailableSchedulingTerm + parameters: + receiver: rx + min_size: 1 +- type: nvidia::gxf::DownstreamReceptiveSchedulingTerm + parameters: + transmitter: tx + min_size: 1 +- type: nvidia::gxf::CudaStreamSync + parameters: + rx: rx + tx: tx +--- name: vault components: - name: signal @@ -82,6 +107,10 @@ components: - type: nvidia::gxf::Connection parameters: source: inference/tx + target: cuda_stream_sync/rx +- type: nvidia::gxf::Connection + parameters: + source: cuda_stream_sync/tx target: vault/signal --- name: utils diff --git a/isaac_ros_tensor_rt/gxf/AMENT_IGNORE b/isaac_ros_tensor_rt/gxf/AMENT_IGNORE new file mode 100644 index 0000000..e69de29 diff --git a/isaac_ros_tensor_rt/gxf/tensor_rt/CMakeLists.txt b/isaac_ros_tensor_rt/gxf/tensor_rt/CMakeLists.txt new file mode 100644 index 0000000..79e5505 --- /dev/null +++ b/isaac_ros_tensor_rt/gxf/tensor_rt/CMakeLists.txt @@ -0,0 +1,48 @@ +# SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +# Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# SPDX-License-Identifier: Apache-2.0 + +project(gxf_tensor_rt LANGUAGES C CXX) + +if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") + add_compile_options(-fPIC -w) +endif() + +# Dependencies +find_package(CUDAToolkit) +find_package(GXF ${ISAAC_ROS_GXF_VERSION} MODULE REQUIRED + COMPONENTS + core + cuda + std +) +find_package(TENSORRT 8 MODULE REQUIRED) +include(YamlCpp) + +# TensorRT extension +add_library(gxf_tensor_rt SHARED + tensor_rt_extension.cpp + tensor_rt_inference.cpp + tensor_rt_inference.hpp +) +target_link_libraries(gxf_tensor_rt + PUBLIC + CUDA::cudart + GXF::std + GXF::cuda + TENSORRT::nvonnxparser + yaml-cpp +) diff --git a/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_extension.cpp b/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_extension.cpp new file mode 100644 index 0000000..3a980d5 --- /dev/null +++ b/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_extension.cpp @@ -0,0 +1,36 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 +#include + +#include "tensor_rt_inference.hpp" +#include "gxf/core/gxf.h" +#include "gxf/std/extension_factory_helper.hpp" + +extern "C" { + +GXF_EXT_FACTORY_BEGIN() + +GXF_EXT_FACTORY_SET_INFO(0xd43f23e4b9bf11eb, 0x9d182b7be630552b, "TensorRTExtension", "TensorRT", + "Nvidia", "2.2.0", "LICENSE"); + +GXF_EXT_FACTORY_ADD(0x06a7f0e0b9c011eb, 0x8cd623c9c2070107, nvidia::gxf::TensorRtInference, + nvidia::gxf::Codelet, + "Codelet taking input tensors and feed them into TensorRT for inference."); + +GXF_EXT_FACTORY_END() + +} // extern "C" diff --git a/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_inference.cpp b/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_inference.cpp new file mode 100644 index 0000000..743dcc6 --- /dev/null +++ b/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_inference.cpp @@ -0,0 +1,738 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 +#include "tensor_rt_inference.hpp" + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "NvInferPlugin.h" +#include "NvOnnxConfig.h" +#include "NvOnnxParser.h" + +#include "gxf/cuda/cuda_stream_id.hpp" +#include "gxf/std/tensor.hpp" +#include "gxf/std/timestamp.hpp" + +namespace nvidia { +namespace gxf { + +constexpr int32_t kDefaultDeviceId = 0; +namespace { +// Checks wether a string ends with a certain string +inline bool EndsWith(const std::string& str, const std::string& suffix) { + return str.size() >= suffix.size() && + str.compare(str.size() - suffix.size(), suffix.size(), suffix) == 0; +} + +bool IsValidFile(const std::string& path) { + struct stat st; + if (stat(path.c_str(), &st) != 0) { return false; } + return static_cast(st.st_mode & S_IFREG); +} + +bool ReadEntireBinaryFile(const std::string& file_path, std::vector& buffer) { + // Make sure we are opening a valid file. + if (!IsValidFile(file_path)) { return false; } + // Open the file in binary mode and seek to the end + std::ifstream file(file_path, std::ios::binary | std::ios::ate); + if (!file) { return false; } + // Get the size of the file and seek back to the beginning + const size_t size = file.tellg(); + file.seekg(0); + // Reserve enough space in the output buffer and read the file contents into it + buffer.resize(size); + const bool ret = static_cast(file.read(buffer.data(), size)); + file.close(); + return ret; +} + +// Formats gxf tensor shape specified by std::array for console spew +const std::string FormatDims(const std::array& dimensions, + const int32_t rank) { + std::stringbuf sbuf; + std::ostream stream(&sbuf); + stream << "["; + for (int i = 0; i < rank; ++i) { + if (i > 0) { stream << ", "; } + stream << dimensions[i]; + } + stream << "]"; + return sbuf.str(); +} + +// Formats gxf shape for console spew +const std::string FormatTensorShape(const gxf::Shape& shape) { + std::array dimensions; + for (uint32_t i = 0; i < shape.rank(); ++i) { dimensions[i] = shape.dimension(i); } + return FormatDims(dimensions, shape.rank()); +} + +// Converts TensorRT dimensions to Gxf Tensor dimensions (std::array) +std::array Dims2Dimensions(const nvinfer1::Dims& dims) { + std::array dimensions; + dimensions.fill(1); + for (int32_t i = 0; i < dims.nbDims; i++) { dimensions[i] = dims.d[i]; } + return dimensions; +} + +// Converts TensorRT data type to gxf::Tensor element type (gxf::PrimitiveType) +gxf::Expected NvInferDatatypeToTensorElementType(nvinfer1::DataType data_type) { + switch (data_type) { + case nvinfer1::DataType::kFLOAT: { + return gxf::PrimitiveType::kFloat32; + } + case nvinfer1::DataType::kINT8: { + return gxf::PrimitiveType::kInt8; + } + case nvinfer1::DataType::kINT32: { + return gxf::PrimitiveType::kInt32; + } +// case nvinfer1::DataType::kBOOL: + case nvinfer1::DataType::kHALF: + default: { + GXF_LOG_ERROR("Unsupported DataType %d", data_type); + return gxf::Unexpected{GXF_FAILURE}; + } + } +} + +// Writes engine plan to specified file path +gxf::Expected SerializeEnginePlan(const std::vector& plan, const std::string path) { + // Write Plan To Disk + std::ofstream out_stream(path.c_str(), std::ofstream::binary); + if (!out_stream.is_open()) { + GXF_LOG_ERROR("Failed to create engine file %s.", path.c_str()); + return gxf::Unexpected{GXF_FAILURE}; + } + out_stream.write(plan.data(), plan.size()); + if (out_stream.bad()) { + GXF_LOG_ERROR("Failed to writing to engine file %s.", path.c_str()); + return gxf::Unexpected{GXF_FAILURE}; + } + out_stream.close(); + GXF_LOG_INFO("TensorRT engine serialized at %s", path.c_str()); + return gxf::Success; +} + +} // namespace + +// Logging interface for the TensorRT builder, engine and runtime, to redirect logging into +// GXF-Issac. +void TensorRTInferenceLogger::log(ILogger::Severity severity, const char* msg) throw() { + switch (severity) { + case Severity::kINTERNAL_ERROR: { + GXF_LOG_ERROR("TRT INTERNAL_ERROR: %s", msg); + break; + } + case Severity::kERROR: { + GXF_LOG_ERROR("TRT ERROR: %s", msg); + break; + } + case Severity::kWARNING: { + GXF_LOG_WARNING("TRT WARNING: %s", msg); + break; + } + case Severity::kINFO: { + GXF_LOG_INFO("TRT INFO: %s", msg); + break; + } + case Severity::kVERBOSE: { + if (verbose_) { GXF_LOG_DEBUG("TRT VERBOSE: %s", msg); } + break; + } + default: { + GXF_LOG_ERROR("TRT UNKNOWN SEVERITY ERROR: %s", msg); + break; + } + } +} + +void TensorRTInferenceLogger::setVerbose(bool verbose) { + verbose_ = verbose; +} + +gxf_result_t TensorRtInference::registerInterface(gxf::Registrar* registrar) { + gxf::Expected result; + + result &= registrar->parameter(model_file_path_, "model_file_path", "Model File Path", + "Path to ONNX model to be loaded."); + result &= registrar->parameter(engine_file_path_, "engine_file_path", "Engine File Path", + "Path to the generated engine to be serialized and loaded from."); + result &= registrar->parameter(force_engine_update_, "force_engine_update", "Force Engine Update", + "Always update engine regard less of existing engine file. " + "Such conversion may take minutes. Default to false.", + false); + + result &= registrar->parameter(input_tensor_names_, "input_tensor_names", "Input Tensor Names", + "Names of input tensors in the order to be fed into the model."); + result &= registrar->parameter(input_binding_names_, "input_binding_names", "Input Binding Names", + "Names of input bindings as in the model in the same order of " + "what is provided in input_tensor_names."); + + result &= registrar->parameter(output_tensor_names_, "output_tensor_names", "Output Tensor Names", + "Names of output tensors in the order to be retrieved " + "from the model."); + result &= + registrar->parameter(output_binding_names_, "output_binding_names", "Output Binding Names", + "Names of output bindings in the model in the same " + "order of of what is provided in output_tensor_names."); + result &= registrar->parameter(pool_, "pool", "Pool", "Allocator instance for output tensors."); + result &= registrar->parameter(cuda_stream_pool_, "cuda_stream_pool", "Cuda Stream Pool", + "Instance of gxf::CudaStreamPool to allocate CUDA stream."); + + result &= registrar->parameter(max_workspace_size_, "max_workspace_size", "Max Workspace Size", + "Size of working space in bytes. Default to 64MB", 67108864l); + result &= registrar->parameter(dla_core_, "dla_core", "DLA Core", + "DLA Core to use. Fallback to GPU is always enabled. " + "Default to use GPU only.", + gxf::Registrar::NoDefaultParameter(), + GXF_PARAMETER_FLAGS_OPTIONAL); + result &= registrar->parameter(max_batch_size_, "max_batch_size", "Max Batch Size", + "Maximum possible batch size in case the first dimension is " + "dynamic and used as batch size.", + 1); + result &= registrar->parameter(enable_fp16_, "enable_fp16", "Enable FP16 Mode", + "Enable inference with FP16 and FP32 fallback.", false); + + result &= registrar->parameter(verbose_, "verbose", "Verbose", + "Enable verbose logging on console. Default to false.", false); + result &= registrar->parameter(relaxed_dimension_check_, "relaxed_dimension_check", + "Relaxed Dimension Check", + "Ignore dimensions of 1 for input tensor dimension check.", true); + result &= registrar->parameter(clock_, "clock", "Clock", "Instance of clock for publish time.", + gxf::Registrar::NoDefaultParameter(), + GXF_PARAMETER_FLAGS_OPTIONAL); + result &= registrar->parameter(device_id_, "dev_id", "Device Id", "Create CUDA Stream on " + "which device.", kDefaultDeviceId); + + result &= registrar->parameter(rx_, "rx", "RX", "List of receivers to take input tensors"); + result &= registrar->parameter(tx_, "tx", "TX", "Transmitter to publish output tensors"); + + return gxf::ToResultCode(result); +} + +gxf_result_t TensorRtInference::start() { + // Validates parameter + if (!EndsWith(model_file_path_.get(), ".onnx")) { + GXF_LOG_ERROR("Only supports ONNX model: %s.", model_file_path_.get().c_str()); + return GXF_FAILURE; + } + if (rx_.get().size() == 0) { + GXF_LOG_ERROR("At least one receiver is needed."); + return GXF_FAILURE; + } + if (input_tensor_names_.get().size() != input_binding_names_.get().size()) { + GXF_LOG_ERROR("Mismatching number of input tensor names and bindings: %lu vs %lu.", + input_tensor_names_.get().size(), input_binding_names_.get().size()); + return GXF_FAILURE; + } + if (output_tensor_names_.get().size() != output_binding_names_.get().size()) { + GXF_LOG_ERROR("Mismatching number of output tensor names and bindings: %lu vs %lu.", + output_tensor_names_.get().size(), output_binding_names_.get().size()); + return GXF_FAILURE; + } + + // Initializes TensorRT registered plugins + cuda_logger_.setVerbose(verbose_.get()); + const auto maybe_plugins_lib_namespace = plugins_lib_namespace_.try_get(); + std::string plugin_namespace = maybe_plugins_lib_namespace ? maybe_plugins_lib_namespace.value() + : std::string(""); + if (!initLibNvInferPlugins(&cuda_logger_, plugin_namespace.c_str())) { + // Tries to proceed to see if the model would work + GXF_LOG_WARNING("Could not initialize LibNvInferPlugins."); + } + + std::vector plan; + if (force_engine_update_) { + // Deletes engine plan file if exists for forced update + std::remove(engine_file_path_.get().c_str()); + if (std::ifstream(engine_file_path_.get().c_str()).good()) { + GXF_LOG_ERROR("Failed to remove engine plan file %s for forced engine update.", + engine_file_path_.get().c_str()); + return GXF_FAILURE; + } + } + + // Loads Cuda engine into std::vector plan or creates it if needed. + if (force_engine_update_ || !ReadEntireBinaryFile(engine_file_path_, plan)) { + GXF_LOG_WARNING( + "Rebuilding CUDA engine %s (forced by config). " + "Note: this process may take up to several minutes.", + engine_file_path_.get().c_str()); + auto result = convertModelToEngine(); + if (!result) { + GXF_LOG_ERROR("Failed to create engine plan for model %s.", model_file_path_.get().c_str()); + return gxf::ToResultCode(result); + } + + // Skips loading file and uses in-memory engine plan directly. + plan = std::move(result.value()); + + // Tries to serializes the plan and proceeds anyway + if (!SerializeEnginePlan(plan, engine_file_path_.get())) { + GXF_LOG_ERROR( + "Engine plan serialization failed. Proceeds with in-memory engine plan anyway."); + } + } + + // Creates inference runtime for the plan + NvInferHandle infer_runtime(nvinfer1::createInferRuntime(cuda_logger_)); + + // Deserialize the CUDA engine + if (verbose_.get()) { GXF_LOG_INFO("Creating inference runtime."); } + cuda_engine_.reset(infer_runtime->deserializeCudaEngine(plan.data(), plan.size(), NULL)); + + // Debug spews + if (verbose_.get()) { + GXF_LOG_INFO("Number of CUDA bindings: %d", cuda_engine_->getNbBindings()); + for (int i = 0; i < cuda_engine_->getNbBindings(); ++i) { + GXF_LOG_INFO("CUDA binding No.%d: name %s Format %s", i, cuda_engine_->getBindingName(i), + cuda_engine_->getBindingFormatDesc(i)); + } + } + + // Checks binding numbers against parameter + const uint64_t input_number = input_tensor_names_.get().size(); + const uint64_t output_number = output_tensor_names_.get().size(); + const int64_t total_bindings_number = input_number + output_number; + if (cuda_engine_->getNbBindings() != static_cast(total_bindings_number)) { + GXF_LOG_ERROR( + "Numbers of CUDA bindings mismatch: configured for %lu vs model requires %d. " + "Please check TensorRTInference codelet configuration.\n", + total_bindings_number, cuda_engine_->getNbBindings()); + return GXF_ARGUMENT_INVALID; + } + + // Creates cuda execution context + cuda_execution_ctx_.reset(cuda_engine_->createExecutionContext()); + + // Allocates CUDA buffer pointers for binding to be populated in tick() + cuda_buffers_.resize(input_tensor_names_.get().size() + output_tensor_names_.get().size(), + nullptr); + + // Keeps record of input bindings + binding_infos_.clear(); + for (uint64_t j = 0; j < input_number; ++j) { + const std::string& tensor_name = input_tensor_names_.get()[j]; + const std::string& binding_name = input_binding_names_.get()[j]; + + const int32_t binding_index = cuda_engine_->getBindingIndex(binding_name.c_str()); + if (binding_index == -1) { + GXF_LOG_ERROR("Failed to get binding index for input %s in model %s", binding_name.c_str(), + engine_file_path_.get().c_str()); + return GXF_FAILURE; + } + + if (binding_index >= static_cast(cuda_buffers_.size())) { + GXF_LOG_ERROR("Binding index for input %s is out of range in model %s.", binding_name.c_str(), + engine_file_path_.get().c_str()); + return GXF_FAILURE; + } + + // Checks element type + const auto maybe_element_type = + NvInferDatatypeToTensorElementType(cuda_engine_->getBindingDataType(binding_index)); + if (!maybe_element_type) { + GXF_LOG_ERROR("Unsupported element type for binding input %s on index %d. ", + binding_name.c_str(), binding_index); + return maybe_element_type.error(); + } + + // Keeps binding info + const auto& dims = cuda_engine_->getBindingDimensions(binding_index); + binding_infos_[tensor_name] = + BindingInfo{binding_index, static_cast(dims.nbDims), binding_name, + maybe_element_type.value(), Dims2Dimensions(dims)}; + + // Debug spew + if (verbose_.get()) { + GXF_LOG_INFO( + "Input Tensor %s:%s index %d Dimensions %s.", tensor_name.c_str(), binding_name.c_str(), + binding_index, + FormatDims(binding_infos_[tensor_name].dimensions, binding_infos_[tensor_name].rank) + .c_str()); + } + } + + // Keeps record of output bindings + for (uint64_t j = 0; j < output_number; ++j) { + const std::string& tensor_name = output_tensor_names_.get()[j]; + const std::string& binding_name = output_binding_names_.get()[j]; + + const int32_t binding_index = cuda_engine_->getBindingIndex(binding_name.c_str()); + if (binding_index == -1) { + GXF_LOG_ERROR("Failed to get binding index for output %s", binding_name.c_str()); + return GXF_FAILURE; + } + if (binding_index >= static_cast(cuda_buffers_.size())) { + GXF_LOG_ERROR("Binding index for output %s is out of range.", binding_name.c_str()); + return GXF_FAILURE; + } + + // Checks element type + const auto maybe_element_type = + NvInferDatatypeToTensorElementType(cuda_engine_->getBindingDataType(binding_index)); + if (!maybe_element_type) { + GXF_LOG_ERROR("Unsupported element type for binding output %s on index %d. ", + binding_name.c_str(), binding_index); + return maybe_element_type.error(); + } + + // Keeps binding info + const auto& dims = cuda_engine_->getBindingDimensions(binding_index); + binding_infos_[tensor_name] = + BindingInfo{binding_index, static_cast(dims.nbDims), binding_name, + maybe_element_type.value(), Dims2Dimensions(dims)}; + cuda_buffers_[binding_index] = nullptr; // populate cuda_buffers dynamically, in tick() + + if (verbose_.get()) { + GXF_LOG_INFO( + "Output Tensor %s:%s (%d), Dimensions: %s.", tensor_name.c_str(), binding_name.c_str(), + binding_index, + FormatDims(binding_infos_[tensor_name].dimensions, binding_infos_[tensor_name].rank) + .c_str()); + } + } + + // Grabs CUDA stream and creates CUDA event for synchronization + if (!cuda_stream_) { + auto maybe_stream = cuda_stream_pool_.get()->allocateStream(); + if (!maybe_stream) { + GXF_LOG_ERROR("Failed to allocate CUDA stream"); + return maybe_stream.error(); + } + cuda_stream_ = std::move(maybe_stream.value()); + } + auto stream = cuda_stream_.get()->stream(); + if (!stream) { + GXF_LOG_ERROR("Failed to grab CUDA stream"); + return stream.error(); + } + cached_cuda_stream_ = stream.value(); + auto result = cudaEventCreate(&cuda_event_consumed_); + if (cudaSuccess != result) { + GXF_LOG_ERROR("Failed to create consumed CUDA event: %s", cudaGetErrorString(result)); + return GXF_FAILURE; + } + return GXF_SUCCESS; +} + +gxf::Expected> TensorRtInference::convertModelToEngine() { + // Creates the engine Builder + NvInferHandle builder(nvinfer1::createInferBuilder(cuda_logger_)); + + // Builder Config provides options to the Builder + NvInferHandle builderConfig(builder->createBuilderConfig()); + builderConfig->setMaxWorkspaceSize(max_workspace_size_); + + // Sets DLA core if provided and always fall back to GPU + auto dla_core = dla_core_.try_get(); + if (dla_core) { + builderConfig->setDefaultDeviceType(nvinfer1::DeviceType::kDLA); + builderConfig->setFlag(nvinfer1::BuilderFlag::kGPU_FALLBACK); + builderConfig->setDLACore(dla_core.value()); + } + if (enable_fp16_.get()) { builderConfig->setFlag(nvinfer1::BuilderFlag::kFP16); } + + // Parses ONNX with explicit batch size for support of dynamic shapes/batch + NvInferHandle network(builder->createNetworkV2( + 1U << static_cast(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH))); + + NvInferHandle onnx_parser( + nvonnxparser::createParser(*network, cuda_logger_)); + if (!onnx_parser->parseFromFile(model_file_path_.get().c_str(), + static_cast(nvinfer1::ILogger::Severity::kWARNING))) { + GXF_LOG_ERROR("Failed to parse ONNX file %s", model_file_path_.get().c_str()); + return gxf::Unexpected{GXF_FAILURE}; + } + + // Provides optimization profile for dynamic size input bindings + nvinfer1::IOptimizationProfile* optimization_profile = builder->createOptimizationProfile(); + // Checks input dimensions and adds to optimization profile if needed + const int number_inputs = network->getNbInputs(); + for (int i = 0; i < number_inputs; ++i) { + auto* bind_tensor = network->getInput(i); + const char* bind_name = bind_tensor->getName(); + nvinfer1::Dims dims = bind_tensor->getDimensions(); + + // Validates binding info + if (dims.nbDims <= 0) { + GXF_LOG_ERROR("Invalid input tensor dimensions for binding %s", bind_name); + return gxf::Unexpected{GXF_ARGUMENT_INVALID}; + } + for (int j = 1; j < dims.nbDims; ++j) { + if (dims.d[j] <= 0) { + GXF_LOG_ERROR( + "Input binding %s requires dynamic size on dimension No.%d which is not supported", + bind_tensor->getName(), j); + return gxf::Unexpected{GXF_ARGUMENT_OUT_OF_RANGE}; + } + } + if (dims.d[0] == -1) { + // Only case with first dynamic dimension is supported and assumed to be batch size. + // Always optimizes for 1-batch. + dims.d[0] = 1; + optimization_profile->setDimensions(bind_name, nvinfer1::OptProfileSelector::kMIN, dims); + optimization_profile->setDimensions(bind_name, nvinfer1::OptProfileSelector::kOPT, dims); + dims.d[0] = max_batch_size_.get(); + if (max_batch_size_.get() <= 0) { + GXF_LOG_ERROR("Maximum batch size %d is invalid. Uses 1 instead.", max_batch_size_.get()); + dims.d[0] = 1; + } + optimization_profile->setDimensions(bind_name, nvinfer1::OptProfileSelector::kMAX, dims); + } + } + builderConfig->addOptimizationProfile(optimization_profile); + + // Creates TensorRT Engine Plan + NvInferHandle engine( + builder->buildEngineWithConfig(*network, *builderConfig)); + if (!engine) { + GXF_LOG_ERROR("Failed to build TensorRT engine from model %s.", model_file_path_.get().c_str()); + return gxf::Unexpected{GXF_FAILURE}; + } + + NvInferHandle model_stream(engine->serialize()); + if (!model_stream || model_stream->size() == 0 || model_stream->data() == nullptr) { + GXF_LOG_ERROR("Fail to serialize TensorRT Engine."); + return gxf::Unexpected{GXF_FAILURE}; + } + + // Prepares return value + std::vector result; + const char* data = static_cast(model_stream->data()); + result.resize(model_stream->size()); + std::copy(data, data + model_stream->size(), result.data()); + return result; +} + +gxf_result_t TensorRtInference::stop() { + cuda_execution_ctx_ = nullptr; + cuda_engine_ = nullptr; + cuda_buffers_.clear(); + + auto result = cudaEventDestroy(cuda_event_consumed_); + if (cudaSuccess != result) { + GXF_LOG_ERROR("Failed to destroy consumed CUDA event: %s", cudaGetErrorString(result)); + return GXF_FAILURE; + } + + return GXF_SUCCESS; +} + +gxf_result_t TensorRtInference::tick() { + // Grabs latest messages from all receivers + std::vector messages; + messages.reserve(rx_.get().size()); + for (auto& rx : rx_.get()) { + gxf::Expected maybe_message = rx->receive(); + if (maybe_message) { messages.push_back(std::move(maybe_message.value())); } + } + if (messages.empty()) { + GXF_LOG_ERROR("No message available."); + return GXF_CONTRACT_MESSAGE_NOT_AVAILABLE; + } + // Tries to retrieve timestamp if clock present + gxf::Expected> maybe_input_timestamp = gxf::Unexpected{GXF_FAILURE}; + for (auto& msg : messages) { + maybe_input_timestamp = msg.get("timestamp"); + if (maybe_input_timestamp) { break; } + } + // Populates input tensors + for (const auto& tensor_name : input_tensor_names_.get()) { + gxf::Expected> maybe_tensor = gxf::Unexpected{GXF_UNINITIALIZED_VALUE}; + for (auto& msg : messages) { + maybe_tensor = msg.get(tensor_name.c_str()); + if (maybe_tensor) { break; } + } + if (!maybe_tensor) { + GXF_LOG_ERROR("Failed to retrieve Tensor %s", tensor_name.c_str()); + return GXF_FAILURE; + } + + // Validates input tensor against model bindings then binds and populates buffers + const auto& shape = maybe_tensor.value()->shape(); + const auto& binding_info = binding_infos_[tensor_name]; + nvinfer1::Dims dims; + dims.nbDims = binding_info.rank; + for (int32_t i = 0; i < dims.nbDims; ++i) { dims.d[i] = binding_info.dimensions[i]; } + + // Checks input tensor element type + if (maybe_tensor.value()->element_type() != binding_info.element_type) { + GXF_LOG_ERROR("Mismatching tensor element type required %d vs provided %d", + binding_info.element_type, maybe_tensor.value()->element_type()); + return GXF_FAILURE; + } + + if (relaxed_dimension_check_.get()) { + // Relaxed dimension match. Ignore all 1s. Binding of -1 is considered as match. + const uint32_t shape_rank = shape.rank(); + uint32_t shape_rank_matched = 0; + uint32_t binding_rank_matched = 0; + bool matched = true; + for (uint32_t i = 0; i < gxf::Shape::kMaxRank * 2; ++i) { + if (shape_rank_matched >= shape_rank || binding_rank_matched >= binding_info.rank) { + break; + } + if (shape.dimension(shape_rank_matched) == 1) { + shape_rank_matched++; + continue; + } + if (binding_info.dimensions[binding_rank_matched] == 1) { + binding_rank_matched++; + continue; + } + if (binding_info.dimensions[binding_rank_matched] == -1) { + // Matches dimension + dims.d[binding_rank_matched] = shape.dimension(shape_rank_matched); + shape_rank_matched++; + binding_rank_matched++; + continue; + } + if (shape.dimension(shape_rank_matched) != binding_info.dimensions[binding_rank_matched]) { + matched = false; + break; + } + shape_rank_matched++; + binding_rank_matched++; + } + if (!matched || shape_rank_matched != shape_rank || + binding_rank_matched != binding_info.rank) { + GXF_LOG_ERROR( + "Input Tensor %s bound to %s:" + " dimensions does not meet model spec with relaxed matching. Expected: %s Real: %s", + tensor_name.c_str(), binding_info.binding_name.c_str(), + FormatDims(binding_info.dimensions, binding_info.rank).c_str(), + FormatTensorShape(shape).c_str()); + return GXF_FAILURE; + } + } else { + // Strict dimension match. All dimensions must match. Binding of -1 is considered as match. + if (shape.rank() != binding_info.rank) { + GXF_LOG_ERROR("Tensor %s bound to %s has mismatching rank %d (%d required)", + tensor_name.c_str(), binding_info.binding_name.c_str(), shape.rank(), + binding_info.rank); + return GXF_FAILURE; + } + for (uint32_t i = 0; i < binding_info.rank; i++) { + if (binding_info.dimensions[i] == -1) { dims.d[i] = shape.dimension(i); } + if (shape.dimension(i) != binding_info.dimensions[i] && binding_info.dimensions[i] != -1) { + GXF_LOG_ERROR("Tensor %s bound to %s has mismatching dimension %d:%d (%d required)", + tensor_name.c_str(), binding_info.binding_name.c_str(), i, + shape.dimension(i), binding_info.dimensions[i]); + return GXF_FAILURE; + } + } + } + + // Updates the latest dimension of input tensor + if (!cuda_execution_ctx_->setBindingDimensions(binding_info.index, dims)) { + GXF_LOG_ERROR("Failed to update input binding %s dimensions.", + binding_info.binding_name.c_str()); + return GXF_FAILURE; + } + + // Binds input tensor buffer + cuda_buffers_[binding_info.index] = maybe_tensor.value()->pointer(); + } + + // Creates result message entity + gxf::Expected maybe_result_message = gxf::Entity::New(context()); + if (!maybe_result_message) { return gxf::ToResultCode(maybe_result_message); } + + auto result_message = maybe_result_message.value(); + // Creates tensors for output + for (const auto& tensor_name : output_tensor_names_.get()) { + auto maybe_result_tensor = result_message.add(tensor_name.c_str()); + if (!maybe_result_tensor) { + GXF_LOG_ERROR("Failed to create output tensor %s", tensor_name.c_str()); + return gxf::ToResultCode(maybe_result_tensor); + } + + // Queries binding dimension from context and allocates tensor accordingly + const auto& binding_info = binding_infos_[tensor_name]; + const auto binding_dims = cuda_engine_->getBindingDimensions(binding_info.index); + gxf::Shape shape{Dims2Dimensions(binding_dims), binding_info.rank}; + + auto result = maybe_result_tensor.value()->reshapeCustom( + shape, binding_info.element_type, gxf::PrimitiveTypeSize(binding_info.element_type), + gxf::Unexpected{GXF_UNINITIALIZED_VALUE}, gxf::MemoryStorageType::kDevice, pool_); + if (!result) { + GXF_LOG_ERROR("Failed to allocate for output tensor %s", tensor_name.c_str()); + return gxf::ToResultCode(result); + } + + // Allocates gpu buffer for output tensors + cuda_buffers_[binding_info.index] = maybe_result_tensor.value()->pointer(); + } + + auto maybe_stream_id = result_message.add("TensorRTCuStream"); + if (!maybe_stream_id) { + GXF_LOG_ERROR("failed to add TensorRTCuStream"); + return gxf::ToResultCode(maybe_stream_id); + } + maybe_stream_id.value()->stream_cid = cuda_stream_.cid(); + GXF_ASSERT(maybe_stream_id.value()->stream_cid != kNullUid, "Internal error: stream_cid is null"); + + // Runs inference on specified CUDA stream + if (!cuda_execution_ctx_->enqueueV2(cuda_buffers_.data(), cached_cuda_stream_, + &cuda_event_consumed_)) { + GXF_LOG_ERROR("TensorRT task enqueue for engine %s failed.", engine_file_path_.get().c_str()); + return GXF_FAILURE; + } + + // Create new cuda event + cudaEvent_t cuda_event_done; + auto result2 = cudaEventCreate(&cuda_event_done); + if (cudaSuccess != result2) { + GXF_LOG_ERROR("Failed to create done CUDA event: %s", cudaGetErrorString(result2)); + return GXF_FAILURE; + } + + // Record the job and proceed with execution + auto maybe_event = result_message.add("Infer complete event"); + auto& event = maybe_event.value(); + auto ret = event->initWithEvent(cuda_event_done, device_id_); + if (!ret) { + GXF_LOG_ERROR("failed to init with cuda event into message"); + return ToResultCode(ret); + } + + auto result = cuda_stream_->record(event, result_message, + []() { GXF_LOG_DEBUG("Infer complete event event synced"); }); + if (!result) { + GXF_LOG_ERROR("Infer complete event record failed for entity %zu", result_message.eid()); + return ToResultCode(result); + } + + // Publishes result with acqtime + if (maybe_input_timestamp) { // if input timestamp is present, use it's acqtime + return gxf::ToResultCode( + tx_->publish(result_message, maybe_input_timestamp.value()->acqtime)); + } else { // else simply use 0 as acqtime + return gxf::ToResultCode(tx_->publish(result_message, 0)); + } +} + +} // namespace gxf +} // namespace nvidia diff --git a/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_inference.hpp b/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_inference.hpp new file mode 100644 index 0000000..b12c1cb --- /dev/null +++ b/isaac_ros_tensor_rt/gxf/tensor_rt/tensor_rt_inference.hpp @@ -0,0 +1,128 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 +#ifndef NVIDIA_GXF_EXTENSIONS_TENSOR_RT_TENSOR_RT_INFERENCE_HPP_ +#define NVIDIA_GXF_EXTENSIONS_TENSOR_RT_TENSOR_RT_INFERENCE_HPP_ + +#include +#include +#include +#include +#include + +#include "cuda_runtime.h" + +#include "NvInfer.h" + +#include "gxf/core/entity.hpp" +#include "gxf/core/gxf.h" +#include "gxf/core/parameter.hpp" +#include "gxf/cuda/cuda_stream.hpp" +#include "gxf/cuda/cuda_stream_pool.hpp" +#include "gxf/std/allocator.hpp" +#include "gxf/std/clock.hpp" +#include "gxf/std/codelet.hpp" +#include "gxf/std/parameter_parser_std.hpp" +#include "gxf/std/receiver.hpp" +#include "gxf/std/tensor.hpp" +#include "gxf/std/transmitter.hpp" + +namespace nvidia { +namespace gxf { + +// Logger for TensorRT to redirect logging into gxf console spew. +class TensorRTInferenceLogger : public nvinfer1::ILogger { + public: + void log(ILogger::Severity severity, const char* msg) throw() override; + // Sets verbose flag for logging + void setVerbose(bool verbose); + + private: + bool verbose_; +}; + +// Loads ONNX model, takes input tensors and run inference against them with TensorRT. +// It takes input from all receivers provided and try to locate Tensor component with specified name +// on them one by one. The first occurence would be used. Only takes gpu memory tensor. +// Supports dynamic batch as first dimension. +// Requires gxf::CudaStream to run load on specific CUDA stream. +class TensorRtInference : public gxf::Codelet { + public: + gxf_result_t start() override; + gxf_result_t tick() override; + gxf_result_t stop() override; + gxf_result_t registerInterface(gxf::Registrar* registrar) override; + + private: + // Helper deleter to call destroy while destroying the cuda objects + template + struct DeleteFunctor { + inline void operator()(void* ptr) { reinterpret_cast(ptr)->destroy(); } + }; + // unique_ptr using custom Delete Functor above + template + using NvInferHandle = std::unique_ptr>; + + // To cache binding info for tensors + typedef struct { + int32_t index; + uint32_t rank; + std::string binding_name; + gxf::PrimitiveType element_type; + std::array dimensions; + } BindingInfo; + std::unordered_map binding_infos_; + + // Converts loaded model to engine plan + gxf::Expected> convertModelToEngine(); + + gxf::Parameter model_file_path_; + gxf::Parameter engine_file_path_; + gxf::Parameter plugins_lib_namespace_; + gxf::Parameter force_engine_update_; + gxf::Parameter> input_tensor_names_; + gxf::Parameter> input_binding_names_; + gxf::Parameter> output_tensor_names_; + gxf::Parameter> output_binding_names_; + gxf::Parameter> pool_; + gxf::Parameter> cuda_stream_pool_; + gxf::Parameter max_workspace_size_; + gxf::Parameter dla_core_; + gxf::Parameter max_batch_size_; + gxf::Parameter enable_fp16_; + gxf::Parameter relaxed_dimension_check_; + gxf::Parameter verbose_; + gxf::Parameter> clock_; + gxf::Parameter device_id_; + gxf::Parameter>> rx_; + gxf::Parameter> tx_; + + // Logger instance for TensorRT + TensorRTInferenceLogger cuda_logger_; + + NvInferHandle cuda_execution_ctx_; + NvInferHandle cuda_engine_; + + gxf::Handle cuda_stream_; + std::vector cuda_buffers_; + cudaStream_t cached_cuda_stream_; + cudaEvent_t cuda_event_consumed_; +}; + +} // namespace gxf +} // namespace nvidia + +#endif // NVIDIA_GXF_EXTENSIONS_TENSOR_RT_TENSOR_RT_INFERENCE_HPP_ diff --git a/isaac_ros_tensor_rt/launch/isaac_ros_tensor_rt.launch.py b/isaac_ros_tensor_rt/launch/isaac_ros_tensor_rt.launch.py index 4933b20..7e33aa2 100644 --- a/isaac_ros_tensor_rt/launch/isaac_ros_tensor_rt.launch.py +++ b/isaac_ros_tensor_rt/launch/isaac_ros_tensor_rt.launch.py @@ -23,7 +23,7 @@ def generate_launch_description(): - """Generate launch description for TensorRT ROS2 node.""" + """Generate launch description for TensorRT ROS 2 node.""" # By default loads and runs mobilenetv2-1.0 included in isaac_ros_dnn_inference/models launch_args = [ DeclareLaunchArgument( diff --git a/isaac_ros_tensor_rt/package.xml b/isaac_ros_tensor_rt/package.xml index 05ff5ba..0057c75 100644 --- a/isaac_ros_tensor_rt/package.xml +++ b/isaac_ros_tensor_rt/package.xml @@ -21,7 +21,7 @@ SPDX-License-Identifier: Apache-2.0 isaac_ros_tensor_rt - 0.20.0 + 0.30.0 DNN Inference support for Isaac ROS CY Chen @@ -40,6 +40,8 @@ SPDX-License-Identifier: Apache-2.0 isaac_ros_nitros isaac_ros_nitros_tensor_list_type + isaac_ros_common + ament_lint_auto ament_lint_common isaac_ros_test diff --git a/isaac_ros_tensor_rt/src/tensor_rt_node.cpp b/isaac_ros_tensor_rt/src/tensor_rt_node.cpp index cc12b29..c83ac4d 100644 --- a/isaac_ros_tensor_rt/src/tensor_rt_node.cpp +++ b/isaac_ros_tensor_rt/src/tensor_rt_node.cpp @@ -50,10 +50,10 @@ constexpr char APP_YAML_FILENAME[] = "config/tensor_rt_inference.yaml"; constexpr char PACKAGE_NAME[] = "isaac_ros_tensor_rt"; const std::vector> EXTENSIONS = { - {"isaac_ros_nitros", "gxf/std/libgxf_std.so"}, - {"isaac_ros_nitros", "gxf/cuda/libgxf_cuda.so"}, - {"isaac_ros_nitros", "gxf/serialization/libgxf_serialization.so"}, - {"isaac_ros_nitros", "gxf/tensor_rt/libgxf_tensor_rt.so"} + {"isaac_ros_gxf", "gxf/lib/std/libgxf_std.so"}, + {"isaac_ros_gxf", "gxf/lib/cuda/libgxf_cuda.so"}, + {"isaac_ros_gxf", "gxf/lib/serialization/libgxf_serialization.so"}, + {"isaac_ros_tensor_rt", "gxf/tensor_rt/libgxf_tensor_rt.so"} }; const std::vector PRESET_EXTENSION_SPEC_NAMES = { "isaac_ros_tensor_rt", diff --git a/isaac_ros_tensor_rt/test/isaac_ros_tensor_rt_test.py b/isaac_ros_tensor_rt/test/isaac_ros_tensor_rt_test.py index 5e25471..52df228 100644 --- a/isaac_ros_tensor_rt/test/isaac_ros_tensor_rt_test.py +++ b/isaac_ros_tensor_rt/test/isaac_ros_tensor_rt_test.py @@ -32,7 +32,7 @@ @pytest.mark.rostest def generate_test_description(): - """Generate launch description with all TensorRT ROS2 nodes for testing.""" + """Generate launch description with all TensorRT ROS 2 nodes for testing.""" # By default loads and runs mobilenetv2-1.0 dir_path = os.path.dirname(os.path.realpath(__file__)) model_file_path = dir_path + '/../../test/models/mobilenetv2-1.0.onnx' diff --git a/isaac_ros_triton/CMakeLists.txt b/isaac_ros_triton/CMakeLists.txt index 86985b3..8cb89e8 100644 --- a/isaac_ros_triton/CMakeLists.txt +++ b/isaac_ros_triton/CMakeLists.txt @@ -15,56 +15,36 @@ # # SPDX-License-Identifier: Apache-2.0 -cmake_minimum_required(VERSION 3.13) +cmake_minimum_required(VERSION 3.23.2) project(isaac_ros_triton LANGUAGES C CXX) - -# Default to C++17 -if(NOT CMAKE_CXX_STANDARD) - set(CMAKE_CXX_STANDARD 17) -endif() - if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") add_compile_options(-Wall -Wextra -Wpedantic) endif() -# Default to Release build -if(NOT CMAKE_BUILD_TYPE OR CMAKE_BUILD_TYPE STREQUAL "") - set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE) -endif() -message( STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}" ) - -execute_process(COMMAND uname -m COMMAND tr -d '\n' - OUTPUT_VARIABLE ARCHITECTURE -) +execute_process(COMMAND uname -m COMMAND tr -d '\n' OUTPUT_VARIABLE ARCHITECTURE) message( STATUS "Architecture: ${ARCHITECTURE}" ) find_package(ament_cmake_auto REQUIRED) ament_auto_find_build_dependencies() +# TritonNode +ament_auto_add_library(triton_node SHARED src/triton_node.cpp) +rclcpp_components_register_nodes(triton_node "nvidia::isaac_ros::dnn_inference::TritonNode") +set(node_plugins "${node_plugins}nvidia::isaac_ros::dnn_inference::TritonNode;$\n") -# isaac_ros_triton_node -ament_auto_add_library(isaac_ros_triton_node SHARED src/triton_node.cpp) -target_compile_definitions(isaac_ros_triton_node - PRIVATE "COMPOSITION_BUILDING_DLL" -) -target_link_libraries(isaac_ros_triton_node) -rclcpp_components_register_node(isaac_ros_triton_node - PLUGIN "nvidia::isaac_ros::dnn_inference::TritonNode" - EXECUTABLE isaac_ros_triton) - -# Install config directory -install( - DIRECTORY config - DESTINATION share/${PROJECT_NAME} -) +### Install extensions built from source -# Install package executable -install(TARGETS isaac_ros_triton_node - ARCHIVE DESTINATION lib - LIBRARY DESTINATION lib - RUNTIME DESTINATION bin -) +# Triton + dependencies +add_subdirectory(gxf/triton) +install(TARGETS gxf_triton_ext DESTINATION share/${PROJECT_NAME}/gxf/triton/) +if( ${ARCHITECTURE} STREQUAL "x86_64" ) + set(ARCH_GXF_PATH "gxf_x86_64_cuda_11_8") + elseif( ${ARCHITECTURE} STREQUAL "aarch64" ) + set(ARCH_GXF_PATH "gxf_jetpack502") +endif() +install(DIRECTORY gxf/triton/nvds/lib/${ARCH_GXF_PATH}/ + DESTINATION share/${PROJECT_NAME}/gxf/triton) if(BUILD_TESTING) find_package(ament_lint_auto REQUIRED) @@ -75,4 +55,4 @@ if(BUILD_TESTING) add_launch_test(test/isaac_ros_triton_test_tf.py TIMEOUT "300") endif() -ament_auto_package(INSTALL_TO_SHARE launch) +ament_auto_package(INSTALL_TO_SHARE config launch) diff --git a/isaac_ros_triton/config/triton_node.yaml b/isaac_ros_triton/config/triton_node.yaml index 0086474..a0b12fa 100644 --- a/isaac_ros_triton/config/triton_node.yaml +++ b/isaac_ros_triton/config/triton_node.yaml @@ -46,6 +46,8 @@ components: max_batch_size: 0 async_scheduling_term: triton_response/async_st inference_mode: Direct + output_storage_type: 0 + use_sequence_data: true - name: requester type: nvidia::triton::TritonInferenceRequest parameters: diff --git a/isaac_ros_triton/gxf/AMENT_IGNORE b/isaac_ros_triton/gxf/AMENT_IGNORE new file mode 100644 index 0000000..e69de29 diff --git a/isaac_ros_triton/gxf/triton/CMakeLists.txt b/isaac_ros_triton/gxf/triton/CMakeLists.txt new file mode 100644 index 0000000..3a56073 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/CMakeLists.txt @@ -0,0 +1,109 @@ +# SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +# Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# SPDX-License-Identifier: Apache-2.0 + +project(gxf_triton_ext LANGUAGES C CXX) + +if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") + add_compile_options(-fPIC -w) +endif() + +# Dependencies +include(FetchContent) +include(YamlCpp) +find_package(GXF ${ISAAC_ROS_GXF_VERSION} MODULE REQUIRED + COMPONENTS + core + std +) +# Lock version of Protocol buffers for compatibility with pre-built NVDS +set(CMAKE_POLICY_DEFAULT_CMP0077 NEW) +set(protobuf_BUILD_TESTS OFF) +set(protobuf_BUILD_EXPORT OFF) +set(protobuf_MSVC_STATIC_RUNTIME OFF) +set(Protobuf_USE_STATIC_LIBS ON) +set(Protobuf_BUILD_SHARED_LIBS OFF) +fetchcontent_declare( + protobuf + GIT_REPOSITORY https://github.com/protocolbuffers/protobuf.git + GIT_TAG v3.8.0 + SOURCE_SUBDIR cmake +) +fetchcontent_makeavailable(protobuf) +find_package(Protobuf QUIET) + +# Compile protocol buffers +file(GLOB ProtoFiles "${CMAKE_CURRENT_SOURCE_DIR}/nvds/include/*.proto") +PROTOBUF_GENERATE_CPP(ProtoSources ProtoHeaders ${ProtoFiles}) +add_library(libgxf_triton_proto STATIC ${ProtoSources} ${ProtoHeaders}) +target_link_libraries(libgxf_triton_proto PUBLIC protobuf::libprotobuf) + +# NVDS pre-built +add_library(libs_triton::libnvbuf_fdmap SHARED IMPORTED) +add_library(libs_triton::libnvbufsurface SHARED IMPORTED) +add_library(libs_triton::libnvbufsurftransform SHARED IMPORTED) +add_library(libs_triton::libnvds_infer_server SHARED IMPORTED) +add_library(libs_triton::libnvds_inferlogger SHARED IMPORTED) +add_library(libs_triton::libnvds_inferutils SHARED IMPORTED) +add_library(libs_triton::libs_triton INTERFACE IMPORTED) +set_property(TARGET libs_triton::libs_triton PROPERTY + INTERFACE_LINK_LIBRARIES + libs_triton::libnvbuf_fdmap + libs_triton::libnvbufsurface + libs_triton::libnvbufsurftransform + libs_triton::libnvds_infer_server + libs_triton::libnvds_inferlogger + libs_triton::libnvds_inferutils +) + +execute_process(COMMAND uname -m COMMAND tr -d '\n' OUTPUT_VARIABLE ARCHITECTURE) +message( STATUS "Architecture: ${ARCHITECTURE}" ) +if( ${ARCHITECTURE} STREQUAL "x86_64" ) + set(ARCH_GXF_PATH "gxf_x86_64_cuda_11_8") + elseif( ${ARCHITECTURE} STREQUAL "aarch64" ) + set(ARCH_GXF_PATH "gxf_jetpack502") +endif() +set_property(TARGET libs_triton::libnvbuf_fdmap PROPERTY IMPORTED_LOCATION ${CMAKE_CURRENT_SOURCE_DIR}/nvds/lib/${ARCH_GXF_PATH}/libnvbuf_fdmap.so) +set_property(TARGET libs_triton::libnvbufsurface PROPERTY IMPORTED_LOCATION ${CMAKE_CURRENT_SOURCE_DIR}/nvds/lib/${ARCH_GXF_PATH}/libnvbufsurface.so) +set_property(TARGET libs_triton::libnvbufsurftransform PROPERTY IMPORTED_LOCATION ${CMAKE_CURRENT_SOURCE_DIR}/nvds/lib/${ARCH_GXF_PATH}/libnvbufsurftransform.so) +set_property(TARGET libs_triton::libnvds_infer_server PROPERTY IMPORTED_LOCATION ${CMAKE_CURRENT_SOURCE_DIR}/nvds/lib/${ARCH_GXF_PATH}/libnvds_infer_server.so) +set_property(TARGET libs_triton::libnvds_inferlogger PROPERTY IMPORTED_LOCATION ${CMAKE_CURRENT_SOURCE_DIR}/nvds/lib/${ARCH_GXF_PATH}/libnvds_inferlogger.so) +set_property(TARGET libs_triton::libnvds_inferutils PROPERTY IMPORTED_LOCATION ${CMAKE_CURRENT_SOURCE_DIR}/nvds/lib/${ARCH_GXF_PATH}/libnvds_inferutils.so) + +# Triton extension +add_library(gxf_triton_ext SHARED + extensions/triton/triton_server.cpp + inferencers/triton_inferencer_impl.cpp + triton_ext.cpp + triton_inference_request.cpp + triton_inference_response.cpp + triton_scheduling_terms.cpp +) +set(CMAKE_INCLUDE_CURRENT_DIR TRUE) +target_include_directories(gxf_triton_ext PRIVATE + ${CMAKE_CURRENT_SOURCE_DIR}/nvds/include + ${CMAKE_CURRENT_SOURCE_DIR}/extensions/triton +) +target_link_libraries(gxf_triton_ext + PUBLIC + GXF::std + libgxf_triton_proto + libs_triton::libs_triton + yaml-cpp +) +set_target_properties(gxf_triton_ext PROPERTIES BUILD_WITH_INSTALL_RPATH TRUE) +set_target_properties(gxf_triton_ext PROPERTIES INSTALL_RPATH "$ORIGIN") + diff --git a/isaac_ros_triton/gxf/triton/extensions/triton/triton_options.hpp b/isaac_ros_triton/gxf/triton/extensions/triton/triton_options.hpp new file mode 100644 index 0000000..b310695 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/extensions/triton/triton_options.hpp @@ -0,0 +1,39 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef NVIDIA_TRITON_TRITON_OPTIONS_HPP +#define NVIDIA_TRITON_TRITON_OPTIONS_HPP + +namespace nvidia { +namespace triton { + +/** + * @brief Triton Inference Options for model control and sequence control + * + */ +struct TritonOptions { + uint64_t sequence_id; // Should be non-zero because zero is reserved for non-sequence requests. + bool start; + bool end; + uint64_t priority; + uint64_t timeout; +}; + +} // namespace triton +} // namespace nvidia + +#endif diff --git a/isaac_ros_triton/gxf/triton/extensions/triton/triton_server.cpp b/isaac_ros_triton/gxf/triton/extensions/triton/triton_server.cpp new file mode 100644 index 0000000..2a794cd --- /dev/null +++ b/isaac_ros_triton/gxf/triton/extensions/triton/triton_server.cpp @@ -0,0 +1,120 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#include +#include +#include +#include +#include + +#include "common/logger.hpp" +#include "nvdsinferserver_config.pb.h" + +#include "triton_server.hpp" + +namespace nvidia { +namespace triton { + +gxf_result_t TritonServer::initialize() { + GXF_LOG_DEBUG("Initializing Triton Server..."); + + nvdsinferserver::config::TritonModelRepo model_repo_config; + + model_repo_config.set_log_level(log_level_.get()); + model_repo_config.set_strict_model_config(enable_strict_model_config_.get()); + model_repo_config.set_min_compute_capacity(static_cast(min_compute_capability_.get())); + for (const auto& model_repository_path : model_repository_paths_.get()) { + model_repo_config.add_root(model_repository_path); + } + model_repo_config.set_tf_gpu_memory_fraction(static_cast(tf_gpu_memory_fraction_.get())); + model_repo_config.set_tf_disable_soft_placement(tf_disable_soft_placement_.get()); + model_repo_config.set_backend_dir(backend_directory_path_.get()); + model_repo_config.set_model_control_mode(model_control_mode_.get()); + + size_t num_backend_config = 0; + const char delim_setting = ','; + const char delim_value = '='; + + auto maybe_backend_configs = backend_configs_.try_get(); + if (maybe_backend_configs) { + for (const auto& config : maybe_backend_configs.value()) { + model_repo_config.add_backend_configs(); + auto proto_config = model_repo_config.mutable_backend_configs(num_backend_config++); + + size_t delim_setting_pos = config.find(delim_setting); + if (delim_setting_pos == std::string::npos) { + GXF_LOG_ERROR("Unable to find '%c' in backend config: %s", delim_setting, config.c_str()); + return GXF_FAILURE; + } + size_t delim_value_pos = config.find(delim_value, delim_setting_pos); + if (delim_value_pos == std::string::npos) { + GXF_LOG_ERROR("Unable to find '%c' in backend config: %s", delim_value, config.c_str()); + return GXF_FAILURE; + } + if (delim_setting_pos >= delim_value_pos) { + GXF_LOG_ERROR("Delimeter '%c' must come before '%c' in backend config: %s", + delim_setting, delim_value, config.c_str()); + return GXF_FAILURE; + } + + const std::string backend_name = config.substr(0, delim_setting_pos); + const std::string backend_setting = config.substr(delim_setting_pos + 1, + delim_value_pos - delim_setting_pos - 1); + const std::string backend_value = config.substr(delim_value_pos + 1); + + proto_config->set_backend(backend_name); + proto_config->set_setting(backend_setting); + proto_config->set_value(backend_value); + } + } + + tritonRepoConfig_ = std::make_shared(model_repo_config); + + nvdsinferserver::ITritonServerInstance* server_ptr = nullptr; + auto result = NvDsTritonServerInit(&server_ptr, model_repo_config.DebugString().c_str(), + model_repo_config.DebugString().size()); + if (result != NvDsInferStatus::NVDSINFER_SUCCESS) { + GXF_LOG_ERROR("Error in NvDsTritonServerInit"); + return GXF_FAILURE; + } + + std::shared_ptr server( + server_ptr, NvDsTritonServerDeinit); + tritonInstance_ = std::move(server); + GXF_LOG_DEBUG("Successfully initialized Triton Server..."); + return GXF_SUCCESS; +} + +nvidia::gxf::Expected> +TritonServer::getServer() { + if (!tritonInstance_) { + return nvidia::gxf::Unexpected{GXF_NULL_POINTER}; + } + return tritonInstance_; +} + +nvidia::gxf::Expected> +TritonServer::getModelRepoConfig() { + if (!tritonRepoConfig_) { + return nvidia::gxf::Unexpected{GXF_NULL_POINTER}; + } + return tritonRepoConfig_; +} + + +} // namespace triton +} // namespace nvidia diff --git a/isaac_ros_triton/gxf/triton/extensions/triton/triton_server.hpp b/isaac_ros_triton/gxf/triton/extensions/triton/triton_server.hpp new file mode 100644 index 0000000..550c0e8 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/extensions/triton/triton_server.hpp @@ -0,0 +1,158 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef NVIDIA_TRITON_TRITON_SERVER_HPP +#define NVIDIA_TRITON_TRITON_SERVER_HPP + +#include +#include +#include + +#include "gxf/core/component.hpp" +#include "gxf/core/entity.hpp" +#include "gxf/core/expected.hpp" +#include "gxf/core/handle.hpp" + +#include "infer_icontext.h" +#include "nvdsinferserver_config.pb.h" + +namespace nvidia { +namespace triton { + +class TritonServer : public nvidia::gxf::Component { + public: + gxf_result_t registerInterface(nvidia::gxf::Registrar* registrar) override { + nvidia::gxf::Expected result; + + result &= registrar->parameter(log_level_, "log_level", + "Triton Logging Level", + "Set verbose logging level. 0 = Error, 1 = Warn, 2 = Info, 3+ = Verbose", 1U); + + result &= registrar->parameter(enable_strict_model_config_, + "enable_strict_model_config", + "Enable Strict Model Configuration", + "Enable Strict Model Configuration to enforce presence of config. " + "If disabled, TensorRT, TensorFlow saved-model, and ONNX models do " + "not require a model configuration file because Triton can derive " + "all the required settings automatically", true); + + result &= registrar->parameter(min_compute_capability_, + "min_compute_capability", + "Minimum Compute Capability", + "Minimum Compute Capability for GPU. " + "Refer to https://developer.nvidia.com/cuda-gpus", 6.0); + + result &= registrar->parameter(model_repository_paths_, + "model_repository_paths", + "List of Triton Model Repository Paths", + "List of Triton Model Repository Paths. Refer to " + "https://github.com/bytedance/triton-inference-server/blob/master/docs/" + "model_repository.md"); + + result &= registrar->parameter(tf_gpu_memory_fraction_, + "tf_gpu_memory_fraction", + "Tensorflow GPU Memory Fraction", + "The portion of GPU memory to be reserved for TensorFlow Models.", 0.0); + + result &= registrar->parameter(tf_disable_soft_placement_, + "tf_disable_soft_placement", + "Tensorflow will use CPU operation when GPU implementation is not available", + "Tensorflow will use CPU operation when GPU implementation is not available", true); + + result &= registrar->parameter(backend_directory_path_, + "backend_directory_path", + "Path to Triton Backend Directory", + "Path to Triton Backend Directory", std::string("")); + + result &= registrar->parameter(model_control_mode_, + "model_control_mode", + "Triton Model Control Mode", + "Triton Model Control Mode. 'none' will load all models at startup. 'explicit' " + "will allow load of models when needed. Unloading is unsupported", std::string("explicit")); + + result &= registrar->parameter(backend_configs_, + "backend_configs", + "Triton Backend Configurations", + "Triton Backend Configurations in format: 'backend,setting=value'. " + "Refer to Backend specific documentation: " + "https://github.com/triton-inference-server/tensorflow_backend#command-line-options, " + "https://github.com/triton-inference-server/python_backend#managing-shared-memory", + nvidia::gxf::Registrar::NoDefaultParameter(), GXF_PARAMETER_FLAGS_OPTIONAL); + + return nvidia::gxf::ToResultCode(result); + } + + /** + * @brief Create Triton Server via nvdsinferserver::ITritonServerInstance with parameters. + * + * @details Create Shared instance with destructor. + * + * @return gxf_result_t + */ + gxf_result_t initialize() override; + + + /** + * @brief Get the shared instance of nvdsinferserver::ITritonServerInstance. + * + * @details Shared ownership is necessary for proper deinitialization of the underlying Triton + * server since GXF lacks guarantees on deinitialize() ordering across multiple entities. + * + * @return nvidia::gxf::Expected> + */ + nvidia::gxf::Expected> getServer(); + + /** + * @brief Get the shared instance of config::TritonModelRepo. + * + * @details Shared ownership is necessary for proper deinitialization of the underlying Triton + * server since GXF lacks guarantees on deinitialize() ordering across multiple entities. + * + * @return nvidia::gxf::Expected> + */ + nvidia::gxf::Expected> + getModelRepoConfig(); + + private: + // Parameters supported by nvdsinferserver::config::TritonModelRepo + nvidia::gxf::Parameter log_level_; + nvidia::gxf::Parameter enable_strict_model_config_; + nvidia::gxf::Parameter min_compute_capability_; + nvidia::gxf::Parameter> model_repository_paths_; + + nvidia::gxf::Parameter tf_gpu_memory_fraction_; + nvidia::gxf::Parameter tf_disable_soft_placement_; + + nvidia::gxf::Parameter backend_directory_path_; + nvidia::gxf::Parameter model_control_mode_; + + nvidia::gxf::Parameter> backend_configs_; + + + // Shared instance is needed for proper deinitialize since this will be needed for each inference + // request. + std::shared_ptr tritonInstance_; + + // Shared instance is needed for constructing the inference config + std::shared_ptr tritonRepoConfig_; +}; + + +} // namespace triton +} // namespace nvidia + +#endif diff --git a/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_impl.cpp b/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_impl.cpp new file mode 100644 index 0000000..ea13766 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_impl.cpp @@ -0,0 +1,781 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#include "triton_inferencer_impl.hpp" + +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "gxf/std/tensor.hpp" +#include "gxf/std/timestamp.hpp" + +#include "extensions/triton/triton_options.hpp" + +#include "infer_icontext.h" +#include "infer_options.h" +#include "nvdsinferserver_config.pb.h" + +namespace nvidia { +namespace triton { + +struct Inference { + // Entity that will preserve lifetime for input tensors for the Inference Request + std::vector preserved_input; + + // Raw NvDs Ouptuts of the Inference Request + // WAR: The ownership should ideally be handed to a GXF Tensor: GXF-204 + nvdsinferserver::SharedIBatchArray raw_output; + + // Inference Status to be modified by Inference Request completion + NvDsInferStatus infer_status { NVDSINFER_SUCCESS }; + + // Future event indicates completion via callback + std::promise response_promise; + std::future response_completion; + + // Indicates whether this inference is currently active; this can be asynchronously accessed + // via isAcceptingRequest() + std::atomic is_active = { false }; + + // Indicates whether this inference is complete + bool is_complete = { false }; +}; + +/** + * @brief Helper function for translating TRITONSERVER_DataType to nvidia::gxf::PrimitiveType + * + * + * @param datatype + * @return nvidia::gxf::PrimitiveType + */ +static nvidia::gxf::PrimitiveType NvDsToGxfDataType(nvdsinferserver::InferDataType datatype) { + // Unsupported: + // nvdsinferserver::InferDataType::kFp16 + // nvdsinferserver::InferDataType::kString + // nvdsinferserver::InferDataType::kBool + static const std::unordered_map + sNvDsToGxf{ + {nvdsinferserver::InferDataType::kUint8, nvidia::gxf::PrimitiveType::kUnsigned8}, + {nvdsinferserver::InferDataType::kUint16, nvidia::gxf::PrimitiveType::kUnsigned16}, + {nvdsinferserver::InferDataType::kUint32, nvidia::gxf::PrimitiveType::kUnsigned32}, + {nvdsinferserver::InferDataType::kUint64, nvidia::gxf::PrimitiveType::kUnsigned64}, + {nvdsinferserver::InferDataType::kInt8, nvidia::gxf::PrimitiveType::kInt8}, + {nvdsinferserver::InferDataType::kInt16, nvidia::gxf::PrimitiveType::kInt16}, + {nvdsinferserver::InferDataType::kInt32, nvidia::gxf::PrimitiveType::kInt32}, + {nvdsinferserver::InferDataType::kInt64, nvidia::gxf::PrimitiveType::kInt64}, + {nvdsinferserver::InferDataType::kFp32, nvidia::gxf::PrimitiveType::kFloat32}, + {nvdsinferserver::InferDataType::kFp64, nvidia::gxf::PrimitiveType::kFloat64}, + {nvdsinferserver::InferDataType::kString, nvidia::gxf::PrimitiveType::kCustom}, + }; + auto const i = sNvDsToGxf.find(datatype); + if (i == sNvDsToGxf.end()) { + GXF_LOG_WARNING("Unsupported NvDs data type: %d", datatype); + return nvidia::gxf::PrimitiveType::kCustom; + } + return i->second; +} + +/** + * @brief Helper function for translating nvidia::gxf::PrimitiveType to + * nvdsinferserver::InferDataType + * + * @param datatype + * @return nvdsinferserver::InferDataType + */ +static nvdsinferserver::InferDataType GxfToNvDsDataType( + nvidia::gxf::PrimitiveType datatype) { + static const std::unordered_map + sGxfToNvDsData{ + {nvidia::gxf::PrimitiveType::kUnsigned8, nvdsinferserver::InferDataType::kUint8}, + {nvidia::gxf::PrimitiveType::kUnsigned16, nvdsinferserver::InferDataType::kUint16}, + {nvidia::gxf::PrimitiveType::kUnsigned32, nvdsinferserver::InferDataType::kUint32}, + {nvidia::gxf::PrimitiveType::kUnsigned64, nvdsinferserver::InferDataType::kUint64}, + {nvidia::gxf::PrimitiveType::kInt8, nvdsinferserver::InferDataType::kUint8}, + {nvidia::gxf::PrimitiveType::kInt16, nvdsinferserver::InferDataType::kInt16}, + {nvidia::gxf::PrimitiveType::kInt32, nvdsinferserver::InferDataType::kInt32}, + {nvidia::gxf::PrimitiveType::kInt64, nvdsinferserver::InferDataType::kInt64}, + {nvidia::gxf::PrimitiveType::kFloat32, nvdsinferserver::InferDataType::kFp32}, + {nvidia::gxf::PrimitiveType::kFloat64, nvdsinferserver::InferDataType::kFp64}, + {nvidia::gxf::PrimitiveType::kCustom, nvdsinferserver::InferDataType::kString}, + }; + // NOTE: Unsupported nvdsinferserver::InferDataType are: + // - kFp16 + // - kBool + // - kNone + auto const i = sGxfToNvDsData.find(datatype); + if (i == sGxfToNvDsData.end()) { + GXF_LOG_WARNING("Unsupported GXF data type: %d", datatype); + return nvdsinferserver::InferDataType::kNone; + } + return i->second; +} + +/** + * @brief Helper function for translating nvidia::gxf::MemoryStorageType to + * nvdsinferserver::InferMemType + * + * @param memory_type + * @return nvdsinferserver::InferMemType + */ +static nvdsinferserver::InferMemType GxfMemTypeToNvDsMemType( + nvidia::gxf::MemoryStorageType memory_type) { + static const std::unordered_map + sGxfToNvDsMem{ + {nvidia::gxf::MemoryStorageType::kHost, nvdsinferserver::InferMemType::kCpu}, + {nvidia::gxf::MemoryStorageType::kSystem, nvdsinferserver::InferMemType::kCpuCuda}, + {nvidia::gxf::MemoryStorageType::kDevice, nvdsinferserver::InferMemType::kGpuCuda}, + }; + auto const i = sGxfToNvDsMem.find(memory_type); + if (i == sGxfToNvDsMem.end()) { + GXF_LOG_WARNING("Unsupported GXF data type: %d", memory_type); + return nvdsinferserver::InferMemType::kNone; + } + return i->second; +} + +static nvdsinferserver::config::MemoryType GxfMemTypeToNvDsConfMemType( + nvidia::gxf::MemoryStorageType memory_type) { + static const std::unordered_map + sGxfToNvDsMem{ + {nvidia::gxf::MemoryStorageType::kHost, + nvdsinferserver::config::MemoryType::MEMORY_TYPE_CPU}, + {nvidia::gxf::MemoryStorageType::kSystem, + nvdsinferserver::config::MemoryType::MEMORY_TYPE_CPU}, + {nvidia::gxf::MemoryStorageType::kDevice, + nvdsinferserver::config::MemoryType::MEMORY_TYPE_GPU}, + }; + auto const i = sGxfToNvDsMem.find(memory_type); + if (i == sGxfToNvDsMem.end()) { + GXF_LOG_WARNING("Unsupported GXF data type: %d", memory_type); + return nvdsinferserver::config::MemoryType::MEMORY_TYPE_DEFAULT; + } + return i->second; +} + +static nvidia::gxf::MemoryStorageType NvDsMemTypeToGxfMemType( + nvdsinferserver::InferMemType memory_type) { + static const std::unordered_map + sNvDsMemToGxf{ + {nvdsinferserver::InferMemType::kCpu, nvidia::gxf::MemoryStorageType::kHost}, + {nvdsinferserver::InferMemType::kCpuCuda, nvidia::gxf::MemoryStorageType::kSystem}, + {nvdsinferserver::InferMemType::kGpuCuda, nvidia::gxf::MemoryStorageType::kDevice}, + }; + auto const i = sNvDsMemToGxf.find(memory_type); + GXF_ASSERT(i != sNvDsMemToGxf.end(), "Unsupported conversion from NvDs data type: %d", + memory_type); + return i->second; +} + +static gxf_result_t setTritonOptions( + nvdsinferserver::SharedIBatchArray& batchArray, const TritonOptions& triton_options) { + nvdsinferserver::SharedBufOptions options = std::make_shared(); + if (!options) { + GXF_LOG_ERROR("Unable to create Triton Options: SharedBufOptions"); + return GXF_NULL_POINTER; + } + options->setValue(OPTION_SEQUENCE_ID, triton_options.sequence_id); + options->setValue(OPTION_SEQUENCE_START, triton_options.start); + options->setValue(OPTION_SEQUENCE_END, triton_options.end); + options->setValue(OPTION_PRIORITY, triton_options.priority); + options->setValue(OPTION_TIMEOUT, triton_options.timeout); + + if (!batchArray) { + GXF_LOG_ERROR("Input batch array for setting Triton Options is null"); + return GXF_NULL_POINTER; + } + batchArray->setIOptions(std::move(options)); + if (!batchArray->getOptions()) { + GXF_LOG_ERROR("Batch array unable to getOptions()"); + return GXF_NULL_POINTER; + } + return GXF_SUCCESS; +} + +gxf_result_t TritonInferencerImpl::initialize() { + scheduling_term_->setEventState(nvidia::gxf::AsynchronousEventState::WAIT); + + // Initialize pool of Inference Requests. This is needed to occur before start() for proper + // behavior with Scheduling Terms dependent upon isAcceptingRequest(). + inference_pool_.resize(num_concurrent_requests_.get()); + for (size_t num = 0; num < num_concurrent_requests_.get(); num++) { + inference_pool_[num] = new Inference(); + } + if (!inference_pool_.size()) { + GXF_LOG_ERROR("Inference Pool Empty"); + return GXF_FAILURE; + } + return GXF_SUCCESS; +} + +gxf_result_t TritonInferencerImpl::construct() { + if (!inference_pool_.size()) { + GXF_LOG_ERROR("Inference Pool Empty"); + return GXF_FAILURE; + } + + if (nvidia::gxf::Shape::kMaxRank != NVDSINFER_MAX_DIMS) { + GXF_LOG_WARNING("GXF and NvDs Max Rank are mistmatched, which may cause problems."); + } + + nvdsinferserver::config::InferenceConfig inference_config; + inference_config.set_unique_id(static_cast(eid())); + inference_config.set_max_batch_size(max_batch_size_.get()); + inference_config.mutable_backend()->mutable_triton()->set_model_name(model_name_.get()); + inference_config.mutable_backend()->mutable_triton()->set_version(model_version_.get()); + inference_config.add_gpu_ids(0); + + // ensure no pre or post processing is attached + inference_config.clear_preprocess(); + inference_config.clear_postprocess(); + + if (inference_mode_.get() == TritonInferenceMode::kDirect) { + if (!server_handle_.try_get()) { + GXF_LOG_ERROR("Triton Server Handle is null with Direct inference mode"); + return GXF_ARGUMENT_INVALID; + } + auto maybe_server = server_handle_.try_get().value()->getServer(); + if (!maybe_server) { + GXF_LOG_ERROR("Triton Server is unexpected"); + return nvidia::gxf::ToResultCode(maybe_server); + } + server_ = maybe_server.value(); + + auto maybe_model_repo = server_handle_.try_get().value()->getServer(); + if (!maybe_model_repo) { + GXF_LOG_ERROR("Triton Server is unexpected"); + return nvidia::gxf::ToResultCode(maybe_model_repo); + } + auto model_repo = server_handle_.try_get().value()->getModelRepoConfig().value(); + + inference_config.mutable_backend()->mutable_triton()-> + mutable_model_repo()->CopyFrom(*model_repo); + + // suggests memory output storage location to Triton + if (output_storage_type_.try_get()) { + auto output_mem_type = GxfMemTypeToNvDsConfMemType( + nvidia::gxf::MemoryStorageType(output_storage_type_.try_get().value())); + inference_config.mutable_backend()->set_output_mem_type(output_mem_type); + } + + if (use_string_data_.get() || use_sequence_data_.get()) { + infer_context_.reset(createInferTritonSimpleContext()); + } else { + std::string configStr = inference_config.DebugString(); + infer_context_.reset(createInferTrtISContext(configStr.c_str(), configStr.size())); + } + + } else if (inference_mode_.get() == TritonInferenceMode::kRemoteGrpc) { + if (!server_endpoint_.try_get()) { + GXF_LOG_ERROR("Remote endpoint is not set with RemoteGrpc inference mode"); + return GXF_ARGUMENT_INVALID; + } + inference_config.mutable_backend()->mutable_triton()-> + mutable_grpc()->set_url(server_endpoint_.try_get().value()); + + // suggests memory output storage location to Triton + if (output_storage_type_.try_get()) { + auto output_mem_type = GxfMemTypeToNvDsConfMemType( + nvidia::gxf::MemoryStorageType(output_storage_type_.try_get().value())); + inference_config.mutable_backend()->set_output_mem_type(output_mem_type); + } + + // consider adding -> enable_cuda_buffer_sharing, when the DS team enables it + + if (use_string_data_.get() || use_sequence_data_.get()) { + infer_context_.reset(createInferTritonSimpleContext()); + } else { + std::string configStr = inference_config.DebugString(); + infer_context_.reset(createInferTritonGrpcContext(configStr.c_str(), configStr.size())); + } + } else { + GXF_LOG_ERROR("Invalid inference mode"); + return GXF_ARGUMENT_INVALID; + } + + if (!infer_context_) { + GXF_LOG_ERROR("Failure to create Inference Context for '%s'", model_name_.get().c_str()); + return GXF_FAILURE; + } + + NvDsInferStatus status = NVDSINFER_SUCCESS; + status = infer_context_->initialize(inference_config.DebugString(), nullptr); + + if (status != NVDSINFER_SUCCESS) { + GXF_LOG_ERROR("Failure to initialize Inference Context for '%s'", model_name_.get().c_str()); + return GXF_FAILURE; + } + return GXF_SUCCESS; +} + +gxf_result_t TritonInferencerImpl::destruct() { + for (auto& inference_ptr : inference_pool_) { + delete inference_ptr; + } + return GXF_SUCCESS; +} + +gxf_result_t TritonInferencerImpl::inferAsync( + const std::vector input_entities, + const std::vector input_names) { + // Reset scheduling term event to wait for next response. + scheduling_term_->setEventState(nvidia::gxf::AsynchronousEventState::EVENT_WAITING); + + // Use the current inference index to modify the inference object in the inference callback. + size_t current_inference_index = next_inference_index_.load(); + auto& inference = inference_pool_[current_inference_index]; + if (inference->is_active) { + GXF_LOG_ERROR("Next available Inference Context for '%s' is active. " + "Increase num_concurrent_requests.", model_name_.get().c_str()); + return GXF_EXCEEDING_PREALLOCATED_SIZE; + } + + next_inference_index_ = (next_inference_index_ + 1) % num_concurrent_requests_.get(); + + if (!infer_context_) { + GXF_LOG_ERROR("Inference Context not initialized"); + return GXF_FAILURE; + } + + // NOTE: Triton will own the input entity data until inference is complete. + // This is released during getResponse(). + inference->preserved_input = input_entities; + + nvdsinferserver::SharedIBatchArray input = NvDsInferServerCreateBatchArray(); + if (!input) { + GXF_LOG_ERROR("Unable to create nvds input tensors"); + return GXF_FAILURE; + } + + if (input_entities.size() != input_names.size()) { + GXF_LOG_ERROR("Mismatch in number of input_entities and input_names"); + return GXF_FAILURE; + } + + for (size_t i = 0; i < input_entities.size(); i++) { + auto input_entity = input_entities[i]; + auto input_name = input_names[i]; + auto maybe_entity_clone = input_entity.clone(); + if (!maybe_entity_clone) { + GXF_LOG_ERROR("Unable to clone input entity"); + return nvidia::gxf::ToResultCode(maybe_entity_clone); + } + auto entity_clone = maybe_entity_clone.value(); + inference->preserved_input.push_back(entity_clone); + auto input_tensors = entity_clone.findAll().value(); + for (auto input_tensor : input_tensors) { + const auto tensor = input_tensor.value(); + const auto& name = input_name.c_str(); + GXF_LOG_DEBUG("input tensor name = %s", name); + + if (tensor->rank() > NVDSINFER_MAX_DIMS) { + GXF_LOG_ERROR("Tensor rank '%u' is larger than NVDSINFER_MAX_DIMS"); + return GXF_FAILURE; + } + + // Input tensor needs to be fully batched + uint32_t batch_size = 0; // Offload "batch" to the fully specified InferDims instead + nvdsinferserver::InferDims dims; + + auto dataType = GxfToNvDsDataType(tensor->element_type()); + if (dataType == nvdsinferserver::InferDataType::kString) { + auto maybe_string_shape = input_entity.get(); + if (!maybe_string_shape) { + GXF_LOG_ERROR("Found Tensor with String Datatype "\ + "but no accompanying shape specification: "\ + "%s", name); + return GXF_FAILURE; + } + const auto shape = *maybe_string_shape.value(); + dims.numDims = shape.rank(); + dims.numElements = shape.size(); + for (size_t index = 0; index < shape.rank(); index++) { + if (shape.dimension(index) <= 0) { + GXF_LOG_ERROR("Tensor Dimension <= 0 not allowed"); + return GXF_FAILURE; + } + dims.d[index] = shape.dimension(index); + } + } else { + dims.numDims = tensor->rank(); + dims.numElements = static_cast(tensor->element_count()); + for (size_t index = 0; index < tensor->rank(); index++) { + if (tensor->shape().dimension(index) <= 0) { + GXF_LOG_ERROR("Tensor Dimension <= 0 not allowed"); + return GXF_FAILURE; + } + dims.d[index] = tensor->shape().dimension(index); + } + } + + nvdsinferserver::InferBufferDescription description { + memType : GxfMemTypeToNvDsMemType(tensor->storage_type()), + devId : 0, /* NOTE: GXF Allocator does not have concept of device ID for kGPU */ + dataType : dataType, + dims : dims, + elementSize : static_cast(tensor->bytes_per_element()), + name : name, + isInput : true + }; + + auto buffer = NvDsInferServerWrapBuf( + tensor->pointer(), tensor->size(), description, batch_size, [](void* data) {}); + input->appendIBatchBuf(buffer); + } + } + + auto maybe_triton_option = inference-> + preserved_input.front().get(); + if (maybe_triton_option) { + auto result = setTritonOptions(input, *maybe_triton_option.value()); + if (result != GXF_SUCCESS) { + return result; + } + } + + // Create a promise to be used in the inference callback + inference->response_promise = std::move(std::promise()); + + // Create a future object to be set in the inference callback + inference->response_completion = std::move(std::future( + inference->response_promise.get_future())); + + inference->infer_status = NVDSINFER_SUCCESS; + inference->is_active = true; + + // Increase count to be decremented once inference response is received. + incomplete_inference_count_++; + + NvDsInferStatus runStatus = infer_context_->run( + input, [current_inference_index, this]( + NvDsInferStatus s, nvdsinferserver::SharedIBatchArray out) { + auto& inference = inference_pool_[current_inference_index]; + inference->infer_status = s; + inference->raw_output = std::move(out); + inference->is_complete = true; + + if (scheduling_term_->getEventState() == nvidia::gxf::AsynchronousEventState::EVENT_DONE) { + GXF_LOG_DEBUG("Triton Async Event is unexpectedly already marked DONE"); + } + scheduling_term_->setEventState(nvidia::gxf::AsynchronousEventState::EVENT_DONE); + GXF_LOG_DEBUG("Triton Async Event DONE for index = %zu", current_inference_index); + + // NOTE: Set response promise last so that the EVENT_DONE notification occurs before the + // response wait() unblocks. This is to prevent a situation where EVENT_DONE is triggered + // after the entire response has already been processed. + inference->response_promise.set_value(); + }); + + if (runStatus != NVDSINFER_SUCCESS) { + GXF_LOG_ERROR("Unable to run Inference Context"); + return GXF_FAILURE; + } + return GXF_SUCCESS; +} + + +nvidia::gxf::Expected TritonInferencerImpl::getResponse() { + if (!infer_context_) { + GXF_LOG_ERROR("InferenceContext not initialized"); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + + auto& inference = inference_pool_[active_inference_index_]; + + if (!inference->is_complete) { + GXF_LOG_WARNING("Incomplete inference; response appeared out of order; invalid: index = %zu", + active_inference_index_); + GXF_LOG_WARNING("Inference appeared out of order"); + } + GXF_LOG_DEBUG("Trying to load inference for index: %zu", active_inference_index_); + + // Ensure the inference to complete. This normally will not block since the shared state + // should already be ready if this tick has been scheduled; however, if the inference response is + // received out of order from the inference request, we will wait to enforce FIFO. + inference->response_completion.wait(); + if (!inference->response_completion.valid() || !inference->is_active.load()) { + GXF_LOG_ERROR("Unexpectedly incomplete response for index: %zu", active_inference_index_); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + GXF_LOG_DEBUG("Successfully loaded inference for: %zu", active_inference_index_); + + // Increment the active index for the next response. + active_inference_index_ = (active_inference_index_ + 1) % num_concurrent_requests_.get(); + + if (inference->infer_status != NVDSINFER_SUCCESS) { + GXF_LOG_ERROR("Error with NvDs Async Infer: %d", inference->infer_status); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + + if (!inference->raw_output) { + GXF_LOG_ERROR("Unable to get valid outputs from Inference"); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + + auto maybe_output_entity = nvidia::gxf::Entity::New(context_); + if (!maybe_output_entity) { + GXF_LOG_ERROR("Unable to create maybe_output_entity"); + return maybe_output_entity; + } + + auto maybe_input_timestamp = inference-> + preserved_input.front().get("timestamp"); + if (maybe_input_timestamp) { + auto maybe_output_timestamp = + maybe_output_entity.value().add("timestamp"); + if (!maybe_output_timestamp) { + GXF_LOG_ERROR("Unable to create maybe_output_timestamp"); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + + *(maybe_output_timestamp.value().get()) = *(maybe_input_timestamp.value().get()); + } + + auto maybe_input_triton_option = inference-> + preserved_input.front().get(); + if (maybe_input_triton_option) { + auto maybe_output_triton_option = + maybe_output_entity.value().add(); + if (!maybe_output_triton_option) { + GXF_LOG_ERROR("Unable to create maybe_output_triton_option"); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + + *(maybe_output_triton_option.value().get()) = *(maybe_input_triton_option.value().get()); + } + + // Release the ref-counted input entity + inference->preserved_input.clear(); + auto& nvds_output = inference->raw_output; + + GXF_LOG_DEBUG("Raw Outputs size = %u", inference->raw_output->getSize()); + + for (uint32_t index = 0; index < inference->raw_output->getSize(); index++) { + const nvdsinferserver::IBatchBuffer* output_buf = inference->raw_output->getBuffer(index); + nvdsinferserver::SharedIBatchBuffer output_safe_buf = inference->raw_output->getSafeBuf(index); + if (!output_buf || output_safe_buf.get() != output_buf) { + GXF_LOG_ERROR("Mismatch between safe buffer and regular buffer of NvDs"); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + + const nvdsinferserver::InferBufferDescription& description = output_buf->getBufDesc(); + GXF_LOG_DEBUG("Batch size output '%s' = %u", + description.name.c_str(), output_buf->getBatchSize()); + GXF_LOG_DEBUG("Raw Outputs MemType type = %u", output_buf->getBufDesc().memType); + + auto tensor = maybe_output_entity.value().add(description.name.c_str()); + if (!tensor) { + GXF_LOG_ERROR("Unable to add tensor '%s' to output", description.name.c_str()); + return nvidia::gxf::Unexpected{tensor.error()}; + } + std::array dims; + dims[0] = output_buf->getBatchSize(); + size_t gxf_dims_index = 1; + uint32_t rank = 1; + + // If the model is non-dynamic, then batch size = 0. In this case, we need to ignore + // that dimension and override it with a meaningful dimension from DS's dimension. Reset rank + // to 0 as well to ignore previously set dimension. + if (dims[0] == 0) { + gxf_dims_index = 0; + rank = 0; + } + + // Batch will be first index in outgoing GXF Tensor + GXF_ASSERT_LE(description.dims.numDims + 1, nvidia::gxf::Shape::kMaxRank); + for (size_t nvsd_dim_index = 0; nvsd_dim_index < description.dims.numDims && + gxf_dims_index < nvidia::gxf::Shape::kMaxRank; nvsd_dim_index++, gxf_dims_index++) { + dims[gxf_dims_index] = static_cast(description.dims.d[nvsd_dim_index]); + rank++; + } + + nvidia::gxf::Shape ds_tensor_shape {dims, rank}; + uint64_t bytes_per_element = description.elementSize; + + if (description.dataType == nvdsinferserver::InferDataType::kString) { + GXF_LOG_DEBUG("Found output type of data type String!"); + // The shape returned by the inference for a string will be the shape of the unserialized + // data, which we preserve by adding to another component on the published entity. + auto maybe_output_shape = maybe_output_entity.value().add( + description.name.c_str()); + if (!maybe_output_shape) { + GXF_LOG_ERROR("Unable to add Shape '%s' to output", description.name.c_str()); + return nvidia::gxf::Unexpected{tensor.error()}; + } + *(maybe_output_shape.value().get()) = ds_tensor_shape; + + // Batch dimension does not matter here since serialization of strings is unique; it + // should only be interpreted with helper functions. We override the shape that is used for + // the outgoing tensor since we need to represent the fully serialized byte size. + ds_tensor_shape = nvidia::gxf::Shape{ + static_cast(output_buf->getTotalBytes())}; + bytes_per_element = 1; // sizeof(char) + } + + nvidia::gxf::MemoryStorageType target_storage_type = + NvDsMemTypeToGxfMemType(description.memType); + void* buffer_pointer = output_buf->getBufPtr(0); + + // convert output tensor to requested storage type + // if not specified in the config default, + // do not copy (ie. take whatever Triton gives as output) + bool needs_memory = false; + auto memcpy_kind = cudaMemcpyDefault; + if (output_storage_type_.try_get()) { + target_storage_type = nvidia::gxf::MemoryStorageType(output_storage_type_.try_get().value()); + auto current_storage_type = NvDsMemTypeToGxfMemType(description.memType); + if (target_storage_type != current_storage_type) { + switch (current_storage_type) { + case nvidia::gxf::MemoryStorageType::kHost: { + switch (target_storage_type) { + case nvidia::gxf::MemoryStorageType::kDevice: { + needs_memory = true; + memcpy_kind = cudaMemcpyHostToDevice; + } break; + case nvidia::gxf:: MemoryStorageType::kSystem: { + needs_memory = true; + memcpy_kind = cudaMemcpyHostToHost; + } break; + default: + GXF_LOG_ERROR("Unknown target storage type '%s' for copy", target_storage_type); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + } break; + case nvidia::gxf::MemoryStorageType::kDevice: { + switch (target_storage_type) { + case nvidia::gxf::MemoryStorageType::kHost: { + needs_memory = true; + memcpy_kind = cudaMemcpyDeviceToHost; + } break; + case nvidia::gxf:: MemoryStorageType::kSystem: { + needs_memory = true; + memcpy_kind = cudaMemcpyDeviceToHost; + } break; + default: + GXF_LOG_ERROR("Unknown target storage type '%s' for copy", target_storage_type); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + } break; + case nvidia::gxf:: MemoryStorageType::kSystem: { + switch (target_storage_type) { + case nvidia::gxf::MemoryStorageType::kHost: { + memcpy_kind = cudaMemcpyHostToHost; + } break; + case nvidia::gxf::MemoryStorageType::kDevice: { + memcpy_kind = cudaMemcpyHostToDevice; + } break; + default: + GXF_LOG_ERROR("Unknown target storage type '%s' for copy", target_storage_type); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + } break; + default: + GXF_LOG_ERROR("Unknown current storage type '%s' for copy", current_storage_type); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + } + + // Allocate memory if needed + if (needs_memory) { + auto pool = allocator_.try_get(); + if (!pool) { + GXF_LOG_ERROR("Allocator must be set when the requested output storage type" + " does not match triton output"); + return nvidia::gxf::Unexpected{GXF_ENTITY_COMPONENT_NOT_FOUND}; + } + + auto result = tensor.value()->reshapeCustom(ds_tensor_shape, + NvDsToGxfDataType(description.dataType), + bytes_per_element, + nvidia::gxf::Unexpected{GXF_UNINITIALIZED_VALUE}, + target_storage_type, pool.value()); + if (!result) { return nvidia::gxf::ForwardError(result); } + buffer_pointer = static_cast(tensor.value()->pointer()); + } + + // Perform the datacopy + const cudaError_t copy_error = cudaMemcpy(buffer_pointer, + output_buf->getBufPtr(0), + output_buf->getTotalBytes(), + memcpy_kind); + if (copy_error != cudaSuccess) { + GXF_LOG_ERROR("cudaMemcpy error: %s \n", cudaGetErrorString(copy_error)); + return nvidia::gxf::Unexpected{GXF_FAILURE}; + } + } + + // If memory was not allocated by this inferencer, then wrap incoming memory + // from triton context + if (!needs_memory) { + // Pass NvDs output by value to copy underlying shared_ptr. Once each of the outputs GXF tensors + // reach 0 ref count, the underlying NvDs ptr will also be decremented. + auto result = tensor.value()->wrapMemory( + ds_tensor_shape, NvDsToGxfDataType(description.dataType), bytes_per_element, + nvidia::gxf::Unexpected{GXF_UNINITIALIZED_VALUE}, + target_storage_type, + buffer_pointer, + [nvds_output] (void *) { + return nvidia::gxf::Success; + }); + + if (!result) { + GXF_LOG_ERROR("Unable to reshape tensor '%s' to output", description.name.c_str()); + return nvidia::gxf::Unexpected{result.error()}; + } + } + + GXF_ASSERT(output_buf->getTotalBytes() == tensor.value()->size(), + "Mismatch in expected GXF Tensor byte size: %" PRIu64 " != %zu", + output_buf->getTotalBytes(), tensor.value()->size()); + } + + // Shared instances of NvDs output will be managed through callbacks with wrapMemory + inference->raw_output = nvdsinferserver::SharedIBatchArray(); + // Reset for next inference + inference->is_complete = false; + inference->is_active = false; + + // We have processed this inference, so decrement this count, and check if all responses have + // been processed. + incomplete_inference_count_--; + GXF_LOG_DEBUG("incomplete_inference_count_ = %zu", incomplete_inference_count_.load()); + if (!incomplete_inference_count_.load()) { + GXF_LOG_DEBUG("Last inference reached; setting Async state to WAIT"); + scheduling_term_->setEventState(nvidia::gxf::AsynchronousEventState::WAIT); + } + + return maybe_output_entity; +} + +nvidia::gxf::Expected TritonInferencerImpl::isAcceptingRequest() { + if (!inference_pool_.size()) { return false; } + + // If the next inference context in the pool is not active, then another inference request can + // be stored there + const auto& inference = inference_pool_[next_inference_index_.load()]; + return !inference->is_active; +} + +} // namespace triton +} // namespace nvidia diff --git a/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_impl.hpp b/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_impl.hpp new file mode 100644 index 0000000..af6e07d --- /dev/null +++ b/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_impl.hpp @@ -0,0 +1,291 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef NVIDIA_TRITON_INFERENCERS_TRITON_INFERENCER_IMPL_HPP +#define NVIDIA_TRITON_INFERENCERS_TRITON_INFERENCER_IMPL_HPP + +#include +#include +#include +#include +#include +#include + +#include "gxf/core/component.hpp" +#include "gxf/core/entity.hpp" +#include "gxf/std/allocator.hpp" +#include "gxf/std/scheduling_terms.hpp" +#include "gxf/std/tensor.hpp" + +#include "triton_inferencer_interface.hpp" + +#include "extensions/triton/triton_server.hpp" + + +#include "cuda_runtime.h" + + +namespace nvidia { +namespace triton { + +/** + * @brief Struct to maintain members for Inference, such as input, outputs, status + * + * @details This maintains input Entity and raw output from the inference response. This class + * is intended to ease the use of an inference request pool. + * + */ +struct Inference; + +/** + * @brief Enumeration for inference mode + * + */ +enum struct TritonInferenceMode { + kDirect = 0, + kRemoteGrpc = 1, +}; + +/** + * @brief Triton Direct C API Implementation for inferencing. + * + */ +class TritonInferencerImpl : public nvidia::triton::TritonInferencerInterface { + public: + /** + * @brief Register parameters for usage with this component. + * + * @param registrar + * @return gxf_result_t + */ + gxf_result_t registerInterface(nvidia::gxf::Registrar* registrar) override { + nvidia::gxf::Expected result; + + result &= registrar->parameter(server_handle_, "server", + "Triton Server", + "Triton Server Handle", + nvidia::gxf::Registrar::NoDefaultParameter(), + GXF_PARAMETER_FLAGS_OPTIONAL); + + result &= registrar->parameter(model_name_, "model_name", + "Triton Model Name", + "Triton Model Name. Refer to Triton Model Repository."); + + result &= registrar->parameter(model_version_, "model_version", + "Triton Model Version", + "Triton Model Version. Refer to Triton Model Repository."); + + result &= registrar->parameter(max_batch_size_, "max_batch_size", + "Triton Max Batch Size for Model", + "Triton Max Batch Size for Model, which should match Triton Model Repository."); + + result &= registrar->parameter(num_concurrent_requests_, + "num_concurrent_requests", + "Maximum Number of concurrent Inference Requests", + "Maximum Number of concurrent Inference Requests, which defines the pool.", + 1U); + + result &= registrar->parameter(allocator_, + "allocator", + "Allocator", + "Allocator instance for output tensors.", + nvidia::gxf::Registrar::NoDefaultParameter(), GXF_PARAMETER_FLAGS_OPTIONAL); + + result &= registrar->parameter(output_storage_type_, + "output_storage_type", + "Specified output memory location: kHost, kDevice, kSystem" \ + "The memory storage type used by this allocator. ", + "Can be kHost (0), kDevice (1) or kSystem (2)", + nvidia::gxf::Registrar::NoDefaultParameter(), GXF_PARAMETER_FLAGS_OPTIONAL); + + result &= registrar->parameter(use_string_data_, + "use_string_data", + "Specify whether string data is being sent to Triton", + "Specify whether string data is being sent to Triton", + false); + + result &= registrar->parameter(use_sequence_data_, + "use_sequence_data", + "Specify whether sequence data is being sent to Triton", + "Specify whether sequence data is being sent to Triton", + false); + + result &= registrar->parameter(scheduling_term_, "async_scheduling_term", + "Asynchronous Scheduling Term", "Asynchronous Scheduling Term"); + + result &= registrar->parameter(inference_mode_, + "inference_mode", + "Triton Inference mode: Direct, RemoteGrpc", + "Triton Inference mode: Direct, RemoteGrpc"); + + result &= registrar->parameter(server_endpoint_, + "server_endpoint", + "Triton Server Endpoint for GRPC or HTTP", + "Triton Server Endpoint for GRPC or HTTP", + nvidia::gxf::Registrar::NoDefaultParameter(), + GXF_PARAMETER_FLAGS_OPTIONAL); + + return nvidia::gxf::ToResultCode(result); + } + + gxf_result_t initialize() override; + + /** + * @brief Allocate Triton ResponseAllocator. Reserve space for Inference Pool. + * Create Inference Request Pool + * + * @return gxf_result_t + */ + gxf_result_t construct() override; + + /** + * @brief Deallocate Triton ResponseAllocator. Clear Inference Pool. + * + * @return gxf_result_t + */ + gxf_result_t destruct() override; + + /** + * @brief Dispatch Triton inference request asynchronously. + * + * @param[in] input_entities Vector of input entities that contain the tensor data that + * correspond to Triton model inputs + * + * @param[in] input_names Vector of name strings for the tensors that + * correspond to Triton model inputs + * + * @return gxf_result_t + */ + gxf_result_t inferAsync(const std::vector input_entities, + const std::vector input_names) + override; + + /** + * @brief Get the Triton Response after an inference completes. + * + * @return nvidia::gxf::Expected + */ + nvidia::gxf::Expected getResponse() override; + + /** + * @brief Checks if inferencer can accept a new inference request. + * + * @return nvidia::gxf::Expected + */ + nvidia::gxf::Expected isAcceptingRequest() override; + + private: + nvidia::gxf::Parameter> server_handle_; + nvidia::gxf::Parameter model_name_; + nvidia::gxf::Parameter model_version_; + nvidia::gxf::Parameter max_batch_size_; + nvidia::gxf::Parameter num_concurrent_requests_; + nvidia::gxf::Parameter inference_mode_; + nvidia::gxf::Parameter server_endpoint_; + + // Special cases that aren't fully supported by the TRTIS backend yet + // These will need to use the simple case + nvidia::gxf::Parameter use_string_data_; + nvidia::gxf::Parameter use_sequence_data_; + + // Specify the output storage type + nvidia::gxf::Parameter output_storage_type_; + // Specify allocator incase memory needs to be allocated when the requested output storage type + // does not match the output memory type of the triton context + gxf::Parameter> allocator_; + + // Async Scheduling Term required to get/set event state. + nvidia::gxf::Parameter> + scheduling_term_; + + // Use a shared pointer to the server due to lack of guarantees on deinitialization order with + // Server component. + std::shared_ptr server_; + + // Instance of IInferContext can be used across multiple inferences of the same model. + std::shared_ptr infer_context_; + + // Set up a pool of inferences that manage Tensor inputs and response promises. The size of the + // pool must be large enough to accomodate multiple asynchronous requests, and it is controlled + // via parameter interface. + std::vector inference_pool_; + + // Mutex to protect counting in inference pool + std::mutex mutex_; + + // This represents the currently active index for the inference pool. Responses will use this + // index for the promises and future events. + size_t active_inference_index_ { 0 }; + + // This represents the next available index in the inference pool. This can be asynchronously + // accessed via isAcceptingRequest(). + std::atomic next_inference_index_ { 0 }; + + // This represents the number of incompleted inferences for the async + // continuation/termination conditions. This is incremented in a callback, and decremented when + // the inference response is received. + std::atomic incomplete_inference_count_ { 0 }; +}; + +} // namespace triton +} // namespace nvidia + +namespace nvidia { +namespace gxf { + +/** + * @brief Custom parameter parser for TritonInferenceMode + * + */ +template <> +struct ParameterParser<::nvidia::triton::TritonInferenceMode> { + static Expected<::nvidia::triton::TritonInferenceMode> Parse( + gxf_context_t context, gxf_uid_t component_uid, + const char* key, const YAML::Node& node, + const std::string& prefix) { + const std::string value = node.as(); + if (strcmp(value.c_str(), "Direct") == 0) { + return ::nvidia::triton::TritonInferenceMode::kDirect; + } + if (strcmp(value.c_str(), "RemoteGrpc") == 0) { + return ::nvidia::triton::TritonInferenceMode::kRemoteGrpc; + } + return ::nvidia::gxf::Unexpected{GXF_ARGUMENT_OUT_OF_RANGE}; + } +}; + +template<> +struct ParameterWrapper<::nvidia::triton::TritonInferenceMode> { + static Expected Wrap( + gxf_context_t context, + const ::nvidia::triton::TritonInferenceMode& value) { + std::string string_value; + if (value == ::nvidia::triton::TritonInferenceMode::kDirect) { + string_value = "Direct"; + } else if (value == ::nvidia::triton::TritonInferenceMode::kRemoteGrpc) { + string_value = "RemoteGrpc"; + } else { + return ::nvidia::gxf::Unexpected{GXF_ARGUMENT_OUT_OF_RANGE}; + } + return ParameterWrapper::Wrap(context, string_value); + } +}; + +} // namespace gxf +} // namespace nvidia + +#endif diff --git a/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_interface.hpp b/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_interface.hpp new file mode 100644 index 0000000..c351a75 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/inferencers/triton_inferencer_interface.hpp @@ -0,0 +1,91 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef NVIDIA_TRITON_INFERENCERS_TRITON_INFERENCER_INTERFACE_HPP +#define NVIDIA_TRITON_INFERENCERS_TRITON_INFERENCER_INTERFACE_HPP + +#include +#include + +#include "gxf/core/component.hpp" +#include "gxf/core/entity.hpp" + +namespace nvidia { +namespace triton { + +/** + * @brief Interface to wrap implementation of Triton inferencing. + * + */ +class TritonInferencerInterface : public nvidia::gxf::Component { + public: + /** + * @brief Prepare and set up any members specific to implementation. + * + * @details Derived component may prepare any implementation specific + * members/details here. We cannot leverage initialize() due to lack + * of guarantees on other component initializations. + * + * @return gxf_result_t + */ + virtual gxf_result_t construct() = 0; + + /** + * @brief Destroy any members specific to implementation. + * + * @details Derived component may prepare any implementation specific + * members/details here. We cannot leverage deinitialize() due to lack + * of guarantees on other component initializations. + * + * @return gxf_result_t + */ + virtual gxf_result_t destruct() = 0; + + /** + * @brief Dispatch Triton inference request asynchronously. + * + * @param[in] tensors Entity that contains a tensor map with names + * corresponding to Triton model inputs + * + * @return gxf_result_t + */ + virtual gxf_result_t inferAsync(const std::vector input_entities, + const std::vector input_names) = 0; + + /** + * @brief Get the Triton Response after an inference completes. + * + * @return nvidia::gxf::Expected + */ + virtual nvidia::gxf::Expected getResponse() = 0; + + /** + * @brief Checks if inferencer can accept a new inference request. + * + * @details This will be leveraged by scheduling term that decides to + * schedule the request codelet. + * This allows for inferSync behavior depending upon inferencer's implementation. + * + * @return nvidia::gxf::Expected + */ + virtual nvidia::gxf::Expected isAcceptingRequest() = 0; +}; + +} // namespace triton +} // namespace nvidia + +#endif diff --git a/isaac_ros_triton/gxf/triton/nvds/include/infer_custom_process.h b/isaac_ros_triton/gxf/triton/nvds/include/infer_custom_process.h new file mode 100644 index 0000000..6b25598 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/infer_custom_process.h @@ -0,0 +1,122 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef __NVDSINFERSERVER_CUSTOM_PROCESSOR_H__ +#define __NVDSINFERSERVER_CUSTOM_PROCESSOR_H__ + +#include + +#include +#include +#include +#include +#include + +namespace nvdsinferserver { + +/** + * @brief Interface of Custom processor which is created and loaded at runtime + * through CreateCustomProcessorFunc. + * Note: Full dimensions are used for all the inputs and output tensors. then + * IBatchBuffer::getBatchSize() usually return 0. This is matched with + * Triton-V2 shapes for public. to get buf_ptr, user always use + * IBatchBuffer::getBufPtr(idx=0). + */ +class IInferCustomProcessor { +public: + /** @brief IInferCustomProcessor will be deleted by nvdsinferserver lib. + */ + virtual ~IInferCustomProcessor() = default; + + /** @brief Query the memory type, extraInputProcess() implementation supports. + * Memory will be allocated based on the return type and passed to + * extraInputProcess(). + * + * @param type, [output], must be chosen from InferMemType::kCpu or + * InferMemType::kGpuCuda, + */ + virtual void supportInputMemType(InferMemType& type) { type = InferMemType::kCpu; } + + /** + * @brief Indicate whether this custom processor requires inference loop, + * in which nvdsinferserver lib guarantees extraInputProcess() and + * InferenceDone() running in order per each stream id. User can process last + * frame's output tensor from inferenceDone() and feed into next frame's + * inference input tensor in extraInputProcess() + * @return true if need loop(e.g. LSTM based processing); Else @return false. + */ + virtual bool requireInferLoop() const { return false; } + + /** + * @brief Custom processor for extra input data. + * + * @param primaryInputs, [input], the primary image input + * @param extraInputs [input/output], custom processing to generate extra tensor + * input data. The memory is pre-allocated. memory type is same as + * supportInputMemType returned types. + * @param options, [input]. Associated options along with the input buffers. + * It has most of the common Deepstream metadata along with primary data. + * e.g. NvDsBatchMeta, NvDsObjectMeta, NvDsFrameMeta, stream ids and so on. + * See infer_ioptions.h to get all the potential key name and structures + * in the key-value table. + * @return NvDsInferStatus, if successful implementation must return NVDSINFER_SUCCESS + * or an error value in case of error. + */ + virtual NvDsInferStatus extraInputProcess( + const std::vector& primaryInputs, std::vector& extraInputs, + const IOptions* options) = 0; + + /** + * @brief Inference done callback for custom postpocessing. + * + * @param outputs, [input], the inference output tensors. the tensor + * memory type could be controled by + * infer_config{ backend{ output_mem_type: MEMORY_TYPE_DEFAULT } }, + * The default output tensor memory type is decided by triton model. + * User can set other values from MEMORY_TYPE_CPU, MEMORY_TYPE_GPU. + * @param inOptions, [input], corresponding options from input tensors. It is + * same as options in extraInputProcess(). + * @return NvDsInferStatus, if successful implementation must return NVDSINFER_SUCCESS + * or an error value in case of error. + */ + virtual NvDsInferStatus inferenceDone( + const IBatchArray* outputs, const IOptions* inOptions) = 0; + + /** + * @brief Notification of an error to the interface implementation. + * + * @param status, [input], error code + */ + virtual void notifyError(NvDsInferStatus status) = 0; +}; + +} // namespace nvdsinferserver + +extern "C" { + +/** + * Custom processor context is created and loaded in runtime. + * + * @param[in] config Contents of prototxt configuration file serialized as a string. + * @param[in] configLen use for string length of \a config + * @return new instance of IInferCustomProcessor. If failed, return nullptr + */ +typedef nvdsinferserver::IInferCustomProcessor* (*CreateCustomProcessorFunc)( + const char* config, uint32_t configLen); +} + +#endif \ No newline at end of file diff --git a/isaac_ros_triton/gxf/triton/nvds/include/infer_datatypes.h b/isaac_ros_triton/gxf/triton/nvds/include/infer_datatypes.h new file mode 100644 index 0000000..5dbc031 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/infer_datatypes.h @@ -0,0 +1,176 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2020-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef __NVDSINFERSERVER_DATA_TYPES_H__ +#define __NVDSINFERSERVER_DATA_TYPES_H__ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +namespace nvdsinferserver { + +enum class InferTensorOrder : int { + kNone = 0, + kLinear = 1, + kNHWC = 2, +}; + +enum class InferMemType : int { + kNone = 0, + kGpuCuda = 1, + kCpu = 2, + kCpuCuda = 3, + kNvSurface = 5, + kNvSurfaceArray = 6, +}; + +enum class InferDataType : int { + kFp32 = FLOAT, // 0 + kFp16 = HALF, // 1 + kInt8 = INT8, // 2 + kInt32 = INT32, // 3 + kInt16 = 7, + kUint8, + kUint16, + kUint32, + // New + kFp64, + kInt64, + kUint64, + kString, // for text/bytes => str_len(4byte) + str('a\0') + kBool, + kNone = -1, +}; + +enum class InferPostprocessType : int { + kDetector = 0, + kClassifier = 1, + kSegmentation = 2, + kTrtIsClassifier = 3, + kOther = 100, +}; + +enum class InferMediaFormat : int { + /** 24-bit interleaved R-G-B */ + kRGB = 0, + /** 24-bit interleaved B-G-R */ + kBGR, + /** 8-bit Luma */ + kGRAY, + /** 32-bit interleaved R-G-B-A */ + kRGBA, + /** 32-bit interleaved B-G-R-x */ + kBGRx, + kUnknown = -1, +}; + +// typedef NvDsInferDims InferDims; + +struct InferDims +{ + /** Number of dimesions of the layer.*/ + unsigned int numDims = 0; + /** Size of the layer in each dimension. */ + int d[NVDSINFER_MAX_DIMS] = {0}; + /** Number of elements in the layer including all dimensions.*/ + unsigned int numElements = 0; +}; + +/** + * Holds full dimensions (including batch size) for a layer. + */ +struct InferBatchDims +{ + int batchSize = 0; + InferDims dims; +}; + +struct InferBufferDescription { + InferMemType memType; + long int devId; + InferDataType dataType; + InferDims dims; + uint32_t elementSize; // per element bytes, except kString(with elementSize + // is 0) + std::string name; + bool isInput; +}; + +// Common buffer interface [external] +class IBatchBuffer; +class IBatchArray; +class IOptions; + +using SharedIBatchBuffer = std::shared_ptr; +using SharedIBatchArray = std::shared_ptr; +using SharedIOptions = std::shared_ptr; + +class IBatchBuffer { +public: + IBatchBuffer() = default; + virtual ~IBatchBuffer() = default; + virtual const InferBufferDescription& getBufDesc() const = 0; + virtual void* getBufPtr(uint32_t batchIdx) const = 0; + virtual uint32_t getBatchSize() const = 0; + virtual uint64_t getTotalBytes() const = 0; + +private: + DISABLE_CLASS_COPY(IBatchBuffer); +}; + +class IBatchArray { +public: + IBatchArray() = default; + virtual ~IBatchArray() = default; + virtual uint32_t getSize() const = 0; + virtual const IBatchBuffer* getBuffer(uint32_t arrayIdx) const = 0; + virtual const IOptions* getOptions() const = 0; + + virtual SharedIBatchBuffer getSafeBuf(uint32_t arrayIdx) const = 0; + + // add values + virtual void appendIBatchBuf(SharedIBatchBuffer buf) = 0; + virtual void setIOptions(SharedIOptions o) = 0; + +private: + DISABLE_CLASS_COPY(IBatchArray); +}; + +struct LayerInfo { + InferDataType dataType = InferDataType::kFp32; + InferDims inferDims; + int bindingIndex = 0; + bool isInput = 0; + std::string name; + // New + int maxBatchSize; // 0=> nonBatching +}; + +} // namespace nvdsinferserver + +#endif diff --git a/isaac_ros_triton/gxf/triton/nvds/include/infer_defines.h b/isaac_ros_triton/gxf/triton/nvds/include/infer_defines.h new file mode 100644 index 0000000..b5cea28 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/infer_defines.h @@ -0,0 +1,132 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2020-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef __NVDSINFERSERVER_DEFINES_H__ +#define __NVDSINFERSERVER_DEFINES_H__ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define DISABLE_CLASS_COPY(NoCopyClass) \ + NoCopyClass(const NoCopyClass&) = delete; \ + void operator=(const NoCopyClass&) = delete + +#define SIMPLE_MOVE_COPY(Cls) \ + Cls& operator=(Cls&& o) { \ + move_copy(std::move(o)); \ + return *this; \ + } \ + Cls(Cls&& o) { move_copy(std::move(o)); } + +#define INFER_UNUSED(a) (void)(a) + +#if defined(NDEBUG) +#define INFER_LOG_FORMAT_(fmt) fmt +#else +#define INFER_LOG_FORMAT_(fmt) "%s:%d " fmt, __FILE__, __LINE__ +#endif + +#define INFER_EXPORT_API __attribute__((__visibility__("default"))) + +#define InferError(fmt, ...) \ + do { \ + dsInferLogPrint__( \ + NVDSINFER_LOG_ERROR, INFER_LOG_FORMAT_(fmt), ##__VA_ARGS__); \ + } while (0) + +#define InferWarning(fmt, ...) \ + do { \ + dsInferLogPrint__( \ + NVDSINFER_LOG_WARNING, INFER_LOG_FORMAT_(fmt), ##__VA_ARGS__); \ + } while (0) + +#define InferInfo(fmt, ...) \ + do { \ + dsInferLogPrint__( \ + NVDSINFER_LOG_INFO, INFER_LOG_FORMAT_(fmt), ##__VA_ARGS__); \ + } while (0) + +#define InferDebug(fmt, ...) \ + do { \ + dsInferLogPrint__( \ + NVDSINFER_LOG_DEBUG, INFER_LOG_FORMAT_(fmt), ##__VA_ARGS__); \ + } while (0) + +#define RETURN_IF_FAILED(condition, ret, fmt, ...) \ + do { \ + if (!(condition)) { \ + InferError(fmt, ##__VA_ARGS__); \ + return ret; \ + } \ + } while (0) + +#define CHECK_NVINFER_ERROR_PRINT(err, action, logPrint, fmt, ...) \ + do { \ + NvDsInferStatus ifStatus = (err); \ + if (ifStatus != NVDSINFER_SUCCESS) { \ + auto errStr = NvDsInferStatus2Str(ifStatus); \ + logPrint(fmt ", nvinfer error:%s", ##__VA_ARGS__, errStr); \ + action; \ + } \ + } while (0) + +#define CHECK_NVINFER_ERROR(err, action, fmt, ...) \ + CHECK_NVINFER_ERROR_PRINT(err, action, InferError, fmt, ##__VA_ARGS__) + +#define RETURN_NVINFER_ERROR(err, fmt, ...) \ + CHECK_NVINFER_ERROR(err, return ifStatus, fmt, ##__VA_ARGS__) + +#define CONTINUE_NVINFER_ERROR(err, fmt, ...) \ + CHECK_NVINFER_ERROR(err, , fmt, ##__VA_ARGS__) + + +#define CHECK_CUDA_ERR_W_ACTION(err, action, logPrint, fmt, ...) \ + do { \ + cudaError_t errnum = (err); \ + if (errnum != cudaSuccess) { \ + logPrint(fmt ", cuda err_no:%d, err_str:%s", ##__VA_ARGS__, \ + (int)errnum, cudaGetErrorName(errnum)); \ + action; \ + } \ + } while (0) + +#define CHECK_CUDA_ERR_NO_ACTION(err, fmt, ...) \ + CHECK_CUDA_ERR_W_ACTION(err, , InferError, fmt, ##__VA_ARGS__) + +#define RETURN_CUDA_ERR(err, fmt, ...) \ + CHECK_CUDA_ERR_W_ACTION( \ + err, return NVDSINFER_CUDA_ERROR, InferError, fmt, ##__VA_ARGS__) + +#define CONTINUE_CUDA_ERR(err, fmt, ...) \ + CHECK_CUDA_ERR_NO_ACTION(err, fmt, ##__VA_ARGS__) + +#define READ_SYMBOL(lib, func_name) \ + lib->symbol(#func_name) + +#define DIVIDE_AND_ROUND_UP(a, b) ((a + b - 1) / b) +#define INFER_ROUND_UP(value, align) (((value) + (align)-1) & (~((align)-1))) +#define INFER_ROUND_DOWN(value, align) ((value) & (~((align)-1))) +#define INFER_WILDCARD_DIM_VALUE -1 +#define INFER_MEM_ALIGNMENT 1024 + +#endif diff --git a/isaac_ros_triton/gxf/triton/nvds/include/infer_icontext.h b/isaac_ros_triton/gxf/triton/nvds/include/infer_icontext.h new file mode 100644 index 0000000..8becc53 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/infer_icontext.h @@ -0,0 +1,189 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2018-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef __NVDSINFERSERVER_ICONTEXT_H__ +#define __NVDSINFERSERVER_ICONTEXT_H__ + +#ifdef __cplusplus + +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace nvdsinferserver { + +/** + * Inference Output callback interface. + */ +using InferOutputCb = std::function; + +/** + * Inference Logging function interface. + */ +using InferLoggingFunc = + std::function; + +/** + * The DeepStream inference interface class. + */ +class IInferContext { +public: + virtual ~IInferContext() = default; + + /** + * Initialize InferConxt with config text. + * + * @param[in] prototxt is the config string. + * @param[in] logFunc use for print logs. If value is nullptr, use default + * log printer. + * @return NVDSINFER_SUCCESS if run good. + */ + virtual NvDsInferStatus initialize( + const std::string& prototxt, InferLoggingFunc logFunc) = 0; + + /** + * Run inference relavant processing behind. expect the call is async_mode. + * When all behind processing finished. \a done would be called. + * + * @param[in] input holds all batch buffer array. + * @param[in] outputCb use for callback with final status and output array. + * @return NVDSINFER_SUCCESS if run good. + */ + virtual NvDsInferStatus run( + SharedIBatchArray input, InferOutputCb outputCb) = 0; + + /** + * Destroy InferConxt + * + * @return NVDSINFER_SUCCESS if run good. + */ + virtual NvDsInferStatus deinit() = 0; + + /** + * Get the network input information. + * + * @param[in,out] networkInfo Reference to a NvDsInferNetworkInfo structure. + */ + virtual void getNetworkInputInfo(NvDsInferNetworkInfo &networkInfo) = 0; +}; + +/** + * Triton Server global instance. When it is instantiated, all models would be + * loaded prior to all InferContext. Class interfaces is coming soon. + */ +class ITritonServerInstance; + +} // namespace nvdsinferserver + +extern "C" { + +/** + * Creates a new instance of IInferContext initialized using the supplied + * parameters. + * + * @param[in] configStr Parameters to use for initialization of the context. + * @param[in] configStrLen use for string length of \a configStr + * @return new instance of IInferContext. If failed, return nullptr + */ +INFER_EXPORT_API nvdsinferserver::IInferContext* createInferTrtISContext( + const char* configStr, uint32_t configStrLen); + +/** + * Creates a light weight Triton instance of IInferContext. + * + * @return new instance of IInferContext. If failed, return nullptr + */ +INFER_EXPORT_API nvdsinferserver::IInferContext* +createInferTritonSimpleContext(); + +INFER_EXPORT_API nvdsinferserver::IInferContext* +createInferTritonGrpcContext(const char* configStr, uint32_t configStrLen); + +/** + * Creates Triton Server Instance as global singleton. Application need hold it + * until no component need triton inference in process. + * + * @param[in] configStr Parameters for Triton model repo settings. + * @param[in] configStrLen use for string length of \a configStr + * @param[out] instance use for output. + * @return status. If ok, return NVDSINFER_SUCCESS. + */ +INFER_EXPORT_API NvDsInferStatus NvDsTritonServerInit( + nvdsinferserver::ITritonServerInstance** instance, const char* configStr, + uint32_t configStrLen); + +/** + * Destroys Triton Server Instance. Application need call this function before + * process exist. + * + * @param[in] instance use for instance to be destroyed. + * @return status. If ok, return NVDSINFER_SUCCESS. + */ +INFER_EXPORT_API NvDsInferStatus +NvDsTritonServerDeinit(nvdsinferserver::ITritonServerInstance* instance); + +/** + * Wrap a user buffer into SharedIBatchBuffer for IInferContext to use. + * + * @param[in] buf The raw content data pointer. + * @param[in] bufBytes Byte size of \a buf. + * @param[in] desc Buffer Description of \a buf + * @param[in] batchSize Batch size of \a buf. 0 indicates a full-dim buffer. + * @param[in] freeFunc A C++14 function indicates how to free \a buf. + * @return Batched buffer in shared_ptr. If failed, return nullptr. + */ +INFER_EXPORT_API nvdsinferserver::SharedIBatchBuffer NvDsInferServerWrapBuf( + void* buf, size_t bufBytes, + const nvdsinferserver::InferBufferDescription& desc, uint32_t batchSize, + std::function freeFunc); + +/** + * Create a empty BatchArray. + * + * @return A empty Batched array in shared_ptr. If failed, return nullptr. + */ +INFER_EXPORT_API nvdsinferserver::SharedIBatchArray +NvDsInferServerCreateBatchArray(); + +/** + * Create a SharedIBatchBuffer with a vector of strings stored inside. + * + * @param[in] strings A bunch of strings. + * @param[in] dims The shapes for each batch. It could be a full-dim if + * \a batchSize is 0. + * @param[in] batchSize Batch size of \a strings. 0 indicates non-batching. + * @param[in] name Tensor name of this buffer. + * @param[in] isInput Indicates whether the buffer is for input. It should + * always be true for external users. + * @return A Batched Buffer stroing all strings with memtype InferMemType::kCpu. + */ +INFER_EXPORT_API nvdsinferserver::SharedIBatchBuffer +NvDsInferServerCreateStrBuf( + const std::vector& strings, + const nvdsinferserver::InferDims& dims, uint32_t batchSize, + const std::string& name, bool isInput); +} + +#endif + +#endif diff --git a/isaac_ros_triton/gxf/triton/nvds/include/infer_ioptions.h b/isaac_ros_triton/gxf/triton/nvds/include/infer_ioptions.h new file mode 100644 index 0000000..2862760 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/infer_ioptions.h @@ -0,0 +1,171 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef __NVDSINFERSERVER_I_OPTIONS_H__ +#define __NVDSINFERSERVER_I_OPTIONS_H__ + +#include +#include + +#include +#include +#include +#include +#include +#include + +namespace nvdsinferserver { + +enum class OptionType : int { + oBool = 0, + oDouble, + oInt, + oUint, + oString, + oObject, + oArray, + oNone = -1, +}; + +#define OPTION_SEQUENCE_ID "sequence_id" // uint64_t +#define OPTION_SEQUENCE_START "sequence_start" // bool +#define OPTION_SEQUENCE_END "sequence_end" // bool +#define OPTION_PRIORITY "priority" // uint64_t +#define OPTION_TIMEOUT "timeout_ms" // uint64_t +#define OPTION_NVDS_UNIQUE_ID "nvds_unique_id" // int64_t +#define OPTION_NVDS_SREAM_IDS "nvds_stream_ids" // source_id list, vector +#define OPTION_NVDS_FRAME_META_LIST "nvds_frame_meta_list" // vector +#define OPTION_NVDS_OBJ_META_LIST "nvds_obj_meta_list" // vector +#define OPTION_NVDS_BATCH_META "nvds_batch_meta" // NvDsBatchMeta* +#define OPTION_NVDS_GST_BUFFER "nvds_gst_buffer" // GstBuffer* +#define OPTION_NVDS_BUF_SURFACE "nvds_buf_surface" // NvBufSurface* +#define OPTION_NVDS_BUF_SURFACE_PARAMS_LIST "nvds_buf_surface_params_list" // vector +#define OPTION_TIMESTAMP "timestamp" // uint64_t timestamp nano seconds + +class IOptions { +public: + IOptions() = default; + virtual ~IOptions() = default; + virtual bool hasValue(const std::string& key) const = 0; + virtual OptionType getType(const std::string& name) const = 0; + virtual uint32_t getCount() const = 0; + virtual std::string getKey(uint32_t idx) const = 0; + +protected: + virtual NvDsInferStatus getValuePtr( + const std::string& name, OptionType t, void*& ptr) const = 0; + virtual NvDsInferStatus getArraySize(const std::string& key, uint32_t& size) const = 0; + virtual NvDsInferStatus getRawPtrArray( + const std::string& name, OptionType ot, void** ptrBase, uint32_t size) const = 0; + + template + struct OTypeV { + static constexpr OptionType v = V; + }; + template + struct oType; + +public: + NvDsInferStatus getDouble(const std::string& name, double& v) const + { + return getValue(name, v); + } + NvDsInferStatus getInt(const std::string& name, int64_t& v) const + { + return getValue(name, v); + } + NvDsInferStatus getUInt(const std::string& name, uint64_t& v) const + { + return getValue(name, v); + } + NvDsInferStatus getString(const std::string& name, std::string& v) + { + return getValue(name, v); + } + NvDsInferStatus getBool(const std::string& name, bool& v) const + { + return getValue(name, v); + } + template + NvDsInferStatus getObj(const std::string& name, Obj*& obj) const + { + return getValue(name, obj); + } + + template + NvDsInferStatus getValue(const std::string& name, Value& value) const + { + using ValueType = std::remove_const_t; + OptionType otype = oType::v; + void* ptr = nullptr; + auto status = getValuePtr(name, otype, ptr); + if (status == NVDSINFER_SUCCESS) { + assert(ptr); + value = *reinterpret_cast(ptr); + } + return status; + } + + template + NvDsInferStatus getValueArray(const std::string& name, std::vector& values) const + { + using ValueType = std::remove_const_t; + OptionType otype = oType::v; + uint32_t size = 0; + auto status = getArraySize(name, size); + if (status != NVDSINFER_SUCCESS) { + return status; + } + std::vector valuePtrs(size); + void** ptrBase = reinterpret_cast(valuePtrs.data()); + values.resize(size); + status = getRawPtrArray(name, otype, ptrBase, size); + if (status == NVDSINFER_SUCCESS) { + values.resize(size); + for (uint32_t i = 0; i < size; ++i) { + values[i] = *valuePtrs[i]; + } + } + return status; + } +}; + +template +constexpr OptionType IOptions::OTypeV::v; + +template +struct IOptions::oType : IOptions::OTypeV { +}; +template <> +struct IOptions::oType : IOptions::OTypeV { +}; +template <> +struct IOptions::oType : IOptions::OTypeV { +}; +template <> +struct IOptions::oType : IOptions::OTypeV { +}; +template <> +struct IOptions::oType : IOptions::OTypeV { +}; +template <> +struct IOptions::oType : IOptions::OTypeV { +}; + +} // namespace nvdsinferserver + +#endif // __NVDSINFERSERVER_I_OPTIONS_H__ \ No newline at end of file diff --git a/isaac_ros_triton/gxf/triton/nvds/include/infer_options.h b/isaac_ros_triton/gxf/triton/nvds/include/infer_options.h new file mode 100644 index 0000000..7b26cbc --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/infer_options.h @@ -0,0 +1,258 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2019-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef __NVDSINFERSERVER_OPTIONS_H__ +#define __NVDSINFERSERVER_OPTIONS_H__ + +#include + +#include +#include +#include + +#ifdef FOR_PRIVATE +#include "infer_common.h" +#include "infer_utils.h" +#else +inline void +dsInferLogPrint__(NvDsInferLogLevel level, const char* fmt, ...) +{ + va_list args; + va_start(args, fmt); + vprintf(fmt, args); + va_end(args); +} +#define safeStr(str) str.c_str() + +#endif + +namespace nvdsinferserver { + +class BufOptions; +using SharedBufOptions = std::shared_ptr; + +class BufOptions : public IOptions { +private: + struct D { + struct BasicValue { + union { + int64_t vInt64; + uint64_t vUint64; + double vDouble; + bool vBool; + void* vPtr; + } value; + OptionType type = OptionType::oNone; + std::string vStr; + template + inline void setV(const V& v, OptionType t) + { + *((V*)(void*)&value) = v; + this->type = t; + } + } vHead; + std::vector vArray; + }; + +public: + OptionType getType(const std::string& key) const override + { + const auto i = m_Fields.find(key); + return (i == m_Fields.end() ? OptionType::oNone : i->second.vHead.type); + } + bool hasValue(const std::string& key) const override + { + const auto i = m_Fields.find(key); + return (i == m_Fields.end() ? false : true); + } + uint32_t getCount() const final { return (uint32_t)m_Fields.size(); } + std::string getKey(uint32_t idx) const final + { + assert(idx < m_Fields.size()); + auto i = m_Fields.cbegin(); + std::advance(i, idx); + return i->first; + } + +private: + NvDsInferStatus getValuePtr(const std::string& key, OptionType t, void*& ptr) const override + { + assert(t != OptionType::oNone && t != OptionType::oArray); + auto d = getValueD(key, t); + RETURN_IF_FAILED( + d, NVDSINFER_INVALID_PARAMS, "failed to get pointer value:%s", safeStr(key)); + if (t == OptionType::oString) { + ptr = (void*)&(d->vHead.vStr); + } else { + ptr = (void*)&(d->vHead.value); + } + return NVDSINFER_SUCCESS; + } + + NvDsInferStatus getArraySize(const std::string& key, uint32_t& size) const override + { + auto d = getValueD(key, OptionType::oArray); + RETURN_IF_FAILED(d, NVDSINFER_INVALID_PARAMS, "failed to get array value:%s", safeStr(key)); + size = d->vArray.size(); + return NVDSINFER_SUCCESS; + } + + NvDsInferStatus getRawPtrArray( + const std::string& key, OptionType ot, void** ptrBase, uint32_t size) const override + { + auto d = getValueD(key, OptionType::oArray); + RETURN_IF_FAILED( + d, NVDSINFER_INVALID_PARAMS, "failed to get pointer array value:%s", safeStr(key)); + assert(size <= d->vArray.size()); + for (uint32_t i = 0; i < size; ++i) { + auto& each = d->vArray[i]; + assert(each.type != OptionType::oArray && each.type != OptionType::oNone); + RETURN_IF_FAILED( + each.type == ot, NVDSINFER_INVALID_PARAMS, + "query value type:%d doesn't match exact type:%d in array.", (int)ot, + (int)each.type); + if (ot == OptionType::oString) { + ptrBase[i] = (void*)&(each.vStr); + } else { + ptrBase[i] = (void*)&(each.value); + } + } + return NVDSINFER_SUCCESS; + } + + template + struct convertType { + }; + +public: + template + inline void setValue(const std::string& key, const T& v) + { + using t = typename convertType>>::t; + auto& field = m_Fields[key]; + field.vHead.setV(v, oType::v); + } + + template + inline void setValueArray(const std::string& key, const std::vector& values) + { + if (values.empty()) { + return; + } + using t = typename convertType>>::t; + auto& field = m_Fields[key]; + field.vHead.type = OptionType::oArray; + field.vArray = std::vector(values.size()); + for (size_t i = 0; i < values.size(); ++i) { + auto& data = field.vArray[i]; + data.setV(t(values[i]), oType::v); + } + } + +private: + const D* getValueD(const std::string& key, OptionType t) const + { + const auto i = m_Fields.find(key); + if (i == m_Fields.end()) { + InferError("BufOptions: No option:%s found.", safeStr(key)); + return nullptr; + } + if (i->second.vHead.type != t) { + InferError( + "BufOptions: get option:%s but type is not matched.", + safeStr(key)); + return nullptr; + } + return &(i->second); + } + + std::unordered_map m_Fields; +}; + +template <> +inline void +BufOptions::D::BasicValue::setV(const std::string& v, OptionType t) +{ + this->vStr = v; + assert(t == OptionType::oString); + this->type = t; +} + +template +struct BufOptions::convertType { + typedef std::remove_const_t* t; +}; +template <> +struct BufOptions::convertType { + typedef int64_t t; +}; +template <> +struct BufOptions::convertType { + typedef int64_t t; +}; +template <> +struct BufOptions::convertType { + typedef int64_t t; +}; +template <> +struct BufOptions::convertType { + typedef int64_t t; +}; +template <> +struct BufOptions::convertType { + typedef uint64_t t; +}; +template <> +struct BufOptions::convertType { + typedef uint64_t t; +}; +template <> +struct BufOptions::convertType { + typedef uint64_t t; +}; +template <> +struct BufOptions::convertType { + typedef uint64_t t; +}; +template <> +struct BufOptions::convertType { + typedef double t; +}; +template <> +struct BufOptions::convertType { + typedef double t; +}; +template <> +struct BufOptions::convertType { + typedef bool t; +}; +template <> +struct BufOptions::convertType { + typedef std::string t; +}; + +template // not supported +struct BufOptions::convertType> { +}; + +template // not supported +struct BufOptions::convertType> { +}; + +} // namespace nvdsinferserver + +#endif //__NVDSINFERSERVER_OPTIONS_H__ diff --git a/isaac_ros_triton/gxf/triton/nvds/include/infer_post_datatypes.h b/isaac_ros_triton/gxf/triton/nvds/include/infer_post_datatypes.h new file mode 100644 index 0000000..e6bfd14 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/infer_post_datatypes.h @@ -0,0 +1,110 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2020-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef __INFER_POST_DATATYPES_H__ +#define __INFER_POST_DATATYPES_H__ + +#include +#include + +/** + * Holds the information about one detected object. + */ +typedef struct { + /** Offset from the left boundary of the frame. */ + float left; + /** Offset from the top boundary of the frame. */ + float top; + /** Object width. */ + float width; + /** Object height. */ + float height; + /* Index for the object class. */ + int classIndex; + /* String label for the detected object. */ + char* label; + /* confidence score of the detected object. */ + float confidence; +} NvDsInferObject; + +/** + * Holds the information on all objects detected by a detector network in one + * frame. + */ +typedef struct { + /** Array of objects. */ + NvDsInferObject* objects; + /** Number of objects in the array. */ + unsigned int numObjects; +} NvDsInferDetectionOutput; + +#ifdef __cplusplus +/** + * Holds the information on all attributes classifed by a classifier network for + * one frame. + */ + +struct InferAttribute : NvDsInferAttribute { + /* NvDsInferAttribute::attributeLabel would be ignored */ + std::string safeAttributeLabel; +}; + +typedef struct { + /** Array of attributes. Maybe more than one depending on the number of + * output coverage layers (multi-label classifiers) */ + std::vector attributes; + /** String label for the classified output. */ + std::string label; +} InferClassificationOutput; + +#endif + +/** + * Holds the information parsed from segmentation network output for + * one frame. + */ +typedef struct { + /** Width of the output. Same as network width. */ + unsigned int width; + /** Height of the output. Same as network height. */ + unsigned int height; + /** Number of classes supported by the network. */ + unsigned int classes; + /** Pointer to the array for 2D pixel class map. The output for pixel (x,y) + * will be at index (y * width + x). */ + int* class_map; + /** Pointer to the raw array containing the probabilities. The probability + * for class c and pixel (x,y) will be at index (c * width *height + y * + * width + x). */ + float* class_probability_map; +} NvDsInferSegmentationOutput; + +struct TritonClassParams { + uint32_t topK = 0; + float threshold = 0.0f; + std::string tensorName; +}; + +#define INFER_SERVER_PRIVATE_BUF "@@NvInferServer" + +#define INFER_SERVER_DETECTION_BUF_NAME INFER_SERVER_PRIVATE_BUF "Detections" +#define INFER_SERVER_CLASSIFICATION_BUF_NAME \ + INFER_SERVER_PRIVATE_BUF "Classfications" +#define INFER_SERVER_SEGMENTATION_BUF_NAME \ + INFER_SERVER_PRIVATE_BUF "Segmentations" + +#endif diff --git a/isaac_ros_triton/gxf/triton/nvds/include/nvdsinfer.h b/isaac_ros_triton/gxf/triton/nvds/include/nvdsinfer.h new file mode 100644 index 0000000..b06bd3d --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/nvdsinfer.h @@ -0,0 +1,309 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2017-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +/** + * @file + * NVIDIA DeepStream inference specifications + * + * @b Description: This file defines common elements used in the API + * exposed by the Gst-nvinfer plugin. + */ + +/** + * @defgroup ee_nvinf Gst-infer API Common Elements + * + * Defines common elements used in the API exposed by the Gst-inference plugin. + * @ingroup NvDsInferApi + * @{ + */ + +#ifndef _NVDSINFER_H_ +#define _NVDSINFER_H_ + +#include + +#ifdef __cplusplus +extern "C" +{ +#endif + +#define NVDSINFER_MAX_DIMS 8 + +#define _DS_DEPRECATED_(STR) __attribute__ ((deprecated (STR))) + +/** + * Holds the dimensions of a layer. + */ +typedef struct +{ + /** Holds the number of dimesions in the layer.*/ + unsigned int numDims; + /** Holds the size of the layer in each dimension. */ + unsigned int d[NVDSINFER_MAX_DIMS]; + /** Holds the number of elements in the layer, including all dimensions.*/ + unsigned int numElements; +} NvDsInferDims; + +/** + * Holds the dimensions of a three-dimensional layer. + */ +typedef struct +{ + /** Holds the channel count of the layer.*/ + unsigned int c; + /** Holds the height of the layer.*/ + unsigned int h; + /** Holds the width of the layer.*/ + unsigned int w; +} NvDsInferDimsCHW; + +/** + * Specifies the data type of a layer. + */ +typedef enum +{ + /** Specifies FP32 format. */ + FLOAT = 0, + /** Specifies FP16 format. */ + HALF = 1, + /** Specifies INT8 format. */ + INT8 = 2, + /** Specifies INT32 format. */ + INT32 = 3 +} NvDsInferDataType; + +/** + * Holds information about one layer in the model. + */ +typedef struct +{ + /** Holds the data type of the layer. */ + NvDsInferDataType dataType; + /** Holds the dimensions of the layer. */ + union { + NvDsInferDims inferDims; + NvDsInferDims dims _DS_DEPRECATED_("dims is deprecated. Use inferDims instead"); + }; + /** Holds the TensorRT binding index of the layer. */ + int bindingIndex; + /** Holds the name of the layer. */ + const char* layerName; + /** Holds a pointer to the buffer for the layer data. */ + void *buffer; + /** Holds a Boolean; true if the layer is an input layer, + or false if an output layer. */ + int isInput; +} NvDsInferLayerInfo; + +/** + * Holds information about the model network. + */ +typedef struct +{ + /** Holds the input width for the model. */ + unsigned int width; + /** Holds the input height for the model. */ + unsigned int height; + /** Holds the number of input channels for the model. */ + unsigned int channels; +} NvDsInferNetworkInfo; + +/** + * Sets values on a @ref NvDsInferDimsCHW structure from a @ref NvDsInferDims + * structure. + */ +#define getDimsCHWFromDims(dimsCHW,dims) \ + do { \ + (dimsCHW).c = (dims).d[0]; \ + (dimsCHW).h = (dims).d[1]; \ + (dimsCHW).w = (dims).d[2]; \ + } while (0) + +#define getDimsHWCFromDims(dimsCHW,dims) \ + do { \ + (dimsCHW).h = (dims).d[0]; \ + (dimsCHW).w = (dims).d[1]; \ + (dimsCHW).c = (dims).d[2]; \ + } while (0) + +/** + * Holds information about one parsed object from a detector's output. + */ +typedef struct +{ + /** Holds the ID of the class to which the object belongs. */ + unsigned int classId; + + /** Holds the horizontal offset of the bounding box shape for the object. */ + float left; + /** Holds the vertical offset of the object's bounding box. */ + float top; + /** Holds the width of the object's bounding box. */ + float width; + /** Holds the height of the object's bounding box. */ + float height; + + /** Holds the object detection confidence level; must in the range + [0.0,1.0]. */ + float detectionConfidence; +} NvDsInferObjectDetectionInfo; + +/** + * A typedef defined to maintain backward compatibility. + */ +typedef NvDsInferObjectDetectionInfo NvDsInferParseObjectInfo; + +/** + * Holds information about one parsed object and instance mask from a detector's output. + */ +typedef struct +{ + /** Holds the ID of the class to which the object belongs. */ + unsigned int classId; + + /** Holds the horizontal offset of the bounding box shape for the object. */ + float left; + /** Holds the vertical offset of the object's bounding box. */ + float top; + /** Holds the width of the object's bounding box. */ + float width; + /** Holds the height of the object's bounding box. */ + float height; + + /** Holds the object detection confidence level; must in the range + [0.0,1.0]. */ + float detectionConfidence; + + /** Holds object segment mask */ + float *mask; + /** Holds width of mask */ + unsigned int mask_width; + /** Holds height of mask */ + unsigned int mask_height; + /** Holds size of mask in bytes*/ + unsigned int mask_size; +} NvDsInferInstanceMaskInfo; + +/** + * Holds information about one classified attribute. + */ +typedef struct +{ + /** Holds the index of the attribute's label. This index corresponds to + the order of output layers specified in the @a outputCoverageLayerNames + vector during initialization. */ + unsigned int attributeIndex; + /** Holds the the attribute's output value. */ + unsigned int attributeValue; + /** Holds the attribute's confidence level. */ + float attributeConfidence; + /** Holds a pointer to a string containing the attribute's label. + Memory for the string must not be freed. Custom parsing functions must + allocate strings on heap using strdup or equivalent. */ + char *attributeLabel; +} NvDsInferAttribute; + +/** + * Enum for the status codes returned by NvDsInferContext. + */ +typedef enum { + /** NvDsInferContext operation succeeded. */ + NVDSINFER_SUCCESS = 0, + /** Failed to configure the NvDsInferContext instance possibly due to an + * erroneous initialization property. */ + NVDSINFER_CONFIG_FAILED, + /** Custom Library interface implementation failed. */ + NVDSINFER_CUSTOM_LIB_FAILED, + /** Invalid parameters were supplied. */ + NVDSINFER_INVALID_PARAMS, + /** Output parsing failed. */ + NVDSINFER_OUTPUT_PARSING_FAILED, + /** CUDA error was encountered. */ + NVDSINFER_CUDA_ERROR, + /** TensorRT interface failed. */ + NVDSINFER_TENSORRT_ERROR, + /** Resource error was encountered. */ + NVDSINFER_RESOURCE_ERROR, + /** Triton error was encountered. Renamed TRT-IS to Triton. */ + NVDSINFER_TRITON_ERROR, + /** [deprecated]TRT-IS error was encountered */ + NVDSINFER_TRTIS_ERROR = NVDSINFER_TRITON_ERROR, + /** Unknown error was encountered. */ + NVDSINFER_UNKNOWN_ERROR +} NvDsInferStatus; + +/** + * Enum for the log levels of NvDsInferContext. + */ +typedef enum { + NVDSINFER_LOG_ERROR = 0, + NVDSINFER_LOG_WARNING, + NVDSINFER_LOG_INFO, + NVDSINFER_LOG_DEBUG, +} NvDsInferLogLevel; + +/** + * Get the string name for the status. + * + * @param[in] status An NvDsInferStatus value. + * @return String name for the status. Memory is owned by the function. Callers + * should not free the pointer. + */ +const char* NvDsInferStatus2Str(NvDsInferStatus status); + +#ifdef __cplusplus +} +#endif + +/* C++ data types */ +#ifdef __cplusplus + +/** + * Enum for selecting between minimum/optimal/maximum dimensions of a layer + * in case of dynamic shape network. + */ +typedef enum +{ + kSELECTOR_MIN = 0, + kSELECTOR_OPT, + kSELECTOR_MAX, + kSELECTOR_SIZE +} NvDsInferProfileSelector; + +/** + * Holds full dimensions (including batch size) for a layer. + */ +typedef struct +{ + int batchSize = 0; + NvDsInferDims dims = {0}; +} NvDsInferBatchDims; + +/** + * Extended structure for bound layer information which additionally includes + * min/optimal/max full dimensions of a layer in case of dynamic shape. + */ +struct NvDsInferBatchDimsLayerInfo : NvDsInferLayerInfo +{ + NvDsInferBatchDims profileDims[kSELECTOR_SIZE]; +}; + +#endif + +#endif + +/** @} */ diff --git a/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_common.proto b/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_common.proto new file mode 100644 index 0000000..529e460 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_common.proto @@ -0,0 +1,255 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2020-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +syntax = "proto3"; +package nvdsinferserver.config; + +enum MediaFormat { + MEDIA_FORMAT_NONE = 0; + IMAGE_FORMAT_RGB = 1; + IMAGE_FORMAT_BGR = 2; + IMAGE_FORMAT_GRAY = 3; +} + +enum TensorOrder { + TENSOR_ORDER_NONE = 0; + TENSOR_ORDER_LINEAR = 1; + TENSOR_ORDER_NHWC = 2; +} + +enum TensorDataType { + TENSOR_DT_NONE = 0; + TENSOR_DT_FP32 = 1; + TENSOR_DT_FP16 = 2; + TENSOR_DT_INT8 = 3; + TENSOR_DT_INT16 = 4; + TENSOR_DT_INT32 = 5; + TENSOR_DT_UINT8 = 6; + TENSOR_DT_UINT16 = 7; + TENSOR_DT_UINT32 = 8; + TENSOR_DT_FP64 = 9; + TENSOR_DT_INT64 = 10; + TENSOR_DT_UINT64 = 11; + TENSOR_DT_STRING = 12; +} + +enum FrameScalingHW { + FRAME_SCALING_HW_DEFAULT = 0; + FRAME_SCALING_HW_GPU = 1; + FRAME_SCALING_HW_VIC = 2; +} + +/** Tensor memory type + */ +enum MemoryType { + MEMORY_TYPE_DEFAULT = 0; + MEMORY_TYPE_CPU = 1; + MEMORY_TYPE_GPU = 2; +} + +/** Custom lib for preload */ +message CustomLib { + /** Path point to the custom library */ + string path = 1; +} + +/** Network Input layer information */ +message InputLayer { + /** input tensor name, optional*/ + string name = 1; + /** fixed inference shape, only required when backend has wildcard shape */ + repeated int32 dims = 2; + /** tensor data type, optional. default TENSOR_DT_NONE */ + TensorDataType data_type = 3; +} + +/** Network Onput layer information */ +message OutputLayer { + /** output tensor name */ + string name = 1; + /** set max buffer bytes for output tensor */ + uint64 max_buffer_bytes = 2; +} + + +/** preprocessing settings */ +message PreProcessParams { + /** Input data normalization settings */ + message ScaleNormalize + { + /** Normalization factor to scale the input pixels with. */ + float scale_factor = 1; + /** Per channel offsets for mean subtraction. This is an alternative to + * the mean image file. The number of offsets in the array should be + * exactly equalto the number of input channels. + */ + repeated float channel_offsets = 2; + /** Path to the mean image file (PPM format). Resolution of the file + * should be equal to the network input resolution. + */ + string mean_file = 3; + } + /** Network input format */ + MediaFormat network_format = 1; + /** Network input tensor order */ + TensorOrder tensor_order = 2; + /** preprocessing data set to network tensor name */ + string tensor_name = 3; + /** Indicating if aspect ratio should be maintained when scaling to + * network resolution. Right/bottom areas will be filled with black areas. */ + int32 maintain_aspect_ratio = 4; + /** Compute hardware to use for scaling frames / objects. */ + FrameScalingHW frame_scaling_hw = 5; + /** Interpolation filter to use while scaling. Refer to + * NvBufSurfTransform_Inter for supported filter values. */ + uint32 frame_scaling_filter = 6; + /** Preprocessing methods */ + oneof preprocess_method { + /** usual scaling normalization for images */ + ScaleNormalize normalize = 7; + } + /** Indicating if symmetric padding should be used or not while scaling + * to network resolution. Bottom-right padding is used by default. */ + int32 symmetric_padding = 8; +} + +/** Deepstream Detection settings */ +message DetectionParams { + /** non-maximum-suppression cluster method */ + message Nms + { + /** detection score less this threshold would be rejected */ + float confidence_threshold = 1; + /** IOU threshold */ + float iou_threshold = 2; + /** top kth detection results to keep after nms. 0), keep all */ + int32 topk = 3; + } + + /** DBScan object clustering */ + message DbScan { + /** Bounding box detection threshold. */ + float pre_threshold = 1; + // float post_threshold = 2; + /** Epsilon to control merging of overlapping boxes */ + float eps = 3; + /** Minimum boxes in a cluster to be considered an object */ + int32 min_boxes = 4; + /** Minimum score in a cluster for it to be considered as an object */ + float min_score = 5; + } + + /** cluster method based on grouping rectangles*/ + message GroupRectangle { + /** detection score less this threshold would be rejected */ + float confidence_threshold = 1; + /** how many bbox can be clustered together */ + int32 group_threshold = 2; + /** Epsilon to control merging of overlapping boxes */ + float eps = 3; + } + + /** simple cluster method for confidence filter */ + message SimpleCluster + { + /** detection score less this threshold would be rejected */ + float threshold = 1; + } + + /** specific parameters controled per class*/ + message PerClassParams { + /** pre-threshold used for filter out confidence less than the value */ + float pre_threshold = 1; + } + + /** Number of classes detected by a detector network. */ + int32 num_detected_classes = 1; + /** Per class detection parameters. key-value is for + * */ + map per_class_params = 2; + /** Name of the custom bounding box function in the custom library. */ + string custom_parse_bbox_func = 3; + + /** cluster methods for bbox, choose one only */ + oneof clustering_policy { + /** non-maximum-suppression, reserved, not supported yet */ + Nms nms = 4; + /** DbScan clustering parameters */ + DbScan dbscan = 5; + /** grouping rectagules */ + GroupRectangle group_rectangle = 6; + /** simple threshold filter */ + SimpleCluster simple_cluster = 7; + } +} + +/** Deepstream Classifciation settings */ +message ClassificationParams { + /** classifciation threshold */ + float threshold = 1; + /** custom function for classification parsing */ + string custom_parse_classifier_func = 2; +} + +/** Deepstream segmentation settings */ +message SegmentationParams { + /** reserved field */ + float threshold = 1; +} + +/** Other Network settings, need application to do postprocessing */ +message OtherNetworkParams { + /** reserved field */ + string type_name = 1; +} + +/** Triton classifcation settings */ +message TritonClassifyParams +{ + /** top k classification results */ + uint32 topk = 1; + /** classifciation threshold */ + float threshold = 2; + /** [optional] specify which output tensor is used for triton classification.*/ + string tensor_name = 3; +} + +/** Network LSTM Parameters */ +message LstmParams { + /** init constant value for lstm input tensors, usually zero or one */ + message InitConst { + /** const value */ + float value = 1; + } + /** LSTM loop information */ + message LstmLoop { + /** input tensor name */ + string input = 1; + /** output tensor name */ + string output = 2; + /** initialize input tensor for first frame */ + oneof init_state { + /** init const value, default is zero */ + InitConst init_const = 3; + } + /** enable if need keep lstm output tensor data for application output + * parsing, it's disabled by default */ + bool keep_output = 4; + } + repeated LstmLoop loops = 1; +} + diff --git a/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_config.proto b/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_config.proto new file mode 100644 index 0000000..d666901 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_config.proto @@ -0,0 +1,216 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2019-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +syntax = "proto3"; + +import "nvdsinferserver_common.proto"; + +package nvdsinferserver.config; + +/** Post-processing settings */ +message PostProcessParams { + /** label file path. It relative to config file path if value is not + * absoluate path + */ + string labelfile_path = 1; + + /** post-process can only have one of the following types*/ + oneof process_type + { + /** deepstream detection parameters */ + DetectionParams detection = 2; + /** deepstream classification parameters */ + ClassificationParams classification = 3; + /** deepstream segmentation parameters */ + SegmentationParams segmentation = 4; + /** deepstream other postprocessing parameters */ + OtherNetworkParams other = 5; + /** [deprecated] TRT-IS classification parameters */ + TritonClassifyParams trtis_classification = 6; + /** Triton classification parameters, replacing trtis_classification */ + TritonClassifyParams triton_classification = 7; + } +} + +/** Triton models repo settings */ +message TritonModelRepo +{ + /** Triton backend config settings */ + message BackendConfig { + /** backend name */ + string backend = 1; + /** backend setting name */ + string setting = 2; + /** backend setting value */ + string value = 3; + } + + /** Cuda Memory settings for GPU device */ + message CudaDeviceMem { + /** GPU device id */ + uint32 device = 1; + /** Cuda Memory Pool byte size */ + uint64 memory_pool_byte_size = 2; + } + + /** root directory for all models + * All models should set same @a root value */ + repeated string root = 1; + /** log verbose level, the larger the more logs output + * (0): ERROR; + * (1): WARNING; + * (2): INFO + * (3+): VERBOSE Level + */ + uint32 log_level = 2; + + /** enable strict model config + * true: config.pbtxt must exsit. + * false: Triton try to figure model's config file, it may cause failure on + * different input/output dims. + */ + bool strict_model_config = 3; + /** tensorflow gpu memory fraction, default 0.0 */ + float tf_gpu_memory_fraction = 4; + /** tensorflow soft placement, allowed by default */ + bool tf_disable_soft_placement = 5; + /** minimum compute capacity, + * dGPU: default 6.0; Jetson: default 5.3. + */ + float min_compute_capacity = 6; + /** triton backends directory */ + string backend_dir = 7; + /** triton model control mode, select from + * "none": load all models in 'root' repo at startup once. + * "explicit": load/unload models by 'TritonParams'. + * If value is empty, will use "explicit" by default. + */ + string model_control_mode = 8; + + /** Triton server reserved cuda memory size for each device. + * If device not added, will use Triton runtime's default memory size 256MB. + * If \a cuda_memory_pool_byte_size is set 0, plugin will not reserve cuda + * memory on that device. + */ + repeated CudaDeviceMem cuda_device_memory = 9; + /** Triton server reserved pinned memory size during initialization */ + oneof pinned_mem { + uint64 pinned_memory_pool_byte_size = 10; + } + + repeated BackendConfig backend_configs = 11; +} + +message TritonGrpcParams { + string url = 1; +} + +/** Triton inference backend parameters */ +message TritonParams { + /** trt-is model name */ + string model_name = 1; + /** model version, -1 is for latest version, required */ + int64 version = 2; + /** Triton classifications, optional */ + repeated TritonClassifyParams class_params = 3; + + oneof server { + /** trt-is server model repo, all models must have same @a model_repo */ + TritonModelRepo model_repo = 4; + TritonGrpcParams grpc = 5; + } +} + +/** Network backend Settings */ +message BackendParams { + /** input tensors settings, optional */ + repeated InputLayer inputs = 1; + /** outputs tensor settings, optional */ + repeated OutputLayer outputs = 2; + + /** inference framework */ + oneof infer_framework + { + /** [deprecated] TRT-IS inference framework. Use triton instead of trt_is */ + TritonParams trt_is = 3; + /** Triton inference framework */ + TritonParams triton = 4; + } + + /** Output tensor memory type. + * Default: MEMORY_TYPE_DEFAULT, it is Triton preferred memory type. + */ + MemoryType output_mem_type = 5; +} + +/** extrac controls */ +message ExtraControl { + /** enable if need copy input tensor data for application output parsing, + * it's disabled by default */ + bool copy_input_to_host_buffers = 1; + /** defined how many buffers allocated for output tensors in the pool. + * Optional, default is 2, the value can be in range [2, 10+] */ + int32 output_buffer_pool_size = 2; + /** custom function to create a specific processor IInferCustomProcessor. + * e.g. custom_process_funcion: CreateCustomProcessor */ + string custom_process_funcion = 3; +} + +/** Input tensor is preprocessed */ +message InputTensorFromMeta +{ + /** first dims is not a batch-size*/ + bool is_first_dim_batch = 1; +} + +/** Inference configuration */ +message InferenceConfig { + /** unique id, larger than 0, required for multiple models inference */ + uint32 unique_id = 1; + /** gpu id settings. Optional. support single gpu only at this time + * default values [0] */ + repeated int32 gpu_ids = 2; + /** max batch size. Required, can be reset by plugin */ + uint32 max_batch_size = 3; + /** inference backend parameters. required */ + BackendParams backend = 4; + /** preprocessing for tensors, required */ + oneof preprocessing { + PreProcessParams preprocess = 5; + InputTensorFromMeta input_tensor_from_meta = 10; + } + /** postprocessing for all tensor data, required */ + oneof postprocessing { + PostProcessParams postprocess = 6; + } + /** Custom libs for tensor output parsing or preload, optional */ + CustomLib custom_lib = 7; + + /** advanced controls as optional */ + oneof advanced + { + /** extrac controls */ + ExtraControl extra = 8; + } + + /** LSTM controller */ + oneof lstm_control { + /** LSTM parameters */ + LstmParams lstm = 9; + } +} + diff --git a/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_plugin.proto b/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_plugin.proto new file mode 100644 index 0000000..7da0d81 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/include/nvdsinferserver_plugin.proto @@ -0,0 +1,137 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2019-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +syntax = "proto3"; +package nvdsinferserver.config; + +import "nvdsinferserver_config.proto"; + +/** Plugin Control settings for input / inference / output */ +message PluginControl { + + /** Color values for Red/Green/Blue/Alpha, all values are in range [0, 1] */ + message Color { + /** Red color value */ + float r = 1; + /** Green color value */ + float g = 2; + /** Blue color value */ + float b = 3; + /** Alpha color value */ + float a = 4; + } + + /** Boudingbox filter */ + message BBoxFilter { + /** Boudingbox minimum width */ + uint32 min_width = 1; + /** Boudingbox minimum height */ + uint32 min_height = 2; + /** Boudingbox maximum width */ + uint32 max_width = 3; + /** Boudingbox maximum height */ + uint32 max_height = 4; + } + + /** Detection of classes filter */ + message DetectClassFilter { + /** Detection Bounding box filter */ + BBoxFilter bbox_filter = 1; + /** Offset of the RoI from the top of the frame. Only objects within the + * RoI are output */ + uint32 roi_top_offset = 2; + /** Offset of the RoI from the bottom of the frame. Only objects within the + * RoI are output */ + uint32 roi_bottom_offset = 3; + + /** Specify border color for detection bounding boxes */ + Color border_color = 4; + /** Specify background color for detection bounding boxes */ + Color bg_color = 5; + } + + /** Output detection results control */ + message OutputDetectionControl { + /** Default detection classes filter */ + DetectClassFilter default_filter = 1; + /** specifies detection filters per class instead of default filter */ + map specific_class_filters = 2; + } + + /** Input objects control */ + message InputObjectControl { + /** Input bounding box of objects filter */ + BBoxFilter bbox_filter = 1; + } + + /** Processing Mode */ + enum ProcessMode { + /** Processing Default Mode */ + PROCESS_MODE_DEFAULT = 0; + /** Processing Full Frame Mode */ + PROCESS_MODE_FULL_FRAME = 1; + /** Processing Object Clipping Mode */ + PROCESS_MODE_CLIP_OBJECTS = 2; + } + + /** Plugin input data control policy */ + message InputControl { + /** Processing mode setting, optional */ + ProcessMode process_mode = 1; + /** Unique ID of the GIE on whose metadata (bounding boxes) this GIE is to + * operate on. It is used for secondary GIE only. */ + int32 operate_on_gie_id = 2; + /** Class IDs of the parent GIE on which this GIE is to operate on. + * It is used for secondary GIE only. */ + repeated int32 operate_on_class_ids = 3; + /** Specifies the number of consecutive, batches to be skipped for + * inference. Default is 0. */ + uint32 interval = 4; + /** Enables inference on detected objects and asynchronous metadata + * attachments. Works only when tracker-id is valid. It's used for + * classifier with secondary GIE only. */ + bool async_mode = 5; + + /** Input object filter policy */ + oneof object_filter { + /** input object control settings */ + InputObjectControl object_control = 6; + } + } + + /** Plugin output data control policy */ + message OutputControl { + /* Enable attaching inference output tensor metadata */ + bool output_tensor_meta = 1; + /* Postprocessing control policy */ + oneof postprocess_control { + /* Detection results filter */ + OutputDetectionControl detect_control = 2; + } + oneof ClassifierMeta { + /* Classifier type of a particular nvinferserver component. */ + string classifier_type = 3; + } + } + + /** Low-level libnvds_infer_server inference configuration settings */ + InferenceConfig infer_config =1; + /** Control plugin input buffers, object filter before inference */ + InputControl input_control = 2; + /** Control plugin output meta data after inference */ + OutputControl output_control = 3; +} diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbuf_fdmap.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbuf_fdmap.so new file mode 100755 index 0000000..e66376b --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbuf_fdmap.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8fbdc0e54bdcb7266aacbe2cb1a3ea8f2f8a60f45feccc94afeffa05130e54c2 +size 13144 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbufsurface.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbufsurface.so new file mode 100755 index 0000000..4871bdb --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbufsurface.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f02d6cf91551377c2388011b8af31b0d659315df7727b3bc4413faaaf4070ee +size 688648 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbufsurftransform.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbufsurftransform.so new file mode 100755 index 0000000..ee7457b --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvbufsurftransform.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2dba4b0fa4f0d2a22ddab5b19fe34e7ebbd566c636631840d6ea3878ce2b86d9 +size 23005768 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_infer_server.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_infer_server.so new file mode 100755 index 0000000..38d475e --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_infer_server.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82c8eb2b7e666f63015f4235edc819d709fed850e2cdb30ee9910eb87c780285 +size 7722944 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_inferlogger.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_inferlogger.so new file mode 100755 index 0000000..f77e8e1 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_inferlogger.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6e74f0c2f8aab1fff93b2998a4a300daf92799077b0a1404ec41bd0d6ee377a +size 14312 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_inferutils.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_inferutils.so new file mode 100755 index 0000000..541077d --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_jetpack502/libnvds_inferutils.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de706e13e20882b7a3df32a6ad2c14817c418c27c57f54904f3124f5f2b21721 +size 100768 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbuf_fdmap.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbuf_fdmap.so new file mode 100755 index 0000000..40f2267 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbuf_fdmap.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67699773596e1b5273478e5d1dad6570643ae0fb4fc3f9a8a2bdf73af015ed93 +size 23560 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbufsurface.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbufsurface.so new file mode 100755 index 0000000..cef46dc --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbufsurface.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f50f7409918603b0b1728d38f4ace3839580b00b292937595a437e973540d0e1 +size 35160 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbufsurftransform.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbufsurftransform.so new file mode 100755 index 0000000..f00c6b4 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvbufsurftransform.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a1d583f66519a0a148ff10f9a33f1eaf56b246fbee1d55a77f5b9efde71f4dcc +size 88543584 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_infer_server.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_infer_server.so new file mode 100644 index 0000000..69a0a28 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_infer_server.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3070aa1751ebc74922015d1aa76313000dcb225f2047c049f6903c1424934c37 +size 15256528 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_inferlogger.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_inferlogger.so new file mode 100755 index 0000000..6c59752 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_inferlogger.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ceeb966a8c4ca53357535e696c1e77d0ce7cdea7f92467e3db700275e8f2ba1 +size 22928 diff --git a/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_inferutils.so b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_inferutils.so new file mode 100755 index 0000000..d3714f3 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/nvds/lib/gxf_x86_64_cuda_11_8/libnvds_inferutils.so @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f06cbf5f6a099f8b9a0fc0f6a9f563a4b63f1cb739a64b47e555059308e1602 +size 150552 diff --git a/isaac_ros_triton/gxf/triton/triton_ext.cpp b/isaac_ros_triton/gxf/triton/triton_ext.cpp new file mode 100644 index 0000000..6530a1d --- /dev/null +++ b/isaac_ros_triton/gxf/triton/triton_ext.cpp @@ -0,0 +1,70 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#include "gxf/core/component.hpp" +#include "gxf/std/codelet.hpp" +#include "gxf/std/extension_factory_helper.hpp" +#include "gxf/std/scheduling_term.hpp" + +#include "inferencers/triton_inferencer_impl.hpp" +#include "inferencers/triton_inferencer_interface.hpp" + +#include "triton_inference_request.hpp" +#include "triton_inference_response.hpp" +#include "triton_options.hpp" +#include "triton_scheduling_terms.hpp" +#include "triton_server.hpp" + + +GXF_EXT_FACTORY_BEGIN() + GXF_EXT_FACTORY_SET_INFO(0xa3c95d1cc06c4a4e, 0xa2f98d9078ab645c, + "NvTritonExt", + "Nvidia Triton Inferencing and Utilities Extension: 2.26.0 (x86_64), " + "2.30.0 (L4T - Jetpack 5.1)", + "NVIDIA", + "0.1.0", "LICENSE"); + GXF_EXT_FACTORY_ADD(0x26228984ffc44162, 0x9af56e3008aa2982, + nvidia::triton::TritonServer, + nvidia::gxf::Component, + "Triton Server Component for Direct Inference."); + GXF_EXT_FACTORY_ADD(0x1661c0156b1c422d, 0xa6f0248cdc197b1a, + nvidia::triton::TritonInferencerInterface, + nvidia::gxf::Component, + "Triton Inferencer Interface where specific Direct, Remote " + "or IPC inferencers can implement."); + GXF_EXT_FACTORY_ADD(0xb84cf267b2234df5, 0xac82752d9fae1014, + nvidia::triton::TritonInferencerImpl, + nvidia::triton::TritonInferencerInterface, + "Triton Inferencer that uses the Triton C API. Requires " + "Triton Server Component."); + GXF_EXT_FACTORY_ADD(0x34395920232c446f, 0xb5b746f642ce84df, + nvidia::triton::TritonInferenceRequest, + nvidia::gxf::Codelet, + "Triton Inference Request Codelet that wraps Triton Implementation."); + GXF_EXT_FACTORY_ADD(0x4dd957a7aa554117, 0x90d39a98e31ee176, + nvidia::triton::TritonInferenceResponse, + nvidia::gxf::Codelet, + "Triton Inference Response Codelet that wraps Triton Implementation."); + GXF_EXT_FACTORY_ADD_0(0x087696ed229d4199, 0x876f05b92d3887f0, + nvidia::triton::TritonOptions, + "Triton Inference Options for model control and sequence control."); + GXF_EXT_FACTORY_ADD(0xf860241212424e43, 0x9dbf9c559d496b84, + nvidia::triton::TritonRequestReceptiveSchedulingTerm, + nvidia::gxf::SchedulingTerm, + "Triton Scheduling Term that schedules Request Codelet when the inferencer " + "can accept a new request."); +GXF_EXT_FACTORY_END() diff --git a/isaac_ros_triton/gxf/triton/triton_inference_request.cpp b/isaac_ros_triton/gxf/triton/triton_inference_request.cpp new file mode 100644 index 0000000..62e4670 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/triton_inference_request.cpp @@ -0,0 +1,102 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#include +#include +#include + +#include "gxf/std/tensor.hpp" +#include "gxf/std/timestamp.hpp" + +#include "triton_inference_request.hpp" +#include "triton_options.hpp" + +namespace nvidia { +namespace triton { + +gxf_result_t TritonInferenceRequest::start() { + auto result = inferencer_.get()->construct(); + if (input_tensor_names_.get().size() == 0) { + GXF_LOG_ERROR("At least one input tensor is needed."); + return GXF_FAILURE; + } + if (input_tensor_names_.get().size() != input_binding_names_.get().size()) { + GXF_LOG_ERROR("Mismatching number of input tensor names and bindings: %lu vs %lu", + input_tensor_names_.get().size(), input_binding_names_.get().size()); + return GXF_FAILURE; + } + if (input_tensor_names_.get().size() != rx_.get().size()) { + GXF_LOG_ERROR("Mismatching number of input tensor names and receivers: %lu vs %lu", + input_tensor_names_.get().size(), rx_.get().size()); + return GXF_FAILURE; + } + if (rx_.get().size() == 0) { + GXF_LOG_ERROR("At least one receiver is needed."); + return GXF_FAILURE; + } + return result; +} + +gxf_result_t TritonInferenceRequest::tick() { + // Create a new entity that will serve as a tensor map for the model inputs + auto inputs_tensor_map = nvidia::gxf::Entity::New(context()); + if (!inputs_tensor_map) { + return nvidia::gxf::ToResultCode(inputs_tensor_map); + } + + auto& receivers = rx_.get(); + auto& binding_names = input_binding_names_.get(); + auto& tensor_names = input_tensor_names_.get(); + + std::vector input_entities; + std::vector input_names; + + auto maybe_output_timestamp = inputs_tensor_map.value().add("timestamp"); + if (!maybe_output_timestamp) { + return nvidia::gxf::ToResultCode(maybe_output_timestamp); + } + + for (size_t input_index = 0; input_index < receivers.size(); input_index++) { + auto& receiver = receivers[input_index]; + auto maybe_message = receiver->receive(); + if (!maybe_message) { + return nvidia::gxf::ToResultCode(maybe_message); + } + + // ensure entity includes tensor as input + auto& tensor_name = tensor_names[input_index]; + auto maybe_tensor_incoming = maybe_message.value().get( + tensor_name.c_str()); + if (!maybe_tensor_incoming) { + GXF_LOG_ERROR("Unable to find Tensor with name '%s'", tensor_name.c_str()); + return nvidia::gxf::ToResultCode(maybe_tensor_incoming); + } + + input_entities.push_back(maybe_message.value()); + input_names.push_back(binding_names[input_index]); + } + + return inferencer_.get()->inferAsync(input_entities, input_names); +} + +gxf_result_t TritonInferenceRequest::stop() { + auto result = inferencer_.get()->destruct(); + return result; +} + +} // namespace triton +} // namespace nvidia diff --git a/isaac_ros_triton/gxf/triton/triton_inference_request.hpp b/isaac_ros_triton/gxf/triton/triton_inference_request.hpp new file mode 100644 index 0000000..356e746 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/triton_inference_request.hpp @@ -0,0 +1,102 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef NVIDIA_TRITON_TRITON_INFERENCE_REQUEST_HPP +#define NVIDIA_TRITON_TRITON_INFERENCE_REQUEST_HPP + +#include +#include + +#include "gxf/core/entity.hpp" +#include "gxf/core/expected.hpp" +#include "gxf/core/handle.hpp" +#include "gxf/std/codelet.hpp" +#include "gxf/std/parameter_parser_std.hpp" +#include "gxf/std/receiver.hpp" + +#include "inferencers/triton_inferencer_interface.hpp" + +namespace nvidia { +namespace triton { + +/** + * @brief Triton Inference Request that wraps generic TritonInferencer implementation. + * + * @details The Entity which holds this Codelet must also have TritonModelInput(s). + * + */ +class TritonInferenceRequest : public nvidia::gxf::Codelet { + public: + /** + * @brief Register Parameters + * + * @param registrar + * @return gxf_result_t + */ + gxf_result_t registerInterface(nvidia::gxf::Registrar* registrar) override { + nvidia::gxf::Expected result; + + result &= registrar->parameter(inferencer_, "inferencer", + "Inferencer Implementation", + "TritonInferenceInterface Inferencer Implementation Handle"); + result &= registrar->parameter(rx_, "rx", "Receivers", + "List of receivers to take input tensors"); + result &= registrar->parameter(input_tensor_names_, "input_tensor_names", "Input Tensor Names", + "Names of input tensors that exist in the ordered receivers in 'rx'."); + result &= registrar->parameter(input_binding_names_, "input_binding_names", + "Input Triton Binding Names", + "Names of input bindings corresponding to Triton's Config Inputs in the same order of " + "what is provided in 'input_tensor_names'."); + + return nvidia::gxf::ToResultCode(result); + } + + /** + * @brief Prepare TritonInferencerInterface + * + * @return gxf_result_t + */ + gxf_result_t start() override; + + /** + * @brief Receive tensors of TritonModelInput(s), create Tensor Map, submit + * inference request asynchronously. + * + * @return gxf_result_t + */ + gxf_result_t tick() override; + + /** + * @brief Destroys inferencer of type TritonInferencerInterface + * + * @return gxf_result_t + */ + gxf_result_t stop() override; + + private: + nvidia::gxf::Parameter> + inferencer_; + + nvidia::gxf::Parameter> input_tensor_names_; + nvidia::gxf::Parameter> input_binding_names_; + gxf::Parameter>> rx_; +}; + +} // namespace triton +} // namespace nvidia + +#endif diff --git a/isaac_ros_triton/gxf/triton/triton_inference_response.cpp b/isaac_ros_triton/gxf/triton/triton_inference_response.cpp new file mode 100644 index 0000000..2342fe6 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/triton_inference_response.cpp @@ -0,0 +1,135 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#include + +#include "gxf/std/tensor.hpp" +#include "gxf/std/timestamp.hpp" + +#include "triton_inference_response.hpp" +#include "triton_options.hpp" + +namespace nvidia { +namespace triton { + +gxf_result_t TritonInferenceResponse::start() { + if (!inferencer_.get()) { + GXF_LOG_ERROR("Inferencer unavailable"); + return GXF_FAILURE; + } + if (output_tensor_names_.get().size() == 0) { + GXF_LOG_ERROR("At least one output tensor is needed."); + return GXF_FAILURE; + } + if (output_tensor_names_.get().size() != output_binding_names_.get().size()) { + GXF_LOG_ERROR("Mismatching number of output tensor names and bindings: %lu vs %lu", + output_tensor_names_.get().size(), output_binding_names_.get().size()); + return GXF_FAILURE; + } + return GXF_SUCCESS; +} + +gxf_result_t TritonInferenceResponse::tick() { + // This getResponse() call is expected to be a blocking statement. + auto maybe_response = inferencer_.get()->getResponse(); + if (!maybe_response) { + return nvidia::gxf::ToResultCode(maybe_response); + } + + // Create a new entity for model output which will hold a Tensor Map + auto maybe_output_tensor_map = nvidia::gxf::Entity::New(context()); + if (!maybe_output_tensor_map) { + return nvidia::gxf::ToResultCode(maybe_output_tensor_map); + } + + auto& bindings = output_binding_names_.get(); + auto& tensors = output_tensor_names_.get(); + + // Implementation will return a tensor map with Triton Bindings. We need to translate that to the + // expected GXF Tensor names. + for (size_t output_index = 0; output_index < bindings.size(); output_index++) { + auto& tensor_name = tensors[output_index]; + auto maybe_tensor = maybe_output_tensor_map.value().add( + tensor_name.c_str()); + if (!maybe_tensor) { + return nvidia::gxf::ToResultCode(maybe_tensor); + } + + auto& triton_binding = bindings[output_index]; + auto maybe_response_tensor = maybe_response.value().get( + triton_binding.c_str()); + if (!maybe_response_tensor) { + GXF_LOG_ERROR("Unable to find tensor response for binding '%s'", triton_binding.c_str()); + return nvidia::gxf::ToResultCode(maybe_response_tensor); + } + + // Move incoming response tensor to the tensor that will be transmitted. There is no better way + // of redistributing tensors to varying sources with copy. + *(maybe_tensor.value().get()) = std::move(*(maybe_response_tensor.value().get())); + + // For String data, we need to publish nvidia::gxf::Shape so the serialized data can be + // interpreted correctly. + auto maybe_response_tensor_shape = maybe_response.value().get( + triton_binding.c_str()); + if (maybe_response_tensor_shape) { + auto maybe_shape = maybe_output_tensor_map.value().add( + tensor_name.c_str()); + if (!maybe_shape) { + return nvidia::gxf::ToResultCode(maybe_shape); + } + + *(maybe_shape.value().get()) = std::move(*(maybe_response_tensor_shape.value().get())); + } + } + + // Forward Triton Options so that consumer understands sequence id, end of sequence, etc. + auto maybe_input_triton_option = maybe_response.value().get(); + if (maybe_input_triton_option) { + auto maybe_output_triton_option = + maybe_output_tensor_map.value().add(); + if (!maybe_output_triton_option) { + return nvidia::gxf::ToResultCode(maybe_output_triton_option); + } + // Move incoming TritonOption from receiver to the outgoing tensor. + // NOTE: This modifies the incoming Entity tensor component via the move. + *(maybe_output_triton_option.value().get()) = + std::move(*(maybe_input_triton_option.value().get())); + } + + nvidia::gxf::Expected result = nvidia::gxf::Unexpected{GXF_FAILURE}; + + auto maybe_timestamp = maybe_response.value().get("timestamp"); + if (!maybe_timestamp) { + result = tx_.get()->publish(maybe_output_tensor_map.value()); + } else { + result = tx_.get()->publish(maybe_output_tensor_map.value(), maybe_timestamp.value()->acqtime); + } + + if (!result) { + GXF_LOG_ERROR("Error when transmitting message output tensor map"); + return nvidia::gxf::ToResultCode(result); + } + + return GXF_SUCCESS; +} + +gxf_result_t TritonInferenceResponse::stop() { + return GXF_SUCCESS; +} + +} // namespace triton +} // namespace nvidia diff --git a/isaac_ros_triton/gxf/triton/triton_inference_response.hpp b/isaac_ros_triton/gxf/triton/triton_inference_response.hpp new file mode 100644 index 0000000..70568f0 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/triton_inference_response.hpp @@ -0,0 +1,100 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef NVIDIA_TRITON_TRITON_INFERENCE_RESPONSE_HPP +#define NVIDIA_TRITON_TRITON_INFERENCE_RESPONSE_HPP + +#include +#include + +#include "gxf/core/entity.hpp" +#include "gxf/core/expected.hpp" +#include "gxf/core/handle.hpp" +#include "gxf/std/codelet.hpp" +#include "gxf/std/transmitter.hpp" + +#include "inferencers/triton_inferencer_interface.hpp" + +namespace nvidia { +namespace triton { + +/** + * @brief Triton Inference Response that wraps generic TritonInferencer implementation. + * + * @details The Entity which holds this Codelet must also have TritonModelOutput(s). + * + */ +class TritonInferenceResponse : public nvidia::gxf::Codelet { + public: + /** + * @brief Register Parameters. + * + * @param registrar + * @return gxf_result_t + */ + gxf_result_t registerInterface(nvidia::gxf::Registrar* registrar) override { + nvidia::gxf::Expected result; + + result &= registrar->parameter(inferencer_, "inferencer", + "Inferencer Implementation", + "TritonInferenceInterface Inferencer Implementation Handle"); + result &= registrar->parameter(output_tensor_names_, "output_tensor_names", + "Output Tensor Names", + "Names of output tensors in the order to be retrieved from the model."); + result &= registrar->parameter(output_binding_names_, "output_binding_names", + "Output Binding Names", + "Names of output bindings in the model in the same " + "order of of what is provided in output_tensor_names."); + result &= registrar->parameter(tx_, "tx", "TX", "Transmitter to publish output tensors"); + return nvidia::gxf::ToResultCode(result); + } + + /** + * @brief Return success. + * + * @return gxf_result_t + */ + gxf_result_t start() override; + + /** + * @brief Gets Response from Inferencer and transmits output tensors respectively + * to TritonModelOutput(s) Transmitters + * + * @return gxf_result_t + */ + gxf_result_t tick() override; + + /** + * @brief Return success. + * + * @return gxf_result_t + */ + gxf_result_t stop() override; + + private: + nvidia::gxf::Parameter> + inferencer_; + + nvidia::gxf::Parameter> tx_; + nvidia::gxf::Parameter> output_tensor_names_; + nvidia::gxf::Parameter> output_binding_names_; +}; + +} // namespace triton +} // namespace nvidia + +#endif diff --git a/isaac_ros_triton/gxf/triton/triton_scheduling_terms.cpp b/isaac_ros_triton/gxf/triton/triton_scheduling_terms.cpp new file mode 100644 index 0000000..3380fb2 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/triton_scheduling_terms.cpp @@ -0,0 +1,49 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#include "triton_scheduling_terms.hpp" + +namespace nvidia { +namespace triton { + +gxf_result_t TritonRequestReceptiveSchedulingTerm::initialize() { + if (!inferencer_.get()) { + GXF_LOG_ERROR("Inferencer unavailable"); + return GXF_FAILURE; + } + return GXF_SUCCESS; +} + +gxf_result_t TritonRequestReceptiveSchedulingTerm::check_abi(int64_t timestamp, + nvidia::gxf::SchedulingConditionType* type, int64_t* target_timestamp) const { + auto maybe_is_accepting_request = inferencer_.get()->isAcceptingRequest(); + if (!maybe_is_accepting_request) { + GXF_LOG_ERROR("Inference isAcceptingRequest had unexpected return"); + return GXF_FAILURE; + } + const auto& is_accepting_request = maybe_is_accepting_request.value(); + *type = is_accepting_request ? nvidia::gxf::SchedulingConditionType::READY : + nvidia::gxf::SchedulingConditionType::WAIT; + return GXF_SUCCESS; +} + +gxf_result_t TritonRequestReceptiveSchedulingTerm::onExecute_abi(int64_t dt) { + return GXF_SUCCESS; +} + +} // namespace triton +} // namespace nvidia diff --git a/isaac_ros_triton/gxf/triton/triton_scheduling_terms.hpp b/isaac_ros_triton/gxf/triton/triton_scheduling_terms.hpp new file mode 100644 index 0000000..87f2c19 --- /dev/null +++ b/isaac_ros_triton/gxf/triton/triton_scheduling_terms.hpp @@ -0,0 +1,90 @@ +// SPDX-FileCopyrightText: NVIDIA CORPORATION & AFFILIATES +// Copyright (c) 2021-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef NVIDIA_TRITON_TRITON_SCHEDULING_TERMS_HPP +#define NVIDIA_TRITON_TRITON_SCHEDULING_TERMS_HPP + +#include +#include +#include + +#include "gxf/core/handle.hpp" +#include "gxf/std/scheduling_term.hpp" + +#include "inferencers/triton_inferencer_interface.hpp" + +namespace nvidia { +namespace triton { + +/** + * @brief Scheduling term which permits execution only when the Triton Request Component is able to + * accept new requests. + * + */ +class TritonRequestReceptiveSchedulingTerm : public nvidia::gxf::SchedulingTerm { + public: + /** + * @brief Register Parameters. + * + * @param registrar + * @return gxf_result_t + */ + gxf_result_t registerInterface(nvidia::gxf::Registrar* registrar) override { + nvidia::gxf::Expected result; + + result &= registrar->parameter(inferencer_, "inferencer", + "Inferencer Implementation", + "TritonInferenceInterface Inferencer Implementation Handle"); + return nvidia::gxf::ToResultCode(result); + } + + /** + * @brief Returns success + * + * @return gxf_result_t + */ + gxf_result_t initialize() override; + + /** + * @brief Only when an inferencer can accept new inference request, SchedulingCondition is + * Ready + * + * @param[in] timestamp Unused + * @param[out] type Ready (if inferencer can accept new inference request) or Wait (otherwise) + * @param[out] target_timestamp Unmodified + * @return gxf_result_t + */ + gxf_result_t check_abi(int64_t timestamp, nvidia::gxf::SchedulingConditionType* type, + int64_t* target_timestamp) const override; + + /** + * @brief Returns success + * + * @param dt Unused + * @return gxf_result_t + */ + gxf_result_t onExecute_abi(int64_t dt) override; + + private: + nvidia::gxf::Parameter> + inferencer_; +}; + +} // namespace triton +} // namespace nvidia + +#endif diff --git a/isaac_ros_triton/package.xml b/isaac_ros_triton/package.xml index efa77fb..c9deb3f 100644 --- a/isaac_ros_triton/package.xml +++ b/isaac_ros_triton/package.xml @@ -21,7 +21,7 @@ isaac_ros_triton - 0.20.0 + 0.30.0 DNN Inference support for Isaac ROS CY Chen @@ -37,6 +37,8 @@ isaac_ros_nitros isaac_ros_nitros_tensor_list_type + isaac_ros_common + ament_lint_auto ament_lint_common isaac_ros_test diff --git a/isaac_ros_triton/src/triton_node.cpp b/isaac_ros_triton/src/triton_node.cpp index 418b263..7986d75 100644 --- a/isaac_ros_triton/src/triton_node.cpp +++ b/isaac_ros_triton/src/triton_node.cpp @@ -50,10 +50,10 @@ constexpr char APP_YAML_FILENAME[] = "config/triton_node.yaml"; constexpr char PACKAGE_NAME[] = "isaac_ros_triton"; const std::vector> EXTENSIONS = { - {"isaac_ros_nitros", "gxf/std/libgxf_std.so"}, - {"isaac_ros_nitros", "gxf/cuda/libgxf_cuda.so"}, - {"isaac_ros_nitros", "gxf/serialization/libgxf_serialization.so"}, - {"isaac_ros_nitros", "gxf/triton/libgxf_triton_ext.so"} + {"isaac_ros_gxf", "gxf/lib/std/libgxf_std.so"}, + {"isaac_ros_gxf", "gxf/lib/cuda/libgxf_cuda.so"}, + {"isaac_ros_gxf", "gxf/lib/serialization/libgxf_serialization.so"}, + {"isaac_ros_triton", "gxf/triton/libgxf_triton_ext.so"} }; const std::vector PRESET_EXTENSION_SPEC_NAMES = { "isaac_ros_triton", diff --git a/isaac_ros_triton/test/isaac_ros_triton_test_onnx.py b/isaac_ros_triton/test/isaac_ros_triton_test_onnx.py index 3ba3273..e5dfdca 100644 --- a/isaac_ros_triton/test/isaac_ros_triton_test_onnx.py +++ b/isaac_ros_triton/test/isaac_ros_triton_test_onnx.py @@ -31,7 +31,7 @@ @pytest.mark.rostest def generate_test_description(): - """Generate launch description with all Triton ROS2 nodes for testing.""" + """Generate launch description with all Triton ROS 2 nodes for testing.""" # Loads and runs mobilenetv2-1.0 dir_path = os.path.dirname(os.path.realpath(__file__)) model_dir = dir_path + '/../../test/models' diff --git a/isaac_ros_triton/test/isaac_ros_triton_test_tf.py b/isaac_ros_triton/test/isaac_ros_triton_test_tf.py index 69c0c2e..648a259 100644 --- a/isaac_ros_triton/test/isaac_ros_triton_test_tf.py +++ b/isaac_ros_triton/test/isaac_ros_triton_test_tf.py @@ -32,7 +32,7 @@ @pytest.mark.rostest def generate_test_description(): - """Generate launch description with all Triton ROS2 nodes for testing.""" + """Generate launch description with all Triton ROS 2 nodes for testing.""" # Loads and runs a simple tensorflow model dir_path = os.path.dirname(os.path.realpath(__file__)) model_dir = dir_path + '/../../test/models' @@ -45,6 +45,7 @@ def generate_test_description(): parameters=[{ 'model_name': 'simple_triton_tf', 'model_repository_paths': [model_dir], + 'max_batch_size': 8, 'input_binding_names': ['input_1'], 'output_binding_names': ['add'], 'input_tensor_names': ['input'], diff --git a/resources/pipeline.png b/resources/graph.png similarity index 100% rename from resources/pipeline.png rename to resources/graph.png diff --git a/resources/isaac_ros_dnn_inference_nodegraph.png b/resources/isaac_ros_dnn_inference_nodegraph.png new file mode 100644 index 0000000..c5b3c7e --- /dev/null +++ b/resources/isaac_ros_dnn_inference_nodegraph.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7bee2d49ba0e9273bf7f537da16c9d9aefb859ba50d7ebc8d5b7d4b0daeab514 +size 27461 diff --git a/resources/isaac_ros_dnn_inference_peoplesemsegnet.jpg b/resources/isaac_ros_dnn_inference_peoplesemsegnet.jpg new file mode 100644 index 0000000..1c100ca --- /dev/null +++ b/resources/isaac_ros_dnn_inference_peoplesemsegnet.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f98cae81768f159d5be32c2fcef740752d50d9f4c8b07cad85b6b4145b5a796e +size 183818 diff --git a/resources/isaac_ros_dnn_peoplenet.jpg b/resources/isaac_ros_dnn_peoplenet.jpg new file mode 100644 index 0000000..44ba004 --- /dev/null +++ b/resources/isaac_ros_dnn_peoplenet.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:79f5751685d79660fe755a18e9e785dfb0e97785c3532336a5c03784f7a6e68c +size 505228