This repo contains deep learning inference nodes and camera/video streaming nodes for ROS/ROS2 with support for Jetson Nano/TX1/TX2/Xavier NX/AGX Xavier and TensorRT.
The nodes use the image recognition, object detection, and semantic segmentation DNN's from the jetson-inference
library and NVIDIA Hello AI World tutorial, which come with several built-in pretrained networks for classification, detection, and segmentation and the ability to load customized user-trained models.
The camera/video streaming nodes support the following input/output interfaces:
- MIPI CSI cameras
- V4L2 cameras
- RTP / RTSP
- Videos & Images
- Image sequences
- OpenGL windows
ROS Melodic and ROS2 Eloquent are supported, and the latest version of JetPack is recommended.
First, install the latest version of JetPack on your Jetson.
Then, follow the steps below to install the needed components on your Jetson.
These ROS nodes use the DNN objects from the jetson-inference
project (aka Hello AI World). To build and install jetson-inference, see this page or run the commands below:
$ cd ~
$ sudo apt-get install git cmake
$ git clone --recursive https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ mkdir build
$ cd build
$ cmake ../
$ make -j$(nproc)
$ sudo make install
$ sudo ldconfig
Before proceeding, it's worthwhile to test that jetson-inference
is working properly on your system by following this step of the Hello AI World tutorial:
Install the ros-melodic-ros-base
or ros-eloquent-ros-base
package on your Jetson following these directions:
- ROS Melodic - ROS Install Instructions
- ROS2 Eloquent - ROS2 Install Instructions
Depending on which version of ROS you're using, install some additional dependencies and create a workspace:
$ sudo apt-get install ros-melodic-image-transport ros-melodic-vision-msgs
For ROS Melodic, create a Catkin workspace (~/ros_workspace
) using these steps:
http://wiki.ros.org/ROS/Tutorials/InstallingandConfiguringROSEnvironment#Create_a_ROS_Workspace
$ sudo apt-get install ros-eloquent-vision-msgs \
ros-eloquent-launch-xml \
ros-eloquent-launch-yaml \
python3-colcon-common-extensions
For ROS Eloquent, create a workspace (~/ros_workspace
) to use:
$ mkdir -p ~/ros2_example_ws/src
Next, navigate into your ROS workspace's src
directory and clone ros_deep_learning
:
$ cd ~/ros_workspace/src
$ git clone https://github.com/dusty-nv/ros_deep_learning
Then build it - if you are using ROS Melodic, use catkin_make
. If you are using ROS2 Eloquent, use colcon build
:
$ cd ~/ros_workspace/
# ROS Melodic
$ catkin_make
$ source devel/setup.bash
# ROS2 Eloquent
$ colcon build
$ source install/local_setup.bash
The nodes should now be built and ready to use. Remember to source the overlay as shown above so that ROS can find the nodes.
Before proceeding, if you're using ROS Melodic make sure that roscore
is running first:
$ roscore
If you're using ROS2, running the core service is no longer required.
First, it's recommended to test that you can stream a video feed using the video_source
and video_output
nodes. See Camera Streaming & Multimedia for valid input/output streams, and substitute your desired input
and output
argument below. For example, you can use video files for the input or output, or use V4L2 cameras instead of MIPI CSI cameras. You can also use RTP/RTSP streams over the network.
# ROS Melodic
$ roslaunch ros_deep_learning video_viewer.ros1.launch input:=csi://0 output:=display://0
# ROS2 Eloquent
$ ros2 launch ros_deep_learning video_viewer.ros2.launch input:=csi://0 output:=display://0
You can launch a classification demo with the following commands - substitute your desired camera or video path to the input
argument below (see here for valid input/output streams).
Note that the imagenet
node also publishes classification metadata on the imagenet/classification
topic in a vision_msgs/Detection2DArray
message -- see the Topics & Parameters section below for more info.
# ROS Melodic
$ roslaunch ros_deep_learning imagenet.ros1.launch input:=csi://0 output:=display://0
# ROS2 Eloquent
$ ros2 launch ros_deep_learning imagenet.ros2.launch input:=csi://0 output:=display://0
To launch an object detection demo, substitute your desired camera or video path to the input
argument below (see here for valid input/output streams). Note that the detectnet
node also publishes the metadata in a vision_msgs/Detection2DArray
message -- see the Topics & Parameters section below for more info.
# ROS Melodic
$ roslaunch ros_deep_learning detectnet.ros1.launch input:=csi://0 output:=display://0
# ROS2 Eloquent
$ ros2 launch ros_deep_learning detectnet.ros2.launch input:=csi://0 output:=display://0
To launch a semantic segmentation demo, substitute your desired camera or video path to the input
argument below (see here for valid input/output streams). Note that the segnet
node also publishes raw segmentation results to the segnet/class_mask
topic -- see the Topics & Parameters section below for more info.
# ROS Melodic
$ roslaunch ros_deep_learning segnet.ros1.launch input:=csi://0 output:=display://0
# ROS2 Eloquent
$ ros2 launch ros_deep_learning segnet.ros2.launch input:=csi://0 output:=display://0
Below are the message topics and parameters that each node implements.
Topic Name | I/O | Message Type | Description |
---|---|---|---|
image_in | Input | sensor_msgs/Image |
Raw input image |
classification | Output | vision_msgs/Classification2D |
Classification results (class ID + confidence) |
vision_info | Output | vision_msgs/VisionInfo |
Vision metadata (class labels parameter list name) |
overlay | Output | sensor_msgs/Image |
Input image overlayed with the classification results |
Parameter Name | Type | Default | Description |
---|---|---|---|
model_name | string |
"googlenet" |
Built-in model name (see here for valid values) |
model_path | string |
"" |
Path to custom caffe or ONNX model |
prototxt_path | string |
"" |
Path to custom caffe prototxt file |
input_blob | string |
"data" |
Name of DNN input layer |
output_blob | string |
"prob" |
Name of DNN output layer |
class_labels_path | string |
"" |
Path to custom class labels file |
class_labels_HASH | vector<string> |
class names | List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic) |
Topic Name | I/O | Message Type | Description |
---|---|---|---|
image_in | Input | sensor_msgs/Image |
Raw input image |
detections | Output | vision_msgs/Detection2DArray |
Detection results (bounding boxes, class IDs, confidences) |
vision_info | Output | vision_msgs/VisionInfo |
Vision metadata (class labels parameter list name) |
overlay | Output | sensor_msgs/Image |
Input image overlayed with the detection results |
Parameter Name | Type | Default | Description |
---|---|---|---|
model_name | string |
"ssd-mobilenet-v2" |
Built-in model name (see here for valid values) |
model_path | string |
"" |
Path to custom caffe or ONNX model |
prototxt_path | string |
"" |
Path to custom caffe prototxt file |
input_blob | string |
"data" |
Name of DNN input layer |
output_cvg | string |
"coverage" |
Name of DNN output layer (coverage/scores) |
output_bbox | string |
"bboxes" |
Name of DNN output layer (bounding boxes) |
class_labels_path | string |
"" |
Path to custom class labels file |
class_labels_HASH | vector<string> |
class names | List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic) |
overlay_flags | string |
"box,labels,conf" |
Flags used to generate the overlay (some combination of none,box,labels,conf ) |
mean_pixel_value | float |
0.0 | Mean pixel subtraction value to be applied to input (normally 0) |
threshold | float |
0.5 | Minimum confidence value for positive detections (0.0 - 1.0) |
Topic Name | I/O | Message Type | Description |
---|---|---|---|
image_in | Input | sensor_msgs/Image |
Raw input image |
vision_info | Output | vision_msgs/VisionInfo |
Vision metadata (class labels parameter list name) |
overlay | Output | sensor_msgs/Image |
Input image overlayed with the classification results |
color_mask | Output | sensor_msgs/Image |
Colorized segmentation class mask out |
class_mask | Output | sensor_msgs/Image |
8-bit single-channel image where each pixel is a classID |
Parameter Name | Type | Default | Description |
---|---|---|---|
model_name | string |
"fcn-resnet18-cityscapes-1024x512" |
Built-in model name (see here for valid values) |
model_path | string |
"" |
Path to custom caffe or ONNX model |
prototxt_path | string |
"" |
Path to custom caffe prototxt file |
input_blob | string |
"data" |
Name of DNN input layer |
output_blob | string |
"score_fr_21classes" |
Name of DNN output layer |
class_colors_path | string |
"" |
Path to custom class colors file |
class_labels_path | string |
"" |
Path to custom class labels file |
class_labels_HASH | vector<string> |
class names | List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic) |
mask_filter | string |
"linear" |
Filtering to apply to color_mask topic (linear or point ) |
overlay_filter | string |
"linear" |
Filtering to apply to overlay topic (linear or point ) |
overlay_alpha | float |
180.0 |
Alpha blending value used by overlay topic (0.0 - 255.0) |
Topic Name | I/O | Message Type | Description |
---|---|---|---|
raw | Output | sensor_msgs/Image |
Raw output image (BGR8) |
Parameter | Type | Default | Description |
---|---|---|---|
resource | string |
"csi://0" |
Input stream URI (see here for valid protocols) |
codec | string |
"" |
Manually specify codec for compressed streams (see here for valid values) |
width | int |
0 | Manually specify desired width of stream (0 = stream default) |
height | int |
0 | Manually specify desired height of stream (0 = stream default) |
framerate | int |
0 | Manually specify desired framerate of stream (0 = stream default) |
loop | int |
0 | For video files: 0 = don't loop, >0 = # of loops, -1 = loop forever |
Topic Name | I/O | Message Type | Description |
---|---|---|---|
image_in | Input | sensor_msgs/Image |
Raw input image |
Parameter | Type | Default | Description |
---|---|---|---|
resource | string |
"display://0" |
Output stream URI (see here for valid protocols) |
codec | string |
"h264" |
Codec used for compressed streams (see here for valid values) |
bitrate | int |
4000000 | Target VBR bitrate of encoded streams (in bits per second) |