-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with using gl-based processers #9334
Comments
Hi @dgrnbrg The GLSL system does not provide a noticable performance advantage when used on low-end computing devices. This is described by the documentation in the link below that advises on the pros and cons of GLSL and when to use it. It is unlikely to therefore be worthwhile to continue to debug it on your Raspberry Pi. It can be less processing-intensive to use the instruction RS2_PROJECT_COLOR_PIXEL_TO_DEPTH_PIXEL to convert a single pixel in the color frame to a depth pixel, instead of aligning the entire image. |
Hi Marty, thank you for that advice about glsl on low end devices. For the individual pixel projection, would that be useful if I need the entire aligned pair of images? Or is that specifically useful when I may only need specific depths? The software package I’m feeding this to downstream requires aligned depth and color images. Alternatively, a few other ideas/questions:
I know this is a lot of questions, but I am highly motivated to get the realsense usable on a raspberry pi for real time localization, using a nearby computer via wireless for the heavier computations. I just want to get the pi to be able to stream sufficient info to accomplish that :) |
If you are aiming to transfer the camera data to another computer to perform the heavy computation, Intel's open-source networking system may be an appropriate solution, as it is based around using a Pi 4 as the remote computing device that the camera is attached to and a more powerful computer such as a laptop as the central host machine. The paper that describes the networking system in the above link is based on ethernet cabling but the paper states that it could be used with a wi-fi connection. Bear in mind though that there will be some limitations in supported resolution / FPS modes over a networking connection compared to accessing a camera directly. In the recent RealSense SDK version 2.48.0 an example program for using GLSL with data from a networked remote camera was also described in the release notes for that version. |
Using the networking approach would be really good, except that I am also using a couple cores on the Raspberry Pi to do some local processing as well, for when the Wifi is spotty. Can I use the networking system and simultaneously use the camera frames locally, or is it an either/or situation? |
A chart in the paper shows the CPU utilization across all cores for a selection of different configurations. It estimates around 50% utilization if running at VGA resolution (640x480) at 30 FPS. whilst using a low resolution would reduce utilization. |
Thank you, that seems like a better choice to try VGA. I'd also like to know whether I can use |
I'm not certain on this question. My expectation with a non-networked application would normally be that if a particular stream type is accessed by a process ('claimed') then another process could not access the same stream. This is in accordance with the rules of the SDK's multi streaming model described in the link below. A practical example of these principles is that if you enable the depth stream in the RealSense Viewer then another application that was launched afterwards and requested the depth stream could not access it. The reverse is also true - if the Viewer was launched secondly then it could not access the same depth stream if another application was already currently using it. But if you had two cameras then if a stream is accessed on one camera then depth would still be available on the other camera, because it is a specific stream on a specific camera that is claimed when that stream is enabled. |
In that case, I don't think that I'll be able to use the Given that, what about these following approaches to improve the performance:
If none of those are straightforward, do you know of another small SBC that there's documented success with running RTabMap and object detection/tracking (for instance, the Up2, Jetson Nano, or a particular NUC)? My other concern, if I change platforms, is being able to purchase it (thanks, chippageddon), and being able to fit it onto the robot chassis (i.e. very small). |
If you need the entire image to be aligned then using alignment instead of converting a single pixel with RS2_PROJECT_COLOR_PIXEL_TO_DEPTH_PIXEL would be the appropriate approach. Alignment is a processing-intensive operation, so unless you have access to graphics acceleration (other than GLSL) then it may be difficult to avoid experiencing slowdown on your Pi when aligning. If you were able to replace the Raspberry Pi with an Nvidia Jetson Nano board then you would have access to the librealsense SDK's support of CUDA acceleration of alignment, pointclouds and color conversion due to the Nvidia graphics GPU chip on Jetson boards (CUDA is an Nvidia-only feature). A Nano board is affordable in price and small in size at 70 x 45 mm, and Jetson boards are especially suited to vision computing and AI applications. In regard to scaling down depth resolution, depth scene complexity can be reduced with post-processing by using a Decimation Filter. Post-processing takes place on the computing hardware instead of in the camera hardware, so there can be a CPU % usage cost to doing so. https://dev.intelrealsense.com/docs/post-processing-filters#section-decimation-filter I do not have knowledge about achieving enhancement using the Neon* architecture of the Pi's Arm CPU. The OpenVINO Toolkit vision computing platform, which is compatible with Raspberry Pi, has optimizations for Neon though. https://medium.com/sclable/intel-openvino-with-opencv-f5ad03363a38 A guide about RealSense installation on Raspberry Pi at the link below affirms that the CMake build flag -DBUILD_WITH_OPENMP=ON can be used on Pi 4 to enable usage of multiple cores in librealsense. The official SDK notes about the OpenMP flag state: "When enabled, YUY to RGB conversion and Depth-Color spatial alignment will take advantage of multiple-cores using OpenMP. This can reduce latency at expense of greater CPU utilization". |
@MartyG-RealSense, thank you for this wealth of information. One last question before I buy some more hardware and modify some builds--do you have any references to the framerates folks have achieved with realsense-ros on the nano? I'm also looking at the LattePanda Alpha, which seems leagues faster and may be a more surefire choice. |
You should be able to achieve 30 FPS on a Nano, as demonstrated in the tutorial article in the link below. I say 'should' because of the number of factors in ROS that can affect performance. |
Hi @dgrnbrg Do you require further assistance with this case, please? Thanks! |
Case closed due to no further comments received. |
Issue Description
Hi, I'm using realsense-ros to use the D435i on a raspberry pi for SLAM, object detection, and mapping. I've been running into major performance issues with realsense-ros (see IntelRealSense/realsense-ros#1929 for the debugging so far), which I've determined is actually due to
rs2::align
being too slow on the ARMv7 architecture. I tried changing realsense-ros to users2::gl::align
instead to improve performance; however, the OpenGL-based processing block is failing to initialize correctly. I'm looking for help in understanding what the issue is, or even how to debug it. At this point, I believe I've got the system to a point where I can see in a debugger where the glsl-based aligner segfaults (and I've included the backtrace below). I'll now describe what I've done so far:First, I rebuilt librealsense2 with
-DBUILD_GLSL_EXTENSIONS=true
, and then in ros-realsense, I added#include <librealsense2-gl/rs_processing_gl.hpp>
torealsense_node_factory.h
, and changedbase_realsense_node.cpp
to users2::gl::align
. I didn't see a performance increase, but I determined that this is probably due to the "backup" node structure that automatically falls back to the CPU version inrs-gl.cpp
in librealsense, so I hacked that out with this patch:I validated that
rs-gl
could run, which required coaxing b/c the RPi4 has OpenGL ES, which wasn't passing the OpenGL version checking. I addressed this by exportingMESA_GL_VERSION_OVERRIDE=3.0
andMESA_GLSL_VERSION_OVERRIDE=130
, and I sawrs-gl
run and display an image.Next, I convinced
roslaunch realsense2_camera rs_camera.launch align_depth:=true
to start by adding those environment variables, as well asexport LD_PRELOAD=/usr/local/lib/librealsense2-gl.so.2.45
.At this point, I was seeing an unexpected crash at startup in the nodelet manager, so I added
launch-prefix="xterm -e gdb --args"
to the crashing nodelet manager in order to get a backtrace of the crash site. I'm pretty sure that the RPi4's GPU is capable of running glsl-based aligner, but I need help understanding why thers2::options::set_option
/rs2::pointcloud::map_to
is getting called by the align code.The text was updated successfully, but these errors were encountered: