-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unusual performance when working with large messages #2139
Comments
I forgot to mention, I also strayed from the ros2 tutorial by creating the publisher and subscriber with SensorDataQoS. |
For debugging I've attached some merged logs here. I set the tracing level to merged_dual_listeners.log In both runs I started up the listener(s) before the talker. |
Excellent. I like it when people try things instead of sticking to the script 🙂 and then you can discover interesting things, because the improvement
doesn't quite fit with anything I remember having seen 😀 I do have a guess. I have slowly come to the conclusion that message loss with best-effort on modern networks and loopback networks practically always means packets/datagrams getting lost on the receiving side, and with best-effort, losing one packet means the entire point cloud is lost. On the receiving side, we have two places where you lose data (that I am aware of):
So my guess is that the publisher is pushing out the datagrams faster than the subscriber can pick them up and buffer them, or that the subscriber (often?) starts processing incoming datagrams too late. Either way you can overrun a receive buffer, especially if the receive buffer is substantially smaller than the point cloud. The default socket receive buffer is often too small for point clouds — if you haven't seen https://docs.ros.org/en/jazzy/How-To-Guides/DDS-tuning.html yet, then it could be as simple as that. It could also be that processing a point cloud once it has been received in its entirety takes enough time for the next point cloud to already overrun the receive buffer. (That depends on sizes, bandwidth, intervals ...) If that's the case, then moving the delivery of the data off the thread that reads from the socket should help ("asynchronous deliver" in Cyclone, it can be turned on by setting A third option is that scheduling is in your way. I don't think it is the case for you, but if you have 100 threads that are runnable, who's to say the thread that reads from the socket gets priority unless you give it priority? Those are explanations for why you might be losing data. By themselves they are not enough for why adding a subscriber improves the situation. For that I think I need to assume you're using unicast: because then a second subscribing process would cause the publisher to write each packet twice, slowing down the rate at which the packet arrive at a single subscriber. If I am not mistaken, ROS 2 defaults its discovery setting to With that said, looking at the log:
we see that multicast is disabled, and that the two readers (in the
indeed do request the receive the data at different addresses (
which is substantially less than the ~5MB that you mention. That all fits with my guess that it is simply overflowing the buffer and the slowing down caused by having to send everything twice improves the matter. Why Fast does better I don't know for sure, but I suspect the reason is that they default to using a (proprietary) shared memory-based transport in a single machine, and so they never touch the socket buffers in any meaningful way. Cyclone can do shared-memory, too, by off-loading the traffic to Eclipse Iceoryx, but you need to configure it. (It would actually be nice to have a tighter integration with a shared memory transport, but that's for another day.) |
Hi @eboasson, thanks for the response! I am on the same team as @dsobek and just wanted to chime in here regarding tuning. When we install our software on users' computers, we install a file into
I confirmed with @dsobek that this is being set properly via this file on his machine, so I don't believe it's the system receive buffer |
Setting the I do think this behavior where two subscribers improves the performance when the receive buffer is too small is interesting though, and I'm not sure I agree with your unicast theory.
Unless there is something I misunderstood, I don't think the publisher sending messages in unicast mode changed the subscriber receiving rate since the bandwidth suggests it's receiving messages at 100hz. But thanks for the detailed response! That config change unblocks me. |
Great!
This is the rate at which you send/receive application messages, I was considering the rate at which Cyclone sends/receives UDP datagrams. A single datagram is at most (roughly) 64kB (less in the default config, 14720B according to the log you provided earlier), so 4.5MB means sending a great many datagrams for each application message. If you look at the networking statistics of the kernel ( |
Hello @eboasson I am currently working with ROS2 Humble on Nvidia Jetson Orin NX 16GB, trying to setup a multimedia streaming pipeline to a GPU Server and I'm experiencing the same behaviour as @dsobek described in his initial issue post. My SetupFor Networking I am using an Intel E810 network card with 25 Gbit/s Transceivers, and with iperf3 I get about 13 Gbit/s UDP throughput (limited by CPU Bottlenecks on the Jetson) with 0% package loss.
My cyclone DDS is already set up with the following configuration, including the
ExperimentWith this configuration, I am trying to stream raw images from an Intel RealSense D435I to my GPU Server.
Subscribing to the depth image on the receiving side works fine, but subscribing to the RGB image (which is higher resolution), the framerate is really low (fluctuating between 10 to 20 FPS). I already tried increasing and decreasing Are there maybe any other parameters in the cyclone DDS I can tweak to potentially reach higher framerates with a single subscriber? Looking forward to your reply! |
Hi, I am getting some strange results when working with large pointcloud messages on a ros2 topic.
With a minimal publisher and subscriber (following this ros tutorial), sending a ~5MB message at 10Hz the listener is picking up maybe %10-20 of messages, however if I startup another subscriber on the same topic, both subscribers will pick up pretty much everything. In this case I was running this code on a VM with 6 cores and 8 GB of RAM, however with more resources, the same effect occurs when adding another subscriber, the only difference is that fewer messages are dropped by the single subscriber. At even higher frequencies I've experienced a single subscriber picking up 0 messages, however two subscribers will still pick up pretty much every message.
How is it that adding subscribers onto a high bandwidth topic improves performance?
I've also run this example using fastdds where most messages are recieved with a single subscriber and adding another subscriber doesn't appear to affect performance.
The text was updated successfully, but these errors were encountered: