Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use dedicated executor for component nodes #1774

Closed
gezp opened this issue Sep 15, 2021 · 11 comments
Closed

use dedicated executor for component nodes #1774

gezp opened this issue Sep 15, 2021 · 11 comments
Assignees

Comments

@gezp
Copy link
Contributor

gezp commented Sep 15, 2021

Feature request

Feature description

Hi, guys

I'm working on project Reduce ROS2 Nodes and Determinism of OSPP2021 (under the mentor of Steve Macenski).

Currently, i'm trying to support composed bringup for Nav2 stack, and i have done some works about manually composed bringup (Compile-time composition) of Nav2, which performs better than Normal bringup , in addition, i find a problem that a large multi-threaded executor consumes higher cpu(increase 30%-50%) than a bunch of single-threaded executors , you can find some details here: https://discourse.ros.org/t/nav2-composition/22175.

for Run-time composition , i notice that rclcpp_components only provides component_container which use SingleThreadedExecutor for all component nodes in container and component_container_mt which use MultiThreadedExecutor.

how about supporting a bunch of SinglethreadedExecutor for component_container ? in other word, create a SingleThreadedExecutor with dedicated thread for each component nodes.

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Sep 15, 2021

This should be a parameterized option with the defaults for the current behavior of a single single-threaded executor. If a parameter, for instance dedicated_thread is set to true, when a new component is added, it is added to its own executor instead of a single one for all components.

I don't think this is clear that that is the objective best solution, but it is a useful option to make available based on the needs and discussions in https://discourse.ros.org/t/nav2-composition/22175. In that ticket, we bring up analysis showing that N-single threaded executors are far more computationally efficient than a single N-multithreaded executor. We'd like to be able to have this N-single threaded container option available to us so we can do dynamic composed bring up in Nav2 rather than having to manually compose them in a process. That's not very flexible and configurable and afterall Nav2 is a framework rather than a standalone solution, do dynamically loading is more in line of what we need.

We could of course create our own container to use locally, but this has clear reuse as people want to compose their systems into a single process. Companies I spoke with expressed interest in this as well, as they are currently manually composing their systems because they cannot dynamically compose them to use N-single threaded executors that they need. And it is a single parameter and code change (much simpler than what @gezp proposes above).

We do not suggest any changes to the multi-threaded executor container, nor are we suggesting to merge them into a single component manager, that would be feature creep and it makes sense for those to be separate executables. Please disregard those parts of this requested change, Zhengpeng is thinking a little bigger than I think this problem warrants.

What we do suggest is the following:

  • Add a parameter, such as dedicated_thread for the container to load the components into a single executor if false and into individual dedicated executor threads if true. Default to false for current behavior.
  • On the loading of component nodes, use that value to either create a new executor to add the node to or add it to the single main executor
  • On unloading of component nodes, if the parameter is true, we will remove that node from the executor and delete it along with the component

This will allow for the current behavior as well as the option to load each into their own executor thread, which is precisely what we're doing in manual composition (as well as some companies we spoke to that gave us that idea).

@fujitatomoya what do you think about that suggestion?

@fujitatomoya
Copy link
Collaborator

thanks for the information 👍

sorry for answering questions with question, https://discourse.ros.org/t/nav2-composition/22175 start navigation is result based on the same workload in the same time window with the same number of threads? (for MultiThreadedExecutor and Multiple SingleThreadedExecutor)

for example, application has a workload to finish.

Executor # of threads CPU Elapsed Time
MultiThreadedExecutor 12 43% 0.75T seconds
Multiple SingleThreadedExecutor 9 34% T seconds

if the result is something like above, i do not see the difference in performance. for me, it is just trading-off.
in this case, maybe we want to have an ability to configure the number of threads for MultiThreadedExecutor? (something like #1708?)

what do you think? if i am missing anything, please let me know!

@gezp
Copy link
Contributor Author

gezp commented Sep 18, 2021

i make a simple package to test. i create a publisher and subscriber on a topic with message geometry_msgs::msg::PolygonStamped (the num of points is: 10000, publishing rate is 100hz) for three cases, and get result here:

case 1: standalone publisher and standalone subscriber
cpu: 7.4 memory: 0.09968215868648357
case 2: composed publisher and composed subscriber with a multi-threaded-executor
cpu: 7.8 memory: 0.0470663710230197
case 3: composed publisher and composed subscriber with multiple single-threaded-executor
cpu: 6.2 memory: 0.04695660234221972
  • the thread num of multi-threaded-executor equals to the num of nodes, in this case, thread num = 2

you could reproduce similar result by running python scripts/test_use_dedicated_executors.py in this package

composition with a multi-threaded-executor (thread num=2) consumes higher cpu than normal standalone case, which seems like a bit strange.
and composition with multiple single-threaded-executor performs better than normal standalone case.

@fujitatomoya
Copy link
Collaborator

@gezp here is what i got on my local environment.

case 1: standalone publisher and standalone subscriber
[psutil.Process(pid=1275415, name='standalone_publisher', started='23:09:04')]
[psutil.Process(pid=1275417, name='standalone_subscriber', started='23:09:04')]
cpu: 25.0 memory: 0.1808959626374977
case 2: composed publisher and composed subscriber with a multi-threaded-executor
[psutil.Process(pid=1275442, name='composed_npub_nsub', started='23:09:25')]
cpu: 27.9 memory: 0.09321206907146806
case 3: composed publisher and composed subscriber with multiple single-threaded-executor
[psutil.Process(pid=1275465, name='composed_npub_nsub', started='23:09:36')]
cpu: 21.0 memory: 0.09055736496036379

I also confirmed that all messages from publisher are received on subscription. after all, i think MultiThreadedExecutor is not performative compared to Multiple SingleThreadedExecutor in this case.

btw, https://github.com/gezp/ros2_topic_performance needs to be updated as below to build.

diff --git a/package.xml b/package.xml
index 687383c..4250df7 100644
--- a/package.xml
+++ b/package.xml
@@ -10,14 +10,18 @@

   <build_depend>example_interfaces</build_depend>
   <build_depend>rclcpp</build_depend>
+  <build_depend>rclcpp_components</build_depend>
   <build_depend>rcutils</build_depend>
   <build_depend>std_msgs</build_depend>
+  <build_depend>geometry_msgs</build_depend>

   <exec_depend>example_interfaces</exec_depend>
   <exec_depend>launch_ros</exec_depend>
   <exec_depend>rclcpp</exec_depend>
+  <exec_depend>rclcpp_components</exec_depend>
   <exec_depend>rcutils</exec_depend>
   <exec_depend>std_msgs</exec_depend>
+  <exec_depend>geometry_msgs</exec_depend>

   <test_depend>ament_cmake_pytest</test_depend>
   <test_depend>ament_lint_auto</test_depend>

I think that having option dedicated executor makes sense for now. (personally i would like to figure out and improve MultiThreadedExecutor in the future.) @ivanpauno @wjwwood @clalancette any opinion?

@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Sep 20, 2021

I wouldn’t argue with a better multithreaded executor as an alternative, but what’s elegant about the changes requested is that they would impact both single and multi threaded executor containers to offer an “executor per component” behavior for complex systems needing that behavior for single and/or multi threaded node needs.

I think essentially we’d just make the container a templated class based on executor type so we could instantiate new ones based on T. That would future proof these changes for any potentially new executor types (like static single threaded which currently doesnt have a container for use).

So I think the best answer is not either-or, but both!

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2021-9-16/22372/1

@fujitatomoya
Copy link
Collaborator

@SteveMacenski we had a quick discussion on this in MW WG, i think that it is okay to have this option.

@gezp
Copy link
Contributor Author

gezp commented Sep 23, 2021

i try to implement this feature here https://github.com/gezp/rclcpp/commit/3d48c0aacdc62c1ab688d1147679157ad578927f, what do you think about it?

@fujitatomoya
Copy link
Collaborator

@gezp thanks for the contribution ! could you make PR, so that we can start review.

@ZhenshengLee
Copy link

Is this feature supported through launch file?

Thanks.

@SteveMacenski
Copy link
Collaborator

Yes, you can select this contain like any others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants