-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spin_some
easily deadlocks action clients
#2451
Comments
The issue can be fixed by re-ordering ClientBase::execute and ClientBase::take_data() such as:
go as the last conditions. So it'd be for both
Otherwise feedback & status will always have priority over server responses. |
I think reordering would work. What of the fact that
rclcpp/rclcpp/src/rclcpp/experimental/executors/events_executor/events_executor.cpp Lines 341 to 344 in 5e14a28
|
The idea is that one single PD: I've already tested with the |
* Fixes for intra-process Actions * Fixes for Clang builds * Fix deadlock * Server to store results until client requests them * Fix feedback/result data race See ros2#2451 * Add missing mutex * Check return value of intra_process_action_send --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
* Fixes for intra-process Actions * Fixes for Clang builds * Fix deadlock * Server to store results until client requests them * Fix feedback/result data race See ros2#2451 * Add missing mutex * Check return value of intra_process_action_send --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
* Fixes for intra-process Actions * Fixes for Clang builds * Fix deadlock * Server to store results until client requests them * Fix feedback/result data race See ros2#2451 * Add missing mutex * Check return value of intra_process_action_send --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
* Fixes for intra-process Actions * Fixes for Clang builds * Fix deadlock * Server to store results until client requests them * Fix feedback/result data race See ros2#2451 * Add missing mutex * Check return value of intra_process_action_send --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
* Fixes for intra-process Actions * Fixes for Clang builds * Fix deadlock * Server to store results until client requests them * Fix feedback/result data race See ros2#2451 * Add missing mutex * Check return value of intra_process_action_send --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com>
* Fixes for intra-process actions (#144) * Fixes for intra-process Actions * Fixes for Clang builds * Fix deadlock * Server to store results until client requests them * Fix feedback/result data race See ros2#2451 * Add missing mutex * Check return value of intra_process_action_send --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com> * Fix IPC Actions data race (#147) * Check if goal was sent through IPC before send responses * Add intra_process_action_server_is_available API to intra-process Client --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com> * Fix data race in Actions: Part 2 (#148) * Fix data race in Actions: Part 2 * Fix warning - copy elision --------- Co-authored-by: Mauro Passerino <mpasserino@irobot.com> * fix: Fixed race condition in action server between is_ready and take"… (ros2#2531) * fix: Fixed race condition in action server between is_ready and take" (ros2#2495) Some background information: is_ready, take_data and execute data may be called from different threads in any order. The code in the old state expected them to be called in series, without interruption. This lead to multiple race conditions, as the state of the pimpl objects was altered by the three functions in a non thread safe way. Co-authored-by: William Woodall <william@osrfoundation.org> Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix: added workaround for call to double calls to take_data This adds a workaround for a known bug in the executor in iron. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> --------- Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: William Woodall <william@osrfoundation.org> --------- Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: Mauro Passerino <mpasserino@irobot.com> Co-authored-by: jmachowinski <jmachowinski@users.noreply.github.com> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: William Woodall <william@osrfoundation.org>
- See ros2#2451 Signed-off-by: Mauro Passerino <mpasserino@irobot.com>
- See ros2#2451 Signed-off-by: Mauro Passerino <mpasserino@irobot.com>
Bug report
When running an
rclcpp_action::Client<T>
in usingspin_some()
and arclcpp::Rate
, if the server you connect to publishes feedback at a faster rate than yourrclcpp::Rate
, your client waits for the goal to be accepted forever, effectively becoming deadlocked.Required Info:
Note: based on my understanding of the root cause (see
Additional information
), I believe this bug exists in rolling too, but I've only tested on Galactic as that's what I have access to at the moment.Steps to reproduce issue
The below server and client reproduce 100% of the time on my machine when running over loopback. Compilation settings shouldn't matter. Once built, they can be run with
ros2 run ...
or directly with./path/to/binary
. I'm usingaction_tutorials_interfaces/action/Fibonacci
, but this should repro with any action interface.Server
Notes:
Client
Notes
SendGoalOptions
as the deadlock is not dependent on one.Expected behavior
The action client will accept the goal, perhaps after working through a short backlog of feedback up to the queue depth in the feedback subscription's QoS.
Actual behavior
The server has accepted and started working on the goal but the client never sees that. The client continuously logs
Waiting for goal response.
in the repro and debug logs haveReceived feedback for unknown goal. Ignoring...
. Debug logs showClient in wait set is ready
, which I presume to be the goal response, but it's never taken, so the client deadlocks.Additional information
I did some digging and think I have tracked down the root cause, some other contributing factors, and some related impacts.
rclcpp_action::Client<T>
derives fromrclcpp_action::ClientBase
derives fromrclcpp::Waitable
, and these are scheduled for work in anrclcpp::Executor
as a single executable, not multiple executables of their constituent parts. When usingrclcpp::Executor::spin_some
, which only collects work from ready entities once per call, therclcpp::Waitable
is scheduled once, regardless of the amount of work it has ready. This should be fine, but therclcpp::ClientBase::take_data
implementation only yields a single executable per call and therefore the subsequentexecute
invoked byrclcpp::Executor
only performs one thing on the client rather than everything that's ready.rclcpp/rclcpp_action/src/client.cpp
Lines 550 to 687 in 5e14a28
In most scenarios, this is fine, but it deadlocks in this specific scenario: a server publishing feedback faster than the client is spinning (specifically, faster than it calls
rclcpp::Executor::wait_for_work
). This can happen with any of thespin*
implementations but happens most readily withspin_some
. Why? Becauserclcpp::ClientBase::take_data
only returns the first ready executable and it prioritizes feedback over responses. The implementation ofrclcpp::ClientBase::take_data
andexecute
take and execute in this order: feedback, status, goal response, result response, cancel response.When you combine the implementation of
rclcpp::Waitable
that only returns execution data representing a single unit of work, the implicit prioritization mechanism built intoClientBase
, and an action server whose feedback topic publishes faster than the spin rate of the client, you get a client that always has ready feedback for an unknown goal, ironically never executing the goal response that feedback is for.Some ancillary effects of this: the goal response is never taken, the client never creates the goal handle, and therefore cancellation isn't easy. Cancellation can be done through
rclcpp_action::Client<T>::async_cancel_all_goals
but that may have adverse side effects depending on your scenario. Additionally, the client never becomes "result aware" for the goal since that occurs when the goal response is processed, so result responses are permanently lost if they occur during the deadlock. Feedback is also always dropped because it's for an unknown goal.The text was updated successfully, but these errors were encountered: