-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fuse -> ROS 2 fuse_core: Fix Async #293
fuse -> ROS 2 fuse_core: Fix Async #293
Conversation
7c4a644
to
74126eb
Compare
32d4aa5
to
6e6e09d
Compare
@ros-pull-request-builder retest this please |
6e6e09d
to
a316052
Compare
@ros-pull-request-builder retest this please |
@@ -112,23 +112,31 @@ void CallbackAdapter::addCallback(const std::shared_ptr<CallbackWrapperBase> & c | |||
{ | |||
std::lock_guard<std::mutex> lock(queue_mutex_); | |||
callback_queue_.push_back(callback); | |||
|
|||
// NOTE(CH3): Do we need to keep triggering guard conditions if the queue size is > 1? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the answer is yes, and this is a good catch.
IIRC the waitable interface assumes that the waitable will become ready by something that will wake the wait set. Guard conditions wake the wait set, but they only wake once no matter how many times they're triggered.
I think a good solution would be to trigger the guard condition in take_data()
. If after taking data the queue is not emtpy, then trigger the guard condition again so the executor will wake for the next callback in the queue.
Closing in favor of #294 |
See: #276
Description
There are several outstanding issues with the async model used in fuse_core:
executor->cancel()
call and spin forever.I'm a little iffy on if that's what's happening with (2), but regardless, the solutions implemented in this PR fix both issues.
The solution is to keep triggering the guard condition on hooking of the callback queues until the callback queues are serviced.
I also edited the tests to be a
little bita lot stricter and now test for multiple callback additions and initializations.The tests now feature async classes with executor thread count of 0 (which means the number of threads will equal the number of CPU cores), as troublesome a case as possible to test, to ensure my solutions are as robust as possible.
I had to do this because a lot of the issues seem to be race condition related and come up extremely rarely.
Notes
There was an alternative solution that I explored (which also worked), which was to use the timeout argument for the multi-threaded executor. This removes any chance of the multi-threaded executor deadlocking on waitables, but at the cost of free spinning at the rate of the timeout.
If we encounter deadlock or threading issues down the line, we might have to use that solution instead.
initialized_
atomic to become true. I can't reproduce it though... Is there a chance for callbacks to get dropped??Update
EDIT: I pulled out all the stops and used all the solutions I found, all at once. I do not think I understand the problem enough to come up with an elegant solution, as for some reason all the "good practices" (like locking on threads that notify condition variables), or relying on condition variable waits instead of sleeps all result in deadlocks or weird behavior like atomic flags not getting set when the block that sets them ostensibly runs (???).
The solution I've devised combines the use of condition variables and manual guard condition triggers (just for
initialize()
!!), and also a lower frequency executor wakeup.On the other hand, I'm able to spin up and down a massive amount of async instances sequentially now, when before it'd randomly block indefinitely, so it might just be good enough (I updated the tests to check for that case specifically.)? I have yet to see it fail, but I am not confident that it will never fail, since I never understood why it failed in the first place.
Some refinement might be helpful though, maybe I'm just not seeing something obvious.
Pinging @svwilliams for visibility.