Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TimedConditionVariable hanging forever when now() > max_blocking_time #1000

Conversation

ivanpauno
Copy link

I'm targetting the PR to 1.9.x just because it's the branch where I was working on.
The same probably applies to master.

The original code did:

std::chrono::nanoseconds nsecs = max_blocking_time - std::chrono::steady_clock::now();

overflows when max_blocking_time has already passed.

I've carefully checked that that was the cause of the hang I was experimenting (I printed everything and checked values, it was overflowing).

Here backtraces of such a hung (thread 1 and thread 2 are the problem).

Extra comment:
The return type of wait_until when a predicate is not passed should be std::cv_status, and not bool (see here). I'm not fixing that here, and returning false when timeout happens (same was happening before as return value of pthread_cond_timedwait is not 0 when timout happens).

Bottom line: Why not just using std::condition_variable_any? Is because a compiler not supporting it? I see that that's actually being used in not linux systems.

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
@MiguelCompany
Copy link
Member

We already have #944 for this. It has a different approach, as it leaves the decission on how to timeout to the native pthread function.

Regarding why we have this, we found out that certain implementations of std::condition_variable_any have an std::mutex inside that is taken without timeout. This can lead to the call to wait_until to be non-deterministic.

@MiguelCompany
Copy link
Member

Regarding why we have this, we found out that certain implementations of std::condition_variable_any have an std::mutex inside that is taken without timeout. This can lead to the call to wait_until to be non-deterministic.

Now that I think about it, we should only use this when building with STRICT_REALTIME

@ivanpauno
Copy link
Author

ivanpauno commented Feb 4, 2020

Didn't see the other PR, which looks better. Can that be pushed?
This might be the cause of many hangs we have in ros2 CI (I can't confirm that, as the error I was trying to fix was other).

@ivanpauno
Copy link
Author

Closing this, as I finally realized my fix didn't solve the issue.
Returning when now > max_blocking_time without releasing the mutex is just wrong.

I also tried #944, it doesn't work because it's not including <condition_variable> when std::condition_variable_any is used. Correcting that, it worked perfectly for me.
But if I edit the code to use the custom implementation of TimedConditionVariable (with the #944 fix included), it hungs.
As far as I've tried, it seems that passing past point times is still a problem.

I don't have time for looking at the bug, and I'm happy if #944 gets merged as it solves the problem for ROS2 (we don't use STRICT_REALTIME). But just for let you know, the custom implementation of TimedConditionVariable still have problems.
I can provide detailed steps of how to reproduce the issue if you want. It's not easy to reproduce, as I'm using many custom ros2 branches and I have to run a test more than 100 times to make it hung.

When using std::condition_variable_any, it doesn't have, even if I run the problematic test 2000 times.

@ivanpauno ivanpauno closed this Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants