-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore old signal handler after shutdown #353
Conversation
@dirk-thomas has clarified that it is appropriate for the demo nodes to call shutdown themselves instead of having it called from rclcpp's signal handler. it doesn't necessarily make sense for the interrupt handler in rclcpp to call shutdown, since maybe users want to spin after sigint in some case. ros2/system_tests#215 adds some tests |
even if it's not appropriate to call shutdown from the signal handler I still think it's appropriate to ignore interrupts, because the deadlock occurs as a consequence of the guard condition triggering that happens in both places. as this PR is right now, I'm going to factorise out the logic of (ingoring interrupts + triggering the guard condition) and call it from both the signal handler and |
ec03f5f factors the guard condition triggering logic out, and also changes from manually ignoring SIGINTS to just skipping responding to them if |
Why should a second SIGINT not notify the guard condition? I would expect a second signal to notify the condition again. Only in
|
Thanks for the clear example: I had the ideas of interrupt and shutdown conflated. This is because I thought that it was a double interrupt that was causing another deadlock to occur, since the first interrupt triggered some destruction. Looking closer, the example that I was double interrupting was exiting after the first interrupt (that's where the destruction was coming from), so it was just another instance of the same issue as in ros2/rmw_implementation#25. Just de-registering our signal handler should be sufficient to fix the deadlocks. I'll post back with CI to confirm |
ok finally got to 50 test passes without the other parameter flakiness issue interfering: the other flakiness issue is fixed in #356, so this job which includes commits from both branches was able to get the tests to pass 100 times in a row: conclusion is that ignoring sigints during shutdown is not necessary; restoring the state of the old signal handler is sufficient |
It would be good to separate the refatoring parts of the patch into one commit and the functional changes into a second commit. |
21c595a
to
d7b7d74
Compare
done; the refactoring in d7b7d74 is optional, we can leave it out if it's easier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (to be merged without squashing)
* QoS Profile Overrides - Player Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>
connects to ros2/rmw_implementation#25
This PR makes two main changes:
during shutdown, ignore sigints from interrupting the shutdown process. otherwise deadlocks described in Deadlock on sigint when multiple rmw impl's available rmw_implementation#25 can occurThis has been done to fix flaky tests caused by the deadlock described in ros2/rmw_implementation#25. For it to take effect, it has to be combined with a change such as ros2/demos@190d2f5 so the processes call shutdown.
If we were to instead call shutdown in the rclcpp signal handler, that change wouldn't be necessary. Is it appropriate to replace this block of code with a call to shutdown?I have tried to maintain the preference for sigaction where available, following existing code, but please keep in mind that there may be subtleties of signal handlers that I'm likely to overlook
Standard CI
repeating the list_paramters* tests (usually very flaky):
(passed 60 times, failed on 60th because of startup issue that I understand to be unrelated)