-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lifecycle destructor calls shutdown while in shuttingdown intermediate state #2520
Comments
i think this makes sense. we cannot change the state when it is in transition state anyway.
lifecycle node com interface cannot be finalized yet, https://github.com/ros2/rcl/blob/f19067726b9bc2d0b90ac0d0f246d1d7821d41ee/rcl_lifecycle/src/rcl_lifecycle.c#L264 since |
Just posting what needs to be done after jazzy release.
|
So cont to #2520 (comment), i took a quick check at the implementation.
this could be different problem for destruction order. @oysstu can you provide the minimal example to reproduce the issue?
this is NOT because the node is transition state. this is because the context is already destroyed before lifecycle node dtor. this is the same problem that #2522 (comment) reported. this can be observed with 1st we need to fix this, #2527 |
@oysstu it would be really appreciated if you can take a look at, |
I'm traveling for the next couple of weeks, but I'll try to follow up as best as I can. I noticed now that the rclcpp context is shut down by the signal handler, and not by me after the executor stops spinning. See the following output:
Shutdown is called from here: rclcpp/rclcpp/src/rclcpp/signal_handler.cpp Line 271 in 343b29b
|
Some possible solutions come to mind.
|
I believe that either context is valid or invalid, we should clean up the lifecycle node to finalize it to avoid leaving the devices unknown state, that is the original issue #997. adding a shutdown callback for the context would work, but if user also adds the preshutdown callback on the context, it would not work as expected order. (we might call Lifecycle::shutdown before user's cleanup callbacks.) I think we can call the |
Assuming that the shutdown callbacks are called in the reverse order of registration, the callback registered in the constructor would be the last one to be called relative to the lifetime of the node (i.e., any references to the node would be obtained by the user after the constructor). If the shutdown callback handles the shutdown transition on rclcpp::shutdown, and the node destructor handles it otherwise, are not all cases covered? Assuming that the node destructor does nothing if the context/publisher is invalid.
Hmm, I don't know enough about the lifecycle QoS and it's intended design to comment on this. |
this is true. i was thinking that user would do something like, have the pointers for nodes, and reset those pointers in context shutdown pre-shutdown hook to clean everything up since the context is shutting down. and then this maybe there could be more ideas, lets keep this open for the fix! thanks for iterating, i will try to bring this issue for some WG meeting. |
I've used a weak_ptr to refer to the node without extending it's lifetime. This only works when the node is managed by a shared_ptr though. If the lifecycle interface was instead defined in a NodeLifecycleInterface class and contained in the node as a shared_ptr, this could be guaranteed (i.e., similar to the other interfaces such as NodeBaseInterface). I've also de-registered the context shutdown callback in the lifecycle destructor, so that also should avoid dangling callbacks. That is good if an application creates and deletes a bunch of nodes. This alone is not sufficient though, since there could be a race condition between the destructor and shutdown callback (which is why a shared_ptr/weak_ptr is needed). |
@oysstu just fyi,
this original problem should be now completed as #2520 (comment) signal case can be discussed more here. (when the context is shutdown gracefully with deferred signal thread) |
As demonstrated in #2553 that issue cannot be addressed from the dtor because the subclasses' dtors are called before the base class' and it's illegal to call methods after the dtor so I think the proper way to address that would be to use the context's shutdown callback as suggested by @oysstu which is what we have been using (and is completely broken now with that change) |
@g-arjones thanks for the information and issue. in that case, what about the context is still in valid? can we just leave the device or sensor with unknown state until context is destroyed? |
I think that's the best that can be done by the base class (that was actually what the author of the feature request was referring to since he mentioned CTRL+C). If a more fine-grained control of the state is required then the applications must handle the transitions themselves. Calling a subclass method from a base class destructor is just not possible in C++ (which also explains #2554). |
The C++ RAII paradigm would suggest that since the subclass initializes the resource, it is up to the subclass dtor to handle releasing the resource properly. On a more distributed level, we use a manager node to handle transitioning when not terminating through exceptions or CTRL+C (e.g. something like the nav2 lifecycle manager). For us, the nominal shutdown pattern used is deactivate-cleanup-shutdown unless one of the intermediate transitions fail, in which case the direct shutdown transition is used (e.g. active-shutdown). |
Same here 👍 |
Just to expand on that, since it's the subclasses that are initializing/handling devices and sensor states they are the ones that should be responsible for ensuring the states are not unknown (in their dtors and/or |
@oysstu @g-arjones @gabrielfpacheco thanks for the discussion and opinions. all comments here make sense to me.
this is also true.
hmm, so that is user's responsibility, and base class never calls shutdown. i am not sure at this moment, i am okay to roll back everything but before that i can bring this up with other maintainers. |
yeah sorry about that happens, i really appreciate your feedback and comments. |
No reason to apologize. Keep up the good work! 👍 |
@oysstu @g-arjones @gabrielfpacheco so i came to the conclusion, that is i will roll back the reason is pretty much we discussed here. we are generating the another complication (hard to expect and user cannot avoid #2554) to address the one issue(#2553 (comment)). until we can figure out more fine grained control for user application, we should roll back the behavior so that user can be responsible for the lifecycle management. so at this moment, that is user application responsibility to make sure to call shutdown the device or sensor to avoid leaving them in unknown state. (either calling shutdown in sub-class or using context shutdown callback.)
as follow-up, I guess something we can improve is doc section that user needs to call shutdown if necessary and warning message just to check the current state and if that is not shut down, we can warn the user that what do you think? any concerns and thoughts? again, thanks for pointing out my oversight and sharing! |
@fujitatomoya Thank you for the update 👍
Yeah, that's what makes the most sense to me too.
Definitely...
Yeah, I guess that's fine. I'm just not sure which loglevel that should go in since for many applications not finalizing a node before terminating the process (or destroying its instance) is perfectly fine and even common practice within many examples and tutorials. |
i guess |
I think this is the correct move right now. The context shutting down before the node is cleaned up points to either a programming/logic error or something exceptional happening. It might be best to leave it to the user to decide the actions to take.
As I mentioned, it's possible to do this if the lifecycle functionality was implemented in a base interface contained in a shared pointer in the lifecycle class. I believe implementing this would enable some generic programming by treating the lifecycle node like a regular node through its base interfaces, but then have the additional lifecycle interface for the specific lifecycle functionality. Right now, the lifecycle API is a weird mix of functional programming and inheritance. I think it would be better to have the lifecycle class intended for use with inheritance, and disallow the direct callback functions. If the lifecycle functionality was implemented in a base interface it could be added to any node if users prefer the functional style. |
I appreciate your effort for looking into this, @fujitatomoya.
It makes sense to me that the user is the one responsible for rightfully finalizing the node since this may be application-specific. Rolling back and improving documentation seems the right move for me as well. |
@oysstu @g-arjones @gabrielfpacheco thanks for the quick response. i will start reverting, i will try some doc and debug print follow-up after reverting. |
rollback PRs,
to close the following issues,
|
see #2520 (comment), all related to PRs are rolled back and merged. i will go ahead to close this issue, feel free to reopen if i miss anything. |
@oysstu @g-arjones @gabrielfpacheco just FYI, this is closed. |
@fujitatomoya Thanks a lot! Any idea when the PR to rosdistro will be open? |
that is something i am not sure, you are talking about humble, right? CC: @audrow |
Exactly... |
@audrow @clalancette do we have any plan for humble sync or plan? |
Currently it is user application responsibility to manage the all state control. See more details for #2520. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
I'll be doing syncs every two weeks or so. (A sync is in progress right now.) As for a patch, I'm going to aim for the week of July 22nd. Let me know if I should do a patch sooner. |
@audrow thanks for sharing the plan. |
I would really appreciate if this could be released sooner. It's an important bug fix (kind of a critical one, at least for us). Is that feasible? |
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/new-packages-for-humble-hawksbill-2024-06-20/38277/2 |
It looks like rclcpp wasn't included, unfortunately. Still broken... |
@g-arjones i had a chat with @audrow , that was just a sync, so just community packages only. next patch is being scheduled in middle of July at this moment. |
I would have expected something like this to have a higher priority but I guess we will have to wait. Thank you for the notice! |
* LifecycleNode base class resource needs to be reset via dtor. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> * add debug notice that prints LifecycleNode is not shutdown in dtor. Currently it is user application responsibility to manage the all state control. See more details for #2520. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> * add test cases to call shutdown from each primary state. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> * address review comments. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> --------- Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Bug report
Required Info:
Steps to reproduce issue
Shutdown transition was added to the lifecycle node destructor in #2450 (iron backport: #2490). This currently triggers shutdown if the node state is anything other than finalized. Would it make sense for this to check whether the node is in one of the primary states before sending the shutdown transition? I've seen the following warning quite a bit when terminating a launched node using ctrl+c.
Not sure if it is related to the publisher being invalid or not, but the node is clearly in the shuttingdown state, and not one of the primary states.
Expected behavior
Only attempt transition in primary states unconfigured, inactive, and active.
Actual behavior
Shutdown transition is attempted for intermediate transitions (e.g., shutting down).
Ref. the following destructor snippet
rclcpp/rclcpp_lifecycle/src/lifecycle_node.cpp
Lines 156 to 169 in 5f912eb
The text was updated successfully, but these errors were encountered: