-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out a better way to handle expiration of sessions on fabric deletion #9642
Comments
Specifically to #9571 (comment), isn't the easiest thing to do to send the response without reliability (or equivalently to terminate the tcp connection immediately after sending the response)? Why does the exchange need to live on in the first place? So this is similar to option 2, but instead of not sending the response, we do send it but also tear down everything immediately after sending it. |
In fact, now that I'm thinking about it, isn't number two as written (send no response) already what's in the spec? It makes sense. How can you continue to communicate after the fabric is removed? You can't, or at least shouldn't be able to. |
This sounds like my option 3? Right now this does not work, apparently; I was not told what exactly does not work about it. |
Ah I see, yes that sounds like 3. And I guess the spec isn't totally clear on this:
This implies that no response (option 2) is a possible way to handle this. I'm not entirely sure that the spec is requiring that though. Sending a response and then immediately tearing down the session (option 3) seems much nicer ergonomically if you care about this operation being at all reliable. Else, you would be forced to infer success by trying to establish a new session and verifying you receive a no-trusted-root error response. |
Yes, the spec allows but does not require sending a response. Yes, sending a response would be nice. Option 3 would be good if we can figure out what's not working about that approach right now. |
So the specific things that don't work with option 3 are:
Why do we crash? Because we can't send any messages on the exchange after we shut down the session. So when we go to close the exchange we can't send the remaining standalone ack, never reach the
|
Note that not sending a response would not help with the above. |
Conceptually, I would have expected this to be simple. Omitting protocol matching, we have a stack that could look something like this:
What about the structure of our code prevents such an approach? Is fabric deletion async? Why can't we propagate the close-session action up the stack with function return values? |
Alternatively, why not just invoke a Post method to asynchronously handle clean-up of the accessing session? That way, in the code flow that is handling the message on the exchange, it can safely send a response back on that session (immediately though), and then in a async code path due to the post, actually clean-up the session. |
@msandstedt Sorry for the lag. We could push the session expiration further down into the networking stack so it doesn't actually get expired until after we have done the message send. It's all doable; it would just need to change the API contracts all up the callstack to until we get out of the exchange message handling.... @mrjerryjohns We want to clean up the session ASAP, not async, so no more messages will come in on it after fabric deletion. Or at least that's my impression based on why #9571 happened. As in, once we have removed the fabric, things on that fabric should not be able to change the node's state anymore. |
Now that #19328 is fixed, we should be able to do this much more sanely. |
This prevents establishment of new sessions for the fabric, and shuts down other communication on the sessions that do exist, which our old setup did not use to do. Fixes project-chip#9642
This prevents establishment of new sessions for the fabric, and shuts down other communication on the sessions that do exist, which our old setup did not use to do. Fixes #9642
Problem
#9571 introduces code that waits for an exchange to close by overriding the exchange delegate. This is fragile and we should have a better solution.
Proposed Solution
Some options:
I personally thing option 1 is too overdesigned for what we want here. Option 4 is pretty simple, but scatters the logic for the "deleting current fabric" and "deleting other fabric" case around a bit, unless we make CommandHandler expire sessions for whatever fabric is deleted, not the accessing fabric. Options 2 and 3 both mean the command sender will not get a response, which the spec allows but might not be ideal.
Thoughts?
@pan-apple @msandstedt @mrjerryjohns
The text was updated successfully, but these errors were encountered: