-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure to call connection callbacks in synchronous failure #16509
Conversation
If OperationalDeviceProxy::Connect fails synchronously it does not call pending failure callbacks except for the argument to the current call. This leads to unanswered connection requests as FindOrEstablishSession immediately calls ReleaseSession() after such a failure, which deletes the proxy.
PR #16509: Size comparison from a72ab9b to 1d0d4e0 Increases (16 builds for cyw30739, efr32, esp32, k32w, linux, nrfconnect, p6, telink)
Full report (27 builds for cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
|
@@ -98,6 +98,10 @@ CHIP_ERROR OperationalDeviceProxy::Connect(Callback::Callback<OnDeviceConnected> | |||
if (err != CHIP_NO_ERROR && onFailure != nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if onFailure is null we won't call the other callbacks?
@@ -98,6 +98,10 @@ CHIP_ERROR OperationalDeviceProxy::Connect(Callback::Callback<OnDeviceConnected> | |||
if (err != CHIP_NO_ERROR && onFailure != nullptr) | |||
{ | |||
onFailure->mCall(onFailure->mContext, mPeerId, err); | |||
|
|||
DequeueConnectionSuccessCallbacks(/* executeCallback */ false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So trying to understand the various cases here.
If we enter this function when state is State::Uninitialized
will we have existing callbacks? I'd think not...
If we enter this function when state is State::NeedsAddress
it actually looks like the existing logic (with or without this change) is buggy. We'll call LookupPeerAddress
, enqueue the new callbacks, and then if the LookupPeerAddress
call failed call the error callback but not dequeue it... which means we will later call it again.
If we enter this function when state is State::Initialized
we will try to EstablishConnection
... is this the part that was observed to sync-fail? But would we ever be in the Initialized
state with pending callbacks? I guess we could get here if we had an earlier UpdateDeviceData
call after DNS-SD resolve where EstablishConnection
failed, OnSessionEstablishmentError
was called but is a no-op (which it is), and we were left in the Initialized state with dangling callbacks? But then the right answer would seem to be to deque and call the callbacks at that point, so they're not left sitting there until someone else tries to connect and we sync-fail.
So overall, I think we should:
- Fix OperationalDeviceProxy::UpdateDeviceData to handle the sync-failure from
EstablishConnection
there correctly. Either by calling callbacks, or by callingConnect(nullptr, nullptr)
after moving to the new state. And nix the bogusOnSessionEstablishmentError
call (and probably the inheritance fromSessionEstablishmentDelegate
. - Fix this code to be consistent in its error handling. Probably most simply by ensuring that all branches of the swich enqueue the callbacks and set
err
as needed and after the switch iferr
is failure we invoke all of our queued callbacks. That would still cover up bugs like the one item 1 is fixing, but hopefully we won't have any at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this needs more work, this was mostly intended as a fire extinguishing exercise as this particular case was reliably causing issues.
However other changes have mitigated that particular issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes proposed above were made in #17266.
|
||
DequeueConnectionSuccessCallbacks(/* executeCallback */ false); | ||
DequeueConnectionFailureCallbacks(err, /* executeCallback */ true); | ||
CloseCASESession(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really live inside OperationalDeviceProxy::EstablishConnection
in the case when EstablishSession
returns error, because there it's clear we need to clean up mCASEClient.... Putting it here is a little odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#17466 addresses this.
Closing this. Unfortunately, I don't have the cycles to own a proper cleanup pass of this code. |
Problem
If OperationalDeviceProxy::Connect fails synchronously it does not call
pending failure callbacks except for the argument to the current call.
This leads to unanswered connection requests as FindOrEstablishSession
immediately calls ReleaseSession() after such a failure, which deletes
the proxy.
Change overview
Make sure to call pending callbacks on synchronous failure.
Testing
Connect to an non-routable address, which fails synchronously in sendmsg()
in the resolve callback. Application is still notified of failure.