-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Failed to reach the CommissioningStageComplete if network is unreachable during commissioning progress #30041
Comments
The commissioning state machine, when handling a state, needs to either queue up an async process that will eventually push the state machine along or needs to fail out and stop the commissioning process. We had a number of state handlers that could fail to do both if the attempt to send the message failed, which would leave the commissioner in a stuck state where it thought it was waiting for the async work, while the async work was not queued. This change adds the relevant error checks to ensure we never wait for work that has not started. Fixes project-chip#30041
Sure, I am willing to verify that. I can do the verification until I go back to the office on Monday @bzbarsky-apple . I would like to ask about the ArmSafe. I am not quite understanding about this mechanism. Why can't the controller just delete the commssionee and make the commissioner back to the initial stage? |
@lion6420 Yes, nothing here is urgent enough to be worth worrying about over the weekend!
It can, in terms its own internal state. Clearing the fail-safe is about putting the commissionee device (if it's reachable) into a state where the user can try again (perhaps with a different wireless network, if the issue was that the device could not join the IP network) immediately instead of having to wait for the fail-safe to time out. |
@bzbarsky-apple Thanks for your reply, have a nice weekend! |
Hi @bzbarsky-apple Controller log: HandleSendingErrorsDuringCommissioing.txt |
The commissioning state machine, when handling a state, needs to either queue up an async process that will eventually push the state machine along or needs to fail out and stop the commissioning process. We had a number of state handlers that could fail to do both if the attempt to send the message failed, which would leave the commissioner in a stuck state where it thought it was waiting for the async work, while the async work was not queued. This change adds the relevant error checks to ensure we never wait for work that has not started. Fixes project-chip#30041
@lion6420 Thank you for checking that! |
The commissioning state machine, when handling a state, needs to either queue up an async process that will eventually push the state machine along or needs to fail out and stop the commissioning process. We had a number of state handlers that could fail to do both if the attempt to send the message failed, which would leave the commissioner in a stuck state where it thought it was waiting for the async work, while the async work was not queued. This change adds the relevant error checks to ensure we never wait for work that has not started. Fixes project-chip#30041
The commissioning state machine, when handling a state, needs to either queue up an async process that will eventually push the state machine along or needs to fail out and stop the commissioning process. We had a number of state handlers that could fail to do both if the attempt to send the message failed, which would leave the commissioner in a stuck state where it thought it was waiting for the async work, while the async work was not queued. This change adds the relevant error checks to ensure we never wait for work that has not started. Fixes project-chip#30041
The commissioning state machine, when handling a state, needs to either queue up an async process that will eventually push the state machine along or needs to fail out and stop the commissioning process. We had a number of state handlers that could fail to do both if the attempt to send the message failed, which would leave the commissioner in a stuck state where it thought it was waiting for the async work, while the async work was not queued. This change adds the relevant error checks to ensure we never wait for work that has not started. Fixes #30041
Reproduction steps
We use our own matter controller, and are testing our error handling function. Therefore, during the following step, the chip-all-cluster-app will be shutdown and network will be cut off purposely.
Commissioning without network cut off: Log without network cutdown.txt
Network cut off case: LogNetworkCutdownCase.txt
Is it because the DisArmSafe command failed in DeviceCommissioner::CleanupCommissioning and without handling it?
Bug prevalence
Whenever I do this
GitHub hash of the SDK that was being used
e059202
Platform
other
Platform Version(s)
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: