-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLE client "DISCONNECTING" state is terminal and leaks btproxy connections #6701
Comments
I can also reproduce this with 5 eq3 thermostats that need a short active connection to update the status.
|
Got a few stuck disconnecting. Noticed this while adding some more logging in Bluetooth-Devices/habluetooth#154 and there were less paths to the device than expected ![]()
|
|
Looks like there is a race if we call disconnect while connecting |
Fix can be tested with 2025.2.x by adding the following to your yaml and rebuilding |
Hey there @jesserockz, mind taking a look at this issue as it has been labeled with an integration ( |
Great, flashed that on 2 out of 4 btproxies, will report how it goes. Thanks |
Update after ~24h run: I have not observed a substantial improvement in my situation. Running esphome 2025.2.0, with the snippet from #6701 (comment) which resolved to git rev 370c87536d0782426df005c8978886296cf781e1. Average uptime (before auto-restart due to no scan within 15-min interval) was about 6 hours. Looks like the culprit is still there in "Disconnecting before connected" logs. Log samples:
|
I find the journey of D8:71:4D:96:CB:1B (an Airthings Wave+ device) in btproxy-c25b74 particularly interesting. After failing connect on Similarly 80:6F:B0:59:2A:52 (another Airthings Wave+) in btproxy-c257a8. Could it be a race condition where "disconnect" event has not been propagated properly, and esphome went for a reconnect, then when "disconnect" eventually get to HA, HA issued a manual disconnect but this time affecting "connection in-progress" slot #1 (addressed by peer MAC)? Without much knowledge of btproxy details I am only speculating at this point. There's also a theme:
reason 256=0x100=ESP_GATT_CONN_CONN_CANCEL, reason 2 I couldn't find from the docs (source). |
Looks like we cannot change the state to disconnecting if we are connecting until the connection callback comes as the scanner will start another connection attempt as soon as we change the state to disconnecting which means we have two in flight and the failure happens. So we need some type of other flag on the connection to signal we want to disconnect the connection as soon as we can when we want to disconnect while already connecting that doesn't change the state It means we can never go directly from connecting to disconnecting and when we set to idle we must clear the want_disconnect flag as well as check it every time the state changes.... messy |
@dotdoom I updated the PR for the finding above, can you try again? Please note that |
@bdraco on it. Fetched both components externally, 2 out of 4 proxies are running patched to 3f47f72. |
I'm flying most of the day, but should still have wifi if it works in flight so I should be able to do another turn if needed. Feel free to reach out on discord (same handle) when you have the latest results. |
I pushed a change with a bit more logging (not a functional change so no need to update again if everything is going well) |
@bdraco thank you, the version from 3f47f72 is most reliable. Getting to 24h uptime now (with previous, albeit rare record being 48h so I will let it bake a tad more). Observations are positive:
I kept the 2 proxies at 3f47f72 untouched to keep them running for more, and flashed the latest from esphome/esphome#8297 to the other two. Do you have interest in any specific logs at all? |
Thanks for the update. I'm glad its going well so far.
Only if there are new errors. I have it running on 12 of mine and so far so good |
Alright, on the uptime graph it's a clear win, and no immediately noticeable difference in BLE device reachability or measurement latency. Thank you, @bdraco ! ![]() |
Great news. We will get this shipped. Thanks! |
The problem
It looks like
disconnect()
may mark a Bluetooth connection asDISCONNECTING
if some conditions are not satisfied, which is a terminal state in this FSM and makes connection unusable.Relevant code:
https://github.com/esphome/esphome/blob/b454f63b3604d766abb038fe3c0f79dc20ab5cad/esphome/components/esp32_ble_client/ble_client_base.cpp#L125-L142
As a result, connection becomes unusable. Repeat this twice, and esp32 is out of connection slots and Bluetooth is dead on that one.
Which version of ESPHome has the issue?
2024.12.2
What type of installation are you using?
pip
Which version of Home Assistant has the issue?
2025.1.3
What platform are you using?
ESP32-IDF
Board
Olimex PoE
Component causing the issue
bluetooth_proxy
YAML Config
Anything in the logs that might be useful for us?
Additional information
The problem was discovered following my post:
https://community.home-assistant.io/t/bt-proxy-btproxy-suddenly-stops-discovering-debugging/832665
Looks like related problems have been seen before:
https://community.home-assistant.io/t/esphome-bluetooth-ble-proxy-stops-working/649181/14
The text was updated successfully, but these errors were encountered: