Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device timesout after a getting data for a short time #202

Open
DurgNomis-drol opened this issue Apr 13, 2021 · 90 comments
Open

Device timesout after a getting data for a short time #202

DurgNomis-drol opened this issue Apr 13, 2021 · 90 comments
Labels
bug Something isn't working

Comments

@DurgNomis-drol
Copy link
Collaborator

Describe the bug

Hey guys.

I have for the last week or so been having a weird problem with my Google Nest Mini. Our integration works great, but after a couple of minutes i get this debug message (See logs below) which i expect.

I can still talk to the device, but in the Google Home app i cannot access my device settings for my device. My first thought was that something had gone wrong with the device. I tried reseting it multiple times. When i disable or stop our integration it seems to work again, but as soon as i activated our integration again, it stops working and is timing out again. 😞 A power cycle also makes it work for a couple of minutes, and then it stops again.

What do you guys think is happening?

I thought maybe we are hammering it to hard, but then again the would be weird if i am the only one having this problem.

Version of the integration

If you are unsure about the version check the github release here.

1.4.2

Logs

2021-04-13 19:54:40 DEBUG (MainThread) [custom_components.google_home] Fetching data from Google Home device Køkken - https://192.168.0.205:8443/setup/assistant/alarms
2021-04-13 19:54:42 DEBUG (MainThread) [custom_components.google_home] Køkken device timed out while trying to get alarms and timers.
2021-04-13 19:54:42 DEBUG (MainThread) [custom_components.google_home] Getting Do Not Disturb setting from Google Home device Køkken
2021-04-13 19:54:42 DEBUG (MainThread) [custom_components.google_home] Requesting endpoint setup/assistant/notifications for Google Home device Køkken - https://192.168.0.205:8443/setup/assistant/notifications
2021-04-13 19:54:44 DEBUG (MainThread) [custom_components.google_home] Køkken device timed out while trying to post data to it - Raw data: None
2021-04-13 19:54:44 DEBUG (MainThread) [custom_components.google_home] Finished fetching sensor data in 4.009 seconds
@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

I had something similar with Nest Hub Mini yesterday but then it remediated itself. Seems to work since then.

@leikoilja
Copy link
Owner

I did have the same behavior as well a few times. Thou currently my RPi has just stopped working with constant "Unable to connect to Home Assistant" since morning. Nothing seems to be helping, so i ll just focus on fixing it first 😅

@DurgNomis-drol
Copy link
Collaborator Author

DurgNomis-drol commented Apr 13, 2021

Mine have just gotten to a point now where i cannot get it to work at all and i cannot figure out why or how it is caused, since i am the only one having it, it should be something unique to my setup, but I don't have a fancy network setup, just a tp-link wireless router and nothing else.

How is your devices connected to the network (2.4G, 5G or ethernet)?

I also have a problem with my RPI locking up after 1 day or 2. 😭

@DurgNomis-drol
Copy link
Collaborator Author

I think i have found out what is happening.

When I spam an endpoint call like this just after a restart with Google Home integration disable curl -X GET https://192.168.0.205:8443/setup/eureka_info -k. It crash's after about 20-50 calls in a row. If I then stop and wait sometime, it still won't respond with new API calls.

So my conclusion is that the webserver that Google uses on there devices either have memory leak or is very badly build. Or that it is limiting/refusing request after a set number received.

What do you guys think?

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

I just did 500 requests to alarms endpoint in 53s (9.4 req/s) to Google Home Mini, it responded to all of them.
Another 100 requests to eureka_info finished in 10.4s also just fine.

@DurgNomis-drol
Copy link
Collaborator Author

DurgNomis-drol commented Apr 13, 2021

Strange, i may have a faulty one then 😞 Can you paste your script here so i can try to see if it replicates it?

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

Well, script 😅

for i in {1..100}; do echo "Request $i"; curl -H "cast-local-authorization-token: <token>" --insecure https://192.168.1.2:8443/setup/assistant/alarms; done

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

Btw, I think I broke it. I cancelled it with Ctrl-C and it no longer responds. Maybe unfinished requests block it? But power cycle fixed it.

I was able to reproduce it. Sometimes after Ctrl-C it becomes very slow. It actually responds but only after ~30 seconds.

@DurgNomis-drol
Copy link
Collaborator Author

I just tried doing a 1000 request, it worked fine until i opened the Google Home App and the it stopped. I then did Ctrl +C and tried to run it again and it timed out. But it seems to solve it self after sometime(as you suggest), but for me it is much more then 30 seconds. 🤔 I don't use the Google Home normally, only to delete alarms or timers, until i have implemented it.

Do you think it does not like parralle requests or?

I don't remember how much RAM these guys have, but could it be that some of them does not have enough ram to handle being polled often?

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

I mean in my case it responds for 30 seconds on a single request. Now it took almost 2 minutes.

I don't think it's a RAM size. It's probably shitty webserver implementation. It hangs on TLS handshake (client hello request).

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

I think we can detect this and reboot the device using api call with disabled timeout. In my case it worked. It rebooted with no sound, shouldn't be annoying.

@DurgNomis-drol
Copy link
Collaborator Author

How can you see that it hangs on TLS handshake? 🤔

That does not surprise me. 🙄

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

Run curl with --verbose.

@DurgNomis-drol
Copy link
Collaborator Author

It seem to switch from TLS 1.3 to 1.2 after server hello for some reason? I don't know much about this, but could that be anything?

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

I think it doesn't support 1.3. In my case it uses 1.2 and forcing 1.3 causes an error.

@DurgNomis-drol
Copy link
Collaborator Author

Right. I just did a little bit of research and it seems to be that there is a known bug in some versions of openssl that will cause the handshake to just hang forever. And if we are (un)lucky that is exactly that version they are using. 😭

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

Do you have a link to that issue?

@DurgNomis-drol
Copy link
Collaborator Author

Stackoverflow talking about it, see also comment under the one I have linked. This is not directly about openssl but why it is happening.

This is long manifest about it, i haven read it all. But I think it hits spot on with our problem.

@DurgNomis-drol
Copy link
Collaborator Author

This one seems to also be about it. Directly on openssl github

@DurgNomis-drol
Copy link
Collaborator Author

I realised that the first one is not that important. 🤔

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

I tried using different ciphers and downgrading to TLS 1.1 but it still becomes extremely slow after few cancelled requests. It seems rebooting is the only way to remediate this.

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

List of ciphers Google Home Mini supports:

PORT     STATE SERVICE
8443/tcp open  https-alt
| ssl-enum-ciphers:
|   TLSv1.0:
|     ciphers:
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
|     compressors:
|       NULL
|     cipher preference: client
|   TLSv1.1:
|     ciphers:
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
|     compressors:
|       NULL
|     cipher preference: client
|   TLSv1.2:
|     ciphers:
|       TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (secp256r1) - A
|       TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (secp256r1) - A
|     compressors:
|       NULL
|     cipher preference: client
|_  least strength: A

@DurgNomis-drol
Copy link
Collaborator Author

DurgNomis-drol commented Apr 13, 2021

As far as i understand it is because of this line TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (secp256r1) - A we are having problems, it expect a SHA1 128 but gets SHA2 256 or am i misunderstanding something?

And yes that is probably the workaround for now that we should implement or maybe I should test run for sometime and see if it fixes it for me.

@KapJI
Copy link
Collaborator

KapJI commented Apr 13, 2021

No, it announces which ciphers it supports during the handshake and then curl uses what it expects. It hangs before this actually.

@KapJI
Copy link
Collaborator

KapJI commented Apr 14, 2021

@DurgNomis-drol I was experimenting with this, you can try it: https://github.com/KapJI/ha-google-home/tree/reboot-timedout

@KapJI KapJI added the bug Something isn't working label Apr 14, 2021
@leikoilja
Copy link
Owner

oh flip, you guys have some interesting talks here... :)
The problem is real, but we can't just start restarting google home devices :/ it can bring to a bad user experience very quickly :/ How about decreasing the API call frequency and looking at making those calls sync instead of async as @DurgNomis-drol proposed earlier? 🤔

@KapJI
Copy link
Collaborator

KapJI commented Apr 14, 2021

We don't make parallel calls to the same device. Decreasing frequency won't solve this problem and will make UX worse for everyone.

@leikoilja
Copy link
Owner

leikoilja commented Apr 14, 2021

Right, just double checked, those are separate devices we call in parallel, with all sync calls to the same device - that's good.

A weird observation but i did notice sometime the devices dont even appear in HA after HA is reset for the same reason of timeout. Maybe device force restart at HA initialisation can help to "sync them all up"? 😅 haha, not sure how that would work though.

I'm just afraid of us calling restart service on timeouts so that won't end up being restarting the devices every hours or so..

I'm also wondering, have we not started receiving timeouts just recently? 🤔 Don't remember us having timeouts in a old releases.

@KapJI
Copy link
Collaborator

KapJI commented Apr 14, 2021

Just an idea but maybe if restart happens during the request, it will cause this issue similar to ctrl-c in curl.

We started to see this after 2021.4.0 release but some people reported this issue on previous release as well, so that's probably not related.

@KapJI
Copy link
Collaborator

KapJI commented Jun 28, 2021

I think making update interval configuration via config flow is reasonable, so people experiencing this issue can adjust it to their needs.

I personally have this issue very rarely, e.g. once in few weeks which I think correlated to HA updates in my case.

@Raul-7-7
Copy link

@leikoilja Not a problem! I have a few units, anytime you guys needs testing, would be more than happy to help out.

@KapJI Great idea! I am currently on update 180 (3min) and it has 100% resolved my issue, not once I had to reboot my devices, I did restart HA 4-5 times, no issue, it is working.

@DurgNomis-drol maybe you can try changing it to 30 seconds? see how that works, just curious, what is your usecase that you need the update frequency be so low?

Cheers,

@Drakulix
Copy link
Collaborator

@DurgNomis-drol maybe you can try changing it to 30 seconds? see how that works, just curious, what is your usecase that you need the update frequency be so low

Well if you are running automations based on setting alarms/timers you generally want quick response times. If your automation triggers on a ringing alarm a round time of half a minute, could miss the event (if the user stops the alarm to quickly). And three minutes might even be very late, e.g. if you set a timer for a pizza in the oven and use home-assistant for additional notifications.

@DurgNomis-drol
Copy link
Collaborator Author

I use it as a kitchen timer, eg. short duration timers. I think 30 seconds may be the upper limit for me. I will test it now and report back in a couple of days 😃

@DurgNomis-drol
Copy link
Collaborator Author

Nope, unfortunately for me it does not work at all. 😭 I also tested 60 and 120 secs and it gives me the same result.

@Raul-7-7
Copy link

Raul-7-7 commented Jul 5, 2021

@DurgNomis-drol that sucks! mine is over a week now, going no issue with 3 min, I know 3 min wont work for you, but can you try it for a day? curious if thats the magic number!

@DurgNomis-drol
Copy link
Collaborator Author

DurgNomis-drol commented Jul 7, 2021

@Raul-7-7 Yeah, 3 mins seems to be the magic number. It is now stable for me too, though too slow for me to use.

@KapJI
Copy link
Collaborator

KapJI commented Jul 22, 2021

I noticed that in my case the same device was queried simultaneously as the device itself and as the main device in a group when querying groups. Can it be the same for you? You can also try to replace asyncio.gather in api.py with sequential queries and see if it helps.

@DurgNomis-drol
Copy link
Collaborator Author

@KapJI Tried that a long time ago. Didn't change anything.

@Raul-7-7
Copy link

Raul-7-7 commented Aug 1, 2021

@KapJI @DurgNomis-drol I been doing more testing past couple of weeks, seem to notice some consistency with this Scenario even at 3 min update Internal.
Setup: Google Home Mini is located in Bedroom & Google Home located in Living room.
I have no other device in Bedroom, however, I have ChromeCast with Google TV as well in Living room.

Testing Scenarios:

  • Update = 20 seconds
    • both devices eventually get to unavailable state in about 4-5 hours
  • Update = 180 seconds (3min)
    • Google home Mini in the bedroom rocksolid, not once it has gone offline.
    • Google home in the living room goes offline after 24-30 hours, and even if I unplug it, wait 10 seconds plug back, its available for 2-3 hours and then again offline.
    • This is where it gets interesting, I noticed in my Google Home app on my phone, I lose the "CONFIG" gear for the ChromeCast with Google TV, I reboot that device, then reboot Google Home and boom, they both stay online and Google Home is available and only I notice it become Unavailable when ChromeCast with Google TV also loses the Config Gear!

Is it possible that we ignore the update interval when another device in the same room becomes unavailable? and keep hammering it?

Anyway, I did this more as a test, for me it doesn't matter, I only use this integration for setting alarms, and I always set my alarms on the bedroom one which is RockSolid. However, I be happy to do more scenarios if you think it helps solve the issue for community.
Cheers,

@DurgNomis-drol
Copy link
Collaborator Author

@Raul-7-7 Thanks for testing. But i still believe the problem is coursed by a bad implementation of the webserver on Google's end. It does not handle timeout or parallel requests very well.

The things is with these products, they are constructed as cheap as possible, like when you buy a cheap old car. It is only meant the work for a short time until you can't complain or it breaks down. The API we are hammering have never been construct with the intentions of being pulled for data this offend. They are meant to be pulled only once in awhile, like when you open the settings in the app. And nearly all devices, even if they are not build by Google. They run a proprietary version of Android, specifically designed for this job. I know this because i was at some point researching how to root them, but learned that it is (almost) impossible.

@Raul-7-7
Copy link

Raul-7-7 commented Aug 1, 2021

@DurgNomis-drol I agree with you that these devices are not meant to be used the way we are using them, HOWEVER, the part that is driving nuts is, why only few of us are experiencing this? There has to be something different with our setup.

@DurgNomis-drol
Copy link
Collaborator Author

@Raul-7-7

It may have something to do with the internal hardware. I believe that Google gets these devices manufactured at many different companies and they may use different hardware and hence different firmware, and they may even be using different software versions for them (I think this is the problem). But for us the consumer, it will look like it is all the same. This can maybe explain why only some of us are experiencing this problem. There are many other small things that also can play a role, but i believe this one is the main one.

This is by the way, a very come way to manufacture a large amounts of cheap hardware.

@leikoilja
Copy link
Owner

ohh flip, these google devices are a bit crappy.... so sad 😞
Do you guys think any of you can help draft a little section about timeout issue into readme's "troubleshooting" section? We might even need to add some "limitations" paragraph in the beginning of the README file to warn ppl beforehand that the integration might not work for 'high frequency' applications due to the google imposed limitation (what we believe is the case). We can also open a new issue to implement configurable update frequency from HA and once it's implemented we could refer the user to try lowering down the update frequency from let's say default 3minutes to something they need under their own risk? 🤔 Also, big thanks for debugging and testing, @Raul-7-7 and @DurgNomis-drol

@DurgNomis-drol
Copy link
Collaborator Author

@leikoilja That is a great idea!

3 minutes does not even work for me, but my network seems to be peculiar thing.

Lets open an issue for implementing update interval in config flow and separate one for the readme section.

I can make a draft of the readme section, as i am probably the one who fought with problem the most. 😞

@Raul-7-7
Copy link

@leikoilja @DurgNomis-drol I can also help with testing and readme section. Let me know.
Cheers,

@DurgNomis-drol
Copy link
Collaborator Author

@Raul-7-7 I have already written something about it. You can see it in the readme just under about. But if you think that you have something to add, you are more then welcome to open a PR with it. 😊

@leikoilja
Copy link
Owner

@Raul-7-7 thanks for offering your help. We have released a new version yesterday, can you please upgrade and test if it works for you to tune the update frequency and if the devices stop timing out (with a default 3mins update freq)? 🤩

@jamieshaw
Copy link

jamieshaw commented Aug 17, 2021

I've been seeing this issue for a while too — but recently been struggling to get the devices to initialise all together.

For the record, we have 4 Google Home minis, and 2 Google Home Hubs — all pre-"Nest".

I did notice today that for the Google Home devices that did initialise through this component, they wouldn't show for the native "Google Cast" component. Although, deleting that has made little difference. I've managed to connect to 3 out of the 6 today, after several restarts. Initially, only one was working.

Tailing the logs, it seems like the API isn't providing the IP for connecting to the device — I've attempted several restarts, and a clean install when upgrading google_home to 1.7.0. The IP is still returned as "none".

Interestingly, in the past, I had issues connecting via the configuration options through the Google Home app too, as if all traffic was blacklisted temporarily. That no longer seems to be the case, and GH can connect even when HA can't.

UPDATE
Literally after posting this, and re-enabling Google Cast in Home Assistant, all the devices have come back online, but haven't persisted through a restart. The original three have fallen back offline for google_home — Cast is fine. Weird…

@oxamafia
Copy link

oxamafia commented Feb 7, 2023

Hello
I saw that reloading the integration makes the sensors available again.
Maybe we can make an automation after x minutes of an HA restart, we reload the integration ?
I will try it

@oxamafia
Copy link

oxamafia commented Feb 7, 2023

I set the refresh to 10 and made an automation to reload the integration after 5 minutes with homeassistant.reload_config_entry and the ID of the integration… seems to be OK all my sensors are available. I keep you posted

@leikoilja
Copy link
Owner

Sounds a bit hacky, but considering that this integration is based on the reversed unofficial google API - we do what we have to do. Thanks for posting your idea @oxamafia.
Can you please also post here your automation config for the reference? :)

@oxamafia
Copy link

oxamafia commented Feb 8, 2023

Yes sure!
YAML of the automation

trigger:
  - platform: homeassistant
    event: start
condition: []
action:
  - delay:
      hours: 0
      minutes: 5
      seconds: 0
      milliseconds: 0
  - service: homeassistant.reload_config_entry
    data:
      entry_id: ID of Google home integration

To get the ID of GH integration : File Editor (or equivalent) > repertory config/.storage > file core.config_entries
Search for google_home and copy/paste the entry_id field

@oxamafia
Copy link

oxamafia commented Feb 8, 2023

After almost 24hrs (without reboot) all my sensors are still ok.
it’s not very « clean » and a bit hacky but it can solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants