Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent pyhtcc.pyhtcc.UnauthorizedError #14

Open
cjkrolak opened this issue Sep 27, 2023 · 21 comments
Open

intermittent pyhtcc.pyhtcc.UnauthorizedError #14

cjkrolak opened this issue Sep 27, 2023 · 21 comments

Comments

@cjkrolak
Copy link

I have been seeing intermittent failures for pyhtcc.pyhtcc.UnauthorizedError since July 2023, maybe an artifact of changes made in 0.1.53. Traceback is below:

Traceback (most recent call last):
  File "/home/pi/github/ThermostatSupervisor/thermostatsupervisor/honeywell.py", line 218, in get_zones_info
    return_val = super().get_zones_info()
  File "/home/pi/.local/lib/python3.9/site-packages/pyhtcc/pyhtcc.py", line 684, in get_zones_info
    data = self._post_zone_list_data(page_num)
  File "/home/pi/.local/lib/python3.9/site-packages/pyhtcc/pyhtcc.py", line 618, in _post_zone_list_data
    return self._request_json(
  File "/home/pi/.local/lib/python3.9/site-packages/pyhtcc/pyhtcc.py", line 669, in _request_json
    raise UnauthorizedError("Got unauthorized response from server")
pyhtcc.pyhtcc.UnauthorizedError: Got unauthorized response from server

Sometimes the error is preceeded (triggered?) by one or more pyhtcc.pyhtcc.NoZonesFoundError exceptions (see below).

Traceback (most recent call last):
  File "/home/pi/github/ThermostatSupervisor/thermostatsupervisor/honeywell.py", line 218, in get_zones_info
    return_val = super().get_zones_info()
  File "/home/pi/.local/lib/python3.9/site-packages/pyhtcc/pyhtcc.py", line 686, in get_zones_info
    raise NoZonesFoundError("No zones were found from GetZoneListData")
pyhtcc.pyhtcc.NoZonesFoundError: No zones were found from GetZoneListData

Is anyone else seeing these errors?

@csm10495
Copy link
Owner

csm10495 commented Sep 27, 2023

How long were you logged in with the same session/object before it happened? I wonder if the token times out or something.
My usage runs every minute to check / update the thermostats as needed. On each minute it gets a fresh PyHtcc object. I don't have this error in any logs for the last month or so.

@cjkrolak
Copy link
Author

I re-login once every 24 hours. I poll the TCC website once every 10 minutes between logins. This failure happened on the 4th day of running, approximately 14 hours since the last login. Do you re-login every minute?

@csm10495
Copy link
Owner

csm10495 commented Sep 27, 2023

Yep i login every minute (via getting a fresh obj)

(See next comment)

@csm10495
Copy link
Owner

Deleting the comment i just made since its wrong. In the worst case, i would login every minute. I actually only login if something changed from the control-side. If i change something via the webui or manually, it wouldn't know so it wouldn't login again until a planned schedule change happens.

In total i actually login only a few times a day.

@cjkrolak
Copy link
Author

OK, I'll reduce my re-connect time from 24 hours to 8 hours and see if that resolves the issue. As I noted above, this is new behavior or a new failure message since about July of this year, so something must have changed on the TCC server side or the manifestation of this failure with the python code.

@csm10495
Copy link
Owner

Hopefully it helps out.

@cjkrolak
Copy link
Author

updated, it failed again today, 6.3 hours since last login, I'm halving the login interval again to see if it helps.

@cjkrolak
Copy link
Author

I don't think the login interval is going to fix my issue, I've reduced it down to 4.0 hours in between logins and still see the authorization errors, in the last run it occurred 1.2 hours after the last login. How often are you polling the TCC site? That's the next knob I will experiment with. I'm currently polling every 10 minutes.

@csm10495
Copy link
Owner

I poll on the temperature change in the schedule.

So I have a Google Sheet with times and temperatures. It polls that sheet every minute. If the time passes a timestamp on the sheet, it sends commands to match what that interval says and with that creates a new PyHTCC object to do the updates then moves on.

A special case is it skips updating if the last timestamp event is the same temperatures as this one (it caches that locally).

In total it logs in maybe 6 times a day max in the worst case with a lot of temperature changes. On each of those logins it does it's actions all within a minute before throwing out the object.

This allows me to manually change the thermostats and know that eventually the schedule will kick in again but not immediately overwrite what I did manually.

@cjkrolak
Copy link
Author

@csm10495 so are you only hitting the TCC site about 6 times a day? In my case I'm hitting the TCC site every 10 minutes (144 times a day).

@csm10495
Copy link
Owner

Yep.

For fun I pickled a PyHTCC object and left it for a few days. Upon trying it again today, it still works too. No auth error or anything. I figured it would fail after leaving it stale for some time.

@cjkrolak
Copy link
Author

anyone know how often a Honeywell thermostat talks to the TCC web site? I'm wondering if I'm hitting TCC too often with my script and triggering the AuthErrors as some type of DDOS defense

@csm10495
Copy link
Owner

No idea. I'm guessing it long polls or something like that.

For fun I'm running:

for i in range(250): 
    print(i, z.set_permanent_cool_setpoint(77))

and they all passed. So I didn't hit anything at least yet.

@csm10495
Copy link
Owner

Did the same thing 1000 times, to no issue either. I figured that would hit something but I guess not.

@cjkrolak
Copy link
Author

thanks for the data, to precisely replicate my issue please try calling get_indoor_temperature_raw() on at a 5 or 10 minute interval. It's a slightly different POST message than the one you tried. For me it's been failing within 2000 calls.

@csm10495
Copy link
Owner

I left something running overnight to see if I'd hit the same issue. I hit a network issue at some point:

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

but nothing like you're hitting. I'll leave it longer and see if something happens.

Though if it does, not sure how much we'll be able to do about it at the base level. We'll see.

@csm10495
Copy link
Owner

csm10495 commented Oct 18, 2023

After 245 iterations (at 5 min apart), got a:

UnexpectedError: Expected json data in the response

It was a 200, though was an html response instead of json. Inside the html we have:

    <div class="crumbs">Unexpected Error</div>

So i guess it worked till then, then failed with an unrelated error. I'll try again and ignore UnexpectedError to see what happens

@cjkrolak
Copy link
Author

After 245 iterations (at 5 min apart), got a:

UnexpectedError: Expected json data in the response

It was a 200, though was an html response instead of json. Inside the html we have:

    <div class="crumbs">Unexpected Error</div>

So i guess it worked till then, then failed with an unrelated error. I'll try again and ignore UnexpectedError to see what happens

I catch that intermittent error and a few others and try to recover by re-authenticating using this overridden function:

def get_zones_info(self) -> list:
        """
        Return a list of dicts corresponding with each one corresponding to a
        particular zone.

        Method overridden from base class to add exception handling.
        inputs:
            None
        returns:
            list of zone info.
        """
        return_val = []
        try:
            return_val = super().get_zones_info()
        except (pyhtcc.requests.exceptions.ConnectionError,
                pyhtcc.pyhtcc.UnexpectedError,
                pyhtcc.pyhtcc.NoZonesFoundError,
                pyhtcc.pyhtcc.UnauthorizedError,
                ) as ex:
            # force re-authenticating
            tc.connection_ok = False
            tc.connection_fail_cnt += 1
            print(traceback.format_exc())
            print(f"{util.get_function_name()}: WARNING: {ex}")

            # warning email
            time_now = (datetime.datetime.now().
                        strftime("%Y-%m-%d %H:%M:%S"))
            email_notification.send_email_alert(
                subject=(f"{self.thermostat_type} zone "
                         f"{self.zone_name}: "
                         "intermittent error "
                         "during get_zones_info()"),
                body=(f"{util.get_function_name()}: trial "
                      f"{tc.connection_fail_cnt} of "
                      f"{tc.max_connection_fail_cnt} at "
                      f"{time_now}\n{traceback.format_exc()}")
                )

            # exhausted retries, raise exception
            if tc.connection_fail_cnt > tc.max_connection_fail_cnt:
                raise ex
        else:
            # good response
            tc.connection_ok = True
            tc.connection_fail_cnt = 0  # reset

        return return_val

@csm10495
Copy link
Owner

csm10495 commented Oct 19, 2023

This time at 264 iterations (22 ish hours) i hit the issue you get.

html body matches up:

<body>\r\n<div id="header"><h1>Server Error</h1></div>\r\n<div id="content">\r\n <div class="content-container"><fieldset>\r\n  <h2>401 - Unauthorized: Access is denied due to invalid credentials.</h2>\r\n  <h3>You do not have permission to view this directory or page using the credentials that you supplied.</h3>\r\n </fieldset></div>\r\n</div>\r\n</body>

Maybe the solution is to just re-authenticate on a 401. Going to try some more experiments.

Edit, forget to mention: Nothing useful in the headers/cookies either.

@csm10495
Copy link
Owner

Another time at 116 iterations, I got a NoZonesFoundError. The body has this:

<body bgcolor="white">\r\n\r\n            <span><H1>Server Error in \'/portal\' Application.<hr width=100% size=1 color=silver></H1>\r\n\r\n            <h2> <i>Runtime Error</i> </h2></span>\r\n\r\n            <font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">\r\n\r\n            <b> Description: </b>An exception occurred while processing your request. Additionally, another exception occurred while executing the custom error page for the first exception. The request has been terminated.\r\n            <br><br>\r\n\r\n    </body>

With a 500 status code.

.. which is kind of funny. Still nothing interesting in cookies/headers.

I could sort of see a case where maybe we retry on 500s since it denotes that the server side had an error, but since it isn't a 4XX that our request would still be likely valid.

I'll do some more testing.

@csm10495
Copy link
Owner

csm10495 commented Oct 24, 2023

I started writing a way of handling retries. Then I hit an issue with it.. tried fixing it and realized I didn't like the way I was doing it. I'm going to come back to this in a bit and probably use the backoff module to help instead of writing something custom.

(My way worked for around 800 5-minute-separated requests.. then failed in a way I didn't think of that invalidated the approach. Before that I was failing around 200.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants