Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hang during get #11

Closed
jerryneedell opened this issue Feb 22, 2019 · 17 comments
Closed

hang during get #11

jerryneedell opened this issue Feb 22, 2019 · 17 comments

Comments

@jerryneedell
Copy link
Contributor

Not much to go on, but I wanted to document this as well as I can:
I was running the cheerlights demo overnight. In the morning I noticed it was not updating and appeared to just be hung. I entered control-C ather REPL and this is the traceback:

Fetching json from https://api.thingspeak.com/channels/1417/feeds.json?results=1
Traceback (most recent call last):
  File "code.py", line 51, in <module>
  File "code.py", line 41, in <module>
  File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 114, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 229, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 215, in request
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 188, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 88, in readline
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 545, in socket_available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 288, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 285, in _wait_response_cmd
KeyboardInterrupt:

It appears to have been stuck in _waot_response_cmd()

That's all the information I have at this point.

@anecdata
Copy link
Member

anecdata commented Mar 4, 2019

I see a similar behavior. On PyPortal running with >100k free memory, I have a wifimanager .get that fails routinely after some time (always seems to be during the 2nd .get in a sequence of 2 - I'm just pulling short pieces of text from an API), requiring manual reload:

Traceback (most recent call last):
  File "main.py", line 192, in <module>
  File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 165, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 76, in connect
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 572, in socket_connect
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 503, in socket_open
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 261, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 173, in _wait_for_ready
RuntimeError: ESP32 not responding

If I can get it reproducible with a minimal demo, I'll post that.

@ladyada
Copy link
Member

ladyada commented Mar 4, 2019

@jerryneedell can you add a timeout? errors are fine - ESP32 sometimes goes off into a bad zone - but it should raise an error rather than hangning

@jerryneedell
Copy link
Contributor Author

I'll take a look at it -- timeout sounds like a good idea.

@jerryneedell
Copy link
Contributor Author

Still looking into just where this is hanging and where a timeout should be added. It takes a long time to reproduce ;-)

@anecdata
Copy link
Member

anecdata commented Apr 15, 2019

Only within the past week I've been getting hangs sometimes overnight on PyPortal (since updating to 4.0.0-beta6 or -beta7). esp.firmware_version reports 1.2.2 (NINA_W102_17_FEB_2018.bin I believe?) The following is after a run with 1,020 successful GETs (HTTP OK code 200), 8 408s (server timed out waiting for client - I've gotten some 408s regardless of version), and sporadic retries:

Traceback (most recent call last):
  File "main.py", line 315, in <module>
  File "main.py", line 270, in <module>
  File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request
  File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 194, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 90, in readline
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 545, in socket_available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 261, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 167, in _wait_for_ready
KeyboardInterrupt: 

I'll try to capture with debug enabled, and see if I can isolate anything.

@ladyada
Copy link
Member

ladyada commented Apr 15, 2019

the thing i dont get is that we have a timeout in taht code

    def _wait_for_ready(self):
        """Wait until the ready pin goes low"""
        if self._debug >= 3:
            print("Wait for ESP32 ready", end='')
        times = time.monotonic()
        while (time.monotonic() - times) < 10:  # wait up to 10 seconds
            if not self._ready.value: # we're ready!
                break
            if self._debug >= 3:
                print('.', end='')
                time.sleep(0.05)
        else:
            raise RuntimeError("ESP32 not responding")

which makes me wonder if eventually time.monotonic() stops working! can you set _debug to 3 and turn on REPL output so we can tell where it might be failing/hanging?

@anecdata
Copy link
Member

anecdata commented Apr 15, 2019

Done: debug=3 & I'm saving a large scrollback. I don't have the output anymore on the original hang, but if I recall correctly the raw time.montonic was approx. 40,000+ seconds, and the program run time was approx. 20,000+ seconds. (The PyPortal code does also occasionally restart itself when I'm not looking, but I assume that's unrelated.)

@anecdata
Copy link
Member

anecdata commented Apr 15, 2019

Oddly, now I'm routinely getting memory errors I haven't seen before (no other code changes, other than debug):
.Parameter #0 length is 171 Traceback (most recent call last): File "main.py", line 316, in <module> File "main.py", line 271, in <module> File "adafruit_esp32spi/adafruit_esp32spi_wifimanager.py", line 110, in get File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 194, in request File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 92, in readline File "adafruit_esp32spi/adafruit_esp32spi.py", line 559, in socket_read File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response File "adafruit_esp32spi/adafruit_esp32spi.py", line 288, in _wait_response_cmd File "adafruit_esp32spi/adafruit_esp32spi.py", line 286, in _wait_response_cmd File "adafruit_esp32spi/adafruit_esp32spi.py", line 240, in _read_bytes File "adafruit_esp32spi/adafruit_esp32spi.py", line 240, in <listcomp> MemoryError: memory allocation failed, allocating 1024 bytes
Which is odd since the available memory prior to the last HTTP call was 63792b
Something due to debug?? I don't think I'll catch the hang if I have to restart often.

ADDENDUM: Ignore this comment for now, doing more testing. esp._debug=3 seems to trigger a lot of HTTP 408 errors (presumably slowing down the client). That and/or serial may be factors in the memory issue. Trying esp._debug=2, looking better so far.

ADDENDUM 2: Ran for days w/o incident, likely a fluke that it happened a couple of times close together. I have identical code now running on PyPortal, Feather M4, and ItsyBitsy M4, so will keep an eye out.

@ZachNo
Copy link
Contributor

ZachNo commented May 5, 2019

I fixed it hanging on my end by adding the timeout in readline.

It appeared to hang in the while b'\r\n' not in self._buffer: while loop because the value returned from socket_available is always zero. (Most likely a transient network error not sending a response?)

After adding a max tries to that while loop, and throwing an exception after that threshold my PyPortal now seems to run perfectly.

@ladyada
Copy link
Member

ladyada commented May 5, 2019

@ZachNo more timeouts are better, if you have a PR for a timeout addition please submit!

@caternuson
Copy link

Was #44 suppose to fix this? Looks like there is some history here. I've only just recently run into this. It appears that only fixed read, not readline? I'm having issue with getting stuck here also:
https://github.com/adafruit/Adafruit_CircuitPython_ESP32SPI/blob/master/adafruit_esp32spi/adafruit_esp32spi_socket.py#L90
when the response is empty.

@caternuson
Copy link

I've been running with the PR44 version of the lib and it seems to have fixed this.

@jerryneedell Have you had a chance to test as well?

@ladyada
Copy link
Member

ladyada commented May 28, 2019

@caternuson maybe we should do a release? a couple of these were fixed

@jerryneedell
Copy link
Contributor Author

I did try #44 and had no problems with it.

@caternuson
Copy link

@ladyada yep, i'd say so.
@jerryneedell do you think this issue is resolved?

@jerryneedell
Copy link
Contributor Author

jerryneedell commented May 28, 2019

@caternuson Yes, I think so. If it recurs we can always file a new issue.

@caternuson
Copy link

cool. thanks. or reopen this one, closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants