Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32-S3 ADC use causes crashes when WiFi in use #9291

Closed
Timeline8 opened this issue Jun 1, 2024 · 74 comments · Fixed by #9325
Closed

ESP32-S3 ADC use causes crashes when WiFi in use #9291

Timeline8 opened this issue Jun 1, 2024 · 74 comments · Fixed by #9325
Assignees
Milestone

Comments

@Timeline8
Copy link

Timeline8 commented Jun 1, 2024

CircuitPython version

1) Adafruit CircuitPython 9.0.5 on 2024-05-22; Waveshare ESP32-S3-Zero with ESP32S3
2) Adafruit CircuitPython 9.1.0-beta.3 on 2024-05-22; Waveshare ESP32-S3-Zero with ESP32S3
3) Adafruit CircuitPython 9.0.4 on 2024-04-16; Adafruit Feather ESP32-S3 TFT with ESP32S3

Code/REPL

import gc
import time
import board
import neopixel
from rainbowio import colorwheel
from adafruit_thermistor import Thermistor

led = neopixel.NeoPixel(board.NEOPIXEL, 1, brightness=0.1)

# Setup thermistor for readings
therm7 = Thermistor(board.A0, 10000, 10000, 25, 3695, high_side=True)  # pin, resistor, nom_thermistor, nom_temp
therm8 = Thermistor(board.A1, 10000, 10000, 25, 3695, high_side=True)

def get_average_temp(pin):
    readings = []

    for _ in range(5):
        reading = pin.temperature
        readings.append(reading)
        time.sleep(0.02)  # 20ms delay between readings

    average_temp_c = sum(readings) / len(readings)  # Average the C reading
    average_temp_f = (average_temp_c * 1.9) + 32  # Convert the Ave C Reading to F

    return average_temp_c, average_temp_f


# main loop
count = 0
while True:
    count += 1
    print(count, f"{gc.mem_alloc()=}")

    average_temp_c, average_temp_f = get_average_temp(therm7)
    print(
        f"   Average therm7 Reading : {average_temp_c:.0f}\u00b0C {average_temp_f:.0f}\u00b0F"
    )

    average_temp_c, average_temp_f = get_average_temp(therm8)
    print(
        f"   Average therm8 Reading : {average_temp_c:.0f}\u00b0C {average_temp_f:.0f}\u00b0F\n\n"
    )

    for x in range(3):
        led.fill(colorwheel((time.monotonic() * 50) % 255))  # change Neopixel color
        time.sleep(1)

Behavior

Various failures but usually crashes share in common: MU pops up “Could not find an attached drive”, Mac OS pops up “Disk Not Ejected Properly”, MU of course has closed the serial window so nothing to see. Printing gc.mem_allocat() with each loop in my code shows allocated memorial in the 4000-8000 range so no apparent run away memory issues.

Sometimes the board will disconnect, come back, code stays running but the Neopixel is steady white like it is in the REPL. Other times it crashed with 3x yellow blinking (Safe mode) and reports an internal watchdog timer expired.

I have an S2 board that is on 9.0.4 and has been running this code for many weeks and sending the data to an IO feed. No chronic crashed like the S3 boards.

Description

What follows is the long list of notes I have been taking as I tried different things. But the above, in behavior, is the executive summary. Below is tedious reading. Sorry...

Testing notes:

Waveshare ESP32-S3 Zero running 9.0.5 and libraries updated via Circup is starting with the “code chooser” code discussed here https://forums.adafruit.com/viewtopic.php?t=210926 starting with the 6th post down.

Code I am running (“choosing”) is a dual thermistor reading in a roughly 3+ second long loop that reads two thermistors and then changes the color of the Neopixel 3 times once per second.

Crashes share in common: MU pops up “Count not find an attached drive”, Mac OS pops up “Disk Not Ejected Properly”, MU of course has closed the serial window so nothing to see. Printing gc.mem_allocat() with each loop in my code shows allocated memorial in the 4000-8000 range so no apparent run away memory issues.

First time I had no display configured so I could not see an error and the Neopixel wasn’t indicating any activity. Added code to display serial window on external display. Reset board with reset button. I failed to note if the drive had reloaded itself after the crash and before reseting the board.

Second time same “crash”. Observed the Neopixel to be constant white indicating it was in the REPL, however the external display showed the code was still running and getting valid thermistor readings. Crashed happened around 290-300 loops. CIRCUITPU remounted its drive I believe but not certain. I let it run for a while longer then reset the board with the reset button.

Third time crashed at loop 272, this time stopping and Neopixel flashing yellow in three blink bursts (safe mode). Reopening MU serial window, failed due to ”Internal watchdog timer expired.” Noted for sure that the CIRCUITPY drive had remounted. Ejected drive and power cycled the board by unplugging USB cable.

Fourth time crashed at loop 233 (gc.mem_alloc at 5568). Same as third run with code stopped, three yellow flashes, and “Internal watchdog timer expired” in the reopened MU serial window.

Switching gears… Renamed the “code chooser” program from code.py and made my thermistor code code.py so it will load and run directly without the chooser reseting the MCU. Also power cycle reset the board.

Different type of crash this time. At loop 53 (mem = 5168). Drive did not unmount and the error in the REPL is

Traceback (most recent call last):
File "code.py", line 66, in
File "code.py", line 46, in get_average_temp
File "adafruit_thermistor.py", line 126, in temperature
File "adafruit_thermistor.py", line 116, in resistance
ZeroDivisionError: division by zero

Odd. Normally I use 10k resistors with my 10k thermistor but this time I only had 1k resistors on hand. But I wouldn’t think that should matter. Source code for the library doesn’t indicate any restrictions on the resistor range. I believe this failure is just a result of random values when no thermistor is attached.

Ran again and it made it to run 72 but same divide by zero error. Switched to 10k resistors. Hard reset. Made it to run 38, with the previously described crash scenario (disk eject & reconnect, safe mode with an “Internal watchdog timer expired” error) is back. Done for the night!

Next day. Backed up entire Waveshare CIRCUITPY drive. Ran one more time as is. Crashed with the Neopixel showing steady white (REPL indicator) but code was still running. MU and Mac OS both reported drive ejected. Board did not remount and MU doesn’t see it.

Adafruit REV TFT S2 Feather. Copied over all the files that were on the Waveshare. Also verified 9.0.5 and ran Circus to verify all libraries were up to date (all were). Commented out all code that had anything to do with the external display. Thermistors on breadboard changed from D6 and D7 to A0 and A1. No failures after a few hours.

Switched back to Waveshare and ran as is. Eventually failed with the REPL white neopixel, ejected disk, but kept running. Drive did not remount. Did full reinstall of boot loader then 9.0.5. Copied over backed up files onto the MCU again. Hard power cycle reset. Restarted code. Crashed at cycle 288, 3 yellow blink safe mode and “Internal watchdog timer expired” and drive remounted.

Commented out all thermistor stuff and just ran the neopixel and gc memory allocation. Ran 13908 loops without issue (over 12 hours). Uncommented thermistor code and restarted the run (hard reset). Made it 457 loops (a little over 20 minutes) and crashed with the board disconnecting and the TFT fade to black and back in about 3 second pulses.

Restarted as is after getting home from work. Got to about 275, white NeoPixel, still running code, and disconnected. Moved it to a power supply connection only (not computer) and restarted. Looks like it crashed the same way with white Neopixel and code still displaying new lines.

Copied same drive contents to S3 TFT Feather running 9.0.4 and started it on the computer (no thermistors connected). S3 TFT Feather crashed, disconnected, reconnected and reports Safe Mode for Internal Watchdog timer expired. Restarted S3 TFT Feather. Dies same way.

Additional information

No response

@Timeline8 Timeline8 added the bug label Jun 1, 2024
@dhalbert
Copy link
Collaborator

dhalbert commented Jun 1, 2024

adafruit_thermistor is quite simple. It uses analogio.AnalogIn. I'm thinking there is an ADC problem.

@Timeline8
Copy link
Author

Timeline8 commented Jun 1, 2024

Additional observation this morning. Put the S3 TFT feather on to a breadboard so I could hook up physical thermistors. Only change to the code is the resistors I used are in a 1k SIP, so the setup code for the two thermistors changed from 10k to 1k. Crashed in the 194th loop with a divide by zero error that I have been seeing periodically. If I simply try a CTRL-D or a Save in MU to soft boot the code immediately crashes on the first loop. Repeated CTRL-D multiple times in a row to verify. Restarting the code via soft boot does not clear out of memory whatever is causing this failure. Hitting the boards RST button does restart the code properly where it went over 400 loops then went to safe mode with the internal watchdog timer expired error. I have pasted the REPL below of the div by zero message in the 194th loop and then a subsequent Save (soft boot) in MU.

I did notice on my S2 running this code (also have it running on a Pico W without issue), I was taking samples for my averaging function every 50ms rather than the 20ms posted here. Changed my S3 TFT feather to 50ms, but still crashes with the WD timer expired Safe Made message.

194 gc.mem_alloc()=5504
   Average therm7 Reading : 23°C 75°F
Traceback (most recent call last):
  File "code.py", line 40, in <module>
  File "code.py", line 19, in get_average_temp
  File "adafruit_thermistor.py", line 126, in temperature
  File "adafruit_thermistor.py", line 116, in resistance
ZeroDivisionError: division by zero

Code done running.

Press any key to enter the REPL. Use CTRL-D to reload.

Adafruit CircuitPython 9.0.4 on 2024-04-16; Adafruit Feather ESP32-S3 TFT with ESP32S3
>>> [D
... 
>>> 
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
1 gc.mem_alloc()=4768
Traceback (most recent call last):
  File "code.py", line 35, in <module>
  File "code.py", line 19, in get_average_temp
  File "adafruit_thermistor.py", line 126, in temperature
  File "adafruit_thermistor.py", line 116, in resistance
ZeroDivisionError: division by zero

Code done running.

@jepler
Copy link
Member

jepler commented Jun 1, 2024

The division by zero error is:

        if self.high_side:
            # Thermistor connected from analog input to high logic level.
            reading = self.pin.value / 64
            reading = (1023 * self.series_resistor) / reading

if the analog pin value is 0, then it divides by 0.

@Timeline8
Copy link
Author

Which is what I thought might be happening when I was running with no hardware connected to the pins where random noise at the open pins could result in a zero reading. This is why I then made sure to add the thermistors & resistors this morning so there would be no way there should be 0V at the pin and I still experienced the divide by zero fault. It ran the loop 193 times with valid room temperature mid/upper 70°Fs readings for both thermistors then 3 seconds later on loop 194 read the first thermistor then failed reading the second thermistor. Doesn't seem like a hardware problem with my simple circuit

But that is kind of besides the point. I can not get this code to run for any length of time without some sort of crash on an ESP32-S3 where as the same code on an S2 and on a Pico W run none stop day in and day out for weeks now. With the S3 usually it is going into safe mode with an internal WD timer expired failure with the board disconnecting and sometimes reconnecting, sometimes the divide by zero, and still other times with the board disconnecting with the Neopixel going steady white but the code still running where I can see the measurements being reported every 3 seconds on the TFT but obviously the neopixel part of the code doing nothing anymore.

One last test. I removed my thermistors and replaced them with a second 1k SIP so that the A0 and A1 pins are getting a straight fixed resistor voltage divider verified with my meter that both pin are at 1.65V. Reran code as is (so temp reported is 94°C 211°F since I didn't change the thermistor line from 10k@25°C). It managed to run over a half hour before it bombed out with the "Internal watchdog timer expired." safe mode message. At least no divide by zero.

@dhalbert dhalbert added this to the 9.x.x milestone Jun 2, 2024
@dhalbert
Copy link
Collaborator

dhalbert commented Jun 3, 2024

Just to be clear, you are getting things like the watchdog crash or divide by zero on 9.1.0-beta.3 as well as 9.0.x, right? We moved to a new version of ESP-IDF for 9.1.0, and I want to make sure that version still has the issue.

@Timeline8
Copy link
Author

Timeline8 commented Jun 4, 2024

I am fairly sure I did but re-reading my tedious notes above, I see I don't have an entry stating that. However I did note I ran the beta in the forum discussion. But I will double check tonight as the Waveshare should still be running the 9.1.0-beta.3. On the Adafruit S3 TFT I only ran that one on 9.0.4 as I thought it might be the version on the two S2s I have running this code for months without issue but checking one of the S2s, it is at 9.0.3. I can upgrade the S3 TFT tonight to 9.1.0-beta.3 and retest it as well. And I got a notice yesterday that my backorder from Digikey for the QtPy S3 I ordered has shipped, so when I get it I will load 9.1.0-beta.3 on that one as well and see how it acts (my S3 REV TFT is sadly still on backorder). I will report everything I find.

Do you think there is any value loading 9.0.3 onto any of the S3 boards since the S2 boards run fine with it? Or do you think this is an S3 specific issue and we should only be looking upward and onward with current versions only?

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 4, 2024

Do you think there is any value loading 9.0.3 onto any of the S3 boards since the S2 boards run fine with it? Or do you think this is an S3 specific issue and we should only be looking upward and onward with current versions only?

Based on your testing, I think this is an S3-specific issue. S3 with 9.1.0-beta.3 is the only test still to do, I would say. If that's a fix, great, otherwise I will set up an S3 board and let it run for hours.

@Timeline8
Copy link
Author

Timeline8 commented Jun 5, 2024

2024-06-04

Waveshare - verified it was still on the beta that I last ran it. From boot_out.txt…
Adafruit CircuitPython 9.1.0-beta.3 on 2024-05-22; Waveshare ESP32-S3-Zero with ESP32S3

Running with TFT to see code running, using 2x 1k SIP packages to create voltage divider at pins D7 & D8 to simulate the thermistors and verified with a meter that each pin had 1.64V at each pin while running..

Looped just over 100 times (~ 5 minutes) before the board disconnected from my iMac, MU and MacOS both reported disk ejection, Neopixel changed to steady white, and board did not reconnect (MU serial window closed and icon show no board connected). However the TFT shows the code continuing to run normally and as I type this is at loop 180.


S3 TFT - Ran “circup update --all” first to get the suggested link for the newest version of CP and also updated the libraries. Downloaded the beta and loaded it. From boot_out.txt…
Adafruit CircuitPython 9.1.0-beta.3 on 2024-05-22; Adafruit Feather ESP32-S3 TFT with ESP32S3

Hard reset board (pulled USB cable). Reduced code from the Waveshare because the external TFT code not required. Same 2x 1k SIP resistors to create resistor divider voltage at A0 and A1. Verified 1.57V at each pin while running.

Ran for 45+ minutes and last I checked it was over 700 loops. I came back a little later and it had disconnected, but had then reconnected and restarted. The loop was up to 170+ on the TFT. Neopixel was acting normally per the code.

CTRL-C, ejected board, power cycled it, and restarted. So far this second run it is behaving itself and has made it over an hour and at loop 1385.

@Timeline8
Copy link
Author

And to add to the end of the last post, the S3 feather sometween about 11pm and 2am disconnected and reconnected 4 times and finally stopped in Safe Model with an "Internal watchdog timer expired".

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 5, 2024

On either board, do you have a settings.toml with CIRCUITPY_WIFI_SSID and CIRCUITPY_WIFI_PASSWORD? That will connect to the wifi network. In other words, wifi will be active even if not mentioned in the test program.

It certainly sounds like the boards are hard-crashing resetting spontaneously. I will try a very simple test that simply reads as fast as possible from the ADC.

@Timeline8
Copy link
Author

Yes, I still have the WiFi settings auto connecting like that. We did talk about disabling that over at the forums but I don't think I tried it. So that is now on my to-do list for tonight: Run both the Waveshare S3 and Feather TFT S3 without the .toml file.

If there is a conflict between the two (ADC and WiFi), that will be disappointing for me since I need both for my application, so I won't be able to simply not use WiFi. But the problem can't be fixed later if we don't narrow it down to a root cause, so I will try it sans WiFi.

Also just received my QtPy S3 today. So if the other two crash relatively quickly with the WiFi off, I can try the QtPy. While I wouldn't think the model board matters as they are all S3, I have noticed that the Waveshare is good about crashing sooner while the Feather likes to wait until later.

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 5, 2024

I was just testing on a QT Py ESP32-S3, with the very simple test program below. With CIRCUITPY_WIFI_SSID and CIRCUITPY_WIFI_PASSWORD commented out in settings.toml, it runs indefinitely, with millions of conversions. With them not commented out (so web set up in advance) , it crashes hard almost immediately. I am using board.A2, which uses ADC1 on the QT Py S3.

So now I know how to reproduce this. It is strange because ADC2 is supposed to be shared with WiFi, not ADC1. But maybe something is interrupting the conversion in some bad way.

No need for you to test further at this point. Thanks for persevering through this.

Test program with two 1kohm resistors forming a 3.3/2 voltage divider connected to pin A2:

import analogio
import board

a2 = analogio.AnalogIn(board.A2)

count = 0
while True:
    count += 1
    if count % 100000 == 0:
        print("count", count)
    v = a2.value
    if v < 32000 or v > 33000:
        print(count, v)

@dhalbert dhalbert modified the milestones: 9.x.x, 9.1.0 Jun 5, 2024
@dhalbert
Copy link
Collaborator

dhalbert commented Jun 5, 2024

Testing with A0, which is an ADC2 pin, I also get crashes rather quickly, and sometimes get safe-mode "Internal watchdog timer expired".

@dhalbert dhalbert self-assigned this Jun 5, 2024
@dhalbert dhalbert changed the title ESP32-S3 problem with adafruit_thermistor.mpy library ESP32-S3 ADC use causes crashes when WiFi in use Jun 5, 2024
@Timeline8
Copy link
Author

Hi Dan, Thank you very much for confirming this. Since I first reported this on the forums and after days of replies there and here, since no one seemed to be interesting in actually running the code to see if it could be reproduced, I was starting to wonder if it was me and everyone was just being polite by not telling me I'm the idiot. ;)

You and I actually had a conversation on the forums on the Wifi vs ADC about two months ago because I had read about the possible interference between the two and was concerned if I should be making a point to use ADC1 to avoid that. Regardless, it looks like your testing shows this is a different issue since you got the crash on both ADC1 & 2.

I guess for now I will have to proceed with my projects targeting S2 boards. The project I am slowly developing as I learn CP, and add features to as I go, depends on both ADC and WiFi for IO Feeds.

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 6, 2024

@Timeline8 Rest assured we were interested, but if we can delegate some testing to eliminate possibilities, then we try that. (We have all too many bugs to look at 🙂). For instance, was displayio involved or not? And I was hoping it was really fixed in 9.1.0-beta.3, since we'd upgraded the underlying Espressif software (ESP-IDF) in that release. I also spent time looking for similar reports in the ESP-IDF repo, but could not find any.

I hope we can figure this out soon, because broken ADC's when wifi is in use is a pretty serious limitation. As your testing indicates, the S2 boards could be a substitute. If you are interested in more precise temperature readings, then you could use external I2C ADC breakout board. If you are measuring ambient air temperature (not liquids), then one of the I2C temperature breakouts (there are many) could be used. But the easiest would be to fix ESP32-S3, of course.

@bill88t
Copy link

bill88t commented Jun 6, 2024

I just ran into this with the cardputer.
Spamming the adc with wifi connected produces instant resets, watchdog safemodes and hangups.

I just finished the battery driver, which uses IO10 of the ESP32-S3.
When connected to a network, after about 30 seconds it does one of the following:

  1. Resets (NO SAFEMODE, actual reset).
  2. hangs up (neopixel went white, not a color I use, and stayed there, unresponsive).
  3. Watchdog safemode.

The polling rate was 30 samples/s.

The code accounts for None reads.

@bill88t
Copy link

bill88t commented Jun 6, 2024

Attempted this patch, which reduces the points of failure:

--- a/ports/espressif/common-hal/analogio/AnalogIn.c
+++ b/ports/espressif/common-hal/analogio/AnalogIn.c
@@ -115,10 +115,10 @@ uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) {
     #endif
 
     uint32_t adc_reading = 0;
-    size_t sample_count = 0;
+    int sample_count = 0;
     // Multisampling
     esp_err_t ret = ESP_OK;
-    for (int i = 0; i < NO_OF_SAMPLES; i++) {
+    while (sample_count < NO_OF_SAMPLES) {
         int raw;
         ret = adc_oneshot_read(adc_handle, channel, &raw);
         if (ret != ESP_OK) {
@@ -127,9 +127,6 @@ uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) {
         adc_reading += raw;
         sample_count += 1;
     }
-    if (sample_count == 0) {
-        raise_esp_error(ret);
-    }
     adc_reading /= sample_count;
 
     // This corrects non-linear regions of the ADC range with a LUT, so it's a better reading than raw

It didn't work. I think the error is in esp-idf.

I will attempt to create a minimal esp-idf application implementing oneshot adc and wifi.

@bill88t
Copy link

bill88t commented Jun 6, 2024

Yep, fun stuff:

I (21616) EXAMPLE: ADC1 Channel[3] Raw Data: 562
I (21616) EXAMPLE: ADC1 Channel[3] Cali Voltage: 481 mV
I (21626) EXAMPLE: ADC2 Channel[0] Raw Data: 537
E (21626) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (21626) task_wdt:  - IDLE0 (CPU 0)
E (21626) task_wdt: Tasks currently running:
E (21626) task_wdt: CPU 0: main
E (21626) task_wdt: CPU 1: IDLE1
E (21626) task_wdt: Print CPU 0 (current core) backtrace


Backtrace: 0x4200B48F:0x3FC93680 0x4200B8AC:0x3FC936A0 0x40377331:0x3FC936D0 0x42007582:0x3FC992B0 0x42008447:0x3FC992E0 0x42006FB1:0x3FC99300 0x42006B22:0x3FC99320 0x4200F58A:0x3FC99340 0x4200EDF5:0x3FC99360 0x4200EE52:0x3FC99380 0x4200F491:0x3FC993B0 0x420134C3:0x3FC993E0 0x42012E86:0x3FC99400 0x4200F601:0x3FC99720 0x4201B431:0x3FC99750 0x403804C9:0x3FC99780 0x42008C26:0x3FC997D0 0x4201AC97:0x3FC99840 0x4037AD1D:0x3FC99870

0;32mI (21626) EXAMPLE: ADC2 Channel[0] Cali Voltage: 467 mV
I (21706) EXAMPLE: ADC1 Channel[2] Raw Data: 732
I (21716) EXAMPLE: ADC1 Channel[2] Cali Voltage: 620 mV
I (21716) EXAMPLE: ADC1 Channel[3] Raw Data: 543

Stock idf 5.2.1, untouched oneshot adc example..
I didn't even setup wifi in it..
The nvs partition may have connected it automatically, idk.

@jepler
Copy link
Member

jepler commented Jun 6, 2024

possibly related? espressif/esp-idf#12466

@bill88t
Copy link

bill88t commented Jun 6, 2024

Using a YD-ESP32-S3 which is a dual usb-C board for this, which is excellent for debugging and built some debug builds.

import wifi, board, analogio;wifi.radio.connect("SSID", "PASSWD");a=analogio.AnalogIn(board.GPIO10)
while True:
    a.value

This reliably crashes it. Maximum 15s.

Decoded backtrace of debug build (clean tree, current master):

0x4037cc7a: ram_chip_i2c_readReg at ??:?
0x40378aa0: regi2c_ctrl_write_reg_mask at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/esp_hw_support/regi2c_ctrl.c:46
0x420a6761: adc_ll_calibration_init at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/hal/esp32s3/include/hal/adc_ll.h:790
 (inlined by) adc_hal_calibration_init at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/hal/adc_hal_common.c:92
0x420a09db: adc_oneshot_read at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/esp_adc/adc_oneshot.c:174
0x42042937: common_hal_analogio_analogin_get_value at /home/bill88t/git/circuitpython/ports/espressif/common-hal/analogio/AnalogIn.c:123
0x42039455: analogio_analogin_obj_get_value at /home/bill88t/git/circuitpython/ports/espressif/../../shared-bindings/analogio/AnalogIn.c:101
0x42014f0a: fun_builtin_1_call at /home/bill88t/git/circuitpython/ports/espressif/../../py/objfun.c:68
0x4200eaf5: mp_call_function_n_kw at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:725
0x4200ecdf: mp_convert_member_lookup at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:1183
0x4200ee19: mp_load_method_maybe at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:1253
0x4200ee2e: mp_load_method at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:1262
0x4200eeed: mp_load_attr at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:1071
0x4201f549: mp_execute_bytecode at /home/bill88t/git/circuitpython/ports/espressif/../../py/vm.c:437
0x420150e1: fun_bc_call at /home/bill88t/git/circuitpython/ports/espressif/../../py/objfun.c:273
0x4200eaf5: mp_call_function_n_kw at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:725
0x4200eb0a: mp_call_function_0 at /home/bill88t/git/circuitpython/ports/espressif/../../py/runtime.c:699
0x4206e056: parse_compile_execute at /home/bill88t/git/circuitpython/ports/espressif/../../shared/runtime/pyexec.c:152
0x4206e45d: pyexec_friendly_repl at /home/bill88t/git/circuitpython/ports/espressif/../../shared/runtime/pyexec.c:748
0x4202491b: run_repl at /home/bill88t/git/circuitpython/ports/espressif/../../main.c:946
0x42024f67: main at /home/bill88t/git/circuitpython/ports/espressif/../../main.c:1084 (discriminator 1)
0x42026cca: app_main at /home/bill88t/git/circuitpython/ports/espressif/supervisor/port.c:503
0x4215a854: main_task at /home/bill88t/git/circuitpython/ports/espressif/esp-idf/components/freertos/app_startup.c:208

For this crash, the reason was: Guru Meditation Error: Core 1 panic'ed (Interrupt wdt timeout on CPU1).

possibly related? espressif/esp-idf#12466

I feel like it is. I think memory corruption takes place.

I only sometimes get a coredump. Sometimes usb just dies and debug serial, 5 seconds after usb has died, says that:

I (54623) wifi:bcn_timeout,ap_probe_send_start
W (54625) CP wifi: event 21 0x21
I (57131) wifi:ap_probe_send over, resett wifi status to disassoc
I (57132) wifi:state: run -> init (c800)
I (57133) wifi:pm stop, total sleep time: lu us / lu us

I (57136) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
W (57145) CP wifi: disconnected
W (57146) CP wifi: reason 200 0xc8
I (57149) CP wifi: Retrying connect. 4 retries remaining
W (59593) CP wifi: disconnected
W (59593) CP wifi: reason 201 0xc9

As if it didn't crash.

@Timeline8
Copy link
Author

@dhalbert, no problem on me doing some of the upfront testing. I know from my own experiences learning CircuitPython and also reading other people pleas for help, 99% of the time it is user error. So no harm on pushing back a bit on the user to kind of "prove it".

As for my project, it is for aquarium monitoring and eventually some controls later. Therefore I am sensing water temperature. Thermistors are well suited for this. Simple to use, and easy to waterproof if needed. I would love to see the build in ADC of the S3 back in track again, but the S2 is just as capable (don't need dual cores to read a temperature once every five minutes) so I still have a path forward.

@bablokb
Copy link

bablokb commented Jun 6, 2024

@Timeline8: have you thought about using DS18B20-sensors? They are available in a waterproof enclosure. And they are easy to use.

@bill88t
Copy link

bill88t commented Jun 13, 2024

You can download the "Absolute Newest" build from the downloads page for your board.
That will have the fix, along with any future 9.1.x releases.

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 13, 2024

Will this be fixed under the next beta release? 9.1.0-beta.4 or 9.1.1 or whatever the next release will be?

Yes, and it is already fixed in builds with PR9325 in the filename or later. Download from https://adafruit-circuit-python.s3.amazonaws.com/index.html?prefix=bin

@Timeline8
Copy link
Author

Timeline8 commented Jun 13, 2024

You can download the "Absolute Newest" build

Oh ya, totally forgot about that link on those pages. Duh! Thanks.

@Timeline8
Copy link
Author

Timeline8 commented Jun 15, 2024

Appreciate everyone's help, but I am still experience problems. Back to the Wavershare S3 Zero board, I downloaded adafruit-circuitpython-waveshare_esp32_s3_zero-en_US-20240613-main-PR9325-03e42a8.uf2 and installed it. Did it a couple times due to still having problems. The boot_out.txt reads:

Adafruit CircuitPython 9.1.0-beta.3-28-g03e42a8c0c on 2024-06-13; Waveshare ESP32-S3-Zero with ESP32S3
Board ID:waveshare_esp32_s3_zero
UID:437BAD9541C4

There is a more recent version by one day but it is (name truncated) ...PR9318-ed5591c.uf2 so I didn't try that one as 9318 is earlier than 9325.

Am downloading the correct version (first file name listed above)?
And does the text_out indicate the contents match the file name?

I ask because as I am playing with it I am experiencing disconnects, resets, and the full brightness WHITE NeoPixel issue. However the code either restarts or remains running (see it on my TFT) in the case of the white neopixel. I have not experienced any safe mode crashed due to internal watchdog, so I guess that is something.

@dhalbert
Copy link
Collaborator

Sorry to hear that. Which pin are you using, and do you know whether it's an ADC1 or an ADC2 pin? The UF2 you're downloading is correct. PR's (pull requests) are not merged in order so the order is not significant.

@Timeline8
Copy link
Author

Timeline8 commented Jun 15, 2024

D7 & D8 so IO7 & IO8 which go to GPIO7 & GPIO8 of the S3 (fortunately Waveshare kept numbers the same on the board vs the ESP32) which according to the datasheet for the ESP32-S3 is ADC1

(cut & pasted from table in datasheet)

ADC1_CH6 GPIO7
ADC1_CH7 GPIO8

My full code that I am running at this moment where I am seeing this is ...

import gc
import time
import board
import neopixel
from random import randint
import busio
import displayio
from fourwire import FourWire
from adafruit_st7789 import ST7789
from adafruit_thermistor import Thermistor

displayio.release_displays()

spi = busio.SPI(clock=board.D1, MOSI=board.D2)
tft_res = board.D3
tft_dc = board.D4
tft_cs = board.D5
tft_blk = board.D6  # TFT's backlight control

display_bus = FourWire(spi, command=tft_dc, chip_select=tft_cs, reset=tft_res)
display = ST7789(
    display_bus,
    width=240,
    height=135,
    rowstart=40,  # (320 - width) / 2
    colstart=53,  # (240 - height) / 2
    rotation=270,
    backlight_pin=tft_blk,
)

# Row and column start above are because ST7789 driver is for 320x240 display
# size so need to center real display in that virtual space.

display.brightness = 1.0  # between 0 and 1

led = neopixel.NeoPixel(board.NEOPIXEL, 1, brightness=0.03)

# Setup thermistor for readings
# pin, resistor, therm, @temp, beta, therm on high side
therm7 = Thermistor(board.D7, 10000, 10000, 25, 3695, high_side=True)
therm8 = Thermistor(board.D8, 10000, 10000, 25, 3695, high_side=True)

def get_average_temp(pin):
    readings = []

    for _ in range(5):
        reading = pin.temperature
        readings.append(reading)
        time.sleep(0.02)  # 20ms delay between readings

    average_temp_c = sum(readings) / len(readings)  # Average the C reading
    average_temp_f = (average_temp_c * 1.9) + 32  # Convert the Ave C Reading to F

    return average_temp_c, average_temp_f


# main loop
count = 0
while True:
    count += 1
    print(count, f"{gc.mem_alloc()=}")

    average_temp_c, average_temp_f = get_average_temp(therm7)
    print(
        f"  therm7 : {average_temp_c:.1f}\u00b0C {average_temp_f:.1f}\u00b0F"
    )

    average_temp_c, average_temp_f = get_average_temp(therm8)
    print(
        f"  therm8 : {average_temp_c:.1f}\u00b0C {average_temp_f:.1f}\u00b0F\n\n"
    )

    led[0] = (randint(0, 255), randint(0, 255), randint(0, 255))

    time.sleep(0.1)

@bill88t
Copy link

bill88t commented Jun 18, 2024

Adding this here to note how the bug looks with #9344.
(Link goes to the Adafruit Discord server)
https://discord.com/channels/327254708534116352/327298996332658690/1252425213765877882

@Timeline8
Copy link
Author

CONFIG_FREERTOS_UNICORE=y, so that only one core is used: problem seems to go away. That would explain why ESP32-S2,which has only one core, doesn't have the problem.

Circling back to this comment, which was before you thought it was fixed only for us to find it may not be, is this statement still true? And if so, is this something that can be done through CircuitPython?

I ask because as I try to decide which single configuration to use going forward on my project using S3 modules would be great if I could just configure it to one core if that hides the problem for now. If not, I may be looking at the Pi Pico W as my core board. The idea being a single design using the same components and configuration (MCU, display, sensors, etc.) in order to facilitate having any change or update later be an easy roll out to all my deployments.

@dave4jr
Copy link

dave4jr commented Aug 21, 2024

@Timeline8 @bill88t Hey guys, what’s the status on this? I’m also experiencing this issue on 9.1.1 on an esp32 s3 feather. Using adc on A4 while connected. Runs for about 15 - 20 min and get the watchdog error. Did you guys find a workaround? How do you configure the CONFIG_FREERTOS_UNICORE in CP? Thanks a bunch!

@bill88t
Copy link

bill88t commented Aug 21, 2024

@dave4jr I did perform some more tests, force-feeding watchdog, so we get no watchdog resets, only to get a hang.
(Where the watchdog does not save us from.)
I cannot get a crash dump, so I cannot debug this any further.

The only workarounds currently confirmed working are:

  • Having bluetooth paired.
  • Configuring CONFIG_FREERTOS_UNICORE at compile time.

Notes inside my head that could prove useful:

  • IDF5.3 fixed the similar temperature bug, perhaps tracking back the commit will yield answers.
  • The hang is now a true to the sense hang, the core just *Windows XP logout sound*.
  • How does enabling bluetooth fix this?

@dhalbert Maybe it's unicore time? Letting this bug linger in stable isn't a suitable alternative really..

@Timeline8
Copy link
Author

@dave4jr , as @bill88t mentioned, this was never resolved unfortunately. In the meantime I have reluctantly been trying out DS18b20 "1 -Wire" bus temperature sensors which work well enough but they come with their own set of inconveniences and problems not present when using a simple thermistor (which mean extra coding just to try and protect against these added problems). My preference would be to go back to my thermistor setup but this ADC bug makes that impossible on S3 modules when also using Adafruit IO feeds.

@bill88t, I agree that this should be somehow fixed or worked around in one of the next stable revisions of CP. I am rather surprised this has not come up with more people as ADC and Wifi are both kind of fundamental features of the ESP32-S3 modules and while many may be instead utilizing ready made 1-wire/I2C/SPI connected sensors, I would assume there is some not insignificant number of makers using plain old ADC with Wifi.

@steveputz
Copy link

@dhalbert...
I'm not sure if I'm seeing this same bug. I don't believe my program is using ADC (yet).

I've been experiencing infrequent but repeated watchdog timer resets on my ESP32-S3 Reverse TFT board running CircuitPython 9.1.1. By infrequent, I mean my program typically runs for one or more days before a watchdog reset halts it.

Although my program initializes an ADC AnalogIn on board.A5, it never actually reads it (I will be using it eventually).
I am reading board.A4 as a digital input pin, debounced using keypad.Keys() from a asyncio task.
When an occasional pulse is detected, data is sent to an Adafruit IO feed.
(interval is between several seconds up to 10 minutes).

I'm also using I2C to read RTC (from PCF8523), temperature (from MCP9808), and battery voltage (from MAX17048).
I'm periodically displaying to the TFT and NeoPixel about once per second from a second task.
Web Workflow is enabled via settings.toml, but not used except for software changes.
There are Serial print statements in the code, but USB is not connected.

Do you think it will help to update to CircuitPython 9.1.3 ?
Or to set CONFIG_FREERTOS_UNICORE ?

image

@dhalbert
Copy link
Collaborator

@steveputz A few comments:

  • I seem to remember other reports of watchdog timeouts unrelated to ADC.
  • If you remove the AnalogIn object creation, does it make any difference? It might take days to find out.
  • We will have a 9.2.0-alpha.2352 release soon, which upgrades a few things, including ESP-IDF. That would be worth. trying.

@steveputz
Copy link

steveputz commented Sep 10, 2024

@dhalbert ...

  • After the previous restart, it has run for 22 hours without a reset, but that's not unusual.
  • I just commented out the AnalogIn object creation and restarted.
  • Are there any relevant changes between 9.1.1 and 9.1.3 suggesting I should upgrade to 9.1.3, or should I wait for CircuitPython 9.2?
  • Any suggestions of what might be triggering the infrequent watchdog timeouts? I could artificially increase frequency of something (e.g. WiFi operations, display updates, etc.)
  • Both of the program's asyncio tasks do WiFi requests to AIO. Might they be stepping on each other very occasionally?

@Timeline8
Copy link
Author

@steveputz When 9.1.3 came out because it has an updated ESP-IDF, I tested the ADC functionality but still crashed for me when reading ADC values. So might be best to wait for 9.2.0-alpha.2352 if it is going to include another ESP-IDF update. However I don't know of any good reason not to upgrade from 9.1.1 to 9.1.3. I tend to always install the latest stable version whenever I am working with a board and sometimes the latest development release when the mood strikes me. So far so good.

@dhalbert
Copy link
Collaborator

The changes between 9.1.1 and 9.1.3 are slight, and mostly unrelated to Espressif in general. However, it's always good to update, because that narrows the space of where to look for a problem. 9.1.0 did an ESP-IDF update. 9.1.1 and later did not and won't (not on a bug-fix point release).

@bill88t
Copy link

bill88t commented Sep 12, 2024

IDF 5.3.1 fixes this. I hope, I pray.
I have been running while True: adc.value for 15' now with wifi connected, it hasn't crashed.
The weird random 0 values are also gone, the adc values seem a lot more uniform.

Oh and I flashed the latest artifact on my watch, so it'll get uptime-tested.

@bill88t
Copy link

bill88t commented Sep 12, 2024

Update: Left overnight.
This issue is resolved on the latest artifact.

@dhalbert
Copy link
Collaborator

IDF 5.3.1 fixes this. I hope, I pray.

Was that hope based on reading something in the release notes? The only explicit ADC thing mentioned is:
espressif/esp-idf#14124.

@bill88t
Copy link

bill88t commented Sep 12, 2024

Was that hope based on reading something in the release notes?

No, just on testing.
I retest everything i'm concerned about after idf / micropy core updates.

The idf release notes are incredibly long.
I physically don't have the time to read them.

The wording "I hope" was chosen because of the bluetooth workaround, which patched it, till enabling bluetooth became conditional, making me feel uncertain as to if the issue is really gone.

@Timeline8
Copy link
Author

Dan, I agree in that I went and searched the release document for the 5.3.1 version and a search for "ADC" did not turn up any mention of this specific issue. Since they didn't explicitly mention this problem nor a specific fix, I remain a bit skeptical that if it works now it might be more accidental and potentially something that could happen again on a later release.

@bill88t
Copy link

bill88t commented Sep 12, 2024

Well, if it does appear again, we will at least be able to trace down which commit caused it.
So it's not all that bad, I guess.

The test case is simple:

import board, analogio, wifi
wifi.radio.connect(...)
adc = analogio.AnalogIn(board.Ax)
while True:
  adc.value

So long as this passes and doesn't die within seconds or minutes we shouldn't have trouble with adc.

@Timeline8
Copy link
Author

I assume this won't be rolled out until CircuitPython 9.2.0-alpha.2352 (with 2351 being the current one)?

@bill88t
Copy link

bill88t commented Sep 12, 2024

I assume this won't be rolled out until CircuitPython 9.2.0-alpha.2352 (with 2351 being the current one)?

CircuitPython has CI/CD, you can download the latest artifact for your board from:
https://adafruit-circuit-python.s3.amazonaws.com/index.html?prefix=bin/

@tannewt
Copy link
Member

tannewt commented Sep 12, 2024

Ok, closing since it appears fixed. Feel free to request a reopen or make a new issue if it happens again.

@tannewt tannewt closed this as completed Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants