-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP32-S3 ADC use causes crashes when WiFi in use #9291
Comments
|
Additional observation this morning. Put the S3 TFT feather on to a breadboard so I could hook up physical thermistors. Only change to the code is the resistors I used are in a 1k SIP, so the setup code for the two thermistors changed from 10k to 1k. Crashed in the 194th loop with a divide by zero error that I have been seeing periodically. If I simply try a CTRL-D or a Save in MU to soft boot the code immediately crashes on the first loop. Repeated CTRL-D multiple times in a row to verify. Restarting the code via soft boot does not clear out of memory whatever is causing this failure. Hitting the boards RST button does restart the code properly where it went over 400 loops then went to safe mode with the internal watchdog timer expired error. I have pasted the REPL below of the div by zero message in the 194th loop and then a subsequent Save (soft boot) in MU. I did notice on my S2 running this code (also have it running on a Pico W without issue), I was taking samples for my averaging function every 50ms rather than the 20ms posted here. Changed my S3 TFT feather to 50ms, but still crashes with the WD timer expired Safe Made message.
|
The division by zero error is: if self.high_side:
# Thermistor connected from analog input to high logic level.
reading = self.pin.value / 64
reading = (1023 * self.series_resistor) / reading if the analog pin value is 0, then it divides by 0. |
Which is what I thought might be happening when I was running with no hardware connected to the pins where random noise at the open pins could result in a zero reading. This is why I then made sure to add the thermistors & resistors this morning so there would be no way there should be 0V at the pin and I still experienced the divide by zero fault. It ran the loop 193 times with valid room temperature mid/upper 70°Fs readings for both thermistors then 3 seconds later on loop 194 read the first thermistor then failed reading the second thermistor. Doesn't seem like a hardware problem with my simple circuit But that is kind of besides the point. I can not get this code to run for any length of time without some sort of crash on an ESP32-S3 where as the same code on an S2 and on a Pico W run none stop day in and day out for weeks now. With the S3 usually it is going into safe mode with an internal WD timer expired failure with the board disconnecting and sometimes reconnecting, sometimes the divide by zero, and still other times with the board disconnecting with the Neopixel going steady white but the code still running where I can see the measurements being reported every 3 seconds on the TFT but obviously the neopixel part of the code doing nothing anymore. One last test. I removed my thermistors and replaced them with a second 1k SIP so that the A0 and A1 pins are getting a straight fixed resistor voltage divider verified with my meter that both pin are at 1.65V. Reran code as is (so temp reported is 94°C 211°F since I didn't change the thermistor line from 10k@25°C). It managed to run over a half hour before it bombed out with the "Internal watchdog timer expired." safe mode message. At least no divide by zero. |
Just to be clear, you are getting things like the watchdog crash or divide by zero on 9.1.0-beta.3 as well as 9.0.x, right? We moved to a new version of ESP-IDF for 9.1.0, and I want to make sure that version still has the issue. |
I am fairly sure I did but re-reading my tedious notes above, I see I don't have an entry stating that. However I did note I ran the beta in the forum discussion. But I will double check tonight as the Waveshare should still be running the 9.1.0-beta.3. On the Adafruit S3 TFT I only ran that one on 9.0.4 as I thought it might be the version on the two S2s I have running this code for months without issue but checking one of the S2s, it is at 9.0.3. I can upgrade the S3 TFT tonight to 9.1.0-beta.3 and retest it as well. And I got a notice yesterday that my backorder from Digikey for the QtPy S3 I ordered has shipped, so when I get it I will load 9.1.0-beta.3 on that one as well and see how it acts (my S3 REV TFT is sadly still on backorder). I will report everything I find. Do you think there is any value loading 9.0.3 onto any of the S3 boards since the S2 boards run fine with it? Or do you think this is an S3 specific issue and we should only be looking upward and onward with current versions only? |
Based on your testing, I think this is an S3-specific issue. S3 with 9.1.0-beta.3 is the only test still to do, I would say. If that's a fix, great, otherwise I will set up an S3 board and let it run for hours. |
2024-06-04 Waveshare - verified it was still on the beta that I last ran it. From boot_out.txt… Running with TFT to see code running, using 2x 1k SIP packages to create voltage divider at pins D7 & D8 to simulate the thermistors and verified with a meter that each pin had 1.64V at each pin while running.. Looped just over 100 times (~ 5 minutes) before the board disconnected from my iMac, MU and MacOS both reported disk ejection, Neopixel changed to steady white, and board did not reconnect (MU serial window closed and icon show no board connected). However the TFT shows the code continuing to run normally and as I type this is at loop 180. S3 TFT - Ran “circup update --all” first to get the suggested link for the newest version of CP and also updated the libraries. Downloaded the beta and loaded it. From boot_out.txt… Hard reset board (pulled USB cable). Reduced code from the Waveshare because the external TFT code not required. Same 2x 1k SIP resistors to create resistor divider voltage at A0 and A1. Verified 1.57V at each pin while running. Ran for 45+ minutes and last I checked it was over 700 loops. I came back a little later and it had disconnected, but had then reconnected and restarted. The loop was up to 170+ on the TFT. Neopixel was acting normally per the code. CTRL-C, ejected board, power cycled it, and restarted. So far this second run it is behaving itself and has made it over an hour and at loop 1385. |
And to add to the end of the last post, the S3 feather sometween about 11pm and 2am disconnected and reconnected 4 times and finally stopped in Safe Model with an "Internal watchdog timer expired". |
On either board, do you have a It certainly sounds like the boards are hard-crashing resetting spontaneously. I will try a very simple test that simply reads as fast as possible from the ADC. |
Yes, I still have the WiFi settings auto connecting like that. We did talk about disabling that over at the forums but I don't think I tried it. So that is now on my to-do list for tonight: Run both the Waveshare S3 and Feather TFT S3 without the .toml file. If there is a conflict between the two (ADC and WiFi), that will be disappointing for me since I need both for my application, so I won't be able to simply not use WiFi. But the problem can't be fixed later if we don't narrow it down to a root cause, so I will try it sans WiFi. Also just received my QtPy S3 today. So if the other two crash relatively quickly with the WiFi off, I can try the QtPy. While I wouldn't think the model board matters as they are all S3, I have noticed that the Waveshare is good about crashing sooner while the Feather likes to wait until later. |
I was just testing on a QT Py ESP32-S3, with the very simple test program below. With So now I know how to reproduce this. It is strange because ADC2 is supposed to be shared with WiFi, not ADC1. But maybe something is interrupting the conversion in some bad way. No need for you to test further at this point. Thanks for persevering through this. Test program with two 1kohm resistors forming a 3.3/2 voltage divider connected to pin A2: import analogio
import board
a2 = analogio.AnalogIn(board.A2)
count = 0
while True:
count += 1
if count % 100000 == 0:
print("count", count)
v = a2.value
if v < 32000 or v > 33000:
print(count, v) |
Testing with |
Hi Dan, Thank you very much for confirming this. Since I first reported this on the forums and after days of replies there and here, since no one seemed to be interesting in actually running the code to see if it could be reproduced, I was starting to wonder if it was me and everyone was just being polite by not telling me I'm the idiot. ;) You and I actually had a conversation on the forums on the Wifi vs ADC about two months ago because I had read about the possible interference between the two and was concerned if I should be making a point to use ADC1 to avoid that. Regardless, it looks like your testing shows this is a different issue since you got the crash on both ADC1 & 2. I guess for now I will have to proceed with my projects targeting S2 boards. The project I am slowly developing as I learn CP, and add features to as I go, depends on both ADC and WiFi for IO Feeds. |
@Timeline8 Rest assured we were interested, but if we can delegate some testing to eliminate possibilities, then we try that. (We have all too many bugs to look at 🙂). For instance, was I hope we can figure this out soon, because broken ADC's when wifi is in use is a pretty serious limitation. As your testing indicates, the S2 boards could be a substitute. If you are interested in more precise temperature readings, then you could use external I2C ADC breakout board. If you are measuring ambient air temperature (not liquids), then one of the I2C temperature breakouts (there are many) could be used. But the easiest would be to fix ESP32-S3, of course. |
I just ran into this with the cardputer. I just finished the battery driver, which uses IO10 of the ESP32-S3.
The polling rate was 30 samples/s. The code accounts for |
Attempted this patch, which reduces the points of failure: --- a/ports/espressif/common-hal/analogio/AnalogIn.c
+++ b/ports/espressif/common-hal/analogio/AnalogIn.c
@@ -115,10 +115,10 @@ uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) {
#endif
uint32_t adc_reading = 0;
- size_t sample_count = 0;
+ int sample_count = 0;
// Multisampling
esp_err_t ret = ESP_OK;
- for (int i = 0; i < NO_OF_SAMPLES; i++) {
+ while (sample_count < NO_OF_SAMPLES) {
int raw;
ret = adc_oneshot_read(adc_handle, channel, &raw);
if (ret != ESP_OK) {
@@ -127,9 +127,6 @@ uint16_t common_hal_analogio_analogin_get_value(analogio_analogin_obj_t *self) {
adc_reading += raw;
sample_count += 1;
}
- if (sample_count == 0) {
- raise_esp_error(ret);
- }
adc_reading /= sample_count;
// This corrects non-linear regions of the ADC range with a LUT, so it's a better reading than raw It didn't work. I think the error is in esp-idf. I will attempt to create a minimal esp-idf application implementing oneshot adc and wifi. |
Yep, fun stuff:
Stock idf 5.2.1, untouched oneshot adc example.. |
possibly related? espressif/esp-idf#12466 |
Using a YD-ESP32-S3 which is a dual usb-C board for this, which is excellent for debugging and built some debug builds. import wifi, board, analogio;wifi.radio.connect("SSID", "PASSWD");a=analogio.AnalogIn(board.GPIO10)
while True:
a.value This reliably crashes it. Maximum 15s. Decoded backtrace of debug build (clean tree, current master):
For this crash, the reason was:
I feel like it is. I think memory corruption takes place. I only sometimes get a coredump. Sometimes usb just dies and debug serial, 5 seconds after usb has died, says that:
As if it didn't crash. |
@dhalbert, no problem on me doing some of the upfront testing. I know from my own experiences learning CircuitPython and also reading other people pleas for help, 99% of the time it is user error. So no harm on pushing back a bit on the user to kind of "prove it". As for my project, it is for aquarium monitoring and eventually some controls later. Therefore I am sensing water temperature. Thermistors are well suited for this. Simple to use, and easy to waterproof if needed. I would love to see the build in ADC of the S3 back in track again, but the S2 is just as capable (don't need dual cores to read a temperature once every five minutes) so I still have a path forward. |
@Timeline8: have you thought about using DS18B20-sensors? They are available in a waterproof enclosure. And they are easy to use. |
You can download the "Absolute Newest" build from the downloads page for your board. |
Yes, and it is already fixed in builds with |
Oh ya, totally forgot about that link on those pages. Duh! Thanks. |
Appreciate everyone's help, but I am still experience problems. Back to the Wavershare S3 Zero board, I downloaded adafruit-circuitpython-waveshare_esp32_s3_zero-en_US-20240613-main-PR9325-03e42a8.uf2 and installed it. Did it a couple times due to still having problems. The boot_out.txt reads: Adafruit CircuitPython 9.1.0-beta.3-28-g03e42a8c0c on 2024-06-13; Waveshare ESP32-S3-Zero with ESP32S3 There is a more recent version by one day but it is (name truncated) ...PR9318-ed5591c.uf2 so I didn't try that one as 9318 is earlier than 9325. Am downloading the correct version (first file name listed above)? I ask because as I am playing with it I am experiencing disconnects, resets, and the full brightness WHITE NeoPixel issue. However the code either restarts or remains running (see it on my TFT) in the case of the white neopixel. I have not experienced any safe mode crashed due to internal watchdog, so I guess that is something. |
Sorry to hear that. Which pin are you using, and do you know whether it's an ADC1 or an ADC2 pin? The UF2 you're downloading is correct. PR's (pull requests) are not merged in order so the order is not significant. |
D7 & D8 so IO7 & IO8 which go to GPIO7 & GPIO8 of the S3 (fortunately Waveshare kept numbers the same on the board vs the ESP32) which according to the datasheet for the ESP32-S3 is ADC1 (cut & pasted from table in datasheet)
My full code that I am running at this moment where I am seeing this is ... import gc
import time
import board
import neopixel
from random import randint
import busio
import displayio
from fourwire import FourWire
from adafruit_st7789 import ST7789
from adafruit_thermistor import Thermistor
displayio.release_displays()
spi = busio.SPI(clock=board.D1, MOSI=board.D2)
tft_res = board.D3
tft_dc = board.D4
tft_cs = board.D5
tft_blk = board.D6 # TFT's backlight control
display_bus = FourWire(spi, command=tft_dc, chip_select=tft_cs, reset=tft_res)
display = ST7789(
display_bus,
width=240,
height=135,
rowstart=40, # (320 - width) / 2
colstart=53, # (240 - height) / 2
rotation=270,
backlight_pin=tft_blk,
)
# Row and column start above are because ST7789 driver is for 320x240 display
# size so need to center real display in that virtual space.
display.brightness = 1.0 # between 0 and 1
led = neopixel.NeoPixel(board.NEOPIXEL, 1, brightness=0.03)
# Setup thermistor for readings
# pin, resistor, therm, @temp, beta, therm on high side
therm7 = Thermistor(board.D7, 10000, 10000, 25, 3695, high_side=True)
therm8 = Thermistor(board.D8, 10000, 10000, 25, 3695, high_side=True)
def get_average_temp(pin):
readings = []
for _ in range(5):
reading = pin.temperature
readings.append(reading)
time.sleep(0.02) # 20ms delay between readings
average_temp_c = sum(readings) / len(readings) # Average the C reading
average_temp_f = (average_temp_c * 1.9) + 32 # Convert the Ave C Reading to F
return average_temp_c, average_temp_f
# main loop
count = 0
while True:
count += 1
print(count, f"{gc.mem_alloc()=}")
average_temp_c, average_temp_f = get_average_temp(therm7)
print(
f" therm7 : {average_temp_c:.1f}\u00b0C {average_temp_f:.1f}\u00b0F"
)
average_temp_c, average_temp_f = get_average_temp(therm8)
print(
f" therm8 : {average_temp_c:.1f}\u00b0C {average_temp_f:.1f}\u00b0F\n\n"
)
led[0] = (randint(0, 255), randint(0, 255), randint(0, 255))
time.sleep(0.1) |
Adding this here to note how the bug looks with #9344. |
Circling back to this comment, which was before you thought it was fixed only for us to find it may not be, is this statement still true? And if so, is this something that can be done through CircuitPython? I ask because as I try to decide which single configuration to use going forward on my project using S3 modules would be great if I could just configure it to one core if that hides the problem for now. If not, I may be looking at the Pi Pico W as my core board. The idea being a single design using the same components and configuration (MCU, display, sensors, etc.) in order to facilitate having any change or update later be an easy roll out to all my deployments. |
@Timeline8 @bill88t Hey guys, what’s the status on this? I’m also experiencing this issue on 9.1.1 on an esp32 s3 feather. Using adc on A4 while connected. Runs for about 15 - 20 min and get the watchdog error. Did you guys find a workaround? How do you configure the CONFIG_FREERTOS_UNICORE in CP? Thanks a bunch! |
@dave4jr I did perform some more tests, force-feeding watchdog, so we get no watchdog resets, only to get a hang. The only workarounds currently confirmed working are:
Notes inside my head that could prove useful:
@dhalbert Maybe it's unicore time? Letting this bug linger in stable isn't a suitable alternative really.. |
@dave4jr , as @bill88t mentioned, this was never resolved unfortunately. In the meantime I have reluctantly been trying out DS18b20 "1 -Wire" bus temperature sensors which work well enough but they come with their own set of inconveniences and problems not present when using a simple thermistor (which mean extra coding just to try and protect against these added problems). My preference would be to go back to my thermistor setup but this ADC bug makes that impossible on S3 modules when also using Adafruit IO feeds. @bill88t, I agree that this should be somehow fixed or worked around in one of the next stable revisions of CP. I am rather surprised this has not come up with more people as ADC and Wifi are both kind of fundamental features of the ESP32-S3 modules and while many may be instead utilizing ready made 1-wire/I2C/SPI connected sensors, I would assume there is some not insignificant number of makers using plain old ADC with Wifi. |
@dhalbert... I've been experiencing infrequent but repeated watchdog timer resets on my ESP32-S3 Reverse TFT board running CircuitPython 9.1.1. By infrequent, I mean my program typically runs for one or more days before a watchdog reset halts it. Although my program initializes an ADC AnalogIn on I'm also using I2C to read RTC (from PCF8523), temperature (from MCP9808), and battery voltage (from MAX17048). Do you think it will help to update to CircuitPython 9.1.3 ? |
@steveputz A few comments:
|
@dhalbert ...
|
@steveputz When 9.1.3 came out because it has an updated ESP-IDF, I tested the ADC functionality but still crashed for me when reading ADC values. So might be best to wait for 9.2.0-alpha.2352 if it is going to include another ESP-IDF update. However I don't know of any good reason not to upgrade from 9.1.1 to 9.1.3. I tend to always install the latest stable version whenever I am working with a board and sometimes the latest development release when the mood strikes me. So far so good. |
The changes between 9.1.1 and 9.1.3 are slight, and mostly unrelated to Espressif in general. However, it's always good to update, because that narrows the space of where to look for a problem. 9.1.0 did an ESP-IDF update. 9.1.1 and later did not and won't (not on a bug-fix point release). |
IDF 5.3.1 fixes this. I hope, I pray. Oh and I flashed the latest artifact on my watch, so it'll get uptime-tested. |
Update: Left overnight. |
Was that hope based on reading something in the release notes? The only explicit ADC thing mentioned is: |
No, just on testing. The idf release notes are incredibly long. The wording "I hope" was chosen because of the bluetooth workaround, which patched it, till enabling bluetooth became conditional, making me feel uncertain as to if the issue is really gone. |
Dan, I agree in that I went and searched the release document for the 5.3.1 version and a search for "ADC" did not turn up any mention of this specific issue. Since they didn't explicitly mention this problem nor a specific fix, I remain a bit skeptical that if it works now it might be more accidental and potentially something that could happen again on a later release. |
Well, if it does appear again, we will at least be able to trace down which commit caused it. The test case is simple: import board, analogio, wifi
wifi.radio.connect(...)
adc = analogio.AnalogIn(board.Ax)
while True:
adc.value So long as this passes and doesn't die within seconds or minutes we shouldn't have trouble with adc. |
I assume this won't be rolled out until CircuitPython 9.2.0-alpha.2352 (with 2351 being the current one)? |
CircuitPython has CI/CD, you can download the latest artifact for your board from: |
Ok, closing since it appears fixed. Feel free to request a reopen or make a new issue if it happens again. |
CircuitPython version
Code/REPL
Behavior
Various failures but usually crashes share in common: MU pops up “Could not find an attached drive”, Mac OS pops up “Disk Not Ejected Properly”, MU of course has closed the serial window so nothing to see. Printing gc.mem_allocat() with each loop in my code shows allocated memorial in the 4000-8000 range so no apparent run away memory issues.
Sometimes the board will disconnect, come back, code stays running but the Neopixel is steady white like it is in the REPL. Other times it crashed with 3x yellow blinking (Safe mode) and reports an internal watchdog timer expired.
I have an S2 board that is on 9.0.4 and has been running this code for many weeks and sending the data to an IO feed. No chronic crashed like the S3 boards.
Description
What follows is the long list of notes I have been taking as I tried different things. But the above, in behavior, is the executive summary. Below is tedious reading. Sorry...
Testing notes:
Waveshare ESP32-S3 Zero running 9.0.5 and libraries updated via Circup is starting with the “code chooser” code discussed here https://forums.adafruit.com/viewtopic.php?t=210926 starting with the 6th post down.
Code I am running (“choosing”) is a dual thermistor reading in a roughly 3+ second long loop that reads two thermistors and then changes the color of the Neopixel 3 times once per second.
Crashes share in common: MU pops up “Count not find an attached drive”, Mac OS pops up “Disk Not Ejected Properly”, MU of course has closed the serial window so nothing to see. Printing gc.mem_allocat() with each loop in my code shows allocated memorial in the 4000-8000 range so no apparent run away memory issues.
First time I had no display configured so I could not see an error and the Neopixel wasn’t indicating any activity. Added code to display serial window on external display. Reset board with reset button. I failed to note if the drive had reloaded itself after the crash and before reseting the board.
Second time same “crash”. Observed the Neopixel to be constant white indicating it was in the REPL, however the external display showed the code was still running and getting valid thermistor readings. Crashed happened around 290-300 loops. CIRCUITPU remounted its drive I believe but not certain. I let it run for a while longer then reset the board with the reset button.
Third time crashed at loop 272, this time stopping and Neopixel flashing yellow in three blink bursts (safe mode). Reopening MU serial window, failed due to ”Internal watchdog timer expired.” Noted for sure that the CIRCUITPY drive had remounted. Ejected drive and power cycled the board by unplugging USB cable.
Fourth time crashed at loop 233 (gc.mem_alloc at 5568). Same as third run with code stopped, three yellow flashes, and “Internal watchdog timer expired” in the reopened MU serial window.
Switching gears… Renamed the “code chooser” program from code.py and made my thermistor code code.py so it will load and run directly without the chooser reseting the MCU. Also power cycle reset the board.
Different type of crash this time. At loop 53 (mem = 5168). Drive did not unmount and the error in the REPL is
Traceback (most recent call last):
File "code.py", line 66, in
File "code.py", line 46, in get_average_temp
File "adafruit_thermistor.py", line 126, in temperature
File "adafruit_thermistor.py", line 116, in resistance
ZeroDivisionError: division by zero
Odd. Normally I use 10k resistors with my 10k thermistor but this time I only had 1k resistors on hand. But I wouldn’t think that should matter. Source code for the library doesn’t indicate any restrictions on the resistor range. I believe this failure is just a result of random values when no thermistor is attached.
Ran again and it made it to run 72 but same divide by zero error. Switched to 10k resistors. Hard reset. Made it to run 38, with the previously described crash scenario (disk eject & reconnect, safe mode with an “Internal watchdog timer expired” error) is back. Done for the night!
Next day. Backed up entire Waveshare CIRCUITPY drive. Ran one more time as is. Crashed with the Neopixel showing steady white (REPL indicator) but code was still running. MU and Mac OS both reported drive ejected. Board did not remount and MU doesn’t see it.
Adafruit REV TFT S2 Feather. Copied over all the files that were on the Waveshare. Also verified 9.0.5 and ran Circus to verify all libraries were up to date (all were). Commented out all code that had anything to do with the external display. Thermistors on breadboard changed from D6 and D7 to A0 and A1. No failures after a few hours.
Switched back to Waveshare and ran as is. Eventually failed with the REPL white neopixel, ejected disk, but kept running. Drive did not remount. Did full reinstall of boot loader then 9.0.5. Copied over backed up files onto the MCU again. Hard power cycle reset. Restarted code. Crashed at cycle 288, 3 yellow blink safe mode and “Internal watchdog timer expired” and drive remounted.
Commented out all thermistor stuff and just ran the neopixel and gc memory allocation. Ran 13908 loops without issue (over 12 hours). Uncommented thermistor code and restarted the run (hard reset). Made it 457 loops (a little over 20 minutes) and crashed with the board disconnecting and the TFT fade to black and back in about 3 second pulses.
Restarted as is after getting home from work. Got to about 275, white NeoPixel, still running code, and disconnected. Moved it to a power supply connection only (not computer) and restarted. Looks like it crashed the same way with white Neopixel and code still displaying new lines.
Copied same drive contents to S3 TFT Feather running 9.0.4 and started it on the computer (no thermistors connected). S3 TFT Feather crashed, disconnected, reconnected and reports Safe Mode for Internal Watchdog timer expired. Restarted S3 TFT Feather. Dies same way.
Additional information
No response
The text was updated successfully, but these errors were encountered: