Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QT PY ESP32-S2 core crash when Wi-Fi router is power-cycled #7230

Open
rdagger opened this issue Nov 19, 2022 · 42 comments
Open

QT PY ESP32-S2 core crash when Wi-Fi router is power-cycled #7230

rdagger opened this issue Nov 19, 2022 · 42 comments

Comments

@rdagger
Copy link

rdagger commented Nov 19, 2022

CircuitPython version

Adafruit CircuitPython 8.0.0-beta.4-21-g8f414eb4e on QT PY ESP32-S2.  

Code/REPL

from secrets import secrets
from time import sleep
import wifi

def connect_to_wifi():
    """Connect to Wi-Fi."""
    print(f'Connecting to {secrets["ssid"]}...')
    # Test Wi-Fi signal strength
    rssi = -255
    for network in wifi.radio.start_scanning_networks():
        if network.ssid == secrets["ssid"] and network.rssi > rssi:
            rssi = network.rssi
    wifi.radio.stop_scanning_networks()
    print(f'Wi-Fi rssi: {rssi} db')

    while True:
        try:
            wifi.radio.connect(secrets["ssid"], secrets["password"])
            print(f'Connected to {secrets["ssid"]}')
            break
        except Exception as wifi_err:
            print(f"Wi-fi connection error: {wifi_err}")
        sleep(1)

connect_to_wifi()

gateway_ip = wifi.radio.ipv4_gateway

while True:
    # Check Wi-Fi is connected
    ping = wifi.radio.ping(gateway_ip, timeout=2)
    if ping is None:
        print("Gateway could not be pinged!")
        connect_to_wifi()
        continue
    print(f"Ping: {ping}") 
    sleep(1)

Behavior

Had 4 QT-PY ESP32-S2's running in my office. They all core crashed when I restarted my Wi-Fi router. Simplified my code and reproduced the problem multiple times.
The code runs until the router is rebooted or powered down. Then it throws the following error:

�]0;�Wi-Fi: off | Done | 8.0.0-beta.4-21-g8f414eb4e�\Auto-reload is off.
Running in safe mode! Not running saved code.

You are in safe mode because:
CircuitPython core code crashed hard. Whoops!
Crash into the HardFault_Handler.
Please file an issue with the contents of your CIRCUITPY drive at
https://github.com/adafruit/circuitpython/issues

Description

I don't think this is a duplicate of "ping too frequently results in Safe Mode #5980" because I am waiting 1 second between pings and the code doesn't crash from frequent pings. Instead, it crashes while the Wi-Fi is trying to reconnect. Furthermore, I've had a more complicated version of the code running for several days on 4 QT-PY's and only encountered a core crash when the router went down.

Additional information

Removing the RSSI check that uses wifi.radio.start_scanning_networks() causes the crashes less frequently but they can still occur during reconnect. Occasionally, the code will not crash and reconnect properly. I suspect the longer the router is down the more likely the core crash. My router takes over a minute to restart.

@rdagger rdagger added the bug label Nov 19, 2022
@tannewt tannewt added this to the 8.x.x milestone Nov 21, 2022
@rdagger
Copy link
Author

rdagger commented Nov 22, 2022

I have done more testing using my 4 QT-PY ESP32-S2's. They will often run for a dozen hours and then all core crash within minutes of each other. I've been checking the memory and there doesn't appear to be any leaks.

I've been trying to come up with a way to reliably reconnect the WiFi when it's disrupted. Pinging the gateway, and then reconnecting on no response, does work most of the time. However, certain types of disconnections seem to always cause a core crash.

All 4 QT PY's are in the same room with direct line of sight to my access point. I don't notice any WiFi issues with any of the other equipment in the house including an older MicroPython ESP32 in my vegetable garden outside which has been running great for over a year.

btw: The docs state: "Reconnections are handled automatically once one connection succeeds." This is not the case with the QT PY ESP32-S2.

I added multiple ping attempts before reconnection to avoid unnecessary reconnects. It seems that many failed pings to the gateway will resolve on a 2nd or 3rd try.

        # Check Wi-Fi is connected
        wifi_verified = False
        for i in range (10):
            if wifi.radio.ping(gateway_ip, timeout=2):
                wifi_verified = True
                break
            else:
                print(f'Gateway could not be pinged! Attempt {i+1} of 10.')
                sleep(3)
        if not wifi_verified:
            connect_to_wifi()
            continue

I also removed the RSSI check prior to WiFi connection because the start_scanning_networks() command was definitely causing more frequent core crashes.

@anecdata
Copy link
Member

anecdata commented Nov 22, 2022

On espressif port (ESP32-S2), you can call connect() whenever needed with negligible penalty even if the device is already connected, so there's no harm and no need to ping.

The truest test of whether the device is connected is checking for an IPv4 address (wifi.radio.ipv4_address); if there is an IPv4 address, then the device is connected to the AP. This is how it's tracked internally on espressif port.

btw: The docs state: "Reconnections are handled automatically once one connection succeeds." This is not the case with the QT PY ESP32-S2.

There are several automatic reconnection attempts, but not infinite, so connection retries are ultimately needed if there is an extended disconnection.

@rdagger
Copy link
Author

rdagger commented Nov 22, 2022

@anecdata The crashes occur on wifi.radio.connect(). They don't occur on the pings. That's why I'm trying to minimize the reconnects. Thanks, for the tip on the ipv4_address! I will try setting up one of the QT PY's with an ipv4_check instead of pinging to determine if the WiFi connection has dropped.

@anecdata
Copy link
Member

anecdata commented Nov 22, 2022

How did you determine that the connect() is triggering the hardfault? I guess it's the print statements, but it is possible some issue takes time to manifest.

btw, espressif boards will automatically connect to the AP with the best signal, among multiple of the same SSID (no need to scan and find best RSSI):

Another thing that need to be considered is that the reconnection may not connect the same AP if there are more than one APs with the same SSID. The reconnection always select current best APs to connect.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32s2/api-guides/wifi.html#wi-fi-reconnect

There was an issue with unstable wifi scanning, it's ostensibly fixed but maybe try an example without pings or scanning just to see if that makes any difference?

@rdagger
Copy link
Author

rdagger commented Nov 22, 2022

I have relied on the print statements. I have a print statements on the line before and after wifi.radio.connect. When a crash occurs the preceding print statement prints but the subsequent one never prints.

There is only 1 access point in the house.
I was using the RSSI code because I was having signal strength issues with 2 of my QT PY's which have external antennas instead of built-in on the board. The built-in antennae seem to work better although it doesn't matter when everything is in the same room. I did find that WiFi scanning did result in more core crashes. That's why I removed the RSSI check.

I have 2 QT PY's running now using only the ipv4_address to verify the WiFi connection. I also have 2 QT PY's running my previous code slightly modified to reattempt failed pings before reconnecting.

@tannewt
Copy link
Member

tannewt commented Nov 22, 2022

You can also do a DEBUG=1 build to get a back trace when the crash occurs. There is a backtrace decoder in ports/espressif/tools

@rdagger
Copy link
Author

rdagger commented Nov 22, 2022

Never tried to build CP. Is there a build guide for the QT PY ESP32-S2 or are there any prebuilt UF2's with debug enabled?

@dhalbert
Copy link
Collaborator

@rdagger See https://learn.adafruit.com/building-circuitpython/ and particularly https://learn.adafruit.com/building-circuitpython/espressif-build. If you have or can set up a Linux box, that's generally easiest. The DEBUG=1 build might be too big, and you may need to disable some features. Setting CIRCUITPY_ULAB=0 is often enough, because that's large. We can help you with builds in https://adafru.it/discord in #circuitpython-dev.

@rdagger
Copy link
Author

rdagger commented Nov 23, 2022

I was able to build with DEBUG=1. For some reason the I2S is not working but I don't really need it to test the WiFi. I will left you know how it goes and thanks for the build help.

@rdagger
Copy link
Author

rdagger commented Nov 29, 2022

I’m stuck trying to debug the core crash. I tried multiple custom builds of CircuitPython using DEBUG=1 but I haven’t been able to collect any data yet because the core crash takes out the USB serial before any data is displayed. I’ve tried to implement a console uart to catch the debug data but I can’t get it working. The QT PY ESP32-S2 does not have a 'Send ESP_LOG output to TX/RX pins' sample in the sdkconfig file so I copied the section from the QT PY ESP32 Pico sdkconfig. I modified the console uart TX and RX pins to 5 and 16 to match the TX and RX labels on the QT PY ESP32-S2:


# Uncomment (remove ###) to send ESP_LOG output to TX/RX pins
#
# ESP System Settings
#
CONFIG_ESP_SYSTEM_PANIC_PRINT_HALT=y
# CONFIG_ESP_SYSTEM_PANIC_SILENT_REBOOT is not set
CONFIG_ESP_CONSOLE_UART_CUSTOM=y
# CONFIG_ESP_CONSOLE_NONE is not set
CONFIG_ESP_CONSOLE_UART=y
CONFIG_ESP_CONSOLE_UART_CUSTOM_NUM_0=y
# CONFIG_ESP_CONSOLE_UART_CUSTOM_NUM_1 is not set
CONFIG_ESP_CONSOLE_UART_NUM=0
CONFIG_ESP_CONSOLE_UART_TX_GPIO=5
CONFIG_ESP_CONSOLE_UART_RX_GPIO=16
CONFIG_ESP_CONSOLE_UART_BAUDRATE=115200
# CONFIG_ESP_SYSTEM_CHECK_INT_LEVEL_5 is not set
# end of ESP System Settings

Unfortunately, I’m not getting any serial output on either GPIO pin. I hooked the QT PY up to an oscilloscope to verify no communication. Is there something else I have to modify or are my settings wrong?

Btw: switching from a pinging approach to an ipv4_address check for WiFi connectivity has increased the time between core crashes but they still occur.

@tannewt
Copy link
Member

tannewt commented Nov 30, 2022

That sdkconfig seems to work for me. Here is my build:

firmware.uf2.zip

Part of the output is:

I (1380) cpu_start: Pro cpu start user code
I (1380) cpu_start: cpu freq: 240000000
I (1380) cpu_start: Application information:
I (1385) cpu_start: Project name:     circuitpython
I (1391) cpu_start: App version:      8.0.0-beta.4-194-g2f5ec1cab-dir
I (1398) cpu_start: Compile time:     Nov 30 2022 11:16:18
I (1404) cpu_start: ELF file SHA256:  824bf4d7c9ce4654...
I (1410) cpu_start: ESP-IDF:          716d8531d7

@rdagger
Copy link
Author

rdagger commented Dec 1, 2022

@tannewt Thanks for doing that. I guess I'm missing something because I'm still not getting any serial communication. Did you use pins TX and RX (5 & 16)? I tried wiping the Pi just to be sure. Is there any special UART settings? I'm just using:

pi@raspberrypibplus:~ $ tio /dev/ttyAMA0
[tio 17:33:25] tio v1.32
[tio 17:33:25] Press ctrl-t q to quit
[tio 17:33:25] Connected

I should see something on TX when I reboot the QT PY right?

@rdagger
Copy link
Author

rdagger commented Dec 1, 2022

btw: when I uploaded your firmware I got the following upon boot:

Adafruit CircuitPython 8.0.0-beta.4-59-g2f5ec1cab-dirty on 2022-11-30; Adafruit QT Py ESP32S2 with ESP32S2

Is that the correct version? I also tried doing a factory reset of the QT PY and loading your UF2 again just to be sure.

@tannewt
Copy link
Member

tannewt commented Dec 1, 2022

I'm not using a Pi to read the UART. I've got a USB to serial adapter board I use.

That version number matches mine.

Could you post a picture of the board?

@rdagger
Copy link
Author

rdagger commented Dec 1, 2022

Here are some pics:
QT PY ESP32-S2

QT PY   Pi

@tannewt
Copy link
Member

tannewt commented Dec 1, 2022

Wiring looks right to me. What does ls /dev/tty* show? I've never seen one that starts with AMA. (Though I haven't really used the built in uart on rpi.)

@rdagger
Copy link
Author

rdagger commented Dec 1, 2022

I did do a loop back test to make sure /dev/ttyAMA0 was the correct UART for my wiring and I verified it was the high speed UART (PL011). I tried on 2 Raspberry Pi's (B plus and 4). I'm on the road so I don't have my scope or any USB dongles.

@rdagger
Copy link
Author

rdagger commented Dec 1, 2022

ls -l /dev
lrwxrwxrwx 1 root root 7 Nov 30 20:03 serial0 -> ttyAMA0

dmesg | grep tty
[ 3.325760] 20201000.serial: ttyAMA0 at MMIO 0x20201000 (irq = 81, base_baud = 0) is a PL011 rev2

ttyAMA0 is the high speed UART.

@tannewt
Copy link
Member

tannewt commented Dec 1, 2022

Ok, I have no idea why you aren't seeing output then.

@rdagger
Copy link
Author

rdagger commented Dec 1, 2022

I'll try another QT PY ESP32-S2 when I get home. I did have some quality issues with this last batch such as a cold solder joint. Thanks for all your help!

@rdagger
Copy link
Author

rdagger commented Dec 1, 2022

I reloaded CircuitPython 8.0.0-beta.4 from CircuitPython.com and ran the following code:

from board import TX, RX
from busio import UART
uart = UART(TX, RX, baudrate=115200)
print('UART ready')

while True:
    if uart.in_waiting:
        data = uart.readline()
        data_string = ''.join([chr(b) for b in data])
        print(data_string, end="")
        uart.write(data)

I then connected using Tio on the Pi with the same wiring pictured above and typed a test:

pi@raspberrypibplus:~ $ tio /dev/ttyAMA0 -b 115200
[tio 12:25:06] tio v1.32
[tio 12:25:06] Press ctrl-t q to quit
[tio 12:25:06] Connected
Test from Tio on Raspberry Pi

It showed up in the Mu serial console and was transmitted back to the Pi.

�]0;�Wi-Fi: off | code.py | 8.0.0-beta.4�\UART ready
Test from Tio Raspberry Pi

I think that shows that the Raspberry Pi and wiring are OK.

@rdagger
Copy link
Author

rdagger commented Dec 4, 2022

I got a brand new QT PY ESP32-S2. I loaded your DEBUG=1 firmware above. I got a CP210x USB to serial dongle and connected it to the QT PY (TX to RX and RX to TX). I connected to the serial port using Putty at 115200 from a Windows computer. Unfortunately, I did not receive any communication from the QT PY. I tried rebooting the QT PY and nothing came through. I reversed RX and TX but still nothing.

I hooked up the CP210x dongle to my scope with serial decoding enabled and was able to receive data from Putty. I hooked up the TX pin of the QT PY to the scope and I didn’t get anything.

Perhaps I have inaccurate expectations. I thought the RX and TX pins would give REPL access like the serial screen in Mu. Is that not the case? Is there something I have to enable, or do to get the QT PY to transmit on the TX pin? Is there some better way to test my set up?

@tannewt
Copy link
Member

tannewt commented Dec 5, 2022

My build should have debug output to TX, not the REPL. Try flashing the bin with esptool. (or make BOARD= PORT= flash from within circuitpython). Perhaps UF2 bootloader doesn't change the second stage bootloader.

@rdagger
Copy link
Author

rdagger commented Dec 5, 2022

Erasing the board and then flashing my firmware.bin with https://adafruit.github.io/Adafruit_WebSerial_ESPTool/ did the trick. Thanks!
Both Putty on Windows using a dongle, and Tio on the Raspberry Pi using GPIO pins works.
I was also able to redirect the debug UART to pins 40 and 41 so I can run the debug serial cable out the side of my project enclosure.

Unfortunately, my program gets stuck right after I2S is configured:

self._audio = audiobusio.I2SOut(bit_clock=TX, word_select=RX, data=D6)

The debug console just keeps printing the following:
debug01

The program itself hangs at this point. It's been printing the above for over 10 minutes now. The program will run if I comment out the I2S section, but I'm wondering if that could be part of what was originally causing the core crash.

@tannewt
Copy link
Member

tannewt commented Dec 6, 2022

You may want to change the debug level in the debug sdkconfig You can change the DEBUG to INFO to do this. That will stop that message. It looks to me like it is doing the I2S output by alternating buffers.

I'd then add more prints to see where CP is getting stuck.

@rdagger
Copy link
Author

rdagger commented Dec 6, 2022

The program still freezes with the same debug output right after intializing the audiobusio.I2SOut. I changed the debug level and ran make clean before running make again. I erased and programmed the QT PY using ESPTool with the firmare.bin. Are my edits correct?
Debug3
I did notice the following line in the debug output:
E (2648) I2S: i2s_driver_uninstall(2047): I2S port 0 has not installed
and these are the only other mentions of I2S before the repeating debug output loop:

I (10438) I2S: queue free spaces: 3
D (10448) I2S: Addr[0] = 1073695632
D (10448) I2S: Addr[1] = 1073696148
I (10448) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=2
I (10458) I2S: I2S0, MCLK output by GPIO0
D (10458) I2S: size: 512, rw_pos: 0, buf_size: 512, curr_ptr: 1073695632
D (10468) I2S: size: 512, rw_pos: 0, buf_size: 512, curr_ptr: 1073696148

Here's my mpconfigport.mk. I did disable ble, rotaryio, esp32_camera and touchio_use_native to make it fit with DEBUG=1.

Config3

@tannewt
Copy link
Member

tannewt commented Dec 6, 2022

I pointed you to the wrong debug level. That one is for the second stage bootloader. There is one further down in the file for the user program: https://github.com/adafruit/circuitpython/blob/main/ports/espressif/esp-idf-config/sdkconfig-debug.defaults#L76

@rdagger
Copy link
Author

rdagger commented Dec 6, 2022

That fixed it. Actually, the program is now up and running without any issues. For some reason the debug level was preventing the I2S from working.

I'll let it run and hopefully I'll get a core crash soon with a back trace.

Thanks for your patience!

@rdagger
Copy link
Author

rdagger commented Dec 6, 2022

Not sure why but the program can't maintain a stable Adafruit IO connection with DEBUG=1 and also using I2S. My version without I2S ran for a day without dropping a connection. However, the 2 boards I just set up with I2S have been online for only an hour and have dropped 3 times with very unreliable service.

Here's the REPL view:

WiFi Hostname: QTPY19871
Connecting to Adafruit IO...
Adafruit IO connected.  Client ID: QTPY19871
Subscribed to rdagger/feeds/test at QOS level 1
Free memory: 1945680, hour: 14
Topic: rdagger/feeds/test, message: 2:21 PM
Topic: rdagger/feeds/test, message: 2:23 PM
MMQTTException: PINGRESP not returned from broker.
MMQTTException: PINGRESP not returned from broker.
MMQTTException: PINGRESP not returned from broker.
OSError: Unhandled ESP TLS error 80 0 8018 -80
Attempting to reconnect to MQTT broker
Adafruit IO connected.  Client ID: QTPY19871
Subscribed to rdagger/feeds/test at QOS level 1
MMQTTException: PINGRESP not returned from broker.
MMQTTException: PINGRESP not returned from broker.
MMQTTException: PINGRESP not returned from broker.
OSError: Unhandled ESP TLS error 80 0 8018 -80
Attempting to reconnect to MQTT broker
Adafruit IO connected.  Client ID: QTPY19871
Subscribed to rdagger/feeds/test at QOS level 1
Topic: rdagger/feeds/test, message: wifi
MMQTTException: PINGRESP not returned from broker.
MMQTTException: PINGRESP not returned from broker.
MMQTTException: PINGRESP not returned from broker.
OSError: Unhandled ESP TLS error 80 0 8018 -80
Attempting to reconnect to MQTT broker
Adafruit IO connected.  Client ID: QTPY19871
Subscribed to rdagger/feeds/test at QOS level 1
Free memory: 1945184, hour: 15

And the only debug output:

I (64475) I2S: queue free spaces: 3
I (64475) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=2
I (64475) I2S: I2S0, MCLK output by GPIO0
I (64485) gpio: GPIO[7]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (71685) esp-x509-crt-bundle: Certificate validated
E (749515) esp-tls-mbedtls: write error :-0x0050:
I (751205) esp-x509-crt-bundle: Certificate validated
E (1548615) esp-tls-mbedtls: write error :-0x0050:
I (1550325) esp-x509-crt-bundle: Certificate validated
E (2289985) esp-tls-mbedtls: write error :-0x0050:
I (2291495) esp-x509-crt-bundle: Certificate validated

I also have 2 boards without DEBUG=1 running the full program with I2S and they have been stable for a day.

@rdagger
Copy link
Author

rdagger commented Dec 7, 2022

One of the QT PY's core crashed last night at 2AM. Here is the last REPL message:

MQTTException: PINGRESP not returned from broker.
[tio 02:27:18] Disconnected
[tio 02:27:25] Connected
Auto-reload is off.
Running in safe mode! Not running saved code.
You are in safe mode because:
Internal watchdog timer expired.

Unfortunately, I didn't notice until this morning and my terminal only had 1000 lines of scrollback. The debug console keeps outputting the following lines (about 100 per minute) so I Iost the debug output concerning the core crash:

I (24718074) gpio: GPIO[39]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (24718074) gpio: GPIO[38]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 

@tannewt
Copy link
Member

tannewt commented Dec 7, 2022

The status LED blinks tend to output that. It is probably easiest to comment out that print in the IDF, clean, build and flash again.

@rdagger
Copy link
Author

rdagger commented Dec 7, 2022

Is this the correct code to modify:
https://github.com/adafruit/circuitpython/blob/main/supervisor/shared/safe_mode.c#:~:text=allow%20for%20reset.-,%23if%20CIRCUITPY_STATUS_LED,%23endif,-if%20(boot_in_safe_mode)%20%7B

Specifically comment out the following lines in supervisor/shared/safe_mode.c as shown:

/*
#if CIRCUITPY_STATUS_LED
status_led_init();
#endif
*/
...
...
/*
#ifdef CIRCUITPY_STATUS_LED
// Blink on for 100, off for 100
bool led_on = (diff % 250) < 125;
if (led_on) {
	new_status_color(SAFE_MODE);
} else {
	new_status_color(BLACK);
}
*/
...
...
/*	
#if CIRCUITPY_STATUS_LED
new_status_color(BLACK);
status_led_deinit();
#endif
*/

@rdagger
Copy link
Author

rdagger commented Dec 8, 2022

I haven't had a chance to implement the changes above, but the other QT PY crashed about 2 hours ago and didn't go into safe mode. Instead, it just locked up. Here's the debug output:

Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4009361b  PS      : 0x00060630  A0      : 0x3ffdfc80  A1      : 0x3ffdfc60  
A2      : 0xffffffff  A3      : 0x00000001  A4      : 0xffffffff  A5      : 0x3ffd7e54  
A6      : 0x3ffcffb8  A7      : 0x00000000  A8      : 0x80093619  A9      : 0x3ffe3830  
A10     : 0x00000002  A11     : 0x3fe150a0  A12     : 0x3ffd2c90  A13     : 0x0000000f  
A14     : 0x3ffd2cd4  A15     : 0x0000000c  SAR     : 0x0000001c  EXCCAUSE: 0x0000001d  
EXCVADDR: 0x0000001d  LBEG    : 0x3ffd2c90  LEND    : 0x0000000f  LCOUNT  : 0x4002cbfc  

Backtrace: 0x40093618:0x3ffdfc60 0x3ffdfc7d:0x3ffdfc80 |<-CORRUPTED

ELF file SHA256: 146992c619b26705

CPU halted.

There was no info in the REPL other than the following:

Subscribed to rdagger/feeds/test at QOS level 1
MMQTTException: PINGRESP not returned from broker.

[tio 17:39:49] Disconnected

@rdagger
Copy link
Author

rdagger commented Dec 8, 2022

I ran decode_backtrace and got the following:

rdagger@ubuntucp:~/circuitpython/ports/espressif$ python3 tools/decode_backtrace.py adafruit_qtpy_esp32s2
adafruit_qtpy_esp32s2
? 0x40093618:0x3ffdfc60 0x3ffdfc7d:0x3ffdfc80
0x40093618: fun_bc_call at /home/rdagger/circuitpython/ports/espressif/../../py/objfun.c:337 (discriminator 1)
0x3ffdfc7d: ?? ??:0
? 

@tannewt
Copy link
Member

tannewt commented Dec 8, 2022

Bummer. I was hoping that backtrace would be more useful. Something must be corrupting the stack. I'm not sure what.

@rdagger
Copy link
Author

rdagger commented Dec 8, 2022

Could there be an issue with running multiple QT PY's at the same time? I noticed they all use the same network hostname (espressif). I'll try giving them unique hostnames.

Also, could you please let me know what file I need to modify to disable the safe mode status LED. The edits I made above to supervisor/shared/safe_mode.c did not work.

@tannewt
Copy link
Member

tannewt commented Dec 9, 2022

@rdagger
Copy link
Author

rdagger commented Dec 9, 2022

Not sure if it's a coincidence but all 4 boards made it through the night without crashing since I gave them unique hostnames. I also tried rebooting my access point and running deauth attacks against the QT PY's. So far, they haven't crashed. I added a fifth board and I'll let them all run for a few days. Afterwards, if there are no crashes, I'll put the hostnames back to espressif and see if the problem recurs.

@rdagger
Copy link
Author

rdagger commented Dec 14, 2022

It's been 6 days and I have not had another core crash on any of the 5 QT PY's. I think the problem was due to running multiple QT PY boards with identical default hostnames on the same network. The problem may be specific to my Asus Wi-Fi router. Many of the simultaneous core crashes did seem to coincide with deauths in the router log.

@dhalbert
Copy link
Collaborator

I don't know that these kinds of errors should cause literal crashes, but that may be beyond our control.

@tannewt
Copy link
Member

tannewt commented Dec 15, 2022

I think it may be worth using the cpy-MAC hostname in dhcp in addition to mdns. That'd make them much more likely to be unique.

@rdagger
Copy link
Author

rdagger commented Dec 15, 2022

I generated a unique name using the least significant 19 bits of the microcontroller.cpu.uid
hostname = 'QTPY' + str(int.from_bytes(microcontroller.cpu.uid, 'little') >> 29)

Example: QTPY362041

That way it's easy to identify when using network tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants