-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in VFS esp_vfs_write / uart_write / _lock_acquire(release)_recursive (IDFGH-385) #2470
Comments
Could you please also attach the panic handler output which is sent to the console on crash? That is to see what kind of exception/crash this is. |
Apologies, but all these devices are headless and running at remote sites, so no UART output, the coredump is uploaded to a central host.
I started adding the code to log the reset cause/reason to the firmware but have not been able to deploy due to
#2474 (comment)
As soon as this resolved I will deploy new firmware to embed the reset code into the coredump name.
From: Ivan Grokhotkov <notifications@github.com>
Sent: Wednesday, 26 September 2018 03:03
To: espressif/esp-idf <esp-idf@noreply.github.com>
Cc: Andre M. Maree <andrem@kss.co.za>; Author <author@noreply.github.com>
Subject: Re: [espressif/esp-idf] Crash in VFS esp_vfs_write / uart_write / _lock_acquire(release)_recursive (#2470)
Could you please also attach the panic handler output which is sent to the console on crash? That is to see what kind of exception/crash this is.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#2470 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADE_mORtbQYlNxrZ2awawxg6EfWaLaoHks5uetJHgaJpZM4W470o>.
|
The backtraces suggest that this might be a stack overflow, but without seeing panic handler output it's impossible to tell exactly. Please try reproducing this on a device that you have physical access to. |
@igrr |
Ideally, the entire panic handler output. I understand that it's not feasible to get it remotely, hence my suggestion to try reproducing the same issue locally. |
I have 3 test devices running same firmware locally but not 1 has crashed so I assume it is usage related. The production devices are deployed in student hostels to control access to rooms and dispensing of hot/cold shower water hence the device functioning is 24hr and completely random on site. Maybe the panic handler output should be redirected to flash as well, together with coredump? |
I have not been able to get any local test devices (3 running for 48 hrs) to crash but have had 8 coredumps at the remote site (from 50 devices over 48hrs) of which 6 are reasonably similar fitting into 2 patterns. I have spent a lot of time trying to make sense of the current thread stack, but no luck. Some common trends can however be identified. Pattern 1 we have 3 maybe 4 reasonably similar occurrences, all zipped together The reset cause is the single digit in the filename just after the MAC address. All help appreciated to help stabilize this site. Thanks |
Hi @ammaree, sorry for the late response. What is the stack size of these two tasks where the issue happens? One common thing in every core dump is that the used stack size is 2568 or 2572 bytes at the point where panic handler is invoked. Which suggests that this might be a stack overflow if the stack size was set to 2560 bytes? |
Hi @igrr It is very difficult to know which task is running. Pattern 1: Both xMqttPublish() and xMqttPublishBuild() ----> xPrint() are part of the MQTTtx task and the stack size is large, ~7kB. But ultimately, based on #0 to __swbuf_r() I have no idea of the running task. Pattern 2: |
Closing this issue as well. Problem went away somewhere during last couple of weeks, possibly related to changes in vfs module |
----------------------------- Delete above -----------------------------
Environment
git rev-parse --short HEAD
to get the commit id.): LATEST (today)Problem Description
Repeated crash in uart_write / _lock_acquire_recursive AND/OR uart_write / _lock_release_recursive
Expected Behavior
Should not crash, must work consistently
Actual Behavior
Crash at irregular intervals on 4 out of 50+ devices,
Steps to reproduce
Nothing can be done to to reproduce consistently. Have had 6 crashes on 4 devices out of 50+ during a 24 hour period.
Code to reproduce this issue
Other items if possible
Have attached the sdkconfig, same for all crashes since exact same firmware
Have attached coredump (both binary(all zipped into 1) and decoded text) for 6 events
ELF file not attached.
Debug Logs
30aea432bc04_4p9_1537855759 esp_vfs_write.txt
30aea432c9a4_4p9_1537884881 esp_vfs_write.txt
30aea432c70c_4p9_1537848949 esp_vfs_write.txt
30aea432c95c_4p9_1537853193 esp_vfs_write.txt
30aea432c95c_4p9_1537859708 esp_vfs_write.txt
30aea432c95c_4p9_1537861983 esp_vfs_write.txt
sdkconfig.txt
uart_write.zip
The text was updated successfully, but these errors were encountered: