-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredump inconsistency.. (IDFGH-676) #1650
Comments
I'm also interested in doing the same, pushing the core dump to AWS etc. I'll be watching here to see if you get additional clarification. One point I was curious about was whether enabling flash output of coredumps entirely disables part output. I'd prefer to have both enabled in the typical case where I'm debugging and looking at a serial terminal. |
Any chance for some light on the 2 questions, please ... |
Core dump module saves only that part of task's stack which is actually used at the moment of crash. So dump data size is much smaller than theoretical maximum.
Having in mind optimization mentioned above I see the following possible approaches:
When calculating core dump partition size you can add additional space to the minimal one for every task, for example by multiplying it by 1.15 coef. So the formula should look like
Command syntax for this: |
@gerekon To close this issue just 3 more points:
Thanks |
@gerekon In addition to the questions above, another request (or 2): #1 Would it be possible to add a mechanism whereby the user application can register area(s) of memory to be included in the coredump. It would be of great value to use if we had visibility of, for example, our stdout buffer since it would contain the last number of error messages before the crash. #2 Since you have access to the task names inside the TCBs would it be possible to add the logic to identify the tasks under "THREADS INFO" as well as "CORE DUMP MEMORY CONTENTS" |
Core dump will be aborted and no data are written to flash. Respective error message will be printed to UART.
There should be special API, but I do not see it in github repo (I will check this). In any case you can read actual length from the second 4 bytes of RAW data in flash. See comments at
I think yes. We will discuss and plan this feature.
This is also possible, but needs GDB update. We will discuss and plan this feature too. Thanks a lot for your suggestions! I think we can implement them. |
Apologies, could and should have found this info myself. I will close this issue now, and look forward to the extra feature to attach a memory buffer to the coredump |
@gerekon The "Current Thread Stack" stops with a message "Backtrace stopped: Cannot access memory at address 0x4000bd86" or "Backtrace stopped: previous frame identical to this frame (corrupt stack?)" In 1 case the "Current Thread Stack" has only a single entry before the "Backtrace stopped: Cannot access memory at address 0x4000bd86" message All help appreciated. |
0x4000bd86 points to a function in ROM code. Please see https://docs.espressif.com/projects/esp-idf/en/latest/api-guides/core_dump.html#rom-functions-in-backtraces. |
I have added the reference to esp32_rom.elf and it adds the ROM functions but does not help for a single entry with addresses: or: In neither of these cases I can identify the task and that makes it impossible to trace further. |
@ammaree Crash behavior seems to be unstable. Also |
HI @gerekon Thanks for your feedback, have increased the size of all task stacks so not the symptoms have changed. From the latest firmware built using IDF as at yesterday morning (27 Sep at 08h00 UTC) the coredump now (only on some crashes) yield even less info. The dump is empty and complains about overlapping reasons. I have attached a couple of examples and the resulting decoded dump. Any chance you can shed some light here? PS: The 4 in the bin/txt filenames indicate the reset cause |
@ammaree We are planning to update the core dump script in order to give more info upon such conditions (overlapping regions). In general this message can be seen when one of TCB/task stack interferes with another TCB or stack. This should not happen in usual case.
|
Hi @gerekon , very concerning situation. In all my own private code I only have 2 occurrences of recursion. Will again check the json_parser and Paho MQTT libraries, but nothing there as far as I know. |
Hi @gerekon I have been travelling for the last 4 weeks and returned to find 310 crash dump files generated by 38 different devices during the period. Thanks |
Yes.
Yes. you should use modified version because
No special updates for core dump is needed. Let me know if you have problems. |
A'luta continua.... @gerekon @igrr Any help to urgently resolve these crashes will be highly appreciated, even if I have to use a beta version that at least provide some information. Andre 30aea432bbfc_5_48C_1550822685_0.txt 30aea432c6e0_5_48C_1550806228_0.txt 30aea432c6ec_5_48C_1550787188_0.txt 30aea432c9b8_4_48C_1550684836_0.txt 30aea432c9c4_4_48C_1550684642_0.txt |
Upgraded last week to the latest IDF and coredump. Rebuilt the same application with the new IDF and deployed on some test bench devices. Trying again to redo the crash analysis but not having any luck. Latest coredump now just results in:
Also, the old version "espcoredump.py" now only shows 6 TCBs and stacks detected instead of the existing 18 with no error messages. This is close to becoming a small crisis, not being able to do crash analysis and fix problems. Any recommendations PLEASE? |
Gents @gerekon @igrr @projectgus have rechecked options my side. Using latest IDF but coredump boms out with version error. Older v2 (modified by @gerekon) produces some output but does not make any sense. Any input on the above problem. I am sitting with a number of core dumps that cannot be analyzed, and a regular stream of devices continuing to reboot... Please? |
@ammaree Could you post coredump file and ELF file of the application? |
I think I might have found the problem. The coredump does not seem to start with a 4 byte (uint32_t) magic number but starts directly with the size, followed by version, number of tasks and TCB size. Is this correct? |
Yes, there have been changes in format. For now you must use the proper version of |
Can I ask where you get this. I'm getting cannot use Version12, expected 1. if i disable version checking in espcoredump.py I get the error referring to growing up stacks not supported. I have a coredump but can't analyse it. any advice? |
Thanks for reporting, feel free to reopen. |
We are trying to understand the core dump module but need some clarity on a number of items. Some background first.
We have implemented a mechanism to track planned and unplanned (crash) restarts and to make an informed decision about automatically reverting to previous version of OTA firmware. Whenever an unplanned restart is detected the full core dump partition is uploaded to our cloud host using HTTP PUT functionality and uniquely identified using MAC address and time..
http://esp-idf.readthedocs.io/en/latest/api-guides/core_dump.html#save-core-dump-to-flash states that the core dump size is the sum of all the TCBs and all the stacks, plus a small additional overhead. Based on this we calculated a core dump partitions of ~140KB, and for safety we made it 192KB.
We configured the coredump to use logging level 5, expecting the amount of info in the coredump to be more, but similar to @nkolban in https://www.esp32.com/viewtopic.php?f=13&t=1099 , found that only the log info prior/during the core dump generation increases, the amount of info in the coredump stays the same.
Q1: How should we calculate the core dump partition size since 20 tasks with total of ~140k stack allocated only creates a core dump of ~16K ?
Q2: What is the exact syntax required to enable the 'espcoredump.py' utility to take its input from the uploaded binary file? We have tried using:
"python /c/Dropbox/devs/ws/z-sdk/esp-idf/components/espcoredump/espcoredump.py --chip esp32 info_corefile --core /c/Dropbox/devs/ws/z-appl/ewm-irmacos/coredump/30aea432bc00_1.bin --core-format raw --print-mem"
and a number of variations on the above but the help message says:
'usage: espcoredump info_corefile [-h] [--gdb GDB] [--core CORE]
[--core-format CORE_FORMAT] [--off OFF]
[--save-core SAVE_CORE] [--print-mem]
prog
espcoredump info_corefile: error: too few arguments"
The "prog" parameter is not documented anywhere but from the python source it appears to be the path to the ELF file.
Adding the path [+filename] to the command line causes an error message:
"Growing up stacks are not supported for now!
Failed to create corefile!"
What parameters are we missing or have wrong?
The text was updated successfully, but these errors were encountered: