Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w5500 phy core dump (IDFGH-5876) #7575

Closed
achao1104 opened this issue Sep 17, 2021 · 8 comments
Closed

w5500 phy core dump (IDFGH-5876) #7575

achao1104 opened this issue Sep 17, 2021 · 8 comments
Labels
Resolution: Cannot Reproduce Issue cannot be reproduced Status: Done Issue is done internally

Comments

@achao1104
Copy link

Hello,

a couple of running, this core dump will occur irregularly

==================== THREAD 2 (TCB: 0x3ff9fb8c, name: 'Tmr Svc') =====================
#0 0x4001a8d4 in ?? ()
#1 0x400312cc in vPortExitCritical (mux=0x3ffec068) at D:/project/esp32/es-idf-v4.3-release/components/freertos/port/xtensa/port.c:472
#2 0x4002ef42 in xQueueGenericSend (xQueue=0x3ffec01c, pvItemToQueue=, xTicksToWait=, xCopyPosition=0) at D:/project/esp32/es-idf-v4.3-release/components/freertos/queue.c:896
#3 0x40109818 in esp_event_post_to (event_loop=0x3ffec000, event_base=, event_id=, event_data=, event_data_size=, ticks_to_wait=0) at D:/project/esp32/es-idf-v4.3-release/components/esp_event/esp_event.c:865
#4 0x40108d42 in esp_event_post (event_base=, event_id=2, event_data=0x3ffdb1e0, event_data_size=4, ticks_to_wait=0) at D:/project/esp32/es-idf-v4.3-release/components/esp_event/default_event_loop.c:83
#5 0x400a3ea0 in eth_on_state_changed (eth=, state=, args=0x0) at D:/project/esp32/es-idf-v4.3-release/components/esp_eth/src/esp_eth.c:126
#6 0x400a5c04 in w5500_update_link_duplex_speed (w5500=0x3fff3adc) at D:/project/esp32/es-idf-v4.3-release/components/esp_eth/src/esp_eth_phy_w5500.c:93
#7 0x400a5e00 in w5500_get_link (phy=0x3fff3adc) at D:/project/esp32/es-idf-v4.3-release/components/esp_eth/src/esp_eth_phy_w5500.c:116
#8 0x400a48c8 in eth_check_link_timer_cb (/builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gdb/gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion frame_id_p (*this_id)' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) [answered Y; input not from terminal] /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gdb/gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion frame_id_p (*this_id)' failed.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Sep 17, 2021
@github-actions github-actions bot changed the title w5500 phy core dump w5500 phy core dump (IDFGH-5876) Sep 17, 2021
@achao1104
Copy link
Author

the full list as blow

===============================================================
==================== ESP32 CORE DUMP START ====================

Crashed task handle: 0x3ffdce90, name: 'tiT', GDB name: 'process 1073598096'

================== CURRENT THREAD REGISTERS ===================
exccause 0x1d (StoreProhibitedCause)
excvaddr 0x0
epc1 0x40108196
epc2 0x0
epc3 0x0
epc4 0x0
epc5 0x0
epc6 0x0
epc7 0x0
eps2 0x60520
eps3 0x0
eps4 0x0
eps5 0x0
eps6 0x0
eps7 0x4008926c
pc 0x4002e0a1 0x4002e0a1 <panic_abort+21>
lbeg 0x0 0
lend 0x0 0
lcount 0x0 0
sar 0x8 8
ps 0x60e20 396832
threadptr
br
scompare1
acclo
acchi
m0
m1
m2
m3
expstate
f64r_lo
f64r_hi
f64s
fcr
fsr
a0 0x8002e874 -2147293068
a1 0x3ffdccd0 1073597648
a2 0x3ffdcd10 1073597712
a3 0x3ffdcd5d 1073597789
a4 0xa 10
a5 0x3f016fcc 1057058764
a6 0x40000aa0 1073744544
a7 0x0 0
a8 0x0 0
a9 0x1 1
a10 0x3ffdcd5d 1073597789
a11 0x3ffdcd5d 1073597789
a12 0xa 10
a13 0xa 10
a14 0x3ffdccb9 1073597625
a15 0x0 0

==================== CURRENT THREAD STACK =====================
#0 0x4002e0a1 in panic_abort (details=0x3ffdcd10 "abort() was called at PC 0x40019fbe on core 0") at D:/project/esp32/es-idf-v4.3-release/components/esp_system/panic.c:367
#1 0x4002e874 in esp_system_abort (details=0x3ffdcd10 "abort() was called at PC 0x40019fbe on core 0") at D:/project/esp32/es-idf-v4.3-release/components/esp_system/system_api.c:112
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

======================== THREADS INFO =========================
Id Target Id Frame

  • 1 process 1073598096 0x4002e0a1 in panic_abort (details=0x3ffdcd10 "abort() was called at PC 0x40019fbe on core 0") at D:/project/esp32/es-idf-v4.3-release/components/esp_system/panic.c:367
    2 process 1073347468 0x4001a8d4 in ?? ()
    3 process 1073347124 0x40108196 in esp_pm_impl_waiti () at D:/project/esp32/es-idf-v4.3-release/components/hal/esp32s2/include/hal/cpu_ll.h:202
    4 process 1073615160 0x4002f302 in xQueueReceive (xQueue=0x3ffecd34, pvBuffer=0x3ffe1100, xTicksToWait=) at D:/project/esp32/es-idf-v4.3-release/components/hal/esp32s2/include/hal/cpu_ll.h:33
    5 process 1073345240 0x4002f456 in xQueueSemaphoreTake (xQueue=0x3ffed444, xTicksToWait=) at D:/project/esp32/es-idf-v4.3-release/components/hal/esp32s2/include/hal/cpu_ll.h:33
    6 process 1073610440 0x4002f302 in xQueueReceive (xQueue=0x3ffec464, pvBuffer=0x3ffdfe88, xTicksToWait=) at D:/project/esp32/es-idf-v4.3-release/components/hal/esp32s2/include/hal/cpu_ll.h:33
    7 process 1073687028 0x4001a8d4 in ?? ()
    8 process 1073605996 0x4002f302 in xQueueReceive (xQueue=0x3ffed558, pvBuffer=0x3ffded34, xTicksToWait=) at D:/project/esp32/es-idf-v4.3-release/components/hal/esp32s2/include/hal/cpu_ll.h:33
    9 process 1073344512 0x4001a8d4 in ?? ()
    10 process 1073620876 0x4002f302 in xQueueReceive (xQueue=0x3ffe12f0, pvBuffer=0x3ffe2750, xTicksToWait=) at D:/project/esp32/es-idf-v4.3-release/components/hal/esp32s2/include/hal/cpu_ll.h:33
    11 process 1073601260 0x4002f456 in xQueueSemaphoreTake (xQueue=0x3ffebf50, xTicksToWait=) at D:/project/esp32/es-idf-v4.3-release/components/hal/esp32s2/include/hal/cpu_ll.h:33

==================== THREAD 1 (TCB: 0x3ffdce90, name: 'tiT') =====================
#0 0x4002e0a1 in panic_abort (details=0x3ffdcd10 "abort() was called at PC 0x40019fbe on core 0") at D:/project/esp32/es-idf-v4.3-release/components/esp_system/panic.c:367
#1 0x4002e874 in esp_system_abort (details=0x3ffdcd10 "abort() was called at PC 0x40019fbe on core 0") at D:/project/esp32/es-idf-v4.3-release/components/esp_system/system_api.c:112
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

==================== THREAD 2 (TCB: 0x3ff9fb8c, name: 'Tmr Svc') =====================
#0 0x4001a8d4 in ?? ()
#1 0x400312cc in vPortExitCritical (mux=0x3ffec068) at D:/project/esp32/es-idf-v4.3-release/components/freertos/port/xtensa/port.c:472
#2 0x4002ef42 in xQueueGenericSend (xQueue=0x3ffec01c, pvItemToQueue=, xTicksToWait=, xCopyPosition=0) at D:/project/esp32/es-idf-v4.3-release/components/freertos/queue.c:896
#3 0x40109818 in esp_event_post_to (event_loop=0x3ffec000, event_base=, event_id=, event_data=, event_data_size=, ticks_to_wait=0) at D:/project/esp32/es-idf-v4.3-release/components/esp_event/esp_event.c:865
#4 0x40108d42 in esp_event_post (event_base=, event_id=2, event_data=0x3ffdb1e0, event_data_size=4, ticks_to_wait=0) at D:/project/esp32/es-idf-v4.3-release/components/esp_event/default_event_loop.c:83
#5 0x400a3ea0 in eth_on_state_changed (eth=, state=, args=0x0) at D:/project/esp32/es-idf-v4.3-release/components/esp_eth/src/esp_eth.c:126
#6 0x400a5c04 in w5500_update_link_duplex_speed (w5500=0x3fff3adc) at D:/project/esp32/es-idf-v4.3-release/components/esp_eth/src/esp_eth_phy_w5500.c:93
#7 0x400a5e00 in w5500_get_link (phy=0x3fff3adc) at D:/project/esp32/es-idf-v4.3-release/components/esp_eth/src/esp_eth_phy_w5500.c:116
#8 0x400a48c8 in eth_check_link_timer_cb (/builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gdb/gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion frame_id_p (*this_id)' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) [answered Y; input not from terminal] /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gdb/gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion frame_id_p (this_id)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) [answered Y; input not from terminal]
Traceback (most recent call last):
File "D:\project\esp32\es-idf-v4.3-release\components\espcoredump\espcoredump.py", line 295, in
info_corefile()
File "D:\project\esp32\es-idf-v4.3-release\components\espcoredump\espcoredump.py", line 156, in info_corefile
task_name = gdb.get_freertos_task_name(tcb_addr)
File "D:\project\esp32\es-idf-v4.3-release\components\espcoredump\corefile\gdb.py", line 94, in get_freertos_task_name
val = self._gdbmi_data_evaluate_expression('(char
)((TCB_t *)0x%x)->pcTaskName' % tcb_addr)
File "D:\project\esp32\es-idf-v4.3-release\components\espcoredump\corefile\gdb.py", line 89, in _gdbmi_data_evaluate_expression
'done', 'result')['payload']['value']
File "D:\project\esp32\es-idf-v4.3-release\components\espcoredump\corefile\gdb.py", line 84, in _gdbmi_run_cmd_get_one_response
return self._gdbmi_run_cmd_get_responses(cmd, resp_message, resp_type, multiple=False)[0]
File "D:\project\esp32\es-idf-v4.3-release\components\espcoredump\corefile\gdb.py", line 68, in _gdbmi_run_cmd_get_responses
more_responses = self.p.get_gdb_response(timeout_sec=0, raise_error_on_timeout=False)
File "D:\project\esp32.espressif\python_env\idf4.3_py3.7_env\lib\site-packages\pygdbmi\gdbcontroller.py", line 269, in get_gdb_response
self.verify_valid_gdb_subprocess()
File "D:\project\esp32.espressif\python_env\idf4.3_py3.7_env\lib\site-packages\pygdbmi\gdbcontroller.py", line 177, in verify_valid_gdb_subprocess
% str(self.gdb_process.poll())
pygdbmi.gdbcontroller.NoGdbProcessError: gdb process has already finished with return code: 3

@achao1104
Copy link
Author

esp32s2+w5500
we use the sample in idf sdk, and turn on the power of w5500 and init the stack according to sdk and then read some data from slave device via ipv6 and then deinit w5500 and then power off w5500. and then wait for 30 seconds and power on w5500 again...
it can run smoothly but a couple of hours, it will panic. the log is like above.

@achao1104
Copy link
Author

idf verson v4.3.1 release

@achao1104
Copy link
Author

add more log.

assertion "dhcp_inc_pcb_refcount(): memory leak" failed: file "D:/project/esp32/es-idf-v4.3-release/components/lwip/lwip/src/core/ipv4/dhcp.c", line 232, function: dhcp_inc_pcb_refcount

abort() was called at PC 0x40019fbe on core 0

Backtrace:0x4002e09e:0x3ffdcce0 0x4002e871:0x3ffdcd00 0x4003810e:0x3ffdcd20 0x40019fbe:0x3ffdcd90 0x4001a42b:0x3ffdcdb0 0x400b2a7a:0x3ffdcde0 0x400b3757:0x3ffdce00 0x400a6da8:0x3ffdce20 0x400a6997:0x3ffdce40 0x400363ad:0x3ffdce60
0x4002e09e: panic_abort at D:/project/esp32/es-idf-v4.3-release/components/esp_system/panic.c:367

0x4002e871: esp_system_abort at D:/project/esp32/es-idf-v4.3-release/components/esp_system/system_api.c:112

0x4003810e: abort at D:/project/esp32/es-idf-v4.3-release/components/newlib/abort.c:46

0x400b2a7a: dhcp_inc_pcb_refcount at D:/project/esp32/es-idf-v4.3-release/components/lwip/lwip/src/core/ipv4/dhcp.c:232
(inlined by) dhcp_inc_pcb_refcount at D:/project/esp32/es-idf-v4.3-release/components/lwip/lwip/src/core/ipv4/dhcp.c:229

0x400b3757: dhcp_start at D:/project/esp32/es-idf-v4.3-release/components/lwip/lwip/src/core/ipv4/dhcp.c:856

0x400a6da8: esp_netif_dhcpc_start_api at D:/project/esp32/es-idf-v4.3-release/components/esp_netif/lwip/esp_netif_lwip.c:1035

0x400a6997: esp_netif_api_cb at D:/project/esp32/es-idf-v4.3-release/components/esp_netif/lwip/esp_netif_lwip.c:107

0x400363ad: tcpip_thread_handle_msg at D:/project/esp32/es-idf-v4.3-release/components/lwip/lwip/src/api/tcpip.c:208
(inlined by) tcpip_thread at D:/project/esp32/es-idf-v4.3-release/components/lwip/lwip/src/api/tcpip.c:154

@david-cermak
Copy link
Collaborator

Hi @achao1104

This looks like a memory corruption issue, the first backtrace points to store prohibited exception in freertos queues when posting link status event, the other is clearly a memory issue in a pcb ref count.
Is the issue reproducible also on a simple example code? Could you please share some portion of the code that we can test?
I'd suggest looking into memory corruption debugging or posting a minimal reproducible sample application.

esp32s2+w5500: we use the sample in idf sdk, and turn on the power of w5500 and init the stack according to sdk and then read some data from slave device via ipv6 and then deinit w5500 and then power off w5500. and then wait for 30 seconds and power on w5500 again... it can run smoothly but a couple of hours, it will panic. the log is like above.

Tried exactly this (exchanging data over IPv6 with udp sockets), controlling the power of w5500 with a relay, running it for some hours already with no issues (so far, also altered the check_link_period_ms to try to hit the first issue). Here's my test code:

https://gist.github.com/david-cermak/87462309176722d6051ac2bd56e5d32b

Running this example_test.py IPv6 as the server part.

PTAL at the test code, feel free to modify it according to your use case.

@achao1104
Copy link
Author

thanks for your feedback, in our case, w5500 is connected to a slave device by point-2-point cable, thus, no router between them. we use ipv6.
we have fixed this issue by disable dhcpc since we don't need to use dhcpc.

@david-cermak
Copy link
Collaborator

@achao1104 Could you please close the issue, then?

no router between them.

Also tested with no router, just a switch, so direct connection without any ND6 messages (added neighbors addresses to the internal cache)

we use ipv6.

The test code above also uses IPv6, the device acting as a udp client.

we have fixed this issue by disable dhcpc

Very likely you've just postponed the problem, i.e. reduced the probability of hitting this memory issue.

@achao1104
Copy link
Author

thanks, we will simplify the codes and test again.

@espressif-bot espressif-bot added Resolution: Cannot Reproduce Issue cannot be reproduced Status: Done Issue is done internally and removed Status: Opened Issue is new labels Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Cannot Reproduce Issue cannot be reproduced Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

3 participants