Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash but recovered on 1.6 #211

Closed
rdeutsch3 opened this issue Jul 4, 2024 · 7 comments
Closed

Crash but recovered on 1.6 #211

rdeutsch3 opened this issue Jul 4, 2024 · 7 comments

Comments

@rdeutsch3
Copy link

rdeutsch3 commented Jul 4, 2024

No concrete information on "why" but it does potentially roughly align with my router mysteriously rebooting. I noticed it crashed and rebooted when the up time was less than I expected.

Crash information recovered from EEPROM
Crash # 1 at 1152540083 ms
Restart reason: 254
Exception (0):
epc1=0x00000000 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

stack>>>

ctx: sys

sp: 3ffffb50 end: 3fffffb0
3ffffb50: 3ffffe90 3ffffeb8 0000021c 40233ccc
3ffffb60: 64732d73 3ffffeb8 3fff2930 4023993e
3ffffb70: 61685f04 745f0470 6c057063 6c61636f
3ffffb80: 00000000 00000000 00000000 00000000
3ffffb90: 00000000 00000000 00000000 00000000
3ffffba0: 00000000 00000000 00000000 00000000
3ffffbb0: 00000000 00000000 00000000 00000000
3ffffbc0: 00000000 00000000 00000000 00000000
3ffffbd0: 00000000 00000000 00000000 00000000
3ffffbe0: 00000000 00000000 00000000 00000000
3ffffbf0: 00000000 00000000 00000000 00000000
3ffffc00: 00000000 00000000 00000000 00000000
3ffffc10: 00000000 00000000 00000000 00000000
3ffffc20: 00000000 00000000 00000000 00000000
3ffffc30: 00000000 00000000 00000000 00000000
3ffffc40: 00000000 00000000 00000000 00000000
3ffffc50: 00000000 00000000 00000000 00000000
3ffffc60: 00000000 00000000 00000000 00000000
3ffffc70: 000c0011 3fff0001 00001191 3fff0014
3ffffc80: 3ffffe90 00000001 3fff2930 40236dcd
3ffffc90: 00000000 61685f04 745f0470 6c057063
3ffffca0: 6c61636f 00000000 00000000 00000000
3ffffcb0: 00000000 00000000 00000000 00000000
3ffffcc0: 00000000 00000000 00000000 00000000
3ffffcd0: 00000000 00000000 00000000 00000000
3ffffce0: 00000000 00000000 00000000 00000000
3ffffcf0: 00000000 00000000 00000000 00000000
3ffffd00: 00000000 00000000 00000000 00000000
3ffffd10: 00000000 00000000 00000000 00000000
3ffffd20: 00000000 00000000 00000000 00000000
3ffffd30: 00000000 00000000 00000000 00000000
3ffffd40: 00000000 00000000 00000000 00000000
3ffffd50: 00000000 00000000 00000000 00000000
3ffffd60: 00000000 00000000 00000000 00000000
3ffffd70: 00000000 00000000 00000000 00000000
3ffffd80: 00000000 00000000 00000000 00000000
3ffffd90: 00000000 000c0011 3f000001 00000000
3ffffda0: 00000082 3fff2a18 00000012 000000b2
3ffffdb0: 4024682c 3fff53ac 00000000 3fffc258
3ffffdc0: 40100ad4 00002000 c0035001 00000030
3ffffdd0: 000088f0 0000111e 3ffe8f10 40101012
3ffffde0: 00000016 3ffefdf8 00000020 3ffef650
3ffffdf0: 00000005 00000000 00000020 40100854
3ffffe00: 40251ba8 3fff2b24 00000005 40102864
3ffffe10: 3ffeb1f5 40105afb 3ffeec08 00000016
3ffffe20: 401033ef 3ffeec08 3ffefdf8 00000016
3ffffe30: fffffff9 58b88dd8 3ffef550 401035cc
3ffffe40: 3ffebaac 00000000 00000000 00000030
3ffffe50: fffffff9 58b88dd8 40103a86 00000100
3ffffe60: 3ffebaac 7fffffff 00002200 00000001
3ffffe70: 0000000a 00006208 3ffefdf8 00000030
3ffffe80: 3ffe0003 00000002 3fff6024 40239459
3ffffe90: 3fffff1a 00000001 3fff6024 40239459
3ffffea0: 3fff0000 00000000 00000000 3f000001
3ffffeb0: 3fff0000 00000000 00000000 4023976c
3ffffec0: 00000000 0000002d 3fffff10 40239ad5
3ffffed0: 00000200 00000a60 3ffe8f10 3fff24e4
3ffffee0: 3fffb73c 3fffb75a 3fff2930 40237139
3ffffef0: 00000000 00000000 00000000 00000000
3fffff00: 00000000 00000000 00000000 40247cc6
3fffff10: 00400000 002d0001 00000000 4023774e
3fffff20: 3fffb73c 3fffb75a 3fffb75a
Incomplete stack trace saved!
<<<stack<<<No more EEPROM space available to save crash information!

Flash CRC OK
Firmware Version: 1.6.0

ACKET(0x112918 @ 0x1DA2) Motion - NoData: [Zero: 0x00000000, Parity: 0xF]

[1150665555] RATGDO: Motion Detected
[1150665560] RATGDO: ENCODING 00000937 0000000000826539 00000080
[1150665659] RATGDO: reader completed packet
[1150665659] RATGDO: DECODED 00008C2F 00000000BE9600D2 4260B281
[1150665660] RATGDO: PACKET(0x9600D2 @ 0x8C2F) Status - Status: [DoorState Closed, Parity 0xB, Obs 1, Lock 0, Light 1]
[1150665669] RATGDO: tgt 1 curr 1
[1150670559] RATGDO: Motion Cleared
[1150670559] HomeKit: [Client 1073698068] Got characteristic 1.21 change event
[1150670560] RATGDO: SSE send to client 192.168.1.166 on channel 0, data: { "garageMotion": false, "upTime": 1150670560 }
[1150699741] HomeKit: [Client 1073698068] Get Characteristics
[1150761644] HomeKit: [Client 1073698068] Get Characteristics
[1150944407] HomeKit: wifiClient creation error, IP address is not set
[1150944410] HomeKit: [Client 1073698068] Get Characteristics
!!! [1150944411] HomeKit: [Client 1073698068] The socket is null! (or is closed)
!!! [1150944417] HomeKit: [Client 1073698068] The socket is null! (or is closed)
!!! [1150944423] HomeKit: [Client 1073698068] The socket is null! (or is closed)
[1151116295] HomeKit: wifiClient creation error, IP address is not set
[1151116296] HomeKit: [Client 1073698068] Disconnected!
[1151116296] HomeKit: [Client 1073698068] Closing client connection
[1151288034] HomeKit: wifiClient creation error, IP address is not set
[1151459898] HomeKit: wifiClient creation error, IP address is not set
[1151631737] HomeKit: wifiClient creation error, IP address is not set
[1151803502] HomeKit: wifiClient creation error, IP address is not set
[1151975590] HomeKit: wifiClient creation error, IP address is not set
[1152147454] HomeKit: wifiClient creation error, IP address is not set
[1152319418] HomeKit: wifiClient creation error, IP address is not set
[1152491084] HomeKit: wifiClient creation error, IP address is not set

Edit: Updated to include full stack, which is actually incomplete.

@dkerr64
Copy link
Collaborator

dkerr64 commented Jul 4, 2024

We have seen crashes when the WiFi network goes out, usually in the mDNS code. Did you clip out some of the stack dump? It looks incomplete. If you can attach the full crash log then we may be able to decode the stack dump and see if it is in mDNS.

The good news is that it should recover just fine. And you should delete the crash log (after you save a copy) so that there is room to save any future crashes.

thanks.

@rdeutsch3
Copy link
Author

rdeutsch3 commented Jul 5, 2024

We have seen crashes when the WiFi network goes out, usually in the mDNS code. Did you clip out some of the stack dump? It looks incomplete. If you can attach the full crash log then we may be able to decode the stack dump and see if it is in mDNS.

The good news is that it should recover just fine. And you should delete the crash log (after you save a copy) so that there is room to save any future crashes.

thanks.

Yes, I truncated the sttack as from my previous life we rarely looked at them and knew where the crash was from the messages. It was only us 'old hands' that even knew how to do it.

I've updated initial post with the entire stack, which it reports as incomplete.

The Wifi was actually still up as I've got several routers acting as WAPs, and only the main router decided to reboot (weirdly). The signal strength reported by the ratgdo tells me its not on the main router, and I just checked which router it is connected to as well. The ratgdo IP address is static, so it didn't get reassigned when the router came back up. It basically lost connectivity to the world, with the rest of my network for a couple minutes. I haven't looked at the code, does ratgdo talk with the outside world?

@jgstroud
Copy link
Collaborator

jgstroud commented Jul 5, 2024

The stack trace is actually a lot more helpful than the log messages. I'll try to analyze it tomorrow. No the device does not do any communication with the outside world

@jgstroud
Copy link
Collaborator

jgstroud commented Jul 5, 2024

Yep, it crashed in the MDNS Responder. We have seen this a number of times. It seems to happen during or immediately after a wifi outage. Seems to always recover. No root cause or solution yet. Thanks for the logs. Hopefully we'll figure it out at some point.

@dkerr64
Copy link
Collaborator

dkerr64 commented Oct 19, 2024

Possible fix in PR #242 by using IRAM heap to free up more memory.

@dkerr64
Copy link
Collaborator

dkerr64 commented Oct 26, 2024

Believe fixed in 1.8.0

@dkerr64 dkerr64 closed this as completed Oct 26, 2024
@rdeutsch3
Copy link
Author

So far so good. Thank you for the follow up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants