-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Presence of boot.py in many cases will cause a boot loop on ESP32-S2 with TinyUF2 (rc, not beta) #5305
Comments
I think it's more complicated... if I start clean, empty boot & empty code will work repeatedly. It seems like there's some unexpected state being retained in some cases, and once the error triggers, it will keep triggering across hard resets until boot.py is removed. It would almost have to be something in flash? Another behavior I hadn't seen before: Some non-trivial |
Can you replicate this with a DEBUG build when it is outputting to the debug UART? |
I haven't done a DEBUG build since it became too big to generate, but I'll ask around to see how to make it fit. |
Paulsk cut the wifi buffers in half and made it fit: https://discord.com/channels/327254708534116352/537365702651150357/881908023436640356 |
Switched to Cucumber RIS since Funhouse doesn't have debug pins exposed.
Initial test with no
Creating an empty
The gibberish varies each time. No boot loop in this case. Seems like something is getting clobbered. (The |
If I put a larger
Oh, and this appears earlier in the debug console every time when boot looping:
These warning messages were not present in the case in the previous comment. |
Manual bisect shows earliest S3 artifact with this issue is: |
Backing out just the "-1" and "+1" additions on lines 133 and 142 respectively in |
What are the contents of your boot.py and code.py? This does sound like the issue we thought we had fixed. |
I've been testing recently only with an empty I'm trying but having difficulty coming up with a minimal reproducible example for the initial @Neradoc replicated the boot looping on FunHouse with an empty Necessary but insufficient conditions seem to be:
Bu as far as I can tell, it needs those things but also has to be triggered by a
Once triggered, you can even go into safe mode and reduce
If we suspect corruption, a number of variables like build differences, port and board differences, etc., may affect the observed behavior. I'll keep trying to get an example for an initial Addendum: Addendum 2: Addendum 3:
|
I don't really understand what that PR does, or the issues that it addressed. Any ideas or intuition about what kinds of code might trigger this... large code size or complex code to interpret? big buffers? fragmentation? stack vs heap vs pystack vs esp-idf? other? I finally bodged together 600+ lines of code snippets that will consistently trigger the issue, but it's ugly and introduces variables because a lot of modules get exercised. With some direction, I might be able to produce a more targeted example. |
The issue we were trying to fix was a rare case where the ATB reads into the FTB. The ATB tracks block status and the FTB tracks whether each block has a finalizer. If the first block of the FTB looks like a TAIL block from the ATB, then the collector will read past the end of RAM and hang the chip. Once hung, the WDT will reset the chip after a time. So it is a specific case of allocating something at the top of the heap and allocating something at the bottom with a finalizer. I talk about this in more detail in this stream: https://www.youtube.com/watch?v=DQVXNNUGvbk I think the next step would be to print out the structure of the heap here and you can validate the addresses read here. The addresses (ptrs) should always be less than |
That's all way over my head. I missed that deep dive, so I'll watch and see what I can learn. |
I wasn't able to replicate this on an UM FeatherS2. Have you only replicated it on devices with a screen? |
It affects any ESP32-S2 device I've tried so far: MagTag, FunHouse, Cucumber RS (no display). If I delete some code, doesn't seem to matter which code, more like how much, it's often fine again. Just verified that the latest build: FeatherS2 seems harder to trigger, but I don't know why. It has more PSRAM, but it's not like anything I've done fills the 2MB of PSRAM in the other boards. I'd suggest testing on Saola or something else WROVER-based. Addendum: [EDIT: note that an empty (at least) |
@anecdata I think you mentioned on Monday that TinyUF2 is not necessary, at least with the example above? So let's edit the title, etc. |
Just a note that in the large example, there are functions and function code that are never executed, but if I delete too much of that, the issue goes away. That leads me to think it has more to do with code size or complexity, and the interpretation to byte code stage. And indeed, if I have a |
At what point does it fail with that I also see you are enabling the WDT yourself. Please post more of your debug output. The S2 has multiple watchdogs and it should tell us which of the two reset the chip for you. |
I don't think the user watchdog is required, it was just a handy snippet. The code will run and soft-reload fine. What it won't do is recover from a hard reset (or other behavior causing an auto-reset) after it's done a
If I rip the watchdog code out of the example and replace it with a couple of unused functions, it will still boot loop on reset:
|
CircuitPython version
Code/REPL
None
Behavior
On an ESP32-S2 device (Fun House in this case, but on others as well) without TinyUF2, device behaves as expected.On the same device with TinyUF2,starting clean from a freshstorage.erase_filesystem()
, witheitheremptyboot.py
+emptysome (but not all) combinations ofcode.py
, ortrivial andnon-trivialboot.py
+code.py
, when reset the device will display the Blinka icon, then go into an apparent loop where nothing else is displayed (regardless of any print statements inboot.py
) and the violet status lights and the red LED will flash every second.To recover, go into safe mode and delete or rename the
boot.py
file, then hard reset.Saw similar behaviors on
rc.0
, but not on variousbeta
versions or prior.UPDATE:
Further testing indicates that the presence of TinyUF2 does not seem relevant, the issue can be triggered with or without TinyUF2.
The text was updated successfully, but these errors were encountered: