Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RGBMatrix, FrameBufferDisplay bugs #3344

Merged
merged 10 commits into from
Sep 2, 2020
Merged

Conversation

jepler
Copy link
Member

@jepler jepler commented Aug 28, 2020

There were a number of problems. First, when allocation failed during _PM_begin, a crash would occur if you tried to _PM_free the object. Second, there were problems with how allocation errors were signaled that presented additional problems for recovery. (it was doing longjmps out of the protomatter code, and it erroneously disabled interrupts, both of which made post- or mid-crash behavior extremely hard to fathom until noticing those things)

This does not fix 'the other' problem, which is that calling "release_displays()" can't actually release all the rgbmatrix-related storage back into the CircuitPython heap. This is because of the folllowing:

  • when a display is initially allocated, it is while circuitpython is running. It places all allocations on the CircuitPython heap
  • when the program reloads, the allocations are moved from the CP heap to the supervisor heap
  • so when CP starts again, the CP heap is smaller by that amount
  • but it is not possible to re-use the supervisor allocations from step 2, because of the current structure of heap allocations
    This can lead to an allocation failure on the second run that does not occur on a first run, part of why this crash seemed so spoopy.

Most of my testing was done with a setting where the FIRST allocation would fail, e.g., by allocating a 192x32 RGBMatrix. So it's possible I still haven't covered the problem with the original program. However, the original program never reproduced the problem reliably for me, it was more like 2-5% of the time.

Additional bugs acknowledged and fixed:

  • Setting a nonzero rotation led to a garbled display or hard crashes
  • Small displays (e.g., 16 pixels wide) caused safe mode resets
  • Unneeded memory allocations
  • Low-probability memory splatting during soft reset

@jepler jepler linked an issue Aug 28, 2020 that may be closed by this pull request
@tannewt tannewt added this to the 6.0.0 milestone Aug 28, 2020
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions. Thanks for looking into this!

ProtomatterStatus stat = _PM_init(&self->core,
self->width, self->bit_depth,
self->rgb_count/6, self->rgb_pins,
self->addr_count, self->addr_pins,
self->clock_pin, self->latch_pin, self->oe_pin,
self->doublebuffer, self->timer);

common_hal_rgbmatrix_rgbmatrix_deinit(self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why always deinit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is wrong and should be deleted.

@@ -84,31 +84,36 @@ void common_hal_rgbmatrix_rgbmatrix_reconstruct(rgbmatrix_rgbmatrix_obj_t* self,
_PM_FREE(self->core.screenData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the framebuffer case above call mp_obj_new_bytearray? Isn't the idea that the framebuf to use is provided?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. This line of preexisting code should probably also be deleted.

I encountered this when changing optimization flags for debugging
purposes.  The diagnostic appears spurious and unrelated to what
I'm debugging.
Otherwise, out of range writes would occur in tilegrid_set_tile, causing a safe mode reset.
```
Hardware watchpoint 6: -location *stack_alloc->ptr

Old value = 24652061
New value = 24641565
0x000444f2 in common_hal_displayio_tilegrid_set_tile (self=0x200002c8 <supervisor_terminal_text_grid>, x=1, y=1, tile_index=0 '\000')
    at ../../shared-module/displayio/TileGrid.c:236
236	    if (!self->partial_change) {
(gdb)
```
e.g., allocating a 192x32x6bpp matrix would be enough to trigger this
reliably on a Metro M4 Express using the "memory hogging" layout.
Allocating 64x32x6bpp could trigger it, but somewhat unreliably.

There are several things going on here:
 * we make the failing call with interrupts off
 * we were throwing an exception with interrupts off
 * protomatter failed badly in _PM_free when it was partially-initialized

Incorporate the fix from protomatter, switch to a non-throwing malloc
variant, and ensure that interrupts get turned back on.

This decreases the quality of the MemoryError (it cannot report the size
of the failed allocation) but allows CircuitPython to survive, rather
than faulting.
The expectations of displayio.Display and frambufferio.FramebufferDisplay
are different when it comes to rotation.

In displayio.Display, if you call core_construct with a WxH = 64x32
and rotation=90, you get something that is 32 pixels wide and 64 pixels
tall in the LCD's coordinate system.

This is fine, as the existing definitions were written to work like this.
With framebuffer displays, however, the underlying framebuffer (such as
RGBMatrix) says "I am WxH pixels wide in my coordinate system" and the
constructor is given a rotation; when the rotation indicates a transpose
that means "exchange rows and columns, so that to the Groups displayed
on it, there is an effectively HxW pixel region for use".

Happily, we already have a set_rotation method.  Thus (modulo the time
spent debugging things anyway:) the fix is simple: Always request no
rotation from core_construct, then immediately fix up the rotation
to match what was requested.

Testing performed: 32x16 RGBMatrix on Metro M4 Express (but using
the Airlift firmware, as this is the configuration the error was reported
on):
 * initially construct display at 0, 90, 180, 270 degrees
 * later change angle to 0, 90, 180, 270 degrees
 * no garbled display
 * no safe mode crashes
@tannewt noticed this in a pull request review.  The allocated
memory was never used, but the GC would have collected it eventually.
@jepler jepler changed the title Fix crash when allocation fails during rgbmatrix setup Fix RGBMatrix, FrameBufferDisplay bugs Sep 1, 2020
@jepler jepler requested a review from tannewt September 1, 2020 16:09
@jepler
Copy link
Member Author

jepler commented Sep 1, 2020

@tannewt I cracked the case so we don't need to have a discussion about how rotation is supposed to work. But maybe if you end up wanting to request changes we can take that to a chat and get it resolved quickly so this can be merged. I'm eager for this family of bugs to be fixed.

@PaintYourDragon
Copy link

I’ve not extensively tested yet, but have at least confirmed this resolves a huge and consistent problem with rotation that would previously cause a HardFault, or occasionally would run but graphics appeared in the wrong position on the matrix.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannewt I cracked the case so we don't need to have a discussion about how rotation is supposed to work. But maybe if you end up wanting to request changes we can take that to a chat and get it resolved quickly so this can be merged. I'm eager for this family of bugs to be fixed.

What was the issue? It's not clear to me why you changed what you did.

self->bufinfo.len = self->bufsize;
self->bufinfo.typecode = 'H' | MP_OBJ_ARRAY_TYPECODE_FLAG_RW;
}

memset(&self->core, 0, sizeof(self->core));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you clearing the core state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can probably be removed. In Arduino, either your program is starting fresh so the heap is zeroed OR you allocate storage with operator new which returns zero-filled storage. In CircuitPython, this storage could be filled with data left over from earlier. I wanted to eliminate the possibility that any bugs encountered were due to core containing something other than zeros when _PM_init is called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaintYourDragon do you want to say whether it is OK to call _PM_init with core pointing to a memory region full of junk, and if it's not we can deal with that as a protomatter bug rather than a circuitpython bug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this is fine. I was confused because core meant display core to me. Renaming this to protomatter_core would be clearer.

Also, I'd expect rgbmatrix_rgbmatrix_obj_t to be defined in a shared-module header, not shared-bindings because it is implementation specific.

@jepler
Copy link
Member Author

jepler commented Sep 2, 2020

@tannewt Here's the relevant commit message about the rotation fix:

The expectations of displayio.Display and frambufferio.FramebufferDisplay
are different when it comes to rotation.

In displayio.Display, if you call core_construct with a WxH = 64x32
and rotation=90, you get something that is 32 pixels wide and 64 pixels
tall in the LCD's coordinate system.

This is fine, as the existing definitions were written to work like this.
With framebuffer displays, however, the underlying framebuffer (such as
RGBMatrix) says "I am WxH pixels wide in my coordinate system" and the
constructor is given a rotation; when the rotation indicates a transpose
that means "exchange rows and columns, so that to the Groups displayed
on it, there is an effectively HxW pixel region for use".

Happily, we already have a set_rotation method.  Thus (modulo the time
spent debugging things anyway:) the fix is simple: Always request no
rotation from core_construct, then immediately fix up the rotation
to match what was requested.

Testing performed: 32x16 RGBMatrix on Metro M4 Express (but using
the Airlift firmware, as this is the configuration the error was reported
on):
 * initially construct display at 0, 90, 180, 270 degrees
 * later change angle to 0, 90, 180, 270 degrees
 * no garbled display
 * no safe mode crashes

@jepler
Copy link
Member Author

jepler commented Sep 2, 2020

I also tested Sharp Memory Display 400x240 in portrait/rotated mode and it works.

tannewt
tannewt previously approved these changes Sep 2, 2020
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as-is. You can merge if you like or update with my suggestions.

self->bufinfo.len = self->bufsize;
self->bufinfo.typecode = 'H' | MP_OBJ_ARRAY_TYPECODE_FLAG_RW;
}

memset(&self->core, 0, sizeof(self->core));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this is fine. I was confused because core meant display core to me. Renaming this to protomatter_core would be clearer.

Also, I'd expect rgbmatrix_rgbmatrix_obj_t to be defined in a shared-module header, not shared-bindings because it is implementation specific.

@jepler
Copy link
Member Author

jepler commented Sep 2, 2020

@tannewt OK check out the just-added commit, should make no difference to behavior, it's strictly moving that structure definition and renaming the 'core' member to 'protomatter'. 17a5a85

@jepler jepler requested a review from tannewt September 2, 2020 18:35
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better! Thank you!

@tannewt tannewt merged commit 786f4ed into adafruit:main Sep 2, 2020
@jepler jepler deleted the issue-3184 branch November 3, 2021 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Displaying External BMPs on the RGB Matrix is resulting in Safe Mode
3 participants