-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spi_flash] handle reentrance gracefully #4397
Conversation
Looks like an HTTPS misconfiguration is blocking the CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look fine to me, but did not test directly.
while (!common_hal_busio_spi_try_lock(&supervisor_flash_spi_bus)) {} | ||
common_hal_digitalio_digitalinout_set_value(&cs_pin, false); | ||
static bool flash_enable(void) { | ||
if (common_hal_busio_spi_try_lock(&supervisor_flash_spi_bus)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth retrying X amount of times even? I know adafruit_bus_device
had a routine to catch background tasks and keep trying for the lock, I'm not familiar enough to say for sure this needs it just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that instead of the busy-wait in https://github.com/adafruit/Adafruit_CircuitPython_BusDevice/blob/master/adafruit_bus_device/spi_device.py#L73, it's better to throw an exception when the SPI is locked? Then the user would have control over how many times, if at all, to retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looping over common_hal_busio_spi_try_lock()
in C will not succeed if it doesn't work the first time: If foreground Python code has locked the bus, it has to be able to proceed in order to unlock the bus. I don't think we ever un/lock the bus from an interrupt context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jepler doesn't the same apply to adafruit_bus_device
? Or may run_background_tasks()
lock SPI in one tick, and unlock it in another?
I added it three years ago... Have you reproduced an issue with this or just suspect a problem? I think the assumption is that it'd be incorrect for this code to occur when the spi bus is locked. Having this be a loop allows you to detect the error with a debugger when developing. |
My particular use case was attempting a file access in an interrupt handler, and observing that it locks up the board if the interrupt happens during screen update.
There is nothing to prevent this code from being called when the SPI bus is locked: interrupts notwithstanding, the Python user can lock the bus via |
@tannewt It seems the most likely place for this to be encountered in practice is with the Meowbit, which shares the SPI bus between the flash and the screen (I assume this is where you first encountered the problem, @tyomitch). Specifically, when a module tries to access the flash while the screen is updating, it causes an irrecoverable hang. Then you need to figure out how to manually wipe the NOR flash somehow, like introducing an intentional exception into your build. I just ran into it with the AudioPWMIO PR - it's not fun to deal with. |
If there are unwanted wider consequences for this change, and since the Meowbit is just one board (and a historically annoying one, for this same reason), this could be folded into an optional mpconfigboard macro so it doesn't apply to other boards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Code that calls transfer
should already be able to handle failures. Thanks! Looks like it just needs a final merge.
Report failure instead of deadlocking the device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you @tyomitch ! |
Report failure instead of deadlocking the device