-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uart_mcux_lpuart: Enable Asynchronous UART API. #42750
uart_mcux_lpuart: Enable Asynchronous UART API. #42750
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks ok to me, there's some stuff I'd like to see comments/changes on before my +1. The serial driver changes are large so I'm leaving it up to others to review those as I'm not as familiar with it.
drivers/dma/dma_mcux_edma.c
Outdated
@@ -33,16 +33,42 @@ struct dma_mcux_edma_config { | |||
void (*irq_config_func)(const struct device *dev); | |||
}; | |||
|
|||
#ifdef CONFIG_DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this could be rearranged somehow to create a define for an attribute that might be nicer. It's a bit hard to grok at first glance.
The end goal would be removing the repeated declarations of tcdpool here with one that looks more like...
static __aligned(32) EDMA_TCD_ATTR tcdpool[...];
Where your ifdefs instead define the EDMA_TCD_ATTR as needed (nothing, __nocache, __dtcm_noinit_section, etc) to use perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is better! Will make the change.
drivers/dma/dma_mcux_edma.c
Outdated
static __aligned(32) __nocache edma_tcd_t | ||
tcdpool[DT_INST_PROP(0, dma_channels)][CONFIG_DMA_TCD_QUEUE_SIZE]; | ||
#else | ||
#warning tcdpool may be located in cacheable memory which can cause EDMA issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no cache flush/invalidate for imxrt? I imagine there is. Either it works and there's cache flush/invalidation work done in the driver to ensure it works, or it should be disallowed if it doesn't would be my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. using cacheable memory should not be allowed if the driver doesn't maintain cache coherency. I think it is best to force the tcdpool in non-cacheable memory, and this should be a #error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed--I was concerned about breaking the ability to compile projects which had used this in the past but did not have proper cache setup. I agree an error is more appropriate here so I will make this change.
drivers/dma/dma_mcux_edma.c
Outdated
/* Clear the done flag before the callback so a user can optionally | ||
* install (reload) the next handle during the callback if desired | ||
*/ | ||
if (transfer_done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an edma expert, but I'm wondering if there's more transfer descriptors to do in the tcd, will marking the current transfer as done like this cause the hardware to move to the next one before the callback is done? That seems to go against the prior intent here and could be problematic potentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does break my code, and DONE bit should not be cleared before the callback. I am updating an I2S driver to use scatter/gather mode of eDMA, and also added dma_reload() to the eDMA driver (I am working now to merge in your updates before submitting my PR). channel_irq() in this file is based on EDMA_HandleIRQ() from SDK eDMA driver, and we should be careful about diverging from the SDK driver. The SDK driver clears the DONE bit after the callback for a specific reason.
When using scatter/gather mode, EDMA_SubmitTransfer() has some logic for the coherency check when TCD registers are updated. That logic depends on the DONE bit set in some cases. The I2S test case I am working on is one of these cases, and the SubmitTransfer() does not work properly with this update where the DONE bit is cleared before the callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the UART ASYNC driver, the callback will trigger a call to DMA reload when the currently used buffer is full, and the callback requests from the user, via: UART_RX_BUF_REQUEST
If dma_reload is called when the done bit is set (as a result of uart_rx_buf_rsp
) then the DMA transfer is not restarted and the ping-pong transfer stops. This problem was mitigated by clearing the DONE bit before calling into the user callback.
I'll make the change back now to clear the done bit where as done in the SDK driver, but will have to look at how to correctly modify the UART driver so that transfers continue in this case.
drivers/dma/dma_mcux_edma.c
Outdated
switch (config->channel_direction) { | ||
case MEMORY_TO_MEMORY: | ||
case MEMORY_TO_MEMORY: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These brackets in this switch case aren't necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought uncrustify added these for me, but I will remove for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncrustify does some unfortunate things. I know the docs still mention it. Please do try the newer clang format setup if you are interested in doing auto formatting like me. See #41307
LOG_ERR("not support transfer direction"); | ||
return -EINVAL; | ||
} | ||
} | ||
|
||
if (!data_size_valid(config->source_data_size)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nice cleanup of this check
@NickolasLapp Thank you for the contribution. Can you please share how this feature was tested. |
drivers/dma/dma_mcux_edma.c
Outdated
@@ -33,16 +33,42 @@ struct dma_mcux_edma_config { | |||
void (*irq_config_func)(const struct device *dev); | |||
}; | |||
|
|||
#ifdef CONFIG_DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to change the name of this Kconfig symbol, to help avoid confusion. With the eDMA, there are 2 items concerned with cache coherency: the buffers that DMA read/write data to/from, plus the Transfer Control Descriptors (TCDs). For the lpuart driver, the buffers appear to be allocated by the app, and passed to the driver in mcux_lpuart_rx_enable() and mcux_lpuart_tx(). This symbol controls where the TCDs are placed, not the buffers.
#ifdef CONFIG_DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER | |
#ifdef CONFIG_DMA_MCUX_USE_DTCM_FOR_DMA_DESCRIPTORS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will make the change!
drivers/dma/Kconfig.mcux_edma
Outdated
@@ -26,4 +26,11 @@ config DMA_MCUX_TEST_SLOT_START | |||
help | |||
test slot start num | |||
|
|||
config DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment in dma_mcux_edma.c
config DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER | |
config DMA_MCUX_USE_DTCM_FOR_DMA_DESCRIPTORS |
drivers/dma/Kconfig.mcux_edma
Outdated
@@ -26,4 +26,11 @@ config DMA_MCUX_TEST_SLOT_START | |||
help | |||
test slot start num | |||
|
|||
config DMA_MCUX_USE_DTCM_FOR_DMA_BUFFER | |||
bool "Use DTCM for DMA buffers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool "Use DTCM for DMA buffers" | |
bool "Use DTCM for DMA descriptors" |
drivers/dma/Kconfig.mcux_edma
Outdated
bool "Use DTCM for DMA buffers" | ||
default y | ||
help | ||
When this option is activated, the buffers for DMA transfer are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this option is activated, the buffers for DMA transfer are | |
When this option is activated, the descriptors for DMA transfer are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing all these features. Clearing the eDMA DONE bit in the ISR must be done after the callback. Let me know if I can help with your code to work with that change. Other than that change, the rest of my feedback are comments.
drivers/dma/dma_mcux_edma.c
Outdated
/* Clear the done flag before the callback so a user can optionally | ||
* install (reload) the next handle during the callback if desired | ||
*/ | ||
if (transfer_done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does break my code, and DONE bit should not be cleared before the callback. I am updating an I2S driver to use scatter/gather mode of eDMA, and also added dma_reload() to the eDMA driver (I am working now to merge in your updates before submitting my PR). channel_irq() in this file is based on EDMA_HandleIRQ() from SDK eDMA driver, and we should be careful about diverging from the SDK driver. The SDK driver clears the DONE bit after the callback for a specific reason.
When using scatter/gather mode, EDMA_SubmitTransfer() has some logic for the coherency check when TCD registers are updated. That logic depends on the DONE bit set in some cases. The I2S test case I am working on is one of these cases, and the SubmitTransfer() does not work properly with this update where the DONE bit is cleared before the callback.
@NickolasLapp thank you for your contribution. @DerekSnell and I have been discussing the cache coherency changes you put into the UART driver when interfacing with the EDMA. I'm not sure that the proper place for those changes are in the UART driver as it seems like they should be put down into the EDMA driver. My thinking is that multiple stacks use the EDMA driver and would potentially have the same coherency issues and if we centralized the code in the edma driver all of the other drivers would get the same benefit. Otherwise, the code you added to the UART driver would need to be duplicated to the other drivers. Thoughts? |
Sorry all about the delay in getting back to your feedback. I got caught up in some other work things and just now got back to this. I appreciate the thorough review and will make the changes as requested
I agree that it would be preferable to include the cache clean / cache invalidate operations in the |
@mmahadevan108 I tested this on an RT1062EVKB. We have a zephyr project which is based closely on the RT1062EVK project in zephyr. I have 2 UART interfaces connected (lpuart0 and lpuart3) and I passed data with checksum back and forth between the two UARTs, confirming the correct data was received. |
01844a6
to
82e574b
Compare
@dleach02 and @DerekSnell , I was thinking a bit about this comment:
and remembered why I had not done this in the first place. My concern is that the dcache operations have to operate on a full cache line (32 bytes for the processor I'm working on), and so we round up the length for invalidate / clean operations as appropriate to ensure we always pass a multiple of the DCACHE line length to the This could be an issue, however, if the buffers passed to the EDMA module are not padded / aligned to the appropriate value. For example, if an end user did something like:
and passed I think this is probably a fine tradeoff given that the alternative is the data that's received is possibly not correct because the cache operations haven't been performed after the DMA rx, but this is definitely non-obvious behavior and not documented in the zephyr DMA page (which makes no mention of aligning the DMA buffers). I still think this functionality does belong inside the EDMA driver itself, but I'm not sure in general what the right behavior here is. |
drivers/dma/dma_mcux_edma.c
Outdated
/* Invoke callback function. */ | ||
if (handle->callback != NULL) { | ||
(handle->callback)(handle, handle->userData, | ||
transfer_done, tcds_done); | ||
} | ||
|
||
if (transfer_done) { | ||
handle->base->CDNE = handle->channel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back through and figured out why it was an issue to call EDMA_SubmitTransfer()
from within the callback ISR context.
If you look at fsl_edma.c
, considering the case of a ping-pong buffer, we end up in this block:
https://github.com/NXPmicro/mcux-sdk/blob/79cdc5799191a7411479a2617e28a191f4383276/drivers/edma/fsl_edma.c#L1176-L1207
In this block, we've correctly loaded the next DMA descriptor to point to, and now we are trying to set the ESG bit for the active DMA descriptor to enable it to continue on into the reloaded buffer. However, because we haven't cleared the DONE bit yet, writes to ESG are disabled and so it looks to the driver like the previous transfer finished (see the comment here).
I get around this in the PR that I pushed where I just delay reloading the DMA reload until we're out of ISR context in the case that dma_reload
is called from w/in ISR, but this feels a bit ugly to me. Happy to take suggestions if you have thoughts about a better way to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @NickolasLapp , you are right about this. I have been working on my PR #43227 and ran into the same issue with eDMA. I learned that EDMA_SubmitTransfer() was not intended to be called within the eDMA callback. Working with the NXP SDK team to enable this, their solution was similar to yours, and clears the DONE bit in the ISR before the callback. I have submitted their updated driver in the hal_nxp PR fsl_edma driver fix scatter/gather mode. This fixes the issue I had, and I think it will work for you as well.
Our PRs overlap with the DMA changes, so I have been waiting for yours to merge before rebasing and submitting mine. One change we need to make to dma_mcux_edma.c is call the IRQ handler in fsl_edma.c, rather than the copy channel_irq(). This will help keep Zephyr's eDMA driver in sync with the SDK driver. I have this change in my PR. But if you want to resolve this DONE bit problem in your PR, then you can first use the attached patch drivers-dma-mcux_edma-usr-ISR-from-NXP-SDK.zip to update the IRQ handler in dma_mcux_edma.c, and then point the west.yml to hal_nxp PR. Or I am happy to handle this in my PR if you want :)
Thanks again for your contributions and the thought you have put into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I think I've got this correctly updated to use the changes from the hal_nxp PR (#144), and to use the HandleIRQ
function from within the driver itself instead of the custom channel_irq
. I tested with the driver fixes on my board and all seems to work as intended.
Hi @NickolasLapp , Thank you |
I spent some time this morning digging into this, and the more I think about it, the more I think it's not feasible to do the DCACHE invalidate/clean operations from within the EDMA driver itself. The parameters to the callback Instead, what I think makes sense (and I added a commit which implements this behavior) is that we just disallow DMA transfers to memory locations located in cacheable memory. I think this is probably the use case for 90% of users anyway, and it seems to be what's recommended in the MCUX example projects. Let me know what you think about this commit or if there is anything else I can do to work towards getting this PR merged. |
Thanks for the hint at @danieldegrasse! I was able to get this working with your tips. After getting the external loopback cable connected and resolving the cache issues, this unit tests exposed a couple bugs in the serial implementation, as well as a few functional differences to the nominal serial drivers as written for the unit tests. I fixed these up so that all unit tests are now passing and I get the desired: Thanks a lot for your help, and please let me know if there's anything else you notice that should be tested/fixed! |
@danieldegrasse looks like this PR is failing the twister unit test build because some of the RT parts don't have the DMA cells/channels defined for enabled UARTs. I can't be sure without checking all of the applicable reference manuals, however, that the dma mux channel assignments are the same. Additionally, I have no way of testing that these targets actually work since I don't have the applicable hardware. What are your thoughts on best way forward here? |
@NickolasLapp I've verified your latest push on the RT1060 EVK, and it works for me as well. Thank you for getting the implementation updated to pass that test.
Given that this PR effectively enables the asynchronous UART API for all in tree boards with an LPUART peripheral, we'd like to run twister against it using our internal test range. To do that all boards that contain LPUART peripherals will need to have DMA definitions added and the correct pinmux settings applied. What I was considering was opening a separate PR with the required DTS changes and pinmux settings for all boards, that you can base your PR off of. That PR could be merged into |
@danieldegrasse That works great for me. I imagine it should be relatively quick to merge through the edma channel / pinmux settings, and then I can rebase as soon as that lands. Please just post here and let me know when the PR is available with these changes and I can rebase as appropriate. |
@NickolasLapp Just wanted to give you an update- I'm finished the pinmux settings, but during testing I ran into issues with the async API on the RT1170 EVK, and I'm debugging those. Will update you once I've narrowed down the issue. |
@NickolasLapp I've figured out a solution for the RT11xx series boards- moving buffers to noncacheable OCRAM results in the test passing. I've opened #43885 to enable that functionality, and provide appropriate pinmuxes for all RT series boards that should be tested with the uart async api. I'll update you once that PR merges, and we can proceed from there. |
@NickolasLapp FYI- #43885 is merged, so this PR should be ready for testing once it is rebased against the latest main. |
8b9fda7
to
565a646
Compare
Thanks for the update @danieldegrasse .
Looking forward to getting this merged finally! :) |
@NickolasLapp I've tested this PR on the RT1170, RT1064, RT1050, and RT1024 EVKs, and it passes on all of them. Can you squash your commits appropriately so that they will pass CI? |
565a646
to
ff534f7
Compare
Squashed and pushed @danieldegrasse ! Thanks for testing |
@NickolasLapp just a heads up: you can check for issues like trailing whitespace using the Also, can you split the changes to the EDMA driver into a separate commit? This will make traceability easier, since the changes within the EDMA driver are intended to properly support scatter/gather mode, which is required for this PR but a separate enhancement from enabling the UART async API itself. |
This PR fixes up the Scatter-Gather EDMA mode for the MCUX EDMA Driver, as well as enabling the dma reload feature for the same EDMA Driver. Signed-off-by: Nickolas Lapp <nickolaslapp@gmail.com>
This PR enables the Ansynchronous UART API for using the MCUX drivers. It is tested on the RT1062EVKB. Signed-off-by: Nickolas Lapp <nickolaslapp@gmail.com>
ff534f7
to
a7dc108
Compare
Got it. 🤞 that it runs now and all tests pass. Checkpatch passes locally:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. Thanks for all your hard work, @NickolasLapp !
…l be removed when PR merged zephyrproject-rtos#42750
Waiting to resolve cache coherence discussion in PR zephyrproject-rtos#42750
This PR enables the Ansynchronous UART API for using the MCUX drivers.
It is tested on the RT1062EVKB. In order to make this functionality
possible, the MCUX EDMA driver is also exteneded to enable the
dma_reload functionality.
Signed-off-by: Nickolas Lapp nickolaslapp@gmail.com