-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers: ethernet: stm32h7 IT based ethernet TX #27188
drivers: ethernet: stm32h7 IT based ethernet TX #27188
Conversation
Please add @erwango also. I will look at your PR tomorrow also. Thank you for providing it. I hope, my prepared repository helped you also. |
Hey Jeremy @lochej. Thank you very much for your new approach. I like it. :) As I see from the code you've changed memory buffer sizes and introduced IT approach. Therefore I would split your PR into two following commits:
Please also try just to introduce the changes in commits descriptions briefly and/or with bullet points. Try also to avoid descriptions of advantages and disadvantages. Cause it is always clear, that every commit gains us some advantages ;) I've putted also some review from my side. You can always contact me by eMail, if I can help you. |
drivers/ethernet/eth_stm32_hal.c
Outdated
ETH_BufferTypeDef tx_buffer_def[ETH_TXBUFNB]; | ||
/* Only one tx_buffer is sufficient to pass only 1 dma_buffer */ | ||
/* We can save some memory here by allocating only 1 buffer */ | ||
ETH_BufferTypeDef tx_buffer_def[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use ETH_TXBUFNB
define here and change it's value in code above to appropriate one. Please put this change into separate memory management commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have split into 2 commits, but ETH_TXBUFNB should be changed as they are used in the dma_tx_desc_tab. The ETH_BufferTypeDef from the HAL is just a linked list from which we can pass successive arrays to be transmitted. Here we only pass 1 dma_buffer corresponding to the first dma_buffer filled up with the net_buffer. So these symbols are differents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting into 2 commits. Then introduce new one define with meaningful name (e.g. ETH_TXBUF_DEF_LEN
) and value (1U in your case). Please let this thread open. I will test the buffers size match on H7 later (e.g. tx_buffer_def
and dma_buffer
sizes). It was on my ToDo list. I think, you and I can clarify this behavior in this thread and provide proper solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought of it in the sense that we only fill up 1 DMA descriptor at a time. 1 descriptor is used to transmit 1 ethernet frame. At the beginning of the function we exit if the size of the packet is bigger that the eth frame size. So this function needs a lot of work in order to send multiple frames at once. I will add the new symbol btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nukersson Changed to new define ETH_TXBUF_DEF_NB set to 1U.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it KConfig define then? Can you tell the name of this define, if yes?
I do not understand what you mean by this. What defines are you looking for?
I am looking for upper layer stack variable/define, which needed to configures packet's size for Ethernet frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that you can use NET_ETH_MAX_FRAME_SIZE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem with budders I had also. I agree with you. We need only one tx buffer but not 4 as proposed in original driver. I think wi will fix it in this PR or open later a new one. As I said - I would test your PR later today.
@Nukersson I suggest to add this improvement into a new PR as the IT approach is a minimal change. Changing the handling of the buffer needs more time and testing to get working properly (as it plays with the net stack itself). We'll also have to consider checking for the availability of every DMA TX Descriptors before initiating a TX call because HAL_TransmitIT could fail if one of the DMA descriptors is not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Sounds good.
ad9eb28
to
4b2e58d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some small change requests
4b2e58d
to
a9b8079
Compare
Can't test this PR currently. Some of the commits in the Zephyr's master broke my board or clock. Investigating it now. |
Could it be #26980 ? |
adc6cb2
to
0072f9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest rebase had merge conflict with new thread stack definitions. I have resolved them in the last rebase.
0072f9f
to
65aa1cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added one change request. Otherwise runs well on M7 core of nucleo_h745zi_q board.
65aa1cd
to
b63ea31
Compare
Your request has been fulfiled 👍 Please approve this PR or feel free to comment on anything else. |
cafcb93
to
9d2f346
Compare
drivers/ethernet/eth_stm32_hal.c
Outdated
} | ||
|
||
/* Wait for end of TX buffer transmission */ | ||
if (k_sem_take(&dev_data->tx_int_sem, K_FOREVER) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello - I'm thinking about the K_FOREVER
here. While testing this commit with Civetweb, I'm definitely able to get into situation when I have a deadlock. I still do not know what is the reason, I have a feeling that civetweb do not play well with current TCP stack.
I tried to change it to some long timeout (20s) and at least it prevents (or more precisely give a chance to recover from) the deadlocks...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello - I'm thinking about the
K_FOREVER
here. While testing this commit with Civetweb, I'm definitely able to get into situation when I have a deadlock. I still do not know what is the reason, I have a feeling that civetweb do not play well with current TCP stack.
I tried to change it to some long timeout (20s) and at least it prevents (or more precisely give a chance to recover from) the deadlocks...
Are you using civetweb http-server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using civetweb http-server?
Yes, I am. I have a simple site that provides a web interface to a sensor. There is an ajax routine that refreshes the information each 2s by request to a page returning short json string. When I wait long enough I usually see a deadlock here. I have not find exact steps to reproduce it though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using civetweb http-server?
Yes, I am. I have a simple site that provides a web interface to a sensor. There is an ajax routine that refreshes the information each 2s by request to a page returning short json string. When I wait long enough I usually see a deadlock here. I have not find exact steps to reproduce it though...
Does the master's branch version of the driver work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the master's branch version of the driver work?
Hmmm, I need to check. But what do you mean by master's branch version of the driver? I think I have merged lochej/drivers/stm32h7_it_based_eth
into my master, but it was probably very close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least it doesn't deadlock your application but manages to resend the next frames ?
Yes, next frames are sent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any standard dumping function? I cannot find one...
You can use LOG_HEXDUMP_DBG()
for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xhpohanka Add the following code in the semaphore timeout if statement:
LOG_HEXDUMP_ERR(dma_buffer,total_len,"ethernet frame timeout");
Then post your log when you get a semaphore timeout.
I suspect that sending an empty buffer doesn't enable the DMA tx complete interrupt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xhpohanka Logging has been added to the last rebase :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lochej here is a log of timeouts, I will probably share some more later, but now I changed settings a bit and I cannot reproduce it so easy like last week. These timeouts below occurs very soon after power up of the device.
[00:00:05.049,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:05.049,000] <err> eth_stm32_hal: eth packet timeout
d8 d0 90 52 8b 2e cc bd 35 ff ff ff 08 00 45 00 |...R.... 5.....E.
00 28 00 00 00 00 40 06 f5 c6 c0 a8 01 fb c0 a8 |.(....@. ........
01 be 93 5b 07 5b 2c 6f c4 9a e5 b2 a1 af 50 10 |...[.[,o ......P.
05 00 12 a8 00 00 |......
[00:00:05.247,000] <inf> mqtt: MQTT client connected!
[00:00:05.264,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:05.265,000] <err> eth_stm32_hal: eth packet timeout
d8 d0 90 52 8b 2e cc bd 35 ff ff ff 08 00 45 00 |...R.... 5.....E.
00 28 00 00 00 00 40 06 f5 c6 c0 a8 01 fb c0 a8 |.(....@. ........
01 be 93 5b 07 5b 2c 6f c4 b4 e5 b2 a1 b3 50 10 |...[.[,o ......P.
05 00 12 8a 00 00 |......
[00:00:05.917,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:05.917,000] <err> eth_stm32_hal: eth packet timeout
d8 d0 90 52 8b 2e cc bd 35 ff ff ff 08 00 45 00 |...R.... 5.....E.
00 44 00 00 00 00 40 01 f5 af c0 a8 01 fb c0 a8 |.D....@. ........
01 be 03 03 82 21 00 00 00 00 45 00 00 28 ed 8a |.....!.. ..E..(..
40 00 40 06 c8 3b c0 a8 01 be c0 a8 01 fb 07 5b |@.@..;.. .......[
80 0a 1c ce be dc 24 d8 1d 81 50 11 fa 1c 8b 43 |......$. ..P....C
00 00 |..
[00:00:06.767,000] <inf> mqtt: PUBACK packet id: 56264
[00:00:08.283,000] <inf> mqtt: PUBACK packet id: 51096
[00:00:08.475,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:08.476,000] <err> eth_stm32_hal: eth packet timeout
d8 d0 90 52 8b 2e cc bd 35 ff ff ff 08 00 45 00 |...R.... 5.....E.
00 44 00 00 00 00 40 01 f5 af c0 a8 01 fb c0 a8 |.D....@. ........
01 be 03 03 82 21 00 00 00 00 45 00 00 28 ed 8c |.....!.. ..E..(..
40 00 40 06 c8 39 c0 a8 01 be c0 a8 01 fb 07 5b |@.@..9.. .......[
80 0a 1c ce be dc 24 d8 1d 81 50 11 fa 1c 8b 43 |......$. ..P....C
00 00 |..
9d2f346
to
9640b24
Compare
@lochej does this also apply to F4 hardware, or is it only for H7 hardware? |
This only applies to H7 series. |
drivers/ethernet/eth_stm32_hal.c
Outdated
|
||
void HAL_ETH_DMAErrorCallback(ETH_HandleTypeDef *heth_handle) | ||
{ | ||
__ASSERT_NO_MSG(heth_handle != NULL); | ||
|
||
/* State of eth handle is ERROR in case of unrecoverable error */ | ||
/* unrecoverable (ETH_DMACSR_FBE | ETH_DMACSR_TPS | ETH_DMACSR_RPS) */ | ||
if (HAL_ETH_GetState(heth_handle) & HAL_ETH_STATE_ERROR) { | ||
/* Log the error */ | ||
LOG_ERR("ETH_DMAErrorCallback errorcode:%x dmaerror:%x", | ||
HAL_ETH_GetError(heth_handle), | ||
HAL_ETH_GetDMAError(heth_handle)); | ||
|
||
/* TODO go to error state */ | ||
return; | ||
} | ||
|
||
/* Recoverable errors don't put ETH in error state */ | ||
/* ETH_DMACSR_CDE | ETH_DMACSR_ETI | ETH_DMACSR_RWT */ | ||
/* | ETH_DMACSR_RBU | ETH_DMACSR_AIS) */ | ||
|
||
/* TODO Check if we were TX transmitting and the unlock semaphore */ | ||
/* To return the error as soon as possible else we'll just wait */ | ||
/* for the timeout */ | ||
} | ||
void HAL_ETH_MACErrorCallback(ETH_HandleTypeDef *heth_handle) | ||
{ | ||
__ASSERT_NO_MSG(heth_handle != NULL); | ||
|
||
/* TODO handle MAC error */ | ||
} | ||
#endif /* CONFIG_SOC_SERIES_STM32H7X */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nukersson @jukkar @xhpohanka @erwango @FRASTM @ABOSTM
I'm wondering about how to recover from an error on the ETH device that would go for every STM32 series supporting Ethernet.
But that might be for a future PR, I'm just asking for your opinion here.
My initial idea would be that in case of an error on the peripheral, either detected on RX function, TX function or any Error Callback to stop and restart the IP using:
HAL_ETH_Stop_IT(heth); /* Stop any ongoing processes and shut down the IP */
HAL_ETH_Start_IT(heth); /* Restart all the DMA process and reset IP state to ready*/
This piece of code could be added to an error handler for the ethernet driver. All unrecoverable errors could try to restart the IP.
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, only logging messages are added in the different error handlers. We'll need a test program that can generate ethernet errors to see if we can recover from them using my suggested method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is - only non-recoverable or non-manageable errors should become eth-dma HW reset. In other cases - the network stack of Zephyr should be probably informed. Here I don't know, how to propagate errors to the stack and which ones.
But generally it depends on certain application
9640b24
to
efe3fac
Compare
@lowlander |
efe3fac
to
e9754d9
Compare
F4/F7 could work the same way as my H7 implementation. As I can see the current implementation waits for the DMA to be ready (by checking DMA_OWN bit) before calling If we want to keep the same approach as F4/F7 on H7 series, we could wait for the semaphore before calling the This would mean that |
#ifdef CONFIG_SOC_SERIES_STM32H7X | ||
struct k_sem tx_int_sem; | ||
#endif /* CONFIG_SOC_SERIES_STM32H7X */ | ||
K_THREAD_STACK_MEMBER(rx_thread_stack, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be K_KERNEL_STACK_MEMBER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right my bad, I'm updating the PR, fixing push available now 👍
Modify the ethernet driver to use TX complete interrupts. Adds HAL ethernet TX complete callback and locking semaphore. Due to changing behavior/content of the TX DMA descriptors on STM32H7 series, based on the state of the IP, it is more reliable to wait for the TX complete interrupt to check for DMA end of transmission event. This avoids polling the DMA_DESC_OWN bit in the descriptors. Improves reliability and performance of the ethernet peripheral. Tested on CoapServer sample, Dumb HTTP server, telnet sample. Signed-off-by: Jeremy LOCHE <lochejeremy@gmail.com>
Reduced the size of tx_buffer_def array to 1 to save on function stack memory. Here only 1 buffer is enough to call the transmit function. Signed-off-by: Jeremy LOCHE <lochejeremy@gmail.com>
e9754d9
to
63a89a5
Compare
Looks good to me. It would be pleasure for me. And I really like to work with you (guys from ST) and Jeremy also! |
Of course that would be with great pleasure ! I have projects running on ST boards and I could sure help :) |
Modify the ethernet driver to use TX complete interrupts.
Adds HAL ethernet TX complete callback and locking semaphore.
Due to changing behavior/content of the TX DMA descriptors
on STM32H7 series, based on the state of the IP,
it is more reliable to wait for the TX complete interrupt to check
for DMA end of transmission event. This avoids polling the
DMA_DESC_OWN bit in the descriptors.
This improves reliability and performance of the ethernet peripheral.
Tested on CoapServer sample, Dumb HTTP server, telnet sample.
This implementation avoids all of the HAL_ETH_Transmit{Frame} failed
errors that occured with the current implementation on my
Nucleo-H743ZI board. Avoids system crashes and HAL_Timeout
of the original implementation.
@Nukersson tested approves this approach that was exposed in the
original ethernet driver PR #26226
Benchmarked using the following:
Setup: Killer E2400 Gigabit Ethernet
Board: Nucleo_H743ZI clocked from HSI 64 MHz
Signed-off-by: Jeremy LOCHE lochejeremy@gmail.com