Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: ethernet: stm32h7 IT based ethernet TX #27188

Merged

Conversation

lochej
Copy link
Collaborator

@lochej lochej commented Jul 27, 2020

Modify the ethernet driver to use TX complete interrupts.
Adds HAL ethernet TX complete callback and locking semaphore.

Due to changing behavior/content of the TX DMA descriptors
on STM32H7 series, based on the state of the IP,
it is more reliable to wait for the TX complete interrupt to check
for DMA end of transmission event. This avoids polling the
DMA_DESC_OWN bit in the descriptors.

This improves reliability and performance of the ethernet peripheral.

Tested on CoapServer sample, Dumb HTTP server, telnet sample.

This implementation avoids all of the HAL_ETH_Transmit{Frame} failed
errors that occured with the current implementation on my
Nucleo-H743ZI board. Avoids system crashes and HAL_Timeout
of the original implementation.

@Nukersson tested approves this approach that was exposed in the
original ethernet driver PR #26226

Benchmarked using the following:
Setup: Killer E2400 Gigabit Ethernet
Board: Nucleo_H743ZI clocked from HSI 64 MHz

sudo ping -i 0.01 192.0.2.1 -c 1000 
--- 192.0.2.1 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 11822ms
rtt min/avg/max/mdev = 0.650/0.955/2.050/0.154 ms

Signed-off-by: Jeremy LOCHE lochejeremy@gmail.com

@github-actions github-actions bot added the platform: STM32 ST Micro STM32 label Jul 27, 2020
@KozhinovAlexander
Copy link
Collaborator

KozhinovAlexander commented Jul 27, 2020

Please add @erwango also. I will look at your PR tomorrow also. Thank you for providing it. I hope, my prepared repository helped you also.

@lochej
Copy link
Collaborator Author

lochej commented Jul 27, 2020

Please add @erwango also. I will look at your PR tomorrow also. Thank you for providing it. I hope, my prepared repository helped you also.

I agree @erwango should check this out. You helped me a lot with the contribution workflow and I have been able to provide several PR including this one 👍

@KozhinovAlexander
Copy link
Collaborator

KozhinovAlexander commented Jul 28, 2020

Hey Jeremy @lochej. Thank you very much for your new approach. I like it. :)

As I see from the code you've changed memory buffer sizes and introduced IT approach. Therefore I would split your PR into two following commits:

  1. Changing the size of tx_buffer_def
  2. Introducing interrupt approach

Please also try just to introduce the changes in commits descriptions briefly and/or with bullet points. Try also to avoid descriptions of advantages and disadvantages. Cause it is always clear, that every commit gains us some advantages ;)

I've putted also some review from my side. You can always contact me by eMail, if I can help you.

ETH_BufferTypeDef tx_buffer_def[ETH_TXBUFNB];
/* Only one tx_buffer is sufficient to pass only 1 dma_buffer */
/* We can save some memory here by allocating only 1 buffer */
ETH_BufferTypeDef tx_buffer_def[1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use ETH_TXBUFNB define here and change it's value in code above to appropriate one. Please put this change into separate memory management commit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have split into 2 commits, but ETH_TXBUFNB should be changed as they are used in the dma_tx_desc_tab. The ETH_BufferTypeDef from the HAL is just a linked list from which we can pass successive arrays to be transmitted. Here we only pass 1 dma_buffer corresponding to the first dma_buffer filled up with the net_buffer. So these symbols are differents.

Copy link
Collaborator

@KozhinovAlexander KozhinovAlexander Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting into 2 commits. Then introduce new one define with meaningful name (e.g. ETH_TXBUF_DEF_LEN) and value (1U in your case). Please let this thread open. I will test the buffers size match on H7 later (e.g. tx_buffer_def and dma_buffer sizes). It was on my ToDo list. I think, you and I can clarify this behavior in this thread and provide proper solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought of it in the sense that we only fill up 1 DMA descriptor at a time. 1 descriptor is used to transmit 1 ethernet frame. At the beginning of the function we exit if the size of the packet is bigger that the eth frame size. So this function needs a lot of work in order to send multiple frames at once. I will add the new symbol btw.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nukersson Changed to new define ETH_TXBUF_DEF_NB set to 1U.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it KConfig define then? Can you tell the name of this define, if yes?

I do not understand what you mean by this. What defines are you looking for?

I am looking for upper layer stack variable/define, which needed to configures packet's size for Ethernet frame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that you can use NET_ETH_MAX_FRAME_SIZE

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem with budders I had also. I agree with you. We need only one tx buffer but not 4 as proposed in original driver. I think wi will fix it in this PR or open later a new one. As I said - I would test your PR later today.

@Nukersson I suggest to add this improvement into a new PR as the IT approach is a minimal change. Changing the handling of the buffer needs more time and testing to get working properly (as it plays with the net stack itself). We'll also have to consider checking for the availability of every DMA TX Descriptors before initiating a TX call because HAL_TransmitIT could fail if one of the DMA descriptors is not available.

Copy link
Collaborator

@KozhinovAlexander KozhinovAlexander Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Sounds good.

@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch 3 times, most recently from ad9eb28 to 4b2e58d Compare July 28, 2020 09:53
Copy link
Collaborator

@KozhinovAlexander KozhinovAlexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some small change requests

@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch from 4b2e58d to a9b8079 Compare July 28, 2020 10:37
@KozhinovAlexander
Copy link
Collaborator

Can't test this PR currently. Some of the commits in the Zephyr's master broke my board or clock. Investigating it now.

@lochej
Copy link
Collaborator Author

lochej commented Jul 28, 2020

Can't test this PR currently. Some of the commits in the Zephyr's master broke my board or clock. Investigating it now.

Could it be #26980 ?

@KozhinovAlexander
Copy link
Collaborator

Can't test this PR currently. Some of the commits in the Zephyr's master broke my board or clock. Investigating it now.

Could it be #26980 ?

Looks like this. I will switch now the conversation to #26980 to avoid message flood here.

@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch 2 times, most recently from adc6cb2 to 0072f9f Compare August 4, 2020 16:49
Copy link
Collaborator Author

@lochej lochej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest rebase had merge conflict with new thread stack definitions. I have resolved them in the last rebase.

@lochej
Copy link
Collaborator Author

lochej commented Aug 4, 2020

@ABOSTM @gmarull Would you be interested in reviewing this PR as you did with the original stm32h7 ethernet support (#26226) ?

@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch from 0072f9f to 65aa1cd Compare August 5, 2020 14:45
Copy link
Collaborator

@KozhinovAlexander KozhinovAlexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added one change request. Otherwise runs well on M7 core of nucleo_h745zi_q board.

@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch from 65aa1cd to b63ea31 Compare August 6, 2020 09:59
@lochej
Copy link
Collaborator Author

lochej commented Aug 7, 2020

I've added one change request. Otherwise runs well on M7 core of nucleo_h745zi_q board.

Your request has been fulfiled 👍 Please approve this PR or feel free to comment on anything else.

@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch from cafcb93 to 9d2f346 Compare August 14, 2020 19:10
}

/* Wait for end of TX buffer transmission */
if (k_sem_take(&dev_data->tx_int_sem, K_FOREVER) != 0) {
Copy link
Contributor

@xhpohanka xhpohanka Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello - I'm thinking about the K_FOREVER here. While testing this commit with Civetweb, I'm definitely able to get into situation when I have a deadlock. I still do not know what is the reason, I have a feeling that civetweb do not play well with current TCP stack.
I tried to change it to some long timeout (20s) and at least it prevents (or more precisely give a chance to recover from) the deadlocks...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello - I'm thinking about the K_FOREVER here. While testing this commit with Civetweb, I'm definitely able to get into situation when I have a deadlock. I still do not know what is the reason, I have a feeling that civetweb do not play well with current TCP stack.
I tried to change it to some long timeout (20s) and at least it prevents (or more precisely give a chance to recover from) the deadlocks...

Are you using civetweb http-server?

Copy link
Contributor

@xhpohanka xhpohanka Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using civetweb http-server?

Yes, I am. I have a simple site that provides a web interface to a sensor. There is an ajax routine that refreshes the information each 2s by request to a page returning short json string. When I wait long enough I usually see a deadlock here. I have not find exact steps to reproduce it though...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using civetweb http-server?

Yes, I am. I have a simple site that provides a web interface to a sensor. There is an ajax routine that refreshes the information each 2s by request to a page returning short json string. When I wait long enough I usually see a deadlock here. I have not find exact steps to reproduce it though...

Does the master's branch version of the driver work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the master's branch version of the driver work?

Hmmm, I need to check. But what do you mean by master's branch version of the driver? I think I have merged lochej/drivers/stm32h7_it_based_eth into my master, but it was probably very close.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least it doesn't deadlock your application but manages to resend the next frames ?

Yes, next frames are sent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any standard dumping function? I cannot find one...

You can use LOG_HEXDUMP_DBG() for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xhpohanka Add the following code in the semaphore timeout if statement:

LOG_HEXDUMP_ERR(dma_buffer,total_len,"ethernet frame timeout"); 

Then post your log when you get a semaphore timeout.

I suspect that sending an empty buffer doesn't enable the DMA tx complete interrupt.

Copy link
Collaborator Author

@lochej lochej Aug 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xhpohanka Logging has been added to the last rebase :)

Copy link
Contributor

@xhpohanka xhpohanka Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lochej here is a log of timeouts, I will probably share some more later, but now I changed settings a bit and I cannot reproduce it so easy like last week. These timeouts below occurs very soon after power up of the device.

[00:00:05.049,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:05.049,000] <err> eth_stm32_hal: eth packet timeout
d8 d0 90 52 8b 2e cc bd  35 ff ff ff 08 00 45 00 |...R.... 5.....E.
00 28 00 00 00 00 40 06  f5 c6 c0 a8 01 fb c0 a8 |.(....@. ........
01 be 93 5b 07 5b 2c 6f  c4 9a e5 b2 a1 af 50 10 |...[.[,o ......P.
05 00 12 a8 00 00                                |......           
[00:00:05.247,000] <inf> mqtt: MQTT client connected!
[00:00:05.264,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:05.265,000] <err> eth_stm32_hal: eth packet timeout
d8 d0 90 52 8b 2e cc bd  35 ff ff ff 08 00 45 00 |...R.... 5.....E.
00 28 00 00 00 00 40 06  f5 c6 c0 a8 01 fb c0 a8 |.(....@. ........
01 be 93 5b 07 5b 2c 6f  c4 b4 e5 b2 a1 b3 50 10 |...[.[,o ......P.
05 00 12 8a 00 00                                |......           
[00:00:05.917,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:05.917,000] <err> eth_stm32_hal: eth packet timeout
d8 d0 90 52 8b 2e cc bd  35 ff ff ff 08 00 45 00 |...R.... 5.....E.
00 44 00 00 00 00 40 01  f5 af c0 a8 01 fb c0 a8 |.D....@. ........
01 be 03 03 82 21 00 00  00 00 45 00 00 28 ed 8a |.....!.. ..E..(..
40 00 40 06 c8 3b c0 a8  01 be c0 a8 01 fb 07 5b |@.@..;.. .......[
80 0a 1c ce be dc 24 d8  1d 81 50 11 fa 1c 8b 43 |......$. ..P....C
00 00                                            |..               
[00:00:06.767,000] <inf> mqtt: PUBACK packet id: 56264
[00:00:08.283,000] <inf> mqtt: PUBACK packet id: 51096
[00:00:08.475,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:08.476,000] <err> eth_stm32_hal: eth packet timeout
d8 d0 90 52 8b 2e cc bd  35 ff ff ff 08 00 45 00 |...R.... 5.....E.
00 44 00 00 00 00 40 01  f5 af c0 a8 01 fb c0 a8 |.D....@. ........
01 be 03 03 82 21 00 00  00 00 45 00 00 28 ed 8c |.....!.. ..E..(..
40 00 40 06 c8 39 c0 a8  01 be c0 a8 01 fb 07 5b |@.@..9.. .......[
80 0a 1c ce be dc 24 d8  1d 81 50 11 fa 1c 8b 43 |......$. ..P....C
00 00                                            |..   

@lochej lochej changed the title drivers: ethernet: stm32h7 IT based ethernet TX [DNM] drivers: ethernet: stm32h7 IT based ethernet TX Aug 20, 2020
@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch from 9d2f346 to 9640b24 Compare August 20, 2020 09:22
@lowlander
Copy link
Collaborator

@lochej does this also apply to F4 hardware, or is it only for H7 hardware?

@lochej
Copy link
Collaborator Author

lochej commented Aug 22, 2020

@lochej does this also apply to F4 hardware, or is it only for H7 hardware?

This only applies to H7 series.

Comment on lines 540 to 602

void HAL_ETH_DMAErrorCallback(ETH_HandleTypeDef *heth_handle)
{
__ASSERT_NO_MSG(heth_handle != NULL);

/* State of eth handle is ERROR in case of unrecoverable error */
/* unrecoverable (ETH_DMACSR_FBE | ETH_DMACSR_TPS | ETH_DMACSR_RPS) */
if (HAL_ETH_GetState(heth_handle) & HAL_ETH_STATE_ERROR) {
/* Log the error */
LOG_ERR("ETH_DMAErrorCallback errorcode:%x dmaerror:%x",
HAL_ETH_GetError(heth_handle),
HAL_ETH_GetDMAError(heth_handle));

/* TODO go to error state */
return;
}

/* Recoverable errors don't put ETH in error state */
/* ETH_DMACSR_CDE | ETH_DMACSR_ETI | ETH_DMACSR_RWT */
/* | ETH_DMACSR_RBU | ETH_DMACSR_AIS) */

/* TODO Check if we were TX transmitting and the unlock semaphore */
/* To return the error as soon as possible else we'll just wait */
/* for the timeout */
}
void HAL_ETH_MACErrorCallback(ETH_HandleTypeDef *heth_handle)
{
__ASSERT_NO_MSG(heth_handle != NULL);

/* TODO handle MAC error */
}
#endif /* CONFIG_SOC_SERIES_STM32H7X */
Copy link
Collaborator Author

@lochej lochej Aug 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nukersson @jukkar @xhpohanka @erwango @FRASTM @ABOSTM

I'm wondering about how to recover from an error on the ETH device that would go for every STM32 series supporting Ethernet.
But that might be for a future PR, I'm just asking for your opinion here.

My initial idea would be that in case of an error on the peripheral, either detected on RX function, TX function or any Error Callback to stop and restart the IP using:

HAL_ETH_Stop_IT(heth); /* Stop any ongoing processes and shut down the IP */
HAL_ETH_Start_IT(heth); /* Restart all the DMA process and reset IP state to ready*/

This piece of code could be added to an error handler for the ethernet driver. All unrecoverable errors could try to restart the IP.

What do you think ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, only logging messages are added in the different error handlers. We'll need a test program that can generate ethernet errors to see if we can recover from them using my suggested method.

Copy link
Collaborator

@KozhinovAlexander KozhinovAlexander Aug 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is - only non-recoverable or non-manageable errors should become eth-dma HW reset. In other cases - the network stack of Zephyr should be probably informed. Here I don't know, how to propagate errors to the stack and which ones.

But generally it depends on certain application

@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch from 9640b24 to efe3fac Compare August 22, 2020 17:38
@KozhinovAlexander
Copy link
Collaborator

KozhinovAlexander commented Aug 22, 2020

@lowlander
It is originally your implementation of Zephyr's ETH driver for F4/F7. Would you like adapt it also to interrupt driven approach?

@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch from efe3fac to e9754d9 Compare August 22, 2020 17:53
@lochej
Copy link
Collaborator Author

lochej commented Aug 22, 2020

@lowlander
It is originally your implementation of Zephyr's ETH driver for F4/F7. Would you like adapt it also to interrupt driven approach?

F4/F7 could work the same way as my H7 implementation. As I can see the current implementation waits for the DMA to be ready (by checking DMA_OWN bit) before calling HAL_ETH_TransmitFrame which is non-blocking for F4/F7 IP. We could get rid of this code for H7 and F4/F7 series and instead wait for the transmission to end after with a semaphore just like the H7 approach.

If we want to keep the same approach as F4/F7 on H7 series, we could wait for the semaphore before calling the Transmit function.

This would mean that eth_tx could return before the ETH frame being sent and make error handling more difficult. But on the other hand waiting before transmission could result in more performance as the CPU profit of the sending time to do processing for the next frame and only wait a bit before transmission is ready.

@lochej lochej changed the title [DNM] drivers: ethernet: stm32h7 IT based ethernet TX drivers: ethernet: stm32h7 IT based ethernet TX Aug 23, 2020
@lochej lochej requested a review from jukkar August 23, 2020 09:39
#ifdef CONFIG_SOC_SERIES_STM32H7X
struct k_sem tx_int_sem;
#endif /* CONFIG_SOC_SERIES_STM32H7X */
K_THREAD_STACK_MEMBER(rx_thread_stack,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be K_KERNEL_STACK_MEMBER

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right my bad, I'm updating the PR, fixing push available now 👍

Modify the ethernet driver to use TX complete interrupts.
Adds HAL ethernet TX complete callback and locking semaphore.

Due to changing behavior/content of the TX DMA descriptors
on STM32H7 series, based on the state of the IP,
it is more reliable to wait for the TX complete interrupt to check
for DMA end of transmission event. This avoids polling the
DMA_DESC_OWN bit in the descriptors.

Improves reliability and performance of the ethernet peripheral.

Tested on CoapServer sample, Dumb HTTP server, telnet sample.

Signed-off-by: Jeremy LOCHE <lochejeremy@gmail.com>
Reduced the size of tx_buffer_def array to 1 to save
on function stack memory. Here only 1 buffer is
enough to call the transmit function.

Signed-off-by: Jeremy LOCHE <lochejeremy@gmail.com>
@lochej lochej force-pushed the drivers/stm32h7_it_based_eth branch from e9754d9 to 63a89a5 Compare September 1, 2020 16:57
@jukkar jukkar added this to the v2.4.0 milestone Sep 1, 2020
@jukkar jukkar closed this Sep 2, 2020
@jukkar jukkar reopened this Sep 2, 2020
@erwango erwango requested review from carlescufi and galak September 3, 2020 13:41
@carlescufi carlescufi merged commit ddcc385 into zephyrproject-rtos:master Sep 3, 2020
@erwango
Copy link
Member

erwango commented Sep 4, 2020

@lochej Thanks again for this PR.
@Nukersson, @lochej I'm looking for active reviewers on STM32 components. Given your recent contribution in this area, would you be interested be added as codeowners for stm32 ethernet ? I know you mainly focus on H7, but this is a good start ;-)

@KozhinovAlexander
Copy link
Collaborator

@lochej Thanks again for this PR.
@Nukersson, @lochej I'm looking for active reviewers on STM32 components. Given your recent contribution in this area, would you be interested be added as codeowners for stm32 ethernet ? I know you mainly focus on H7, but this is a good start ;-)

Looks good to me. It would be pleasure for me. And I really like to work with you (guys from ST) and Jeremy also!

@lochej
Copy link
Collaborator Author

lochej commented Sep 4, 2020

@lochej Thanks again for this PR.
@Nukersson, @lochej I'm looking for active reviewers on STM32 components. Given your recent contribution in this area, would you be interested be added as codeowners for stm32 ethernet ? I know you mainly focus on H7, but this is a good start ;-)

Of course that would be with great pleasure ! I have projects running on ST boards and I could sure help :)

@lochej lochej deleted the drivers/stm32h7_it_based_eth branch September 10, 2020 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants