-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Invalid calls to vSocketClose() when system bogged down and multiple TCP ports are closed all at once #570
Comments
…ets causing DataAbort interrupts. (FreeRTOS#570)
Hello @phelter, thank you for this report. There is something I don't understand: when This is where if( ( eTCPState == eCLOSED ) ||
( eTCPState == eCLOSE_WAIT ) )
{
if( ( pxSocket->u.xTCP.bits.bPassQueued != pdFALSE_UNSIGNED ) ||
( pxSocket->u.xTCP.bits.bPassAccept != pdFALSE_UNSIGNED ) )
{
if( pxSocket->u.xTCP.bits.bReuseSocket == pdFALSE_UNSIGNED )
{
vSocketCloseNextTime( pxSocket );
}
}
} So the TCP status must be The time between the following statements is very small : vSocketCloseNextTime( pxSocket );
...
vSocketCloseNextTime( NULL );
/* Now the IP-task will block again */
xQueueReceive( xNetworkEventQueue, ...); I would be curious to see/understand your TCP-related code, especially the API calls, and the call to Why does your code call In the PR you show a sequence of events in a log. Do you have the same type of log without applying the patch? And just curiosity: what priorities have you assigned to the IP-task and to the task that calls TCP API's? |
@htibosch The use case is: There are two Server A: - is at priority 14 - The first thing that gets executed within each of the tasks is void prvConnectionListeningTask( void *pvParameters )
{
struct freertos_sockaddr xClient;
Socket_t xListeningSocket = FREERTOS_INVALID_SOCKET;
Socket_t xConnectedSocket = FREERTOS_INVALID_SOCKET;
socklen_t xSize = sizeof( xClient );
serverTaskInfo_t *pxServerTaskInfo = NULL;
pxServerTaskInfo = ( serverTaskInfo_t* ) pvParameters;
xListeningSocket = vCreateTCPServerSocket(pxServerTaskInfo->portNum);
FreeRTOS_listen( xListeningSocket, pxServerTaskInfo->connectionBacklog );
for( ;; )
{
xConnectedSocket = FreeRTOS_accept( xListeningSocket, &xClient, &xSize );
configASSERT( xConnectedSocket != FREERTOS_INVALID_SOCKET ); // NOLINT performance-no-int-to-ptr
xTaskCreate( pxServerTaskInfo->serverHandlerTask,
pxServerTaskInfo->taskName,
pxServerTaskInfo->stackSize,
( void * ) xConnectedSocket,
pxServerTaskInfo->taskPriority,
NULL );
}
}
static Socket_t vCreateTCPServerSocket( uint16_t pPortNum )
{
struct freertos_sockaddr xBindAddress;
Socket_t xListeningSocket = FREERTOS_INVALID_SOCKET;
static const TickType_t xReceiveTimeOut = portMAX_DELAY;
#if( ipconfigUSE_TCP_WIN == 1 )
WinProperties_t xWinProps;
xWinProps.lTxBufSize = ipconfigTCP_TX_BUFFER_LENGTH;
xWinProps.lTxWinSize = ipconfigTCP_WIN_SEG_COUNT;
xWinProps.lRxBufSize = ipconfigTCP_RX_BUFFER_LENGTH;
xWinProps.lRxWinSize = ipconfigTCP_WIN_SEG_COUNT;
#endif /* ipconfigUSE_TCP_WIN */
xListeningSocket = FreeRTOS_socket( FREERTOS_AF_INET, FREERTOS_SOCK_STREAM, FREERTOS_IPPROTO_TCP );
configASSERT( xListeningSocket != FREERTOS_INVALID_SOCKET );
FreeRTOS_setsockopt( xListeningSocket, 0, FREERTOS_SO_RCVTIMEO, &xReceiveTimeOut, sizeof( xReceiveTimeOut ) );
#if( ipconfigUSE_TCP_WIN == 1 )
FreeRTOS_setsockopt( xListeningSocket, 0, FREERTOS_SO_WIN_PROPERTIES, ( void * ) &xWinProps, sizeof( xWinProps ) );
#endif /* ipconfigUSE_TCP_WIN */
xBindAddress.sin_port = FreeRTOS_htons( pPortNum );
FreeRTOS_bind( xListeningSocket, &xBindAddress, sizeof( xBindAddress ) );
return xListeningSocket;
} As you can see the listening socket spawns an independent task for each accepted connection at the same priority as the listening server socket. The function that is passed as: void vServerConnectionInstance( void *pvParameters )
{
Socket_t xSocket = FREERTOS_INVALID_SOCKET;
static char cRxedData[ TCP_RX_DATA_BUFFER_SIZE ];
CR02BaseType_t lBytesReceived = 0;
static const TickType_t xReceiveTimeOut = pdMS_TO_TICKS( CONFIG_SERVER_TCP_TIMEOUT_MS );
static const TickType_t xSendTimeOut = pdMS_TO_TICKS( CONFIG_SERVER_TCP_TIMEOUT_MS );
/* The socket has already been created and connected before
being passed into this RTOS task using the RTOS task's parameter. */
xSocket = ( Socket_t ) pvParameters;
FreeRTOS_setsockopt( xSocket, 0, FREERTOS_SO_RCVTIMEO, &xReceiveTimeOut, sizeof( xReceiveTimeOut ) );
FreeRTOS_setsockopt( xSocket, 0, FREERTOS_SO_SNDTIMEO, &xSendTimeOut, sizeof( xSendTimeOut ) );
for( ;; )
{
lBytesReceived = FreeRTOS_recv( xSocket, &cRxedData, TCP_RX_DATA_BUFFER_SIZE, 0 );
if( lBytesReceived > 0 )
{
prvProcessRXConfigData( xSocket, cRxedData, (size_t)lBytesReceived );
}
else if( lBytesReceived == 0 )
{
}
else
{
FreeRTOS_shutdown( xSocket, FREERTOS_SHUT_RDWR );
break;
}
}
while( FreeRTOS_recv( xSocket, &cRxedData, TCP_RX_DATA_BUFFER_SIZE, 0 ) >= 0 )
{
vTaskDelay( pdMS_TO_TICKS( 250 ) );
}
FreeRTOS_closesocket( xSocket );
vTaskDelete( NULL );
} This is the only place the The failure happens then because these are not For logs - the same set of log output occurs when the patch is NOT applied but when the I believe you have provided the reason for this bug in your previous comment:
If this is an assumption that must hold true, then there is a bug in the code here. The There might be a simpler approach to fixing this by checking |
Thank you for the code and your comments. Your assumption about a possible race condition does not seem correct to me. As long as xQueueReceive() // returns 'eNetworkRxEvent'
vSocketCloseNextTime( pxSocket ); // xSocketToClose = pxSocket
// A competing task calls 'FreeRTOS_closesocket()', it must wait
vSocketCloseNextTime( NULL ); // pxSocket is closed, xSocketToListen = NULL
xQueueReceive(); // returns 'eSocketCloseEvent'
vSocketClose(); // Because a 'eSocketCloseEvent' was received Between two calls to The buffer parameter to static char cRxedData[ TCP_RX_DATA_BUFFER_SIZE ];
- lBytesReceived = FreeRTOS_recv( xSocket, &cRxedData, TCP_RX_DATA_BUFFER_SIZE, 0 );
+ lBytesReceived = FreeRTOS_recv( xSocket, cRxedData, TCP_RX_DATA_BUFFER_SIZE, 0 ); which may lead to data corruption. I don't know if you it is easy to reproduce the problem, but then I would rewrite bPassAccept : 1, /**< when true, this socket may be returned in a call to accept() */
bPassQueued : 1, /**< when true, this socket is an orphan until it gets connected
* Why an orphan? Because it may not be returned in a accept() call until it
* gets the state eESTABLISHED */
bReuseSocket : 1, /**< When a listening socket gets a connection, do not create a new instance but keep on using it */ I had some more minor remarks about the code and attach them here: demo_code.zip One remark And also: an invalid socket is defined as a socket that is either Could you post the |
Hello @htibosch , While I agree with you about the use of the Please see: ISO/IEC 98/99 section 6.3.2.1 - paragraph 2. So it is not this. All I know is when the code provided in PR #571 - which is explicitly checking the lifetime of a socket is put in place, this issue no longer happens. As mentioned the sockets returned to the user from It is very difficult to prove that the state of a TCP socket is correct or will only terminate a socket NOT provided back to the user. However, the fact that the logs provided in the original description show that the same socket is closed multiple times - (once by user, and once internally) should be sufficient proof that there is a lifetime issue in this code and that the separation of the act of closing and the act of destroying a socket are two independent I have provided the necessary info and fix for this issue which has been proven on our machine to work and operate. If you require further proof the issue exists, I suggest creating a unit test or proof that a socket returned by |
Yes I am aware. I did not use
Yes I am aware. And I fixed several places in the PR #571 - where the socket received by the IPTask event queue was not checking for both
|
Hi @phelter, mind you that I am ( we are ) very grateful that you came up with this issue, and that you spent so much time on it. About And about my remark about Before we make a reparation or a patch that protects against unwanted behaviour, I would like to understand why a socket can "have two owners": the IP-task and the application. About Priorities:The FreeRTOS+TCP library was written with the silent assumption that the IP-task has a higher priority than any of its users. In fact, we have always recommended this scheme: Higher : Network Interface ( In vTaskSuspendAll();
{
if( pxSocket->u.xTCP.bits.bReuseSocket == pdFALSE_UNSIGNED )
{
pxClientSocket = pxSocket->u.xTCP.pxPeerSocket;
}
else
{
pxClientSocket = pxSocket;
}
if( pxClientSocket != NULL )
{
pxSocket->u.xTCP.pxPeerSocket = NULL;
/* Is it still not taken ? */
if( pxClientSocket->u.xTCP.bits.bPassAccept != pdFALSE_UNSIGNED )
{
pxClientSocket->u.xTCP.bits.bPassAccept = pdFALSE;
}
else
{
pxClientSocket = NULL;
}
}
}
( void ) xTaskResumeAll(); This is safe to do because the scheduler is temporarily suspended by calling When the user task has a higher priority than the IP-task, we can:
The causeI think that the following happens: For an unknown (but likely legitimate) reason, the IP-task puts the socket in status The user application calls In a solution, I think that A TCP connection can be closed because of the following reasons:
Question 1, do you have any idea why the sockets were put into Question 2, could you describe the steps that must be taken to reproduce the problem? Question 3, you wrote "When the system is bogged down (cpu oversubscribed)" Thanks |
Hello @htibosch, thank you for your response. Regarding:
I did not take it personal; however, it would be appreciated that bugs of this nature, are treated
A socket does not have 2
Without confirmation - my current theory is: A socket is returned from
pxClientSocket = pxSocket->u.xTCP.pxPeerSocket;
...
pxSocket->u.xTCP.pxPeerSocket = NULL;
...
pxClientSocket->u.xTCP.bits.bPassAccept = pdFALSE;
( void ) xSendEventStructToIPTask( &xAskEvent, portMAX_DELAY );
return pxClientSocket; At this point the Due to the way the u.xTCP.bits.bPassAccept and bPassQueued are being used and changed, their state may not correctly reflect the actual state of the TCP socket. These bits are consumed in the following functions in the associated ways:
The code is assuming these to be atomic and a common state - i.e. the use of both together in an if statement - we have the correct knowledge of PassQueued and PassAccept - as a combined state. eg: the case where: if( ( pxSocket->u.xTCP.bits.bPassQueued != pdFALSE_UNSIGNED ) ||
( pxSocket->u.xTCP.bits.bPassAccept != pdFALSE_UNSIGNED ) ) This translates into two actions: confirming The solution to this is to either:
Unfortunately no, I am working remotely helping out a team that is performing the debugging on the device itself. I believe I have provided the necessary info to reproduce, but I cannot share all of the code to reproduce the issue.
No I don't believe so - they are all being serviced, just not necessarily very often due to other tasks in the system. |
Hello @phelter, thank you for this report.
Please suggest if your observation is something like this. |
We have this change #705 for the above observation. Please check if this helps solving your scenario as well. |
Closing this issue as a probable fix is available. |
Describe the bug
When the system is bogged down (cpu oversubscribed) and there are multiple TCP sockets open, and all TCP sockets are closed all at once, there are cases where extra calls to
vSocketCLose()
are performed on sockets that have already have been closed with bad socket data.After much review of the environment and the message queue to the FreeRTOS-Plus-TCP IP task, narrowed down a bug in the lifetime management of the sockets in the socket layer.
Due to the use of: vSocketCloseNextTime, a socket may be both closed:
pxSocket
socket in thexSocketToClose
global variable and then closed later by another call to vSocketCloseNextTime(NULL) which is called first thing in theprvIPTask()
. While the second call:FreeRTOS_closesocket()
- is performed by the user since the user still has a local store of the openedFreeRTOS_Socket_t
.Believe at some point in the tcp code a
vTCPStateChange( pxSocket, eCLOSE_WAIT)
is issued without intervention by the user. The user also requests aFreeRTOS_closesocket()
to terminate the socket, but by then, the socket has already performed the close.Because
vSocketClose()
deletes it's resources, when a closed socket is closed again, a DataAbort interrupt is executed due to aBad Address
, and attempting to delete a list item twice.Target
Host
To Reproduce
Expected behavior
The Socket layer maintains a well known lifetime definition of all sockets created by the user, and ensures if at any point the socket is closed for internal reasons, it does not perform the same action again.
Screenshots
None
Wireshark logs
None
Additional context
Debug - additional messages from Close Execution:
Note that the vSocketClose is called twice on pxSocket
0x0036FDD0
- due to another socket's close being executed.Where a dump of the callers is:
So the first close of the first socket is due to the vSocketCloseNextTime() and the second one is due to user request. On the second close, a DataAbort occurs inside the
uxListRemove
function.The text was updated successfully, but these errors were encountered: