Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Function named "freeSampleStreamingSession" called "timerQueueCancelTimer", only enter "timerQueueCancelTimer" but not exited, please confirm! #1273

Closed
August88618 opened this issue Sep 17, 2021 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@August88618
Copy link

I added printing before and after calling the function, as follows:
MUTEX_LOCK(pSampleConfiguration->configurationObjLock);
if (pSampleConfiguration->iceCandidatePairStatsTimerId != MAX_UINT32 && pSampleConfiguration->streamingSessionCount == 0 &&
pSampleConfiguration->iceCandidatePairStatsTimerId != MAX_UINT32) {
AEDA_LOG_INFO(MODULE, "CY1111:timerQueueCancelTimer begin");
CHK_LOG_ERR(timerQueueCancelTimer(pSampleConfiguration->timerQueueHandle, pSampleConfiguration->iceCandidatePairStatsTimerId,
(uint64_t)(uint32_t)pSampleConfiguration));
AEDA_LOG_INFO(MODULE, "CY1111:timerQueueCancelTimer end");
pSampleConfiguration->iceCandidatePairStatsTimerId = MAX_UINT32;
}
MUTEX_UNLOCK(pSampleConfiguration->configurationObjLock);

When our IPC is abnormal,the last time "timerQueueCancelTimer" was called, only begin but not end.The log reference is as follows
errorLog.txt

@August88618 August88618 added the bug Something isn't working label Sep 17, 2021
@hassanctech
Copy link
Contributor

Can you please post which release/version of the SDK you're using and what platform you're on (i.e. Linux/Mac/Windows) and other important details about your platform?

@hassanctech
Copy link
Contributor

Also what is the impact of what you are seeing do you suspect a deadlock? There is nothing blocking in timerQueueCancelTimer except for waiting to lock the executorLock, it looks like we first acquire configurationObjLock then wait for the executorLock perchance another thread is doing this in the reverse order, we will need to investigate.

@August88618
Copy link
Author

A、SDK version is webrtc-sdk-c-update-libwebsockets-4.2-stable,platform is linux,cross-compiling tool named arm-linux-gnueabihf-gcc4.9.4-uclibc1.0.31-
B、Yes,I suspect it's a deadlock,but I don't see timerQueueCancelTimer waiting for ”configurationObjLock“,because I added printing in SessionCleanupWait when it get ”configurationObjLock“,after obtaining the lock, proceed with the following process.Function call procedure:SessionCleanupWait--->...--->get sampleConfigurationObjLock(get lock success)--->...--->get streamingSessionListReadLock(get lock success)--->...--->freeSampleStreamingSession--->...--->(get lock success) timerQueueCancelTimer, only begin but not end.

@ilyaikik
Copy link

As I remember I've faces similar problem.

Try to add IS_VALID checker to this part of code

// De-initialize the session stats timer if there are no active sessions
    // NOTE: we need to perform this under the lock which might be acquired by
    // the running thread but it's OK as it's re-entrant
    MUTEX_LOCK(pSampleConfiguration->sampleConfigurationObjLock);
    if (pSampleConfiguration->iceCandidatePairStatsTimerId != MAX_UINT32 && pSampleConfiguration->streamingSessionCount == 0 &&
        pSampleConfiguration->iceCandidatePairStatsTimerId != MAX_UINT32) {
        CHK_LOG_ERR(timerQueueCancelTimer(pSampleConfiguration->timerQueueHandle, pSampleConfiguration->iceCandidatePairStatsTimerId,
                                          (UINT64) pSampleConfiguration));
        pSampleConfiguration->iceCandidatePairStatsTimerId = MAX_UINT32;
    }
    MUTEX_UNLOCK(pSampleConfiguration->sampleConfigurationObjLock);

to

// De-initialize the session stats timer if there are no active sessions
    // NOTE: we need to perform this under the lock which might be acquired by
    // the running thread but it's OK as it's re-entrant
    MUTEX_LOCK(pSampleConfiguration->sampleConfigurationObjLock);
    if (pSampleConfiguration->iceCandidatePairStatsTimerId != MAX_UINT32 && pSampleConfiguration->streamingSessionCount == 0 &&
        pSampleConfiguration->iceCandidatePairStatsTimerId != MAX_UINT32)
    {
        if (IS_VALID_TIMER_QUEUE_HANDLE(pSampleConfiguration->timerQueueHandle))
        {
            CHK_LOG_ERR(timerQueueCancelTimer(pSampleConfiguration->timerQueueHandle, pSampleConfiguration->iceCandidatePairStatsTimerId,
                                              (UINT64)pSampleConfiguration));
            pSampleConfiguration->iceCandidatePairStatsTimerId = MAX_UINT32;
        }
    }
    MUTEX_UNLOCK(pSampleConfiguration->sampleConfigurationObjLock);

and modify freeSampleConfiguration (sorry for debug messages):

STATUS freeSampleConfiguration(PSampleConfiguration *ppSampleConfiguration)
{
    ENTERS();
    STATUS retStatus = STATUS_SUCCESS;
    PSampleConfiguration pSampleConfiguration;
    UINT32 i;
    UINT64 data;
    StackQueueIterator iterator;
    BOOL locked = FALSE;
    printf("free 0");
    CHK(ppSampleConfiguration != NULL, STATUS_NULL_ARG);
    pSampleConfiguration = *ppSampleConfiguration;
    printf("free 1");
    CHK(pSampleConfiguration != NULL, retStatus);
    printf("free 2");

    if (IS_VALID_TIMER_QUEUE_HANDLE(pSampleConfiguration->timerQueueHandle))
    {
        if (pSampleConfiguration->iceCandidatePairStatsTimerId != MAX_UINT32)
        {
            printf("free 19\n\r");
            printf("FREE 19: %ull, %ull\n\r", pSampleConfiguration->timerQueueHandle, pSampleConfiguration->iceCandidatePairStatsTimerId);
            retStatus = timerQueueCancelTimer(pSampleConfiguration->timerQueueHandle, pSampleConfiguration->iceCandidatePairStatsTimerId,
                                              (UINT64)pSampleConfiguration);
            if (STATUS_FAILED(retStatus))
            {
                DLOGE("Failed to cancel stats timer with: 0x%08x", retStatus);
            }
            pSampleConfiguration->iceCandidatePairStatsTimerId = MAX_UINT32;
        }
        printf("free 20");
        if (pSampleConfiguration->pregenerateCertTimerId != MAX_UINT32)
        {
            printf("free 21");
            retStatus = timerQueueCancelTimer(pSampleConfiguration->timerQueueHandle, pSampleConfiguration->pregenerateCertTimerId,
                                              (UINT64)pSampleConfiguration);
            if (STATUS_FAILED(retStatus))
            {
                DLOGE("Failed to cancel certificate pre-generation timer with: 0x%08x", retStatus);
            }
            pSampleConfiguration->pregenerateCertTimerId = MAX_UINT32;
        }
        printf("free 22");
        timerQueueFree(&pSampleConfiguration->timerQueueHandle);
    }

    if (pSampleConfiguration->pPendingSignalingMessageForRemoteClient != NULL)
    {
        // Iterate and free all the pending queues
        stackQueueGetIterator(pSampleConfiguration->pPendingSignalingMessageForRemoteClient, &iterator);
        while (IS_VALID_ITERATOR(iterator))
        {
            stackQueueIteratorGetItem(iterator, &data);
            stackQueueIteratorNext(&iterator);
            printf("free 3");
            freeMessageQueue((PPendingMessageQueue)data);
        }
        printf("free 4");
        stackQueueClear(pSampleConfiguration->pPendingSignalingMessageForRemoteClient, FALSE);
        stackQueueFree(pSampleConfiguration->pPendingSignalingMessageForRemoteClient);
        pSampleConfiguration->pPendingSignalingMessageForRemoteClient = NULL;
    }
    printf("free 5");
    hashTableClear(pSampleConfiguration->pRtcPeerConnectionForRemoteClient);
    hashTableFree(pSampleConfiguration->pRtcPeerConnectionForRemoteClient);
    printf("free 6");
    if (IS_VALID_MUTEX_VALUE(pSampleConfiguration->sampleConfigurationObjLock))
    {
        MUTEX_LOCK(pSampleConfiguration->sampleConfigurationObjLock);
        locked = TRUE;
    }
    printf("free 7");
    for (i = 0; i < pSampleConfiguration->streamingSessionCount; ++i)
    {
        retStatus = gatherIceServerStats(pSampleConfiguration->sampleStreamingSessionList[i]);
        if (STATUS_FAILED(retStatus))
        {
            DLOGW("Failed to ICE Server Stats for streaming session %d: %08x", i, retStatus);
        }
        printf("free 8");
        freeSampleStreamingSession(&pSampleConfiguration->sampleStreamingSessionList[i]);
    }
    printf("free 9");
    if (locked)
    {
        MUTEX_UNLOCK(pSampleConfiguration->sampleConfigurationObjLock);
    }
    printf("free 10");
    // deinitKvsWebRtc();
    printf("free 11");
    SAFE_MEMFREE(pSampleConfiguration->pVideoFrameBuffer);
    SAFE_MEMFREE(pSampleConfiguration->pAudioFrameBuffer);
    printf("free 12");
    if (IS_VALID_CVAR_VALUE(pSampleConfiguration->cvar) && IS_VALID_MUTEX_VALUE(pSampleConfiguration->sampleConfigurationObjLock))
    {
        CVAR_BROADCAST(pSampleConfiguration->cvar);
        MUTEX_LOCK(pSampleConfiguration->sampleConfigurationObjLock);
        MUTEX_UNLOCK(pSampleConfiguration->sampleConfigurationObjLock);
    }
    printf("free 13");
    if (IS_VALID_MUTEX_VALUE(pSampleConfiguration->sampleConfigurationObjLock))
    {
        MUTEX_FREE(pSampleConfiguration->sampleConfigurationObjLock);
    }
    printf("free 14");
    if (IS_VALID_MUTEX_VALUE(pSampleConfiguration->streamingSessionListReadLock))
    {
        MUTEX_FREE(pSampleConfiguration->streamingSessionListReadLock);
    }
    printf("free 15");
    if (IS_VALID_MUTEX_VALUE(pSampleConfiguration->signalingSendMessageLock))
    {
        MUTEX_FREE(pSampleConfiguration->signalingSendMessageLock);
    }
    printf("free 16");
    if (IS_VALID_CVAR_VALUE(pSampleConfiguration->cvar))
    {
        CVAR_FREE(pSampleConfiguration->cvar);
    }
    printf("free 17");
    freeStaticCredentialProvider(&pSampleConfiguration->pCredentialProvider);
    printf("free 18");
    // if (IS_VALID_TIMER_QUEUE_HANDLE(pSampleConfiguration->timerQueueHandle)) {
    //     if (pSampleConfiguration->iceCandidatePairStatsTimerId != MAX_UINT32) {
    //         printf("free 19\n\r");
    //         printf("FREE 19: %ull, %ull\n\r", pSampleConfiguration->timerQueueHandle, pSampleConfiguration->iceCandidatePairStatsTimerId);
    //         retStatus = timerQueueCancelTimer(pSampleConfiguration->timerQueueHandle, pSampleConfiguration->iceCandidatePairStatsTimerId,
    //                                           (UINT64) pSampleConfiguration);
    //         if (STATUS_FAILED(retStatus)) {
    //             DLOGE("Failed to cancel stats timer with: 0x%08x", retStatus);
    //         }
    //         pSampleConfiguration->iceCandidatePairStatsTimerId = MAX_UINT32;
    //     }
    //     printf("free 20");
    //     if (pSampleConfiguration->pregenerateCertTimerId != MAX_UINT32) {
    //         printf("free 21");
    //         retStatus = timerQueueCancelTimer(pSampleConfiguration->timerQueueHandle, pSampleConfiguration->pregenerateCertTimerId,
    //                                           (UINT64) pSampleConfiguration);
    //         if (STATUS_FAILED(retStatus)) {
    //             DLOGE("Failed to cancel certificate pre-generation timer with: 0x%08x", retStatus);
    //         }
    //         pSampleConfiguration->pregenerateCertTimerId = MAX_UINT32;
    //     }
    //     printf("free 22");
    //     timerQueueFree(&pSampleConfiguration->timerQueueHandle);
    // }
    printf("free 23");
    if (pSampleConfiguration->pregeneratedCertificates != NULL)
    {
        stackQueueGetIterator(pSampleConfiguration->pregeneratedCertificates, &iterator);
        while (IS_VALID_ITERATOR(iterator))
        {
            stackQueueIteratorGetItem(iterator, &data);
            stackQueueIteratorNext(&iterator);
            printf("free 24");
            freeRtcCertificate((PRtcCertificate)data);
        }
        printf("free 25");
        CHK_LOG_ERR(stackQueueClear(pSampleConfiguration->pregeneratedCertificates, FALSE));
        CHK_LOG_ERR(stackQueueFree(pSampleConfiguration->pregeneratedCertificates));
        pSampleConfiguration->pregeneratedCertificates = NULL;
    }
    printf("free 26");
    MEMFREE(*ppSampleConfiguration);
    *ppSampleConfiguration = NULL;

CleanUp:
    printf("free 27");
    LEAVES();
    return retStatus;
}

The main thing is to Cancel timers before anything.

@August88618
Copy link
Author

Let me confirm it,Is this problem not located at the root cause? Do all debugging prints need to be merged? And why did you block “deinitKvsWebRtc()”?

@hassanctech hassanctech linked a pull request Sep 23, 2021 that will close this issue
@hassanctech hassanctech removed a link to a pull request Sep 23, 2021
@disa6302 disa6302 reopened this Sep 23, 2021
@August88618
Copy link
Author

The problem was not solved, and it reappeared after the changes were merged.Log is the same as above,only begin but not exit.

@disa6302
Copy link
Contributor

@August88618 ,

I misunderstood the issue to be linked
Reopened, so this issue is still being tracked

@hassanctech hassanctech mentioned this issue Sep 24, 2021
@hassanctech
Copy link
Contributor

Sorry it appears there was some confusion, I'm linking PRs to the issue so when the correct one is merged I will take care of closing the issue.

@August88618 can you confirm if you were able to try the change from @ilyaikik and if it worked for you? I have a PR right now with those changes but I kept deinitKvsWebRtc(); as its needed when there's a data channel to clean up libsrtp resources.

@hassanctech hassanctech self-assigned this Sep 24, 2021
@August88618
Copy link
Author

Do you mean merge all Fix bugs #1279 changes to our branches?

@ilyaikik
Copy link

Let me confirm it,Is this problem not located at the root cause? Do all debugging prints need to be merged? And why did you block “deinitKvsWebRtc()”?

Didn't understand your first question, sorry.
No, no debug messages needs to be merged (sorry for that).
deinitKvsWebRtc() is blocked because I modified the code a little bit and call this function in other place. So it do not need to be blocked here.

Actually I did not expect that this code will be PRed as it is. It's better to check funtionality)
But in my case this changes except extra debug messages and blocked deinitKvsWebRtc() solve the problem of stuck in timerQueueCancelTimer.

I could clear the code, check it and prepare it for PR if needed.

@hassanctech
Copy link
Contributor

@August88618 If you can pull the latest from the develop branch and test it should be fixe there. @ilyaikik feel free to try it as well this should be fixed there as well.

@hassanctech
Copy link
Contributor

Closing because this should be resolved on the latest develop branch.

@August88618
Copy link
Author

The problem was not solved, and it reappeared after the changes were merged.Log is the same as above,only begin but not exit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants