Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Fixed proper reporting of sending blocked state. Fixed API value for connect-non-blocking #1698

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ethouris
Copy link
Collaborator

@ethouris ethouris commented Dec 7, 2020

  1. Collecting information about blocked sending (with none succeeded)
  2. In non-blocking mode, this should result in reporting AGAIN error
  3. In blocking mode, it should block until at least one socket is ready for sending, and then only report errors if none succeeded.

Additionally: the value returned in case of blocking-mode srt_connect_group it should return the socket ID of the first successful socket, while in case of non-blocking mode it always returns 0.

Findings (update):

NOT updating, turning to draft.

@ethouris ethouris changed the title [core] Fixed proper reporting of sending blocked state. Fixed API val…ue for connect-non-blocking [core] Fixed proper reporting of sending blocked state. Fixed API value for connect-non-blocking Dec 7, 2020
@ethouris ethouris marked this pull request as draft December 7, 2020 14:26
… exit checks in tsbpd. Suppressed IPE log when unsubscribing
srtcore/core.cpp Outdated
}

HLOGC(tslog.Debug, log << self->CONID() << "tsbpd: WAKE UP!!!");
}
ExitLoops:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new goto statements, please.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to exit two loops, and additionally maintain the thread checker.

srtcore/core.cpp Outdated
Comment on lines 5784 to 5796
bool signaled = false;
while (!signaled)
{
// For safety reasons, do wakeup once per 1s and re-check the flag.
THREAD_PAUSED();
signaled = tsbpd_cc.wait_for(seconds_from(1));
THREAD_RESUMED();
if (self->m_bClosing)
{
HLOGC(tslog.Debug, log << "tsbpd: IPE? Closing flag set in the meantime of waiting. Exiting");
goto ExitLoops;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just let TSBPD to re-check the receiver buffer even in case of a spurious wake-up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes, might be a better idea. OTOH this 1s delay is too much if that deadlock would have happened when closing a socket in the group receiver.

@ethouris ethouris marked this pull request as ready for review December 8, 2020 16:43
srtcore/group.cpp Outdated Show resolved Hide resolved
srtcore/group.cpp Outdated Show resolved Hide resolved
Comment on lines 3402 to 3412
int swait_timeout = 0;

// There's also a hidden condition here that is the upper if condition.
is_pending_blocked = (nsuccessful == 0);

// If this is the case when
if (m_bSynSending && is_pending_blocked)
{
HLOGC(gslog.Debug, log << "grp/sendBroadcast: will block for " << m_iSndTimeOut << " - waiting for any writable in blocking mode");
swait_timeout = m_iSndTimeOut;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication: the same was used in send_broadcast... Consider some common funtion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will. There's a PR for it, but we need to be clear off with all potential problems to solve first. There are marked places that should make this process easy enough. Note however that the things happen differently in broadcast and backup, which means that even conditions for getting sockets blocked are not exactly the same.

srtcore/core.cpp Outdated
Comment on lines 11234 to 11237
void CUDT::updateBrokenConnection()
{
HLOGC(smlog.Debug, log << "updateBrokenConnection: setting closing=true and taking out epoll events");
m_bClosing = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fix require other changes introduced in the PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply a completion for all cases when releaseSync is called, which is connected to the fact that releaseSync is about to make TSBPD thread exit, and it can only exit if m_bClosing == true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does only this change make sense without the rest changes in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest of the changes are formal fixes for real problems, others - including this one - are "sanity fixes". This flag must be set true when the TSBPD thread is expected to exit otherwise it risks a deadlock. Relying on that a flag is set appropriately before exitting isn't the best approach, but it's simply derived from UDT and it's a consequence of how the TSBPD thread is expected to work. Changes here were due to an in-between lifted lock on m_RecvLock, which was newly introduced with the groups.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this fix to a separate PR, so it is not blocked by the review of other changes introduced here.

ethouris and others added 2 commits December 8, 2020 18:06
Co-authored-by: Maxim Sharabayko <maxlovic@gmail.com>
@codecov

This comment has been minimized.

@maxsharabayko maxsharabayko added [core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior labels Dec 10, 2020
@maxsharabayko maxsharabayko added this to the v1.4.3 milestone Dec 10, 2020
srtcore/core.cpp Outdated
@@ -7819,15 +7854,20 @@ void CUDT::destroySynch()
void CUDT::releaseSynch()
{
SRT_ASSERT(m_bClosing);
if (!m_bClosing)
{
HLOGC(smlog.Debug, log << "releaseSynch: IPE: m_bClosing not set to false, TSBPD might hangup!");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might rather be an error message: LOGC(smlog.Error, ...).

Comment on lines +1691 to +1700
// Return value rules:
// In non-blocking mode:
// - return Socket ID, if:
// - you requested only one connection in this call
// In blocking mode:
// - return Socket ID, if:
// - you requested only one connection in this call
// - you connect a group that was not connected yet
// - otherwise return 0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if you requested only one connection in this call"
Not clear. I guess the meaning is that there is only one target passed to this group connect function (arraysize).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that it's so if you do call srt_connect or if you do srt_connect_group with an array of size 1.

Comment on lines +1703 to +1704
if (arraysize > 1)
retval = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that arraysize is 400 lines above from this line, it would make sense to make it const, so that no one mistakenly changes it in between.

int CUDTUnited::groupConnect(CUDTGroup* pg, SRT_SOCKGROUPCONFIG* targets, const int arraysize

Comment on lines +5793 to +5802
{
// For safety reasons, do wakeup once per 1/8s and re-check the flag.
// This should be enough long time that during a normal transmission
// the TSBPD thread would be woken up much earlier when required by
// ACK per ACK timer (at most 10ms since the last check) and in case
// when this might result in a deadlock, it would only hold up to 125ms,
// which should be little harmful for the application. NOTE THAT THIS
// IS A SANITY CHECK FOR A SITUATION THAT SHALL NEVER HAPPEN.
THREAD_PAUSED();
signaled = tsbpd_cc.wait_for(milliseconds_from(125));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will hide possible synchronization issues, and it would much harder to track them.
Also might introduce a potential minor performance degradation.

CUDT::releaseSynch() notifies m_RcvTsbPdCond used in this tsbpd_cc. If it happens to be blocked forever, then there is another issue to fix.

Copy link
Collaborator Author

@ethouris ethouris Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only a sanity check, and I doubt the performance degradation here, although might be that 250ms would be a less likely watchdog to bite, while applications also usually have their own independent watchdogs. Initially I planned 1s here, but that would be way too much. I understand that it might make errors kinda hidden, but OTOH a deadlock here would make a scrap in video transmission (so it won't go "invisible", just it won't make a total disaster).

Note also that releaseSynch won't be called another time, that's true, but TSBPD should simply exit on seeing m_bClosing flag anyway.

@maxsharabayko maxsharabayko modified the milestones: v1.4.3, v1.4.4 Apr 13, 2021
@maxsharabayko maxsharabayko modified the milestones: v1.4.4, Backlog Aug 18, 2021
@ethouris ethouris marked this pull request as draft September 24, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants