Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Broadcast sending logic does not handle the lack of SND buffer space correctly. #2959

Closed
maxsharabayko opened this issue Jun 5, 2024 · 2 comments · Fixed by #2966
Closed
Labels
[core] Area: Changes in SRT library core Priority: Critical Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@maxsharabayko
Copy link
Collaborator

Assuming a broadcast connection with two member links A and B.
At some point sending link A may not have enough space to buffer a payload for sending, while the other member B does.
Then the next time the next payload is submitted to link A it is assigned the wrong sequential number of that payload that failed to get a space in the SND buffer.
Then there is a mess, that can even be observed in Wireshark: packets with the same seqno coming from different members have different payload.

Steps to Reproduce

The issue can be easily reproduced on a localhost.
SRT latency: 6s.
Bitrate 50 Mbps.
Default RCV and SND buffer sizes, that are not enough for this streaming configuration.

srt-xtransmit receive srt://:4200?groupconnect=1 -v --enable-metrics

srt-xtransmit generate "srt://127.0.0.1:4200?grouptype=broadcast&latency=6000" srt://127.0.0.1:4200 --sendrate 50Mbps -v --enable-metrics --duration 10s

The following can be observed in the standard output:

11:15:07.812449 [W] [METRICS] Detected loss of 54 packets (seqno [20612; 20666))
11:15:07.812673 [W] [METRICS] Detected reordered packet, seqno 20651, expected 20667 (dist 16)
11:15:07.812862 [W] [METRICS] Detected reordered packet, seqno 20652, expected 20667 (dist 15)
11:15:07.813048 [W] [METRICS] Detected reordered packet, seqno 20653, expected 20667 (dist 14)

Reordered packet must never come out of an SRT socket!

Details

The issue is on the sender side. For example, here is a log from the receiver. There is no sequence discontinuity neither in pktseqno (%pktseqno), nor in msgno (#msgno), but the first four bytes of the payload do not increase sequentially.
First, there is a gap 0x5083 -> 0x50BA on the same member link. Then the following payload is read from another member, and it is not out of order: 0x50BA -> 0x50AB.

11:15:07.812000/T43264.N:SRT.gr: grp/recv: $2145967382: @1072225559: Extracted data with %970690259 #18541: 0x5083
11:15:07.812000/T43264.N:SRT.gr: grp/recv: $2145967382: @1072225559: Extracted data with %970690260 #18542: 0x50BA
11:15:07.812000/T43264.N:SRT.gr: grp/recv: $2145967382: @1072225557: Extracted data with %970690261 #18556: 0x50AB
11:15:07.812000/T43264.N:SRT.gr: grp/recv: $2145967382: @1072225557: Extracted data with %970690262 #18557: 0x50AC

SRT v1.5.3 and likely prior versions.

@maxsharabayko maxsharabayko added Priority: Critical Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels Jun 5, 2024
@maxsharabayko maxsharabayko added this to the v1.5.4 milestone Jun 5, 2024
@maxsharabayko
Copy link
Collaborator Author

The way to resolve the bug, given each member has its own SND buffer, is to break the connection if it does not have enough space. And report an error. The application can decide to reestablish a connection with a new SND buffer that can accept further packets. It is not the cleanest way, but at least the out-of-order packets bug could be resolved.

Suggested srt_sendmsg2 behavior in case there is not space in the SND buffer of one or several member connection:

  • One link in the group - SRT_EASYNCSND
  • Several links in a group - all don't have space - SRT_EASYNCSND
  • Several links in a group - at least one has space - break those that don't have space.

The SND buffer size configuration is the same for all members of a group. So maybe all the links would end up being broken except the last one that will be returning SRT_EASYNCSND.

@ethouris
Copy link
Collaborator

I think what you described as cases, 1 and 2 are the same. We still have two cases then: ALL or SOME. If SOME, proceed, but just kill failed ones; if ALL (even if it's only one), report async error, or wait for ability in blocking mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Priority: Critical Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants