SRT Protocol Technical Overview #479

stevomatthews · 2018-10-17T20:31:28Z

Over the past several months, some of the original SRT developers have been working on documenting the protocol as we work towards bringing it to the point where an RFC can be submitted. You will find attached an initial draft release of the SRT Protocol Technical Overview, which we invite you to review. All comments will be addressed in subsequent releases.

SRT_Protocol_TechnicalOverview_DRAFT_2018-10-17.pdf

maxsharabayko · 2018-10-18T10:15:02Z

The Drop Request packet structure is missing in the spec.

ethouris · 2018-10-18T13:19:33Z

Findings

Page 5:

" * Periodic NAK reports (UDT4 disabled this draft feature in favor of timeout retransmission of unACKed packets, which consumes too much bandwidth for real-time streaming applications)."

(... plus the continued description on page 6, also details referred to on Page 22).

This is kinda more complicated, but not sure how this can be explained in short.

The LOSSREPORT sent immediately after seeing a loss is only increasing a probability
to recover packets quickly. It may fail, however, by losing the report itself, or
losing again the retransmitted packets.

This situation is "ultimate retransmission failure" and it needs a fallback in form
of one of these:

Ultimate retransmission. The retransmission process must be repeated stubbornly
until it finally succeeds.
Conditional dropping. There are still lost packets, but recovering them is given
up and the receiver agrees to continue delivery with loss.

As UDT was about the file transmission, it could not allow for conditional dropping,
so it must have done Ultimate retransmission. UDT was doing it by repeated
Blind retransmission using Late retransmission method.

Blind retransmission means is sending everything since last ACK again - "blind"
because the sender is unaware if any of the packets in this range were received
or not, but then it is certain of the fate of the packets only up to ACK either,
so it should state the worst possible scenario that all of them are lost. This
retransmission is cyclically "stubbornly" repeated until the ACK is finally moved
forwards.

The method named as Late retransmission means to perform Blind retransmission
only when the loss list is empty and the maximum flight window is reached.

SRT added the following new mechanisms:

"Periodic NAKREPORT": the receiver is sending the LOSSREPORT with all lost
packets seen so far after the time, in which the retransmission was expected, has
passed. As "periodic", this step is repeated until the ACK is finally moved
forward as expected.
"Too-late packet drop", that is, conditional dropping, which happens when the
latency tolerance for recovering previous lost packets has passed. Without this,
any "ultimate retransmission failure" handler would have to be repeated infinitely
until the success, at the expense of stopping the transmission until then. This
simply causes packets to be artificially ACK-ed, faking this way successful
retransmission on the receiver side.
"FASTREXMIT". This is a faster replacement for Late retranmission, that
is, it performs Blind retransmission repeatedly just on a premise that ACK
did not move forward in time in which it was expected to happen. This feature
is turned off in case when "Periodic NAKREPORT" is on, as the latter is deemed
enough efficient in retransmission as well as it causes less burden on a link
bandwidth.

I think in short it might be described as:

"Periodic NAKREPORT" which increases probability of a successful retransmission
by sending the loss report repeatedly, in case when it lately failed by
experiencing the loss again. Together with conditional dropping of too late
packets it made unnecessary the blind retransmission - that is, retransmitting
all packets that are unACK-ed, the only fallback mechanism for a failed
retransmission in UDT.

Exchange of keying material and decryption status using a UDT user-defined control packet (sender knows if a receiver can properly decrypt stream)

In the latest SRT with HSv5, this is exchanged in the handshake. In HSv4 also
the sender is completely unaware that the receiver cannot decrypt the stream,
when the passphrase is wrong, but has a correct length.

I'd say that it can be exchanged by either extended control messages, or in the handshake.

Page 8:

The picture for the control packet has the "Reserved" word for a part that is
later referred to as "Extended type".

Page 9:

The "TIMESTAMP" and "Destination Socket ID" fields could be at least mentioned
that they are used as well with merely the same interpretation as described
at the data packets. I think it's worth to mention that the case when the
destination socket ID may have a value 0 may happen only in case of a handshake
control packet, and never in any other, especially data packets.

The weird names of symbols are used in the table: SRT_HSREQ etc.. In the
code these names are rather SRT_CMD_HSREQ. I think it would be even better
if there are two separate tables, one which states the value of Type with
Extended type 0, and one with Type = 0x7FFF and given value of Extended Type.

This is also weird that some names are versalized, others capitalized, and
sometimes there's something that looks like a symbol. It would be better if
there's one convention: either capitalized descriptive names, or symbols
used in the code, or the latter first, then some description like:

UMSG_KEEPALIVE (sent in case of absence of other meaningful packets to
clear the connection watchdog)
UMSG_LOSSREPORT (NAK, reports missing packets)
UMSG_DROPREQ (informs the peer that requested packets are no longer
available)

Page 15:

I think it's worth mentioning that the timestamps are 32-bit numbers with
a relative time, and this may therefore get overflown after some time.
The overflow should be handled in the handler of the timestamps, currently
TSBPD, by tracking the moment when the time value overflows to handle this
case properly and record the carryover. Currently SRT doesn't use the timestamps
from data packets for any other purpose than TSBPD latency tracking, as well as
the handshake process is too short living to undergo 32-bit time overflow.

Page 16:

Interesting to mention may be that the sender relies on the predefined sending
speed expressed as a time gap between two consecutive packet sending over
the same UDP socket (snd period). New packets are scheduled for sending upon
necessity (including packets for retransmission), but they are only picked up
and sent over the socket at the moment when this time gap is ensured to have
passed, otherwise the sender thread is sleeping for the remaining time.

Page 19:

No, this time of course isn't absolute, but it isn't local either - it's
relative. That is, it counts the time passed since the "time base", where
the time base for the caller is the time when it started the handshake,
and on the listener it's the time when it was contacted by the caller.
This time base on both peers is then probably shifted by half of an
optimistic RTT, but this is treated as negligible. The time is overall
calculated as a shift towards this base at any time later during the
transmission, with taking into account the recorded carryover values when the
32-bit time value overflow has been observed.

Page 20:

I think the NAK is described in a wrong order.

When the packet #4 arrives, but not packet #3, what is happening in this
situation is:

The UMSG_LOSSREPORT message is sent immediately and it contains the
range from the packet following the last received one and the packet
preceding the currently received one, which may be also a single sequence, if
it turns out to be just one.
The loss is recorded in the receiver loss list for the purpose of
further "Periodic NAKREPORT". It is then later being sent periodically
to increase the probability of a successful retransmission in case when
the retransmission failed (by losing the report or the retransmitted packet).

There's also a mechanism of delaying the lossreport until a configured
number of packets following the loss is received, in order to prevent sending
the lossreport in case when the loss was caused by packet reordering
(although this mechanism is turned off by default).

Page 34:

I'd describe more about the multiplexer.

The mechanism called "Multiplexer" in UDT is actually a facility that is
directly bound to the "Channel" (the wrapper for a UDP socket). What is
important about this facility is that this is a gate between an SRT socket and
the Channel (that is, the UDP socket). On the input side it allows the packet
received from the Channel to be dispatched to the right SRT socket (as defined
in the "Destination Socket ID" field in the SRT header), or cause a new socket
to be created, if it's a connection requesting handshake packet (socket id
value 0). On the output, it gets packets that are requested from various SRT
sockets so that they get sent through the given channel. Often there's just one
SRT socket associated with a given channel, and in such a situation the
multiplexer is just a useless interceptor. But the multiplexing-dispatching
features are used when more than one SRT socket is associated with one Channel,
that is:

When the socket is an accepted socket out of a listener socket

An SRT listener is simply bound to the UDP port as declared in the SRT listener
binding port. It means, however, that the UDP socket created for that purpose
and bound to this required UDP port will be used for every accepted socket as
well, both for data received in the frames of the connection, as well as for
sending packets required to be sent through this accepted SRT socket.
Therefore the multiplexer originally created by the listener is shared between
all accepted sockets from this listener.

When more than one "connectable" socket is bound to the same local port

Normally you don't bind the socket you want to use to do srt_connect on it,
relying on that a new UDP socket will be assigned and the local port will be
generated by the system. But you can also manually bind the socket by
explicitly calling srt_bind, this way enforcing it to use particular local
port number. If you use the same port in two SRT sockets in one application
(including when you just reuse the port number that was system-generated
for another SRT socket), this socket will derive the sender queue, multiplexer
and channel from the socket that is already using this port (this feature can
be disabled by unsetting SRTO_REUSEADDR option), so these facilities will be
shared between these two sockets, just like it's shared between sibling
accepted sockets.

This mechanism is then a must in case of a listener socket, whereas for
"connectable" sockets it's a nice feature that may be used to mitigate several
troubles you might have with the firewall settings. If we define that a pair
of two UDP sockets on two hosts set up with appropriate port number, with
UDP packets being sent from one to another and the other way around, form a
"UDP channel", you can use one such UDP channel to handle multiple SRT
connections at a time, and the network facilities in the middle are none the
wisers.

Page 35:

The Congestion Control Mechanism has been completely reinvented since the 1.3.0
version. The description on this page may be misleading as referring to CC from
UDT, which since this version no longer exists. Some more description can be
provided, however as I can see there are some extended descriptions provided by
the "handshake.md" document. I think istead of having them so thoroughly
described there, they can be moved to a place with more general description, as
this doesn't touch upon the handshake itself.

Actually the only thing about the Smoother type that is important for the
handshake is that this should be set exactly the same on both parties,
otherwise the connection will be rejected.

Page 71:

The scheme for the packet breakdown of HSv5 handshake contains the pink
background (so, I understand, intended to be constant) field named
"TsbPd Resv = 0". In HSv5, both these "TsbPd" fields are in use and both
define TsbPdDelay, just in different direction.

There's an explanation at the end of page, but defining things this way
is wrong in this scheme because this scheme with handshake structure with
extensions is used exclusively in HSv5, and in HSv5 the meaning of these
fields is also always the same. It is different in HSv4 - but then, HSv4
isn't done through the handshake extensions, but with separate messages,
for which the breakdown description is added separately.

Page 76:

The table, like it was also earlier, contains "4a17h", which defines a magic
code designated as SrtHsRequest::SRT_MAGIC_CODE with value sometimes defined
as 0x4A17. I think it would be better if it can be made more consistent.

Don't know if this is important, but it might be mentioned that this value
was initially 32-bit and it was 0x4A171510 and the intent was to mimic
the word HAIVISIO(n), however the necessity of introducing the PBKEYLEN
advertisement forced to cut this to the first 16 bits, hence 0x4A17.

maxsharabayko · 2018-10-18T14:40:08Z

The TLPKTDROP abbreviation is not explained.
Suggested:

Too-late Packet Drop. When enabled on the receiver, it skips
missing packets that have not been delivered in time and
delivers the following packets to the application when
their time-to-play has come. It also sends a fake ACK to
the sender. When enabled on the sender and enabled on the
receiving peer, the sender drops the older packets that
have no chance of being delivered in time. It was
automatically enabled in the sender if the receiver supports it.

stoyanovgeorge · 2019-02-20T12:50:39Z

@maxlovic , can you also add some information about the srt-multiplex usage? Currently, there is only some scant information in the issues of people experimenting with it. I wasn't able to find anything else in the official documentation.

stevomatthews · 2019-03-12T19:20:38Z

@stoyanovgeorge Your request has been addressed in PR-602.

maxsharabayko · 2019-04-04T15:01:00Z

Field StreamID in Handshake v5 contains text field of a variable length , multiple to 4 bytes. Each 4 bytes are stored as BIG_ENDIAN. E.G. when `StreamID="HISTORY", then the field will have the value "TSIH YRO". Might be worth mentioning in the Tech. Spec. This reordering is done in fact for the whole control packet, so each textual field would be affected.

ethouris · 2019-05-07T06:43:30Z

Now that the big-endian fix has been added, we can now fix it in the protocol description.

Added direct link to PDF instead of to Issue Haivision#479

maxsharabayko added the [docs] Area: Improvements or additions to documentation label Feb 11, 2019

stevomatthews added a commit to stevomatthews/srt that referenced this issue Jul 4, 2019

Changed link

475d171

Added direct link to PDF instead of to Issue Haivision#479

ethouris added the Status: Revision Needed label Aug 13, 2019

ethouris added this to the v1.4.0 milestone Aug 13, 2019

ethouris mentioned this issue Sep 2, 2019

Updated doc with proper description #845

Merged

rndi closed this as completed in #845 Sep 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SRT Protocol Technical Overview #479

SRT Protocol Technical Overview #479

stevomatthews commented Oct 17, 2018

maxsharabayko commented Oct 18, 2018

ethouris commented Oct 18, 2018

maxsharabayko commented Oct 18, 2018

stoyanovgeorge commented Feb 20, 2019 •

edited

Loading

stevomatthews commented Mar 12, 2019

maxsharabayko commented Apr 4, 2019

ethouris commented May 7, 2019

SRT Protocol Technical Overview #479

SRT Protocol Technical Overview #479

Comments

stevomatthews commented Oct 17, 2018

maxsharabayko commented Oct 18, 2018

ethouris commented Oct 18, 2018

Findings

maxsharabayko commented Oct 18, 2018

stoyanovgeorge commented Feb 20, 2019 • edited Loading

stevomatthews commented Mar 12, 2019

maxsharabayko commented Apr 4, 2019

ethouris commented May 7, 2019

stoyanovgeorge commented Feb 20, 2019 •

edited

Loading