-
Notifications
You must be signed in to change notification settings - Fork 899
Transport Protocol Specification
Aeron attempts to address the following:
- high-throughput and low-latency communication for unicast and multicast
- reliable multicast operation for modest receiver set size (< 100 receivers)
- multiple transmission media support (UDP, InfiniBand, Shared Memory, etc.)
- multiple streams that can provide different QoS
- effective flow and congestion control for multicast and unicast
- receiver application paced flow control on a per stream basis
- easy monitoring of buffering on a per stream basis
- flow control tied to message processing (as opposed to just delivery for processing later)
Aeron is influenced by several other modern protocols. Such as, but not limited to, SPDY, HTTP/2, IPv6, and WebSocket.
This document covers the protocol specification for Aeron transport. This is separate from the archive and cluster control protocols which operate at higher layers.
- Transmission Media: Generic term used to indicate the media over which the protocol runs. Can be UDP, InfiniBand, shared memory, etc.
- Media Driver: Driver for reading/writing from/to transmission media for Aeron.
- Publisher: The client application which produces messages by offering them into a Publication.
- Publication: The source of a message stream identified by channel, stream id, and session id.
- Publication Image: Rebuilt image of a Publication by a Receiver and associated with a Subscription.
- Subscriber: The client application which consumes messages from Publication Images captured by a Subscription.
- Sender: The media driver endpoint which sends the messages produced by the client publisher.
- Receiver: The media driver endpoint which receives messages sent by the Sender.
- Driver Subscription: The media driver in charge of message receipt. These messages are passed on to client Subscriber applications.
- Session: A unique association that identifies a single Publication as source and all Subscriptions for that Publication which can span many receivers in the case of Multicast or Multi-Destination-Cast.
- Session ID: A unique identifier for a Session from a given source.
- Channel: A transmission media needs to have a means of identifying a flow of data and the addressing model of the media. For Aeron, this is called a Channel. For different transmission media, the channel is defined differently. In general, a URI is used for specifying a channel.
- Physical Source: Source of a Session.
- Physical Receiver: Receiver of a Session.
- Stream: Is an ordered stream of messages scoped within a channel that can have multiple instances over time identified by session id, stream id, and channel.
- Stream ID: A unique identifier for a Stream multiplexed within a channel. The value of 0 is reserved.
- Term: A section of data within a Stream. Each Term is associated with a Media Driver send and receive log buffer. The length of a Term must be a power of two and must be the same length on both ends.
- Term ID: A unique identifier for a Term within a Stream. Starts randomly. Must increase monotonically. Can wrap around. Can not go back to a wrapped value thus having a maximum position of term length times the range of term ids.
- Term Offset: Identifier of a single byte within the Term. Always starts at 0. This is the index of the byte within a given term starting from the beginning.
- Frame: The unit of data for Aeron. Measured in bytes. The transmission media may include multiple Frames into a single packet of data for batching.
- Message: The unit of data for the application (hence aka APDU Application Data Unit). A single Message may be fragmented over multiple Frames. Alternatively, a single Message may fit into a single Frame. A Message (all of its Fragments) must fit into a single Term.
- Fragment: The unit of data for a fragmented Message that fits into a single Frame.
The Aeron protocol is designed to be run directly over many different types of transmission media, including shared memory/IPC, InfiniBand/RDMA, UDP, TCP, Raw IP, HTTP, WebSocket, BLE, etc. This means that the following assumptions are made:
- Transmission Media may be a stream media, such as TCP or RDMA without inherent frame boundaries.
- Transmission Media may have only unicast modes of operation.
- Byte ordering of fields of length 16-bits and larger use Little Endian. This is for pure efficiency on performance sensitive platforms. Sub-byte ordering is not a concern as the byte is treated as the atomic unit.
Aeron is a transport protocol and may operate over unreliable media to provide a reliable connection orientated stream as an OSI Transport Layer. For this reason, some additional assumptions must be embraced, which Aeron will detect and correct, such as:
- Duplication of packets may occur
- Packets may be lost
- Packets may arrive out of order
Aeron assumes some operational conditions, such as:
- Low number of Streams. As each Term is a buffer for the Media Driver, and a Stream has a number of terms, the number of active Streams is assumed to be less than 1000 or even less than 100.
Aeron is designed to work hand-in-hand with the underlying concurrent data structures it is founded upon. In this regard, the header layout has a dual purpose in that it is also the data structure framing layout. This leads to a symbiotic relationship between the data structure and the base protocol operation, framing, etc.
Session IDs and initial Term IDs need to be generated in a pseudo-random manner. Term IDs will progress monotonically after generation, but need to start out randomly. Applications may set Session IDs to a specific value. However, it is up to the application to use a temporally unique Session ID in such cases.
Sender: +SETUP, *[DATA | DATA-PAD | HEARTBEAT | RTTM | RTTM-R], *[DATA | DATA-PAD | HEARTBEAT-EOS]
Receiver: +[STATUS | STATUS-SETUP], *[STATUS | NAK | RTTM | RTTM-R]
Connect
=======
Sender: +SETUP +SETUP
\ / \
Receiver: +[STATUS | STATUS-SETUP] +STATUS
Flow Control
============
Sender: *DATA
/
Receiver: +STATUS
Loss Recovery
=============
Sender: +[DATA | DATA-PAD | HEARTBEAT] +[DATA | DATA-PAD]
\ /
Receiver: NAK
RTT Measurement
===============
Sender: RTTM-R RTTM
\ /
Receiver: RTTM | RTTM-R
Aeron Frames begin with a header. The specifics of the header change based on operational type, but the general layout is given below.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Frame Length |
+---------------+---------------+-------------------------------+
| Version | Flags | Type |
+---------------+---------------+-------------------------------+
| Depends on Type ...
- Frame Length: (31 = max 2147483647 bytes) Length of Frame. Including header.
- Version: (8) Current version is 0.
- Flags: (8) Depend on Type.
- Type: (16) Indicates the type of header and any format after Frame Length.
Type | Value | Description |
---|---|---|
HDR_TYPE_PAD | 0x0000 | PAD: Padding Frame. |
HDR_TYPE_DATA | 0x0001 | DATA: Data Fragment. |
HDR_TYPE_NAK | 0x0002 | NAK: Request retransmission. |
HDR_TYPE_SM | 0x0003 | SM: Feedback from subscription on window and buffer status. |
HDR_TYPE_ERR | 0x0004 | ERROR: Error. |
HDR_TYPE_SETUP | 0x0005 | SETUP: Setup. |
HDR_TYPE_RTTM | 0x0006 | RTTM: Round Trip Time Measurement. |
HDR_TYPE_RES | 0x0007 | RES: Resolution. |
HDR_TYPE_EXT | 0xFFFF | Extension Header: Used to extend more options as well as extensions (TBD). |
Data flow in Aeron is uni-directional. Bi-directional communication is accomplished by establishing sessions in both directions separately.
The stream setup sequence varies by unicast vs. multicast transmission media and is intimately tied to Status Messages (SM).
-
For unicast
- Receiver listens on a given unicast channel.
- Sender sends a SETUP Frame to the Receiver. And waits for a SM to be sent back. If no response, the Sender then can retransmit the SETUP Frame until it gets a SM back.
- Receiver sees a SETUP Frame and sends unicast SM directly back to Sender with initial buffer status and reception window.
- Sender sees a SM and can commence streaming data and honouring reception window.
- NOTES: NAKs and SMs are always sent unicast back to Publications.
-
For multicast, there is an endpoint for data and an endpoint for control (NAKs and SMs). See UDP Multicast Mode of Operation for how these endpoints map onto UDP multicast addresses.
- Receivers listen on a given multicast endpoint for data and MAY listen on another for control.
- Senders listens on a given multicast endpoint for control and sends periodic DATA Frames to the data endpoint. A Sender MUST NOT send DATA Frames until it receives a SM from at least one Receiver for the Stream ID.
- Receivers that see DATA Frames for Stream IDs of interest to it, send back Status Messages on the control endpoint with the SETUP flag set to elicit a SETUP Frame from the sender. Receivers MUST NOT send an SM with the SETUP flag set more than a few times a second.
- Senders that receive a SM with the SETUP flag set should respond by sending a SETUP Frame to the control endpoint.
- Receiver sees a SETUP Frame and sends and SM via unicast directly back to Sender with initial buffer status and reception window.
-
Notes:
- Multicast setup is in general the same as unicast setup aside from also supporting joining existing data streams.
- NAKs and SMs are always sent to the control endpoint.
- Receivers may listen to the control frames and suppress NAK generation. But this is an implementation choice.
- Senders never need to listen to the data endpoint.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Frame Length (= header length) |
+---------------+---------------+-------------------------------+
| Version | Flags | Type (=0x05) |
+---------------+---------------+-------------------------------+
|R| Term Offset |
+---------------------------------------------------------------+
| Session ID |
+---------------------------------------------------------------+
| Stream ID |
+---------------------------------------------------------------+
| Initial Term ID |
+---------------------------------------------------------------+
| Active Term ID |
+---------------------------------------------------------------+
| Term Length |
+---------------------------------------------------------------+
| MTU |
+---------------------------------------------------------------+
| TTL |
+---------------------------------------------------------------+
- Frame Length: (32) Value is length of SETUP Frame Header.
- Version: (8) Current version is 0.
- Flags: (8) Depend on Type.
- Type: (16) HDR_TYPE_SETUP
- Term Offset: (31) Offset of the first byte of the stream to start reception on.
- Session ID: (32) Session ID.
- Stream ID: (32) Stream ID.
- Initial Term ID: (32) Term ID of first Term within the Stream that has been sent.
- Active Term ID: (32) Term ID of latest Term within the Stream that has been sent.
- Term Length: (32) Length of Term. Must be positive power of 2.
- MTU: (32) Sender MTU length in bytes.
- TTL: (32) Sender Multicast TTL.
Aeron uses the Data Frame to hold all data. The data may represent a single APDU or a fragment.
Fragmentation & reassembly information is carried in each fragment via the B & E flags and the term offset. The B bit indicates this fragment begins an application APDU (or Message). The E bit indicates this fragment ends an APDU. For a self-contained APDU, the B and E bits will always be set together.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Frame Length (=data + header) |
+---------------+-+-+-+---------+-------------------------------+
| Version |B|E|S| Flags | Type (=0x01) |
+---------------+-+-+-+---------+-------------------------------+
|R| Term Offset |
+---------------------------------------------------------------+
| Session ID |
+---------------------------------------------------------------+
| Stream ID |
+---------------------------------------------------------------+
| Term ID |
+---------------------------------------------------------------+
| Reserved Value |
| |
+---------------------------------------------------------------+
| Data ...
... |
+---------------------------------------------------------------+
- Frame Length: (32) Value is length of data + length of Data Frame Header.
- Version: (8) Current version is 0.
-
Flags: (8)
- (B)egin Message: Fragment begins an APDU.
- (E)nd Message: Fragment ends an APDU.
- (S)End of Stream: Fragment ends current stream.
- Type: (16) HDR_TYPE_DATA
- Term Offset: (31) Offset of the first byte of the frame header within the Term.
- Session ID: (32) Session ID.
- Stream ID: (32) Stream that message is for.
- Term ID: (32) Term that message is for within the Stream.
- Reserved Value: (64) Reserved value for application to use.
- Data: (varies) Data Fragment or entire APDU.
Note: Multiple Senders sending redundant data is supported. Can be as simple as having each use the same Session ID, Stream ID, Term ID, and term offset. Then normal duplicate elimination works. TCP multipath capability is very much akin. Thus same mechanism can be used for providing multipath support. This means that applications need to be able to set Session ID, Stream ID, Term ID, and send consistently the same term offset and data length.
Note: The R-bit for Term Offset is to capture the effect that Term Offset must be a positive integer on languages not supporting unsigned 32-bit values naturally (such as Java) without resorting to 64-bit signed values.
Note: 0 Length Data Frame headers are DATA Frames with 0 data bytes. These are to be used for heartbeat messages as well as for initial channel setup.
Note: Aeron aligns frames to a given frame boundary, currently 32 bytes. So, an individual frame might have a frame length value less than the given data transmitted on the wire. An example would be a message of length 19 data bytes would seem to be 64 bytes on the wire, but have a frame length of 51 (32 bytes header + 19 data bytes). The inclusion of this alignment padding on the wire is for efficiency of operation and reduced latency.
Note: The S bit is used to signal the end of a stream and should only be sent in heartbeat messages.
Padding frames are used to pad out the end of a term or to gap fill an unblocked message. A padding frame is the length of a normal data frame header. However, the frame length field value will be the length of the padding to be applied to the log so that only the Data Frame header need be transmitted over the media. A padding frame when transmitted will be the last, or only, frame in a packet.
Data recovery in Aeron is negative acknowledgement (NAK) based. It is the Media Driver's responsibility to request retransmission of missed data.
Note: For background on NAK processing dynamics, please see IETF RFC 5401 - Multicast Negative-Acknowledgement (NACK) Building Blocks. Aeron has adopted and adapted many of these aspects.
Aeron places very little requirements on NAK processing, but the following are general guidelines for how Aeron publications and subscriptions implementations should behave.
- When a Receiver notices missing data, it MUST send a NAK to the publication immediately (in the case of unicast) or after some delay (in the case of multicast).
- When a Sender receives a NAK, it MUST send the indicated Data Frame again immediately if possible. Retransmissions MAY be rate controlled based on the implementation.
- Senders MUST ignore NAKs for a particular Data Frame for a time after sending a retransmission.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Frame Length (=header length) |
+---------------+---------------+-------------------------------+
| Version | Flags | Type (=0x02) |
+---------------+---------------+-------------------------------+
| Session ID |
+---------------------------------------------------------------+
| Stream ID |
+---------------------------------------------------------------+
| Term ID |
+---------------------------------------------------------------+
|R| Term Offset |
+---------------------------------------------------------------+
| Length |
+---------------------------------------------------------------+
- Frame Length: (32) Value is 28.
- Version: (8) Current version is 0.
- Flags: (8) Reserved.
- Type: (16) HDR_TYPE_NAK
- Session ID: (32) Session for retransmission.
- Stream ID: (32) Stream for retransmission.
- Term ID: (32) Term ID for the Term to request retransmission for.
- Term Offset: (31) Term Offset being requested.
- Length: (32) Length of data being requested in bytes.
Aeron's Status Messages are used for both flow and congestion control. These are used to control feedback from subscriptions to publications as well as monitoring and status indicators. Users can utilize different strategies to how Senders send SMs and how the Receiver Window is managed.
Flow control in Aeron is Stream specific. When used over UDP or non-rate controlled media, each Stream has, in effect, a different QoS.
Aeron senders are dumb in that they only send as much as the receivers will allow on each Stream at any one time.
For more information on flow control, please see here.
Central to the design of Aeron's flow and congestion control is the receiver message window. This is the number of bytes that a receiver is willing to immediately receive. A value of 0 means no data. A value of 1000 means 1000 bytes. This number does NOT count any retransmissions. This window is essentially the same flow control window used in TCP and can be managed in a similar manner.
On Aeron stream setup, Receivers send initial SMs to set initial window length. A suggested initial window length is limited to no more than 1/4th the Term length in bytes or at least a single MTU. A media driver MAY be configured to set the initial window to 0 to prevent a publication from sending immediately.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Frame Length (=header + data) |
+---------------+-+-------------+-------------------------------+
| Version |S| Flags | Type (=0x03) |
+---------------+-+-------------+-------------------------------+
| Session ID |
+---------------------------------------------------------------+
| Stream ID |
+---------------------------------------------------------------+
| Consumption Term ID |
+---------------------------------------------------------------+
|R| Consumption Term Offset |
+---------------------------------------------------------------+
| Receiver Window |
+---------------------------------------------------------------+
| Receiver ID |
| |
+---------------------------------------------------------------+
| Receiver Group Tag Length |
+---------------------------------------------------------------+
| Receiver Group Tag (8 bytes is standard) ...
... |
+---------------------------------------------------------------+
- Frame Length: (32) Value is 28 + length of application specific data (if present)
- Version: (8) Current version is 0.
-
Flags: (8) Reserved.
- (S)ETUP: SETUP Flag
- Type: (16) HDR_TYPE_SM
- Session ID: (32) Session that SM pertains to.
- Stream ID: (32) Stream that SM pertains to.
- Consumption Term ID: (32) Term ID of last byte of complete data consumed by subscribers.
- Consumption Term Offset: (31) Term Offset of last byte of complete data consumed by subscribers.
- Receiver Window: (32) Subscription advertised window.
- Receiver ID: (64) ID for this Receiver. Should be as unique as possible across media drivers (UUID). A media driver may use a single ID.
-
Recevier Group Tag Length: (32) Denotes the length of the group tag to follow. This is 8 bytes for
gtag
which replaces the deprecated Application Specific Feedback (ASF) which was typically 4 bytes. - Recevier Group Tag: Used to tag a receiver as being part of a group for the purposes of flow control.
Some transmission media, such as UDP, can have significant transmission delay as well as queuing delay. This delay may change rapidly during operation. Applications as well as some handling of congestion control may desire to measure RTT during operation and adjust behaviour based on how RTT changes. A special RRTM frame is used for measuring RTT during operation.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Frame Length (=header length) |
+---------------+-+-------------+-------------------------------+
| Version |R| Flags | Type (=0x06) |
+---------------+-+-------------+-------------------------------+
| Session ID |
+---------------------------------------------------------------+
| Stream ID |
+---------------------------------------------------------------+
|R| Echo Timestamp |
| |
+---------------------------------------------------------------+
|R| Reception Delta |
| |
+---------------------------------------------------------------+
| Receiver ID |
| |
+---------------------------------------------------------------+
- Frame Length: (32) Value is 40
- Version: (8) Current version is 0.
-
Flags: (8) Reserved.
- (R)eply: Generate reply Flag
- Type: (16) HDR_TYPE_RTTM
- Session ID: (32) Session that RTT Measurement pertains to.
- Stream ID: (32) Stream that RTT Measurement pertains to.
- Echo Timestamp: (64) Timestamp to echo in a reply or the timestamp in the original RTT Measurement.
- Reception Delta: (64) Time in nanoseconds between receiving original RTT Measurement and sending Reply RTT Measurement.
- Receiver ID: (64) ID for this Receiver. Should be as unique as possible across media drivers (UUID). A media driver may use a single ID. May be 0 to signify all receivers.
A Sender may measure RTT to a specific Receiver by sending an RTT Measurement frame to the Receiver via the Data channel with the R flag set and the Receiver ID set to the receiver to reply. The echo timestamp in the RTT Measurement should be the time of sending the RTT Measurement in nanoseconds from a useful epoch.
A Receiver upon receiving an RTT Measurement with the (R) flag set with a valid session Id, stream Id, and receiver Id, should send an RTT Measurement frame without the (R) flag set that echoes the Echo Timestamp field and Receiver ID field. The Reception Delta field should hold the time (in nanoseconds) between original RTT Measurement frame reception and generation and sending of reply RTT Measurement frame or 0 to indicate no significant time has elapsed.
A Receiver may measure RTT to a Sender by sending an RTT Measurement frame to the Sender via the Control channel with the R flag set and the Receiver ID set to the receiver to reply to. The echo timestamp in the RTT Measurement should be the time of sending the RTT Measurement in nanoseconds from a useful epoch.
A Sender upon receiving an RTT Measurement with the (R) flag set with a valid session Id, and stream Id, should send an RTT Measurement frame without the (R) flag set that echoes the Echo Timestamp field and Receiver ID field. The Reception Delta field should hold the time (in nanoseconds) between original RTT Measurement frame reception and generation and sending of reply RTT Measurement frame or 0 to indicate no significant time has elapsed.
To avoid possible reflection attacks, RTT Measurements with the (R) flag set should be ignored for a short time after being processed by a Sender or Receiver.
The RTT Measurement Header above may be used for measuring one way latency. In this case, the Sender bursts out a set of RTT Measurement frames with a new measure in each one. The Receiver ID field is set to 0. A Receiver receiving this set can then estimate the clock drift of the Sender. After this burst, a periodic frame can be sent for measuring one way latency. In all cases, the use of Receiver ID 0 signifies the use of one way latency measurement.
The duration, size, frequency, and repetition of bursts for accurate measurement and tracking of Sender clocks is left up to the implementation. But it should be noted that for initial measurement, data frames should NOT be sent at high rate.
Aeron uses a generic error handling method similar to ICMP errors like Destination Unreachable, etc. The first set of bytes from the offending frame is included in the error message.
The sending of Error Headers is implementation dependent.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Frame Length (varies) |
+---------------+---------------+-------------------------------+
| Version | Error Code | Type (=0x04) |
+---------------+---------------+-------------------------------+
|R| Frame Length of Offending Header (varies) |
+---------------------------------------------------------------+
| Offending Header ...
+---------------------------------------------------------------+
... |
+---------------------------------------------------------------+
| Error String ...
+---------------------------------------------------------------+
... |
+---------------------------------------------------------------+
- Frame Length: (32) Value varies based on length of Offending Header, length of Error String, and length of Error Header itself.
- Version: (8) Current version is 0.
- Error Code: (8) Type of Error. May be specific to the Offending Header contents.
- Type: (16) HDR_TYPE_ERR
- Frame Length of Offending Header: (32) Length of the Offending Header
- Offending Header: (Varies) Frame Header that generated error. Does not include any Data.
- Error String: (Optional, Varies) Human readable string for error. Length determined by Frame Length - Frame Length of Offending Header - Error Header length.
Streams must be reclaimed after a period of inactivity. Heartbeats are DATA Frames sent with no data, but the highest sent Term ID and Term Offset. They keep the Stream alive, but do not contain new data and can leverage the existing duplicate detection logic. When the publisher has closed the stream the (S)EOS flag is set for heartbeats. Heartbeat message:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| Frame Length (=0) |
+---------------+-+-+-+---------+-------------------------------+
| Version |1|1|S|Flags(=0)| Type (=0x01) |
+---------------+-+-+-+---------+-------------------------------+
|R| Term Offset |
+---------------------------------------------------------------+
| Session ID |
+---------------------------------------------------------------+
| Stream ID |
+---------------------------------------------------------------+
| Term ID |
+---------------------------------------------------------------+
- Frame Length: (32) Special value of 0.
- Version: (8) Current version is 0.
- Flags: (8) B=1,E=1,S=(1 or 0) no other flags -> 0x11000000 = 192 or 0x11100000 = 224.
- Type: (16) HDR_TYPE_DATA
- Term Offset: (31) Offset of the first byte of the frame header within the Term.
- Session ID: (32) Session ID.
- Stream ID: (32) Stream that message is for.
- Term ID: (32) Term that message is for within the Stream.
Heartbeats also inform receivers of the highest sent Term ID and Term Offset for determining missing data and thus initiating loss handling. This is particularly important for detecting tail loss.
Heartbeats are sent only in absence of application data to send.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Frame Length (varies) |
+---------------+---------------+-------------------------------+
| Version | Flags | Type (=0x07) |
+---------------+---------------+-------------------------------+
| List of Resolved Entities |
+---------------------------------------------------------------+
- Frame Length: (32) Varies based on number of entities.
- Version: (8) Current version is 0.
- Flags: (8) Reserved.
- Type: (16) HDR_TYPE_RES
Each Frame has a number of RES (Resolutions) of various types. Each entry holds a Name and information about that Name, such as what type it is, how long ago it was seen, the UDP port of the resolver, and the IPv4/IPv6 address for the name. The resolved entities are a list. Each list entry follows the format below.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+-+-------------+-------------------------------+
| RES Type |S| Flags | UDP Port |
+---------------+-+-------------+-------------------------------+
| Age (in msec) since last activity |
+---------------------------------------------------------------+
| |
| IPv4 Address or IPv6 Address |
| (4 bytes or 16 bytes) |
| |
+-------------------------------+-------------------------------+
| Name Length | Name ...
+-------------------------------+-------------------------------+
... Pad to 4 bytes. |
+---------------------------------------------------------------+
- RES Type: (8) Type of Resolved Data for the entry.
-
Flags: (8) Reserved.
- (S)elf: Entry is for sending resolver.
- UDP Port: (16) UDP port for the resolver with the given name.
- Age: (32) age in milliseconds of last activity for the entry.
- Address: (32 or 128) IPv4 or IPv6 (depending on RES Type field) for the entry.
- Name Length: (16) length (in bytes) of the Name field.
- Name: (varies) Name for the entry.
Type | Value | Description |
---|---|---|
Reserved | 0x00 | Reserved. |
NAME_TO_IP4_MD | 0x01 | Name mapping to given IPv4 (4 bytes) and UDP port (2 bytes) for resolver. |
NAME_TO_IP6_MD | 0x02 | Name mapping to given IPv6 (16 bytes) and UDP port (2 bytes) for resolver. |
NAME_TO_IP4_ENDPOINT | 0x03 | Reserved for IPv4 Channel Endpoint resolution. |
NAME_TO_IP6_ENDPOINT | 0x04 | Reserved for IPv6 Channel Endpoint resolution. |
NAME_TO_IP4_STREAM | 0x05 | Reserved for IPv4 Stream resolution. |
NAME_TO_IP6_STREAM | 0x06 | Reserved for IPv6 Stream resolution. |
RESERVED | 0xFF | Reserved for extension. |
Assumptions: (1) UDP, and (2) ignore Ethernet. IP = 20 bytes, UDP = 8 bytes, Aeron = 32 bytes (Data Frame)
- Payload of 256 bytes + 60 bytes overhead = 81.0% efficiency
- Payload of 1024 bytes + 60 bytes overhead = 94.6% efficiency
31-bit unsigned (32-bit signed) data space per Term exhausts every 2 seconds at 1 GBps (8 Gbps) when at a constant rate. Thus Term IDs tick every 2 seconds. With that rate, 31-bit unsigned (32-bit signed) Term ID space is exhausted at the rate of 2386092.9 hours or 272 years.
Aeron functions over UDP in one of three modes. The first is Unicast (or point-to-point) mode. The second is Multicast mode. The third is Multi-Destination-Cast mode. These modes pertain to the uni-directional flow of data. It is quite possible that a combined bi-directional connection could be setup that is mixed mode. Unicast in one direction and Multicast in another.
Aeron over UDP may pack more than 1 Frame into a single datagram as it sees fit and the Frame will fit.
Channel designation for Aeron over UDP is based on the following URI scheme:
aeron:udp?[interface=local-interface[:local-port]|]endpoint=receiver-address:receiver-port[|control=explicit-control-address:control-port]
Receiver address and port means slightly different things based on unicast vs. multicast operation.
Aeron Receivers listen on specific interface (IP address) and UDP port for Data Frames. Receivers send SMs and NAKs unicast back to the publication. Aeron Receivers must send SMs and NAKs back to the IP address of source of the Data Frames.
Aeron Senders send Data Frames to subscription IP address and UDP port. However, Senders may listen for SMs and NAKs on any port, including an ephemeral port.
Example URIs:
-
aeron:udp?endpoint=192.168.0.3:4050
: Receiver binds to192.168.0.3
on port 4050. Sender sends data to192.168.0.3
port 4050.
Aeron senders and receivers send to and listen on specific interface (IP address), specific IP multicast address/group, and destination UDP port for the data endpoint. Traffic on this endpoint is Data Frames. The IP multicast address must be odd.
Aeron senders and receivers send to and listen on specific interface (IP address), specific IP multicast address/group, and destination UDP port for the control endpoint. Traffic on this endpoint is Status Messages and NAKs. The IP multicast address must be the next even address bigger than the data endpoint multicast address.
An example of the relationship between data and control multicast addresses would be, data on 224.10.9.7
and control on 224.10.9.8
.
Example URIs:
-
aeron:udp?endpoint=224.10.9.7:4050
: Receivers join IP multicast address224.10.9.7
on destination port 4050 for data. And will send SMs and NAKs to IP multicast address224.10.9.8
destination port 4050. Sender joins IP multicast address224.10.9.8
on destination port 4050 for SMs and NAKs. And will send data to224.10.9.7
destination port 4050.
In Multi-Destination-Cast (MDC) mode, a sender sends explicitly to a list of destinations and manages them as it would a multicast group, but uses unicast UDP instead of multicast addressing. This list of destinations may be manually controlled by the publication adding and removing destinations. Or it could be dynamic where destinations can add themselves and have themselves removed due to inactivity.
Aeron Receivers listen on specific interface (IP address) and UDP port for Data Frames. Receivers send SMs and NAKs unicast back to the publication explicit control port. Aeron Receivers must send SMs and NAKs back to the IP address of the source of the Data Frames.
Aeron Senders send Data Frames to specific added destination IP address and UDP port. However, Senders may listen for SMs and NAKs on the explicit control port.
Example URIs:
-
aeron:udp?control=192.168.0.3:4050|control-mode=manual
: Sender binds to192.168.0.3
on port 4050 for control frames (SMs and NAKs). -
aeron:udp?endpoint=192.168.0.4:4051
: Receiver binds to192.168.0.4
on port 4051. -
aeron:udp?endpoint=192.168.0.5:4052
: Receiver binds to192.168.0.5
on port 4052. - Publication API adds
aeron:udp?endpoint=192.168.0.4:4051
: Sender sends data to 192.168.0.4 port 4051. - Publication API adds
aeron:udp?endpoint=192.168.0.5:4052
: Sender sends data to 192.168.0.4 port 4051 as well as 192.168.0.5 port 4052.
Aeron Receivers listen on specific interface (IP address) and UDP port for Data Frames. Receivers send SMs and NAKs unicast back to the sender explicit control port. Aeron Receivers must send SMs and NAKs back to the IP address of the publication explicit control.
Aeron Receivers MUST periodically send specially crafted SMs to the control IP and port of the publication in the absence of SMs being generated by data traffic. These specific SMs MUST have session Id value of 0, stream Id value of 0, and the Elicit SETUP flag set.
Aeron Senders send DATA Frames to IP address and UDP ports that send SMs eliciting SETUP frames. This is a destination. A destination is to be considered active as long as SMs are seen from the destination IP address and UDP port. If a timeout period elapses without receiving an SM, then that destination can be removed. Senders MUST listen for SMs and NAKs on the explicit control port.
Example URIs:
-
aeron:udp?control=192.168.0.3:4050
: Sender binds to192.168.0.3
on port 4050 for control frames (SMs and NAKs). -
aeron:udp?endpoint=192.168.0.4:4051|control=192.168.0.3:4050
: Receiver binds to192.168.0.4
on port 4051. Publication sends data to192.168.0.4
port 4051 -
aeron:udp?endpoint=192.168.0.5:4052|control=192.168.0.3:4050
: Receiver binds to192.168.0.5
on port 4052. Publication sends data to192.168.0.5
port 4052
Aeron uses shared memory for passing messages from publisher to the Sender for transfer over the network media to other machines. Messages are passed via the log buffers. This same mechanism is used for exchanging messages between publishers and subscribers on the same machine. The protocol of exchange is by data frames written to the log. There is no need for control messages over SHM media.
To avoid the use of locks when writing message frames into shared memory then a protocol must be employed so that reader can know when the operation is complete and and how to recover when a client crashes during the process. This is achieved by frame being written in the following order.
- Write the length of the frame header atomically with a negative value so that space can be padded later on failure. Readers check for a value greater than zero for if the frame is ready to read.
- Insert a memory ordering store fence, aka release fence, appropriate for the programming language.
- Write the remainder of the header fields.
- Copy in the frame body.
- Insert a memory ordering store fence, aka release fence, appropriate for the programming language.
- Atomically write the length field with its positive value so readers know the frame is available.
Note: The Sender reads the frames created by Publishers using the same mechanism as Subscribers do via IPC. The same also applies with the Receiver rebuilding an image of a Publication so Subscribers can consume it with the same semantics. This approach allows for a lock-less implementation.