Merge pull request #8390 from ofiwg/pr/update-nroff-generated-man-pag…

…es-main Update nroff-generated man pages
ofiwg · Jan 2, 2023 · 1eb9905 · 1eb9905
2 parents 386ffd2 + fddfb73
commit 1eb9905
Show file tree

Hide file tree

Showing 10 changed files with 3,245 additions and 254 deletions.
diff --git a/man/man3/fi_cm.3 b/man/man3/fi_cm.3
@@ -1,6 +1,6 @@
 .\" Automatically generated by Pandoc 2.9.2.1
 .\"
-.TH "fi_cm" "3" "2022\-12\-09" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
+.TH "fi_cm" "3" "2023\-01\-02" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
 .hy
 .SH NAME
 .PP
@@ -86,7 +86,7 @@ User context associated with the request.
 .SH DESCRIPTION
 .PP
 Connection management functions are used to connect an
-connection-oriented endpoint to a peer endpoint.
+connection-oriented (FI_EP_MSG) endpoint to a listening peer.
 .SS fi_listen
 .PP
 The fi_listen call indicates that the specified endpoint should be

diff --git a/man/man3/fi_collective.3 b/man/man3/fi_collective.3
@@ -1,6 +1,6 @@
 .\" Automatically generated by Pandoc 2.9.2.1
 .\"
-.TH "fi_collective" "3" "2022\-12\-09" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
+.TH "fi_collective" "3" "2023\-01\-02" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
 .hy
 .SH NAME
 .TP
@@ -144,13 +144,7 @@ User specified pointer to associate with the operation.
 This parameter is ignored if the operation will not generate a
 successful completion, unless an op flag specifies the context parameter
 be used for required input.
-.SH DESCRIPTION (EXPERIMENTAL APIs)
-.PP
-The collective APIs are new to the 1.9 libfabric release.
-Although, efforts have been made to design the APIs such that they align
-well with applications and are implementable by the providers, the APIs
-should be considered experimental and may be subject to change in future
-versions of the library until the experimental tag has been removed.
+.SH DESCRIPTION
 .PP
 In general collective operations can be thought of as coordinated atomic
 operations between a set of peer endpoints.

diff --git a/man/man3/fi_getinfo.3 b/man/man3/fi_getinfo.3
@@ -1,6 +1,6 @@
 .\" Automatically generated by Pandoc 2.9.2.1
 .\"
-.TH "fi_getinfo" "3" "2022\-12\-09" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
+.TH "fi_getinfo" "3" "2023\-01\-02" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
 .hy
 .SH NAME
 .PP
@@ -46,8 +46,18 @@ A pointer to a linked list of fi_info structures containing response
 information.
 .SH DESCRIPTION
 .PP
+The fi_getinfo() call is used to discover what communication features
+are available in the system, as well as how they might best be used by
+an application.
+The call is loosely modeled on getaddrinfo().
+fi_getinfo() permits an application to exchange information between an
+application and the libfabric providers regarding its required set of
+communication.
+It provides the ability to access complex network details, balanced
+between being expressive but also simple to use.
+.PP
 fi_getinfo returns information about available fabric services for
-reaching specified node or service, subject to any provided hints.
+reaching a specified node or service, subject to any provided hints.
 Callers may specify NULL for node, service, and hints in order to
 retrieve information about what providers are available and their
 optimal usage models.

diff --git a/man/man3/fi_tagged.3 b/man/man3/fi_tagged.3
@@ -1,6 +1,6 @@
 .\" Automatically generated by Pandoc 2.9.2.1
 .\"
-.TH "fi_tagged" "3" "2022\-12\-09" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
+.TH "fi_tagged" "3" "2023\-01\-02" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
 .hy
 .SH NAME
 .PP
@@ -112,6 +112,10 @@ This can be stated as:
 .nf
 \f[C]
 send_tag & \[ti]ignore == recv_tag & \[ti]ignore
+
+or
+
+send_tag | ignore == recv_tag | ignore
 \f[R]
 .fi
 .PP

diff --git a/man/man7/fabric.7 b/man/man7/fabric.7
@@ -1,6 +1,6 @@
 .\" Automatically generated by Pandoc 2.9.2.1
 .\"
-.TH "fabric" "7" "2022\-12\-09" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
+.TH "fabric" "7" "2023\-01\-02" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
 .hy
 .SH NAME
 .PP
@@ -15,6 +15,8 @@ fabric - Fabric Interface Library
 .PP
 Libfabric is a high-performance fabric software library designed to
 provide low-latency interfaces to fabric hardware.
+For an in-depth discussion of the motivation and design see
+\f[C]fi_guide\f[R](7).
 .SH OVERVIEW
 .PP
 Libfabric provides `process direct I/O' to application software

diff --git a/man/man7/fi_arch.7 b/man/man7/fi_arch.7
@@ -1 +1,271 @@
-update me
+.\" Automatically generated by Pandoc 2.9.2.1
+.\"
+.TH "fi_arch" "7" "2023\-01\-02" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
+.hy
+.IP
+.nf
+\f[C]
+     /
+    / (9 CONNECTED)
+\f[R]
+.fi
+.PP
+/Event
+.PD 0
+.P
+.PD
+/
+.IP
+.nf
+\f[C]
+Connections require the use of both passive and active endpoints.
+In order to establish a connection, an application must first create a
+passive endpoint and associate it with an event queue. The event queue
+will be used to report the connection management events. The application
+then calls listen on the passive endpoint. A single passive endpoint can
+be used to form multiple connections.
+
+The connecting peer allocates an active endpoint, which is also
+associated with an event queue. Connect is called on the active
+endpoint, which results in sending a connection request (CONNREQ)
+message to the passive endpoint. The CONNREQ event is inserted into
+the passive endpoint\[cq]s event queue, where the listening application can
+process it.
+
+Upon processing the CONNREQ, the listening application will allocate
+an active endpoint to use with the connection. The active endpoint is
+bound with an event queue. Although the diagram shows the use of a
+separate event queue, the active endpoint may use the same event queue
+as used by the passive endpoint. Accept is called on the active endpoint
+to finish forming the connection. It should be noted that the OFI accept
+call is different than the accept call used by sockets. The differences
+result from OFI supporting process direct I/O.
+
+libfabric does not define the connection establishment protocol, but
+does support a traditional three-way handshake used by many technologies.
+After calling accept, a response is sent to the connecting active endpoint.
+That response generates a CONNECTED event on the remote event queue. If a
+three-way handshake is used, the remote endpoint will generate an
+acknowledgment message that will generate a CONNECTED event for the accepting
+endpoint. Regardless of the connection protocol, both the active and passive
+sides of the connection will receive a CONNECTED event that signals that the
+connection has been established.
+
+## Connectionless Communications
+
+Connectionless communication allows data transfers between active endpoints
+without going through a connection setup process. The diagram below shows
+the basic components needed to setup connection-less communication.
+Connectionless communication setup differs from UDP sockets in that it
+requires that the remote addresses be stored with libfabric.
+\f[R]
+.fi
+.PP
+1 insert_addr() 2 send() | | /Address\ <\[en]3 lookup\[en]> / Active
+.PD 0
+.P
+.PD
+/ /
+.IP
+.nf
+\f[C]
+libfabric requires the addresses of peer endpoints be inserted into a local
+addressing table, or address vector, before data transfers can be initiated
+against the remote endpoint. Address vectors abstract fabric specific
+addressing requirements and avoid long queuing delays on data transfers
+when address resolution is needed. For example, IP addresses may need to be
+resolved into Ethernet MAC addresses. Address vectors allow this resolution
+to occur during application initialization time. libfabric does not define
+how an address vector be implemented, only its conceptual model.
+
+All connection-less endpoints that transfer data must be associated with an
+address vector.
+
+# Endpoints
+
+At a low-level, endpoints are usually associated with a transmit context, or
+queue, and a receive context, or queue.  Although the terms transmit and
+receive queues are easier to understand, libfabric uses the terminology
+context, since queue like behavior of acting as a FIFO (first-in, first-out)
+is not guaranteed.  Transmit and receive contexts may be implemented using
+hardware queues mapped directly into the process\[cq]s address space.  An endpoint
+may be configured only to transmit or receive data.  Data transfer requests
+are converted by the underlying provider into commands that are inserted into
+hardware transmit and/or receive contexts.
+
+Endpoints are also associated with completion queues. Completion queues are
+used to report the completion of asynchronous data transfer operations.
+
+## Shared Contexts
+
+An advanced usage model allows for sharing resources among multiple endpoints.
+The most common form of sharing is having multiple connected endpoints
+make use of a single receive context.  This can reduce receive side buffering
+requirements, allowing the number of connected endpoints that an application
+can manage to scale to larger numbers.
+
+# Data Transfers
+
+Obviously, a primary goal of network communication is to transfer data between
+processes running on different systems. In a similar way that the socket API
+defines different data transfer semantics for TCP versus UDP sockets, that is,
+streaming versus datagram messages, libfabric defines different types of data
+transfers. However, unlike sockets, libfabric allows different semantics over
+a single endpoint, even when communicating with the same peer.
+
+libfabric uses separate API sets for the different data transfer semantics;
+although, there are strong similarities between the API sets.  The differences
+are the result of the parameters needed to invoke each type of data transfer.
+
+## Message transfers
+
+Message transfers are most similar to UDP datagram transfers, except that
+transfers may be sent and received reliably.  Message transfers may also be
+gigabytes in size, depending on the provider implementation.  The sender
+requests that data be transferred as a single transport operation to a peer.
+Even if the data is referenced using an I/O vector, it is treated as a single
+logical unit or message.  The data is placed into a waiting receive buffer
+at the peer, with the receive buffer usually chosen using FIFO ordering.
+Note that even though receive buffers are selected using FIFO ordering, the
+received messages may complete out of order.  This can occur as a result of
+data between and within messages taking different paths through the network,
+handling lost or retransmitted packets, etc.
+
+Message transfers are usually invoked using API calls that contain the string
+\[dq]send\[dq] or \[dq]recv\[dq].  As a result they may be referred to simply as sent or
+received messages.
+
+Message transfers involve the target process posting memory buffers to the
+receive (Rx) context of its endpoint.  When a message arrives from the network,
+a receive buffer is removed from the Rx context, and the data is copied from
+the network into the receive buffer.  Messages are matched with posted receives
+in the order that they are received.  Note that this may differ from the order
+that messages are sent, depending on the transmit side\[aq]s ordering semantics.
+
+Conceptually, on the transmit side, messages are posted to a transmit (Tx)
+context.  The network processes messages from the Tx context, packetizing
+the data into outbound messages.  Although many implementations process the
+Tx context in order (i.e. the Tx context is a true queue), ordering guarantees
+specified through the libfabric API determine the actual processing order.  As
+a general rule, the more relaxed an application is on its message and data
+ordering, the more optimizations the networking software and hardware can
+leverage, providing better performance.
+
+## Tagged messages
+
+Tagged messages are similar to message transfers except that the messages
+carry one additional piece of information, a message tag.  Tags are application
+defined values that are part of the message transfer protocol and are used to
+route packets at the receiver.  At a high level, they are roughly similar to
+message ids.  The difference is that tag values are set by the application,
+may be any value, and duplicate tag values are allowed.
+
+Each sent message carries a single tag value, which is used to select a receive
+buffer into which the data is copied.  On the receiving side, message buffers
+are also marked with a tag.  Messages that arrive from the network search
+through the posted receive messages until a matching tag is found.
+
+Tags are often used to identify virtual communication groups or roles.
+In practice, message tags are typically divided into fields.  For example, the
+upper 16 bits of the tag may indicate a virtual group, with the lower 16 bits
+identifying the message purpose.  The tag message interface in libfabric is
+designed around this usage model.  Each sent message carries exactly one tag
+value, specified through the API.  At the receiver, buffers are associated
+with both a tag value and a mask.  The mask is used as part of the buffer
+matching process.  The mask is applied against the received tag value carried
+in the sent message prior to checking the tag against the receive buffer.  For
+example, the mask may indicate to ignore the lower 16-bits of a tag.  If
+the resulting values match, then the tags are said to match.  The received
+data is then placed into the matched buffer.
+
+For performance reasons, the mask is specified as \[aq]ignore\[aq] bits. Although
+this is backwards from how many developers think of a mask (where the bits
+that are valid would be set to 1), the definition ends up mapping well with
+applications.  The actual operation performed when matching tags is:
+\f[R]
+.fi
+.PP
+send_tag | ignore == recv_tag | ignore
+.PP
+/* this is equivalent to: * send_tag & \[ti]ignore == recv_tag &
+\[ti]ignore */ \[ga]\[ga]\[ga]
+.PP
+Tagged messages are equivalent of message transfers if a single tag
+value is used.
+But tagged messages require that the receiver perform a matching
+operation at the target, which can impact performance versus untagged
+messages.
+.SS RMA
+.PP
+RMA operations are architected such that they can require no processing
+by the CPU at the RMA target.
+NICs which offload transport functionality can perform RMA operations
+without impacting host processing.
+RMA write operations transmit data from the initiator to the target.
+The memory location where the data should be written is carried within
+the transport message itself, with verification checks at the target to
+prevent invalid access.
+.PP
+RMA read operations fetch data from the target system and transfer it
+back to the initiator of the request, where it is placed into memory.
+This too can be done without involving the host processor at the target
+system when the NIC supports transport offloading.
+.PP
+The advantage of RMA operations is that they decouple the processing of
+the peers.
+Data can be placed or fetched whenever the initiator is ready without
+necessarily impacting the peer process.
+.PP
+Because RMA operations allow a peer to directly access the memory of a
+process, additional protection mechanisms are used to prevent
+unintentional or unwanted access.
+RMA memory that is updated by a write operation or is fetched by a read
+operation must be registered for access with the correct permissions
+specified.
+.SS Atomic operations
+.PP
+Atomic transfers are used to read and update data located in remote
+memory regions in an atomic fashion.
+Conceptually, they are similar to local atomic operations of a similar
+nature (e.g.\ atomic increment, compare and swap, etc.).
+The benefit of atomic operations is they enable offloading basic
+arithmetic capabilities onto a NIC.
+Unlike other data transfer operations, which merely need to transfer
+bytes of data, atomics require knowledge of the format of the data being
+accessed.
+.PP
+A single atomic function operates across an array of data, applying an
+atomic operation to each entry.
+The atomicity of an operation is limited to a single data type or entry,
+however, not across the entire array.
+libfabric defines a wide variety of atomic operations across all common
+data types.
+However support for a given operation is dependent on the provider
+implementation.
+.SS Collective operations
+.PP
+In general, collective operations can be thought of as coordinated
+atomic operations between a set of peer endpoints, almost like a
+multicast atomic request.
+A single collective operation can result in data being collected from
+multiple peers, combined using a set of atomic primitives, and the
+results distributed to all peers.
+A collective operation is a group communication exchange.
+It involves multiple peers exchanging data with other peers
+participating in the collective call.
+Collective operations require close coordination by all participating
+members, and collective calls can strain the fabric, as well as local
+and remote data buffers.
+.PP
+Collective operations are an area of heavy research, with dedicated
+libraries focused almost exclusively on implementing collective
+operations efficiently.
+Such libraries are a specific target of libfabric.
+The main object of the libfabric collection APIs is to expose network
+acceleration features for implementing collectives to higher-level
+libraries and applications.
+It is recommended that applications needing collective communication
+target higher-level libraries, such as MPI, instead of using libfabric
+collective APIs for that purpose.
+.SH AUTHORS
+OpenFabrics.