Table of Contents
The Tracing API consist of these main classes:
TracerProvider
is the entry point of the API. It provides access toTracer
s.Tracer
is the class responsible for creatingSpan
s.Span
is the API to trace an operation.
While languages and platforms have different ways of representing data, this section defines some generic requirements for this API.
OpenTelemetry can operate on time values up to nanosecond (ns) precision. The representation of those values is language specific.
A timestamp is the time elapsed since the Unix epoch.
- The minimal precision is milliseconds.
- The maximal precision is nanoseconds.
A duration is the elapsed time between two events.
- The minimal precision is milliseconds.
- The maximal precision is nanoseconds.
Tracer
s can be accessed with a TracerProvider
.
In implementations of the API, the TracerProvider
is expected to be the
stateful object that holds any configuration.
Normally, the TracerProvider
is expected to be accessed from a central place.
Thus, the API SHOULD provide a way to set/register and access
a global default TracerProvider
.
Notwithstanding any global TracerProvider
, some applications may want to or
have to use multiple TracerProvider
instances,
e.g. to have different configuration (like SpanProcessor
s) for each
(and consequently for the Tracer
s obtained from them),
or because its easier with dependency injection frameworks.
Thus, implementations of TracerProvider
SHOULD allow creating an arbitrary
number of TracerProvider
instances.
The TracerProvider
MUST provide functions to:
- Get a
Tracer
That API MUST accept the following parameters:
name
(required): This name must identify the instrumentation library (e.g.io.opentelemetry.contrib.mongodb
) and not the instrumented library. In case an invalid name (null or empty string) is specified, a working default Tracer implementation as a fallback is returned rather than returning null or throwing an exception. A library, implementing the OpenTelemetry API may also ignore this name and return a default instance for all calls, if it does not support "named" functionality (e.g. an implementation which is not even observability-related). A TracerProvider could also return a no-op Tracer here if application owners configure the SDK to suppress telemetry produced by this library.version
(optional): Specifies the version of the instrumentation library (e.g.1.0.0
).
It is unspecified whether or under which conditions the same or different
Tracer
instances are returned from this functions.
Implementations MUST NOT require users to repeatedly obtain a Tracer
again
with the same name+version to pick up configuration changes.
This can be achieved either by allowing to work with an outdated configuration or
by ensuring that new configuration applies also to previously returned Tracer
s.
Note: This could, for example, be implemented by storing any mutable
configuration in the TracerProvider
and having Tracer
implementation objects
have a reference to the TracerProvider
from which they were obtained.
If configuration must be stored per-tracer (such as disabling a certain tracer),
the tracer could, for example, do a look-up with its name+version in a map in
the TracerProvider
, or the TracerProvider
could maintain a registry of all
returned Tracer
s and actively update their configuration if it changes.
Tracing Context Utilities
contains all operations within tracing that
modify the Context
.
As these utilities operate solely on the context API, they MAY be exposed as static methods on the trace module instead of a class.
The Tracing Context Utilities
MUST provide the following functions:
- Get the currently active span
- Set the currently active span
The above methods MUST be equivalent to a single parameterized method call of
the Context
management system.
The tracer is responsible for creating Span
s.
Note that Tracers
should usually not be responsible for configuration.
This should be the responsibility of the TracerProvider
instead.
The Tracer
MUST provide functions to:
- Create a new
Span
(see the section onSpan
)
The Tracer
MAY provide functions to:
- Get the currently active span
- Set the currently active span
These functions MUST delegate to the Tracing Context Utilities
.
A Span
represents a single operation within a trace. Spans can be nested to
form a trace tree. Each trace contains a root span, which typically describes
the entire operation and, optionally, one or more sub-spans for its sub-operations.
- TraceId
- SpanId
- TraceState
- TraceFlags
- The span name
- A parent span in the form of a
Span
that is stored in a Context or null - A
SpanKind
- A start timestamp
- An end timestamp
Attributes
- A list of
Link
s to otherSpan
s - A list of timestamped
Event
s - A
Status
.
The TraceId, SpanId, TraceState, and TraceFlags are immutable and set on creation.
The span name concisely identifies the work represented by the Span, for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation. The span name SHOULD be the most general string that identifies a (statistically) interesting class of Spans, rather than individual Span instances while still being human-readable. That is, "get_user" is a reasonable name, while "get_user/314159", where "314159" is a user ID, is not a good name due to its high cardinality. Generality SHOULD be prioritized over human-readability.
For example, here are potential span names for an endpoint that gets a hypothetical account information:
Span Name | Guidance |
---|---|
get |
Too general |
get_account/42 |
Too specific |
get_account |
Good, and account_id=42 would make a nice Span attribute |
get_account/{accountId} |
Also good (using the "HTTP route") |
The Span
's start and end timestamps reflect the elapsed real time of the
operation.
For example, if a span represents a request-response cycle (e.g. HTTP or an RPC), the span should have a start time that corresponds to the start time of the first sub-operation, and an end time of when the final sub-operation is complete. This includes:
- receiving the data from the request
- parsing of the data (e.g. from a binary or json format)
- any middleware or additional processing logic
- business logic
- construction of the response
- sending of the response
Child spans (or in some cases events) may be created to represent sub-operations which require more detailed observability. Child spans should measure the timing of the respective sub-operation, and may add additional attributes.
A Span
's start time SHOULD be set to the current time on span
creation. After the Span
is created, it SHOULD be possible to
change its name, set its Attribute
s, and add Link
s and Event
s. These
MUST NOT be changed after the Span
's end time has been set.
Span
s are not meant to be used to propagate information within a process. To
prevent misuse, implementations SHOULD NOT provide access to a Span
's
attributes besides its identifiers.
Vendors may implement the Span
interface to effect vendor-specific logic.
However, alternative implementations MUST NOT allow callers to create Span
s
directly. All Span
s MUST be created via a Tracer
.
A Span
has a two-part identifier associated with it, consisting of:
-
TraceId
A valid trace identifier is a 16-byte array with at least one non-zero byte. -
SpanId
A valid span identifier is an 8-byte array with at least one non-zero byte.
The TraceId is a globally unique identifier for the entire trace, while a SpanId is
a unique identifier within a trace. The concatenation can be considered the global
identifier of an individual Span
.
The API MUST allow retrieving the TraceId
and SpanId
from a span in the following forms:
- Hex - returns the lowercase hex encoded
TraceId
(result MUST be a 32-hex-character lowercase string) orSpanId
(result MUST be a 16-hex-character lowercase string). - Binary - returns the binary representation of the
TraceId
(result MUST be a 16-byte array)SpanId
(result MUST be a 8-byte array).
The API should not expose details about how they are internally stored.
TraceFlags
contain some additional propagated information about the span. Unlike TraceState values,
TraceFlags are present in all spans. The current version of the specification
only supports a single flag called sampled.
TraceState
carries vendor-specific propagated data, represented as a list
of key-value pairs. TraceState allows multiple tracing
systems to participate in the same trace. It is fully described in the W3C Trace Context
specification.
TraceState
is represented by an immutable list of string key/value pairs and
formally defined by the W3C Trace Context specification.
Tracing API MUST provide at least the following operations on TraceState
:
- Return an empty
TraceState
- Get value for a given key
- Create a copy with an added key/value pair
- Create a copy with an updated value for an existing key
- Create a copy with a removed key/value pair
These operations MUST follow the rules described in the W3C Trace Context specification.
TraceState
MUST at all times be valid according to rules specified in W3C Trace Context specification.
Every mutating operations MUST validate input parameters.
If invalid value is passed the operation MUST NOT return TraceState
containing invalid data
and MUST follow the general error handling guidelines (e.g. it usually must not return null or throw an exception).
Please note, since TraceState
is fixed during span creation, it is not possible to update a span with a new TraceState
.
Such changes then make sense only right before
Context
propagation
or telemetry data exporting.
In both cases, Propagators
and SpanExporters
may create a modified TraceState
copy before serializing it to the wire.
There MUST NOT be any API for creating a Span
other than with a Tracer
.
Span
creation MUST NOT set the newly created Span
as the currently
active Span
by default, but this functionality MAY be offered additionally
as a separate operation.
The API MUST accept the following parameters:
-
The span name. This is a required parameter.
-
The parent
Context
or an indication that the newSpan
should be a rootSpan
. The API MAY also have an option for implicitly using the current Context as parent as a default behavior. This API MUST NOT accept aSpan
as parent, only a fullContext
.The semantic parent of the Span MUST be determined according to the rules described in Determining the Parent Span from a Context.
-
SpanKind
, default toSpanKind.Internal
if not specified. -
Attributes
. Additionally, these attributes may be used to make a sampling decision as noted in sampling description. An empty collection will be assumed if not specified.Whenever possible, users SHOULD set any already known attributes at span creation instead of calling
SetAttribute
later. -
Link
s - see API definition here. Empty list will be assumed if not specified. -
Start timestamp
, default to current time. This argument SHOULD only be set when span creation time has already passed. If API is called at a moment of a Span logical start, API user MUST not explicitly set this argument.
Each span has zero or one parent span and zero or more child spans, which
represent causally related operations. A tree of related spans comprises a
trace. A span is said to be a root span if it does not have a parent. Each
trace includes a single root span, which is the shared ancestor of all other
spans in the trace. Implementations MUST provide an option to create a Span
as
a root span, and MUST generate a new TraceId
for each root span created.
For a Span with a parent, the TraceId
MUST be the same as the parent.
Also, the child span MUST inherit all TraceState
values of its parent by default.
A Span
is said to have a remote parent if it is the child of a Span
created in another process. Each propagators' deserialization must set
IsRemote
to true on a parent Span
so Span
creation knows if the
parent is remote.
When a new Span
is created from a Context
, the Context
may contain a Span
representing the currently active instance, and will be used as parent.
This may be a Propagated Span added by a Propagator
.
If there is no Span
in the Context
, the newly created Span
will be a root span.
During the Span
creation user MUST have the ability to record links to other Span
s.
Linked Span
s can be from the same or a different trace. See Links
description.
Link
s cannot be added after Span creation.
A Link
is defined by the following properties:
- (Required) the
Span
to link to, or aContext
containing theSpan
to link to, or theTraceId
,SpanId
,TraceFlags
, andTraceState
of theSpan
to link to - (Optional) One or more
Attribute
s as defined here.
The Link
SHOULD be an immutable type.
The Span creation API MUST provide:
- An API to record a single
Link
where theLink
properties are passed as arguments. This MAY be calledAddLink
.
Links SHOULD preserve the order in which they're set.
With the exception of the accessors and recording status, none of the below may be called after
the Span
is finished.
A Span
must allow retrieving TraceId
and SpanId
, as described in
Retrieving the TraceID and SpanID and provide simple accessors for
TraceFlags
and TraceState
.
An API called IsValid
on a Span
, that returns a boolean value, which is true
if the Span has a
non-zero TraceID and a non-zero SpanID, MUST be provided.
An API called IsRemote
on a Span
, that returns a boolean value, which is true
if the Span was
propagated from a remote parent, MUST be provided.
IsRemote
must return true
unless the SpanId
for the Span
was generated by this API implementation.
When IsRemote
is true
, IsRecording
is always false
.
Returns true if this Span
is recording information like events with the
AddEvent
operation, attributes using SetAttributes
, status with SetStatus
,
etc.
There should be no parameter.
This flag SHOULD be used to avoid expensive computations of a Span attributes or
events in case when a Span is definitely not recorded. Note that any child
span's recording is determined independently from the value of this flag
(typically based on the sampled
flag in the TraceFlags
of the Span).
This flag may be true
despite the entire trace being sampled out. This
allows to record and process information about the individual Span without
sending it to the backend. An example of this scenario may be recording and
processing of all incoming requests for the processing and building of
SLA/SLO latency charts while sending only a subset - sampled spans - to the
backend. See also the sampling section of SDK design.
Users of the API should only access the IsRecording
property when
instrumenting code and never access SampledFlag
unless used in context
propagators.
A Span
MUST have the ability to set Attributes
associated with it.
The Span interface MUST provide:
- An API to set a single
Attribute
where the attribute properties are passed as arguments. This MAY be calledSetAttribute
. To avoid extra allocations some implementations may offer a separate API for each of the possible value types.
Setting an attribute with the same key as an existing attribute SHOULD overwrite the existing attribute's value.
Note that the OpenTelemetry project documents certain "standard attributes" that have prescribed semantic meanings.
Note that Samplers can only consider information already present during span creation. Any changes done later, including new or changed attributes, cannot change their decisions.
A Span
MUST have the ability to add events. Events have a time associated
with the moment when they are added to the Span
.
An Event
is defined by the following properties:
- Name of the event.
- A timestamp for the event. Either the time at which the event was added or a custom timestamp provided by the user.
Attributes
further describing the event.
The Event
SHOULD be an immutable type.
The Span interface MUST provide:
- An API to record a single
Event
where theEvent
properties are passed as arguments. This MAY be calledAddEvent
. This API takes the name of the event, optionalAttributes
and an optionalTimestamp
which can be used to specify the time at which the event occurred. If no custom timestamp is provided by the user, the implementation automatically sets the time at which this API is called on the event.
Events SHOULD preserve the order in which they are recorded. This will typically match the ordering of the events' timestamps, but events may be recorded out-of-order using custom timestamps.
Consumers should be aware that an event's timestamp might be before the start or after the end of the span if custom timestamps were provided by the user for the event or when starting or ending the span. The specification does not require any normalization if provided timestamps are out of range.
Note that the OpenTelemetry project documents certain "standard event names and keys" which have prescribed semantic meanings.
Note that RecordException
is a specialized variant of
AddEvent
for recording exception events.
Sets the Status
of the Span
. If used, this will override the
default Span
status, which is OK
.
Only the value of the last call will be recorded, and implementations are free to ignore previous calls.
The Span interface MUST provide:
- An API to set the
Status
where the new status is the only argument. This SHOULD be calledSetStatus
.
Updates the Span
name. Upon this update, any sampling behavior based on Span
name will depend on the implementation.
Note that Samplers can only consider information already present during span creation. Any changes done later, including updated span name, cannot change their decisions.
Alternatives for the name update may be late Span
creation, when Span is
started with the explicit timestamp from the past at the moment where the final
Span
name is known, or reporting a Span
with the desired name as a child
Span
.
Required parameters:
- The new span name, which supersedes whatever was passed in when the
Span
was started
Finish the Span
. This call will take the current timestamp to set as Span
's
end time. Implementations MUST ignore all subsequent calls to End
(there might
be exceptions when Tracer is streaming event and has no mutable state associated
with the Span
).
Call to End
of a Span
MUST not have any effects on child spans. Those may
still be running and can be ended later.
Parameters:
- (Optional) Timestamp to explicitly set the end timestamp
This API MUST be non-blocking.
To facilitate recording an exception languages SHOULD provide a
RecordException
method if the language uses exceptions.
This is a specialized variant of AddEvent
,
so for anything not specified here, the same requirements as for AddEvent
apply.
The signature of the method is to be determined by each language
and can be overloaded as appropriate.
The method MUST record an exception as an Event
with the conventions outlined in
the exception semantic conventions document.
The minimum required argument SHOULD be no more than only an exception object.
If RecordException
is provided, the method MUST accept an optional parameter
to provide any additional event attributes
(this SHOULD be done in the same way as for the AddEvent
method).
If attributes with the same name would be generated by the method already,
the additional attributes take precedence.
Note: RecordException
may be seen as a variant of AddEvent
with
additional exception-specific parameters and all other parameters being optional
(because they have defaults from the exception semantic convention).
Span lifetime represents the process of recording the start and the end timestamps to the Span object:
- The start time is recorded when the Span is created.
- The end time needs to be recorded when the operation is ended.
Start and end time as well as Event's timestamps MUST be recorded at a time of a calling of corresponding API.
The API MUST provide an operation for creating an object implementing the Span
interface
provided TraceId
, SpanId
, TraceFlags
, TraceState
. This is done in order to expose it in operations such
as in-process Span
propagation.
If a new type is required for supporting this operation, it SHOULD be named PropagatedSpan
.
The behavior is defined as follows:
IsRecording
MUST returnfalse
to signal that events, attributes and other elements are not being recorded, i.e. they are being dropped.IsRemote
MUST returntrue
to signal theSpan
corresponds to a remote span.
The remaining functionality of Span
MUST be defined as no-op operations.
This functionality MUST be fully implemented in the API, and SHOULD NOT be overridable.
Status
interface represents the status of a finished Span
. It's composed of
a canonical code, and an optional descriptive message.
StatusCanonicalCode
represents the canonical set of status codes of a finished
Span
.
Unset
- The default status.
Error
- The operation contains an error.
Ok
- The operation has been validated by an Application developers or Operator to have completed successfully, or contain
The status code SHOULD remain unset, except for the following circumstances:
When the status is set to ERROR
by Instrumentation Libraries, the status codes
SHOULD be documented and predictable. The status code should only be set to ERROR
according to the rules defined within the semantic conventions. For operations
not covered by the semantic conventions, Instrumentation Libraries SHOULD
publish their own conventions, including status codes.
Generally, Instrumentation Libraries SHOULD NOT set the status code to Ok
,
unless explicitly configured to do so. Instrumention libraries SHOULD leave the
status code as Unset
unless there is an error, as described above.
Application developers and Operators may set the status code to Ok
.
Analysis tools SHOULD respond to an Ok
status by suppressing any errors they
would otherwise generate. For example, to suppress noisy errors such as 404s.
API MUST provide a way to create a new Status
.
Required parameters
StatusCanonicalCode
of thisStatus
.
Optional parameters
- Description of this
Status
.
Returns the StatusCanonicalCode
of this Status
.
Returns the description of this Status
.
Languages should follow their usual conventions on whether to return null
or an empty string here if no description was given.
SpanKind
describes the relationship between the Span, its parents,
and its children in a Trace. SpanKind
describes two independent
properties that benefit tracing systems during analysis.
The first property described by SpanKind
reflects whether the Span
is a remote child or parent. Spans with a remote parent are
interesting because they are sources of external load. Spans with a
remote child are interesting because they reflect a non-local system
dependency.
The second property described by SpanKind
reflects whether a child
Span represents a synchronous call. When a child span is synchronous,
the parent is expected to wait for it to complete under ordinary
circumstances. It can be useful for tracing systems to know this
property, since synchronous Spans may contribute to the overall trace
latency. Asynchronous scenarios can be remote or local.
In order for SpanKind
to be meaningful, callers should arrange that
a single Span does not serve more than one purpose. For example, a
server-side span should not be used directly as the parent of another
remote span. As a simple guideline, instrumentation should create a
new Span prior to extracting and serializing the span context for a
remote call.
These are the possible SpanKinds:
SERVER
Indicates that the span covers server-side handling of a synchronous RPC or other remote request. This span is the child of a remoteCLIENT
span that was expected to wait for a response.CLIENT
Indicates that the span describes a synchronous request to some remote service. This span is the parent of a remoteSERVER
span and waits for its response.PRODUCER
Indicates that the span describes the parent of an asynchronous request. This parent span is expected to end before the corresponding childCONSUMER
span, possibly even before the child span starts. In messaging scenarios with batching, tracing individual messages requires a newPRODUCER
span per message to be created.CONSUMER
Indicates that the span describes the child of an asynchronousPRODUCER
request.INTERNAL
Default value. Indicates that the span represents an internal operation within an application, as opposed to an operations with remote parents or children.
To summarize the interpretation of these kinds:
SpanKind |
Synchronous | Asynchronous | Remote Incoming | Remote Outgoing |
---|---|---|---|---|
CLIENT |
yes | yes | ||
SERVER |
yes | yes | ||
PRODUCER |
yes | maybe | ||
CONSUMER |
yes | maybe | ||
INTERNAL |
For languages which support concurrent execution the Tracing APIs provide specific guarantees and safeties. Not all of API functions are safe to be called concurrently.
TracerProvider - all methods are safe to be called concurrently.
Tracer - all methods are safe to be called concurrently.
Span - All methods of Span are safe to be called concurrently.
Event - Events are immutable and safe to be used concurrently.
Link - Links are immutable and safe to be used concurrently.
The API layer MAY include the following Propagator
s:
- A
TextMapPropagator
implementing the W3C TraceContext Specification.
In general, in the absence of an installed SDK, the Trace API is a "no-op" API. This means that operations on a Tracer, or on Spans, should have no side effects and do nothing. However, there is one important exception to this general rule, and that is related to propagation of a Span.
The following cases must be considered when a new Span is requested to be created, especially in relation to the requested parent Span:
- A valid
Span
is specified as the parent of the newSpan
: The API MUST treat this parent context as the context for the newly createdSpan
. This means that aSpan
that has been provided by a configuredPropagator
will be propagated through to any child span. - No valid
Span
is specified as the parent of the newSpan
: The API MUST create an non-valid (both SpanID and TradeID are equivalent to being all zeros)Span
for use by the API caller. This means that both theTraceID
and theSpanID
should be invalid.