As mentioned above, the first "event" in each ND-JSON stream contains metadata to fold into subsequent events. The metadata that agents should collect includes are described in the following sub-sections.
- service metadata
- global labels (requires APM Server 7.2 or greater)
The process for proposing new metadata fields is detailed here.
System metadata relates to the host/container in which the service being monitored is running:
- hostname
- architecture
- operating system
- container ID
- kubernetes
- namespace
- node name
- pod name
- pod UID
On Linux, the container ID and some of the Kubernetes metadata can be extracted by parsing /proc/self/cgroup
. For each line in the file, we split the line according to the format "hierarchy-ID:controller-list:cgroup-path", extracting the "cgroup-path" part. We then attempt to extract information according to the following algorithm:
-
Split the path into dirname/basename (i.e. on the final slash)
-
If the basename ends with ".scope", check for a hyphen and remove everything up to and including that. This allows us to match
.../docker-<container-id>.scope
as well as.../<container-id>
. -
Attempt to extract the Kubernetes pod UID from the dirname by matching one of the following regular expressions:
(?:^/kubepods[\\S]*/pod([^/]+)$)
(?:^/kubepods\.slice/(kubepods-[^/]+\.slice/)?kubepods[^/]*-pod([^/]+)\.slice$)
The first capturing group in the first case and the second capturing group in the second case is the pod UID. In the latter case, we must unescape underscores (
_
) to hyphens (-
) in the pod UID. If we match a pod UID then we record the hostname as the pod name since, by default, Kubernetes will set the hostname to the pod name. Finally, we record the basename as the container ID without any further checks. -
If we did not match a Kubernetes pod UID above, then we check if the basename matches one of the following regular expressions:
^[[:xdigit:]]{64}$
^[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4,}$
If we match, then the basename is assumed to be a container ID.
If the Kubernetes pod name is not the hostname, it can be overridden by the KUBERNETES_POD_NAME
environment variable, using the Downward API. In a similar manner, you can inform the agent of the node name and namespace, using the environment variables KUBERNETES_NODE_NAME
and KUBERNETES_NAMESPACE
.
Process level metadata relates to the process running the service being monitored:
- process ID
- parent process ID
- process arguments
- process title (e.g. "node /app/node_")
Service metadata relates to the service/application being monitored:
- service name and version
- environment name ("production", "development", etc.)
- agent name (e.g. "ruby") and version (e.g. "2.8.1")
- language name (e.g. "ruby") and version (e.g. "2.5.3")
- runtime name (e.g. "jruby") and version (e.g. "9.2.6.0")
- framework name (e.g. "flask") and version (e.g. "1.0.2")
For official Elastic agents, the agent name should just be the name of the language for which the agent is written, in lower case.
Cloud provider metadata is collected from local cloud provider metadata services:
- availability_zone
- account
- id
- name
- instance
- id
- name
- machine.type
- project
- id
- name
- provider (required)
- region
This metadata collection is controlled by a configuration value,
CLOUD_PROVIDER
. The default is auto
, which automatically detects the cloud
provider. If set to none
, no cloud metadata will be generated. If set to
any of aws
, gcp
, or azure
, metadata will only be generated from the
chosen provider.
Any intake API requests to the APM server should be delayed until this metadata is available.
A sample implementation of this metadata collection is available in the Python agent.
Events sent by the agents can have labels associated, which may be useful for custom aggregations, or document-level access control. It is possible to add "global labels" to the metadata, which are labels that will be applied to all events sent by an agent. These are only understood by APM Server 7.2 or greater.
Global labels can be specified via the environment variable ELASTIC_APM_GLOBAL_LABELS
, formatted as a comma-separated list of key=value
pairs.