Docker plugin together with Prometheus fails for some containers due to inconsistent label cardinality #1263

lightblu · 2016-05-25T09:04:21Z

Bug report

System info:

Amazon Linux AMI release 2016.03 (based on RHEL)
Docker: Version: 1.9.1 (build a34a1d5/1.9.1), API version: 1.21
Telegraf - version 0.13.0

Steps to reproduce:

Run Telegraf with docker plugin, prometheus output, and containers which do not have the same labels.

Expected behavior:

Prometheus should report stats about all containers.

Actual behavior:

Prometheus output reports only stats about some containers.

Additional info:

The docker plugin reports only part of my containers, because for most it reports inconsistent label cardinality, e.g.:

2016/05/24 18:58:30 ERROR Getting metric in Prometheus output, key: docker_container_net_tx_errors, labels: map[network:eth0 container_image:SOMEIMAGE container_name:SOMENAME host:SOMEIP], err: inconsistent label cardinality

When I look at containers that are successfully reported in prometheus output:
docker_container_net_tx_packets{License="GPLv2",Vendor="CentOS",container_image="SOMEIMAGE2",container_name="SOMENAME2",host="SOMEIP",network="eth0"} 773869
I am seeing that in this case the container has additional labels License and Vendor.

Verifying with curl --unix-socket /var/run/docker.sock http:/containers/json | jq '.[].Labels' they indeed have inconsistent labels:
{} {} {} { "build-date": "2016-03-31", "license": "GPLv2", "name": "CentOS Base Image", "vendor": "CentOS" } { "License": "GPLv2", "Vendor": "CentOS" } { "License": "GPLv2", "Vendor": "CentOS" } { "License": "GPLv2", "Vendor": "CentOS" }

Docker documentation tells that arbitrary labels can be set during build and container creation. A workaround would be to make sure all my containers have the same label set, but unfortunately labels can not be changed/removed for a running containers.

Proposal:

Quick: Do not add the container labels reported for containers by docker as metric labels. I assume that like in this case these are pretty much irrelevant in most cases.

Bonus: Make the labels the docker plugin grabs from the container labels configurable and make sure that if a label is not present it still gets added with a default (e.g. "missing") value. Thus, if someone wants to use labels for some fancy grouping, one could do that.

The text was updated successfully, but these errors were encountered:

sparrc · 2016-05-25T09:35:41Z

Docker Labels were actually added to metrics because users were specifically asking for them. This is a tough problem because we'd have to track the "cardinality" of every single metric, which seems like overkill.

lightblu · 2016-05-25T09:42:30Z

If some users really need them then I would go for the "bonus" option, e.g. have a container_labels in the config:

# Read metrics about docker containers
[[inputs.docker]]
  ## Docker Endpoint
  ##   To use TCP, set endpoint = "tcp://[ip]:[port]"
  ##   To use environment variables (ie, docker-machine), set endpoint = "ENV"
  endpoint = "unix:///var/run/docker.sock"
  ## Only collect metrics for these containers, collect all if empty
  container_names = []
  **## Only collect these container labels from docker daemon, collect all if empty but note that
  ## this will break prometheus output if containers have inconsistent label sets.
  container_labels = ['my_label_id_like_to_include']**
  ## Timeout for docker list, info, and stats commands
  timeout = "5s"

And then instead of adding all labels to a metric add only these that are defined in container_labels (with a default value if that is missing), no cardinality tracking should be required then? Unfortunately, my go experience is currently much too low to do it by myself, but I might find some spare time at the weekend.

sparrc · 2016-05-25T10:28:24Z

yes, we could add an option, but I think that it should actually default to adding all labels, because prometheus is the only output plugin that has this problem with metrics having different cardinalities. IMO it's a very strange limitation to put on users....

lightblu · 2016-05-25T12:00:05Z

Good news, a colleague with go experience quickly made me what I needed which we will PR soon.
Slightly unaesthetic is that to keep the default = "add all labels" there is no way to specify that Telegraf adds no labels at all from the Container.Labels (because empty array means add all) and you need to specify at least one "fake" label if you want to get my desired behaviour. Do you have a suggestion for this? Otherwise I think it is fine this way (as you said this is only a limitation by Prometheus; btw I agree it is strange but it is owed to their internal model and I think one reason for their claimed efficieny, whatever).

lightblu · 2016-05-25T14:15:57Z

Ok, this works fine if you specify
taginclude = ["network", "container_image", "container_name", "device", "cpu", "unit"]
in the inputs.docker section in the configuration.
Maybe you could add a note to the documentation under Tags that also every containerl label gets added as tags and one needs to specify this for Prometheus, or maybe this should be added to the Prometheus output to be careful with plugins with arbitrary tags. Thanks for your time!

kamalvpm777 · 2017-02-23T05:55:28Z

Hello,
Using the below taginclude option am trying to pull only the CPU related metrics for the docker, but no luck so far. Could you please check and advise what is the right parameter for taginclude to pull only docker CPU metrics. Any of your help will be greatly appreciated .

######telegraf conf
[[inputs.docker]]
taginclude = ["container_image", "container_name","cpu"]
######Docker Endpoint
######To use TCP, set endpoint = "tcp://[ip]:[port]"
######To use environment variables (ie, docker-machine), set endpoint = "ENV"
endpoint = "unix:///var/run/docker.sock"
######Only collect metrics for these containers, collect all if empty
container_names = []

This was referenced May 25, 2016

Added ContainerLabels parameter for filtering the container labels #1263 #1269

Closed

Added container_labels parameter #1270

Closed

lightblu closed this as completed May 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker plugin together with Prometheus fails for some containers due to inconsistent label cardinality #1263

Docker plugin together with Prometheus fails for some containers due to inconsistent label cardinality #1263

lightblu commented May 25, 2016

sparrc commented May 25, 2016

lightblu commented May 25, 2016 •

edited

Loading

sparrc commented May 25, 2016

lightblu commented May 25, 2016

lightblu commented May 25, 2016

kamalvpm777 commented Feb 23, 2017 •

edited

Loading

Docker plugin together with Prometheus fails for some containers due to inconsistent label cardinality #1263

Docker plugin together with Prometheus fails for some containers due to inconsistent label cardinality #1263

Comments

lightblu commented May 25, 2016

Bug report

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

Proposal:

sparrc commented May 25, 2016

lightblu commented May 25, 2016 • edited Loading

sparrc commented May 25, 2016

lightblu commented May 25, 2016

lightblu commented May 25, 2016

kamalvpm777 commented Feb 23, 2017 • edited Loading

lightblu commented May 25, 2016 •

edited

Loading

kamalvpm777 commented Feb 23, 2017 •

edited

Loading