Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker plugin together with Prometheus fails for some containers due to inconsistent label cardinality #1263

Closed
lightblu opened this issue May 25, 2016 · 6 comments

Comments

@lightblu
Copy link
Contributor

Bug report

System info:

Amazon Linux AMI release 2016.03 (based on RHEL)
Docker: Version: 1.9.1 (build a34a1d5/1.9.1), API version: 1.21
Telegraf - version 0.13.0

Steps to reproduce:

  1. Run Telegraf with docker plugin, prometheus output, and containers which do not have the same labels.

Expected behavior:

Prometheus should report stats about all containers.

Actual behavior:

Prometheus output reports only stats about some containers.

Additional info:

The docker plugin reports only part of my containers, because for most it reports inconsistent label cardinality, e.g.:

2016/05/24 18:58:30 ERROR Getting metric in Prometheus output, key: docker_container_net_tx_errors, labels: map[network:eth0 container_image:SOMEIMAGE container_name:SOMENAME host:SOMEIP], err: inconsistent label cardinality

When I look at containers that are successfully reported in prometheus output:
docker_container_net_tx_packets{License="GPLv2",Vendor="CentOS",container_image="SOMEIMAGE2",container_name="SOMENAME2",host="SOMEIP",network="eth0"} 773869
I am seeing that in this case the container has additional labels License and Vendor.

Verifying with curl --unix-socket /var/run/docker.sock http:/containers/json | jq '.[].Labels' they indeed have inconsistent labels:
{} {} {} { "build-date": "2016-03-31", "license": "GPLv2", "name": "CentOS Base Image", "vendor": "CentOS" } { "License": "GPLv2", "Vendor": "CentOS" } { "License": "GPLv2", "Vendor": "CentOS" } { "License": "GPLv2", "Vendor": "CentOS" }

Docker documentation tells that arbitrary labels can be set during build and container creation. A workaround would be to make sure all my containers have the same label set, but unfortunately labels can not be changed/removed for a running containers.

Proposal:

Quick: Do not add the container labels reported for containers by docker as metric labels. I assume that like in this case these are pretty much irrelevant in most cases.

Bonus: Make the labels the docker plugin grabs from the container labels configurable and make sure that if a label is not present it still gets added with a default (e.g. "missing") value. Thus, if someone wants to use labels for some fancy grouping, one could do that.

@sparrc
Copy link
Contributor

sparrc commented May 25, 2016

Docker Labels were actually added to metrics because users were specifically asking for them. This is a tough problem because we'd have to track the "cardinality" of every single metric, which seems like overkill.

@lightblu
Copy link
Contributor Author

lightblu commented May 25, 2016

If some users really need them then I would go for the "bonus" option, e.g. have a container_labels in the config:

# Read metrics about docker containers
[[inputs.docker]]
  ## Docker Endpoint
  ##   To use TCP, set endpoint = "tcp://[ip]:[port]"
  ##   To use environment variables (ie, docker-machine), set endpoint = "ENV"
  endpoint = "unix:///var/run/docker.sock"
  ## Only collect metrics for these containers, collect all if empty
  container_names = []
  **## Only collect these container labels from docker daemon, collect all if empty but note that
  ## this will break prometheus output if containers have inconsistent label sets.
  container_labels = ['my_label_id_like_to_include']**
  ## Timeout for docker list, info, and stats commands
  timeout = "5s"

And then instead of adding all labels to a metric add only these that are defined in container_labels (with a default value if that is missing), no cardinality tracking should be required then? Unfortunately, my go experience is currently much too low to do it by myself, but I might find some spare time at the weekend.

@sparrc
Copy link
Contributor

sparrc commented May 25, 2016

yes, we could add an option, but I think that it should actually default to adding all labels, because prometheus is the only output plugin that has this problem with metrics having different cardinalities. IMO it's a very strange limitation to put on users....

@lightblu
Copy link
Contributor Author

Good news, a colleague with go experience quickly made me what I needed which we will PR soon.
Slightly unaesthetic is that to keep the default = "add all labels" there is no way to specify that Telegraf adds no labels at all from the Container.Labels (because empty array means add all) and you need to specify at least one "fake" label if you want to get my desired behaviour. Do you have a suggestion for this? Otherwise I think it is fine this way (as you said this is only a limitation by Prometheus; btw I agree it is strange but it is owed to their internal model and I think one reason for their claimed efficieny, whatever).

@lightblu
Copy link
Contributor Author

Ok, this works fine if you specify
taginclude = ["network", "container_image", "container_name", "device", "cpu", "unit"]
in the inputs.docker section in the configuration.
Maybe you could add a note to the documentation under Tags that also every containerl label gets added as tags and one needs to specify this for Prometheus, or maybe this should be added to the Prometheus output to be careful with plugins with arbitrary tags. Thanks for your time!

@kamalvpm777
Copy link

kamalvpm777 commented Feb 23, 2017

Hello,
Using the below taginclude option am trying to pull only the CPU related metrics for the docker, but no luck so far. Could you please check and advise what is the right parameter for taginclude to pull only docker CPU metrics. Any of your help will be greatly appreciated .

######telegraf conf
[[inputs.docker]]
taginclude = ["container_image", "container_name","cpu"]
######Docker Endpoint
######To use TCP, set endpoint = "tcp://[ip]:[port]"
######To use environment variables (ie, docker-machine), set endpoint = "ENV"
endpoint = "unix:///var/run/docker.sock"
######Only collect metrics for these containers, collect all if empty
container_names = []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants