Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some users still missing container metrics. #1635

Closed
dashpole opened this issue Apr 10, 2017 · 7 comments
Closed

Some users still missing container metrics. #1635

dashpole opened this issue Apr 10, 2017 · 7 comments

Comments

@dashpole
Copy link
Collaborator

dashpole commented Apr 10, 2017

@stensonb, @dzavalkinolx, @andrewsykim have all said that in cAdvisor v0.25.0 (or kubernetes v1.5.6 or v1.6.0) that they still have issues with disappearing metrics.

To help us get to the bottom of this, please provide OS, OS version, and cadvisor/kubernetes version. I suspect that this is a variant of #1572, which is caused by incorrectly adding the "container": "/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount". See #1572 for more details. Please check your logs for other containers that could have been added incorrectly.

@stensonb
Copy link

FYI - I observed this with 1.5.6. I've subsequently upgraded to 1.6.1, and will report the error here if I see it again. So far, after 24 hours, nothing.

@dashpole
Copy link
Collaborator Author

I just realized that kubernetes/kubernetes#39477 never made it into the 1.5 branch. That is probably why people are still experiencing this...

@carlpett
Copy link

carlpett commented Jul 5, 2017

I think I may be seeing this? The issue I'm having is that we seemingly at random do not get all the containers, or at least not all labels, on the /metric endpoint. As an example, here I'm grepping for the container_cpu_usage_seconds_total metric, and seeing if it has an image label:

# curl -s localhost:9190/metrics | grep -E 'container_cpu_usage_seconds_total.+image' | wc -l
580
# curl -s localhost:9190/metrics | grep -E 'container_cpu_usage_seconds_total.+image' | wc -l
20
# curl -s localhost:9190/metrics | grep -E 'container_cpu_usage_seconds_total.+image' | wc -l
159
# curl -s localhost:9190/metrics | grep -E 'container_cpu_usage_seconds_total.+image' | wc -l
0
# curl -s localhost:9190/metrics | grep -E 'container_cpu_usage_seconds_total.+image' | wc -l
159
# curl -s localhost:9190/metrics | grep -E 'container_cpu_usage_seconds_total.+image' | wc -l
580

(These lines were in quick succession.)

We're running cadvisor as a systemd service (not in a container). Tried upgrading from 0.23.8 to 0.26.1, no difference.
OS is CentOS Linux release 7.2.1511 (Core), docker 17.05.0-ce. We are not using Kubernetes.

(I'm also having random SIGSEGVs on startup, but that seems unrelated)

@maxramqvist
Copy link

maxramqvist commented Jul 7, 2017

Seeing the same issue.
Running cAdvisor 0.26.1 as container.
curl:ing the prometheus endpoint and grep for unique container_label_image 30 times in a row with a seconds pause would get me either 0, 8 or 18. Never the correct 28.
Docker version 17.05.0-ce, build 89658be
Ubuntu 16.10 kernel 4.8 x86_64

@mindw
Copy link

mindw commented Jul 14, 2017

we had our metrics disappear after exactly 24 hours and tracked down the issue to be caused by the period 24 hour rkt-gc.timer.

  • kubelet is run natively on the host (kubelet-wrapper not used)
  • k8s 1.3.x-1.5.x
  • CoreOs 13xx.x
  • rkt was used to run one-shot containers during boot.

Either disabling the rkt-gc.timer or using the kubelet-wrapper resolves the issue.

@zeisss
Copy link

zeisss commented Jul 25, 2017

We are seeing this too. Cadvisor as a systemd service, with Docker 1.13.1. No Kubernetes.

cadvisor_version_info{cadvisorRevision="d19cc94",cadvisorVersion="v0.26.1",dockerVersion="1.13.1",kernelVersion="3.16.0-4-amd64",osVersion="Debian GNU/Linux 8 (jessie)"} 1#

The WebUI shows the disappearing containers just fine.

@dashpole
Copy link
Collaborator Author

dashpole commented Feb 1, 2018

closing as this is outdated. This shouldn't be an issue in newer versions

@dashpole dashpole closed this as completed Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants