kubernetes_state: plumb more container waiting reasons #1763

stevvooe · 2018-06-19T17:14:44Z

We'd like to create monitors that fire when containers are stuck waiting
for various reasons. Two particular reasons, ImagePullBackoff and
CrashLoopBackoff, can be used to detect bad or broken deployments. These
have been plumbed as of kube-state-metric 1.3 but are not currently
whitelisted in the DataDog agent integration. The tests have also been
update with fixture data.

Signed-off-by: Stephen Day stephen.day@getcruise.com

We'd like to create monitors that fire when containers are stuck waiting for various reasons. Two particular reasons, ImagePullBackoff and CrashLoopBackoff, can be used to detect bad or broken deployments. These have been plumbed as of kube-state-metric 1.3 but are not currently whitelisted in the DataDog agent integration. The tests have also been update with fixture data. Signed-off-by: Stephen Day <stephen.day@getcruise.com>

hkaj

that's great, thanks @stevvooe !

stevvooe · 2018-06-26T20:17:40Z

@hkaj @masci What's the timeline for this getting merged and released?

masci

LGTM

hkaj · 2018-07-04T15:44:00Z

thanks @stevvooe ! This will go out with 6.4 (scheduled for end of July)

deiwin · 2018-07-25T10:30:43Z

Could ContainerCreating also be included? Need this to monitor for known issues with https://github.com/aws/amazon-vpc-cni-k8s.

Why's there a whitelist in the first place? I see some discussion in #853, but don't understand the reason for skipping metrics with new reasons instead of simply passing the reason through.

stevvooe · 2018-08-01T18:26:42Z

@deiwin I think the whitelist is reduce the amount of metric volume that may be ignored or unused.

I think you could easily add it with a PR like this one. I only focused on the failure scenarios, as those are the most problematic. What would be the use case of monitoring ContainerCreating?

deiwin · 2018-08-13T12:06:54Z

What would be the use case of monitoring ContainerCreating?

With the CNI linked to above, pods can get stuck in the ContainerCreating phase when the CNI is unable to reserve an IP for them.

stevvooe requested a review from a team as a code owner June 19, 2018 17:14

masci added integration/kubernetes community changelog/Added and removed changelog/Added labels Jun 19, 2018

hkaj approved these changes Jun 20, 2018

View reviewed changes

hkaj added this to the 6.3.1 milestone Jun 20, 2018

JulienBalestra modified the milestones: 6.3.1, 6.4 Jun 25, 2018

masci removed this from the 6.4 milestone Jun 25, 2018

masci approved these changes Jun 28, 2018

View reviewed changes

hkaj merged commit 41adce4 into DataDog:master Jul 4, 2018

stevvooe deleted the sday-plumb-crashloopbackoff branch July 5, 2018 17:53

CharlyF added the test-card-created label Jul 23, 2018

deiwin mentioned this pull request Aug 16, 2018

Include ContainerCreating in pod waiting status reasons #2063

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes_state: plumb more container waiting reasons #1763

kubernetes_state: plumb more container waiting reasons #1763

stevvooe commented Jun 19, 2018

hkaj left a comment

stevvooe commented Jun 26, 2018

masci left a comment

hkaj commented Jul 4, 2018

deiwin commented Jul 25, 2018

stevvooe commented Aug 1, 2018

deiwin commented Aug 13, 2018

kubernetes_state: plumb more container waiting reasons #1763

kubernetes_state: plumb more container waiting reasons #1763

Conversation

stevvooe commented Jun 19, 2018

hkaj left a comment

Choose a reason for hiding this comment

stevvooe commented Jun 26, 2018

masci left a comment

Choose a reason for hiding this comment

hkaj commented Jul 4, 2018

deiwin commented Jul 25, 2018

stevvooe commented Aug 1, 2018

deiwin commented Aug 13, 2018