Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add liveness probe for calico-kube-controllers #4407

Merged

Conversation

hakman
Copy link
Contributor

@hakman hakman commented Feb 18, 2021

Description

Due to kubernetes/client-go#374, calico-kube-controllers takes a very long time to get ready when upgrading a cluster, as described in #3751.

The underlying issue was fixed by projectcalico/libcalico-go#1356, but only for v3.18.
This change is a workaround that can be cherry-picked to v3.17 and earlier, but it's also a good practice for liveness probe to be added, same as for other calico components.

Related issues/PRs

https://github.com/projectcalico/libcalico-go/issues/1267

Todos

  • Tests
  • Documentation
  • Release note

Release Note

Add liveness probe for calico-kube-controllers

@marvin-tigera marvin-tigera added this to the Calico v3.19.0 milestone Feb 18, 2021
@marvin-tigera marvin-tigera added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact labels Feb 18, 2021
@hakman
Copy link
Contributor Author

hakman commented Feb 18, 2021

/cc @lwr20 @caseydavenport

Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! Added a comment and a couple questions.

- -r
periodSeconds: 10
initialDelaySeconds: 10
failureThreshold: 6
readinessProbe:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that this readinessProbe doesn't match the one above? (initialDelaySeconds vs periodSeconds).

Could you comment on why we should make these changes to the readinessprobe as well? The PR seems to mostly be about the liveness probe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, should be periodSeconds. Will fix. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For periodSeconds, the default value is 10. Other probes in the same manifest have this set anyway and wanted to keep the same style.

exec:
command:
- /usr/bin/check-status
- -r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, the -r flag was intended to mean "ready", and we intended to add a "-l" flag to mean "alive" - code is here: https://github.com/projectcalico/kube-controllers/blob/master/cmd/check-status/main.go#L37

I think we can probably get away with this as a quick fix, but should we want to have separate ready / live conditions we'll need to have separate flags, so longer term I'd definitely like to see that even if for now -l == -r.

Would you be interested in adding a new option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't know this. Would not mind adding a new flag for liveness in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hakman hakman force-pushed the kube-controllers-liveness-probe branch from 910bdee to 552e5b5 Compare March 10, 2021 16:23
@hakman hakman force-pushed the kube-controllers-liveness-probe branch from 552e5b5 to 4eb06ed Compare March 10, 2021 16:55
@caseydavenport
Copy link
Member

CC @tmjd - you may be interested in this if we want to include in the operator as well.

@tmjd
Copy link
Member

tmjd commented Mar 23, 2021

@caseydavenport yeah I think we would certainly want to add this to this operator.

@caseydavenport caseydavenport merged commit e6f9c4f into projectcalico:master Mar 29, 2021
@hakman hakman deleted the kube-controllers-liveness-probe branch March 30, 2021 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants