Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes: Delay health check for 180 seconds to account for long re… #1271

Merged
merged 1 commit into from
Aug 6, 2017

Conversation

tgraf
Copy link
Member

@tgraf tgraf commented Aug 6, 2017

…store

If many endpoints have been running before restart, endpoint restore can
eventually take a while due to contacting the kvstore and compilation of
endpoints. Bump the initial delay befor health checks occurs to 180
seconds to avoid unneeded pod restarts.

Signed-off-by: Thomas Graf thomas@cilium.io

…store

If many endpoints have been running before restart, endpoint restore can
eventually take a while due to contacting the kvstore and compilation of
endpoints. Bump the initial delay befor health checks occurs to 180
seconds to avoid unneeded pod restarts.

Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf added kind/bug This is a bug in the Cilium logic. pending-review labels Aug 6, 2017
@tgraf tgraf added this to the 0.11 milestone Aug 6, 2017
@tgraf tgraf requested a review from aanm August 6, 2017 21:19
@ianvernon ianvernon self-requested a review August 6, 2017 21:23
@ianvernon
Copy link
Member

It makes sense to have a longer time between health checks to avoid unnecessary restarts, but a jump from 10-180 seconds is quite a big one. Does it take almost three minutes to get endpoints restored after restarting ? This means that a user can have Cilium in a bad state for almost three minutes before it restarts.

@tgraf
Copy link
Member Author

tgraf commented Aug 6, 2017

It makes sense to have a longer time between health checks to avoid unnecessary restarts, but a jump from 10-180 seconds is quite a big one. Does it take almost three minutes to get endpoints restored after restarting ? This means that a user can have Cilium in a bad state for almost three minutes before it restarts.

This is only the initial delay before the first ever health check is being performed. Cilium should error out with a fatal error message if there is any error during bootstrap.

@ianvernon
Copy link
Member

This is only the initial delay before the first ever health check is being performed. Cilium should error out with a fatal error message if there is any error during bootstrap.

OK makes sense, thanks !

@tgraf tgraf merged commit 663e245 into master Aug 6, 2017
@tgraf tgraf deleted the initialDelay branch August 6, 2017 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants