-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connect: register the service before the proxy #305
Conversation
Because the proxy service registers an alias health check that points to the service ID of the main service and we're registering the proxy service before the main service, the alias check starts out as red because at that time the service doesn't yet exist. Consul runs this check every minute, and so this will become green only after one minute. This is particularly bad in a case when a service is restarted either due to scheduled or unscheduled maintenance. For example, when you have deployment and you trigger a re-deploy (kubectl rollout restart), kubernetes by default will do a rolling deploy, where it won't terminate the old instance until the new one comes up and is healthy. But Consul will take an additional minute or so to mark this service healthy, causing downtime, where no downtime should be experienced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Was able to successfully reproduce the error and the fix. 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look fine, but the changelog needs to include a warning about the implications of this change.
CHANGELOG.md
Outdated
BUG FIXES: | ||
|
||
* Connect: Reduce downtime caused by an alias health check of the sidecar proxy not being healthy for up to 1 minute | ||
when a Connect-enabled service is restarted [[GH-305](https://github.com/hashicorp/consul-k8s/pull/305)]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our conversation earlier, this should include the caveat that while this fix reverts to the previous behavior, that previous behavior includes Consul routing to services that may not be ready yet.
Because the proxy service registers an alias health check that points to the service ID of the main service and we're registering the proxy service before the main service, the alias check starts out as red
because at that time the service doesn't yet exist. Consul runs this check every minute, and so this will become green only after one minute.
This is particularly bad in a case when a service is restarted either due to scheduled or unscheduled maintenance. For example, when you have deployment and you trigger a re-deploy (kubectl rollout restart), kubernetes by default will do a rolling deploy, where it won't terminate the old instance until the new one comes up and is healthy. But Consul will take an additional minute or so to mark this service as healthy, causing downtime, where either no downtime or minimal downtime should be experienced.
Changes proposed in this PR:
Switch the order of how services are registered with Consul, with the main service registered first and the proxy service after.
Steps to reproduce and test
To fix, upgrade to the image built from this PR and run helm upgrade:
Once the connect injector becomes healthy, restart the static-server deployment again (step 4 above). You should either see no or 1-2 errors (i.e. 1-2 seconds of downtime) printed from the while loop running in the static-client container.
Checklist: