Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teleport-update] Add proper healthcheck for agents #51613

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

sclevine
Copy link
Member

@sclevine sclevine commented Jan 29, 2025

During testing, @hugoShaka found that the PID crash detection logic implemented in teleport-update does not catch various failure scenarios where Teleport will repeatedly attempt to reconnect without crashing due to a bug.

This PR adds a readiness check on Teleport's debug socket. @hugoShaka is working on a separate PR in parallel to prevent the readiness endpoint from being disabled.

Instead of monitoring the PID for 30 seconds, teleport-update now waits 60 seconds for the process to start, stabilize, and return 200 from /readyz. If the PID file is missing, socket is missing, or /readyz isn't implemented, the corresponding check is disabled with a warning.


The teleport-update binary will be used to enable, disable, and trigger automatic Teleport agent updates. The new auto-updates system manages a local installation of the cluster-specified version of Teleport stored in /opt/teleport.

RFD: #47126
Goal (internal): https://github.com/gravitational/cloud/issues/10289

@sclevine sclevine added no-changelog Indicates that a PR does not require a changelog entry teleport-update labels Jan 29, 2025
@sclevine sclevine force-pushed the sclevine/autoupdate-diag branch from be7af20 to 3ed4944 Compare January 29, 2025 21:56
@sclevine sclevine marked this pull request as ready for review January 29, 2025 22:25
@sclevine sclevine requested review from hugoShaka and vapopov January 29, 2025 22:25
@@ -55,8 +56,9 @@ func NewClient(socketPath string) *Client {
clt: &http.Client{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here, could you also add a CheckRedirect that errors unconditionally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/md teleport-update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants