Skip to content

Commit

Permalink
health: begin work to use structured health warnings instead of strin…
Browse files Browse the repository at this point in the history
…gs, pipe changes into ipn.Notify (tailscale#12406)

Updates tailscale#4136

This PR is the first round of work to move from encoding health warnings as strings and use structured data instead. The current health package revolves around the idea of Subsystems. Each subsystem can have (or not have) a Go error associated with it. The overall health of the backend is given by the concatenation of all these errors.

This PR polishes the concept of Warnable introduced by @bradfitz a few weeks ago. Each Warnable is a component of the backend (for instance, things like 'dns' or 'magicsock' are Warnables). Each Warnable has a unique identifying code. A Warnable is an entity we can warn the user about, by setting (or unsetting) a WarningState for it. Warnables have:

- an identifying Code, so that the GUI can track them as their WarningStates come and go
- a Title, which the GUIs can use to tell the user what component of the backend is broken
- a Text, which is a function that is called with a set of Args to generate a more detailed error message to explain the unhappy state

Additionally, this PR also begins to send Warnables and their WarningStates through LocalAPI to the clients, using ipn.Notify messages. An ipn.Notify is only issued when a warning is added or removed from the Tracker.

In a next PR, we'll get rid of subsystems entirely, and we'll start using structured warnings for all errors affecting the backend functionality.

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
  • Loading branch information
agottardo authored Jun 14, 2024
1 parent e8ca30a commit a8ee83e
Show file tree
Hide file tree
Showing 11 changed files with 939 additions and 213 deletions.
13 changes: 7 additions & 6 deletions control/controlclient/direct.go
Original file line number Diff line number Diff line change
Expand Up @@ -1581,9 +1581,9 @@ func postPingResult(start time.Time, logf logger.Logf, c *http.Client, pr *tailc
}

// ReportHealthChange reports to the control plane a change to this node's
// health.
func (c *Direct) ReportHealthChange(sys health.Subsystem, sysErr error) {
if sys == health.SysOverall {
// health. w must be non-nil. us can be nil to indicate a healthy state for w.
func (c *Direct) ReportHealthChange(w *health.Warnable, us *health.UnhealthyState) {
if w == health.NetworkStatusWarnable || w == health.IPNStateWarnable || w == health.LoginStateWarnable {
// We don't report these. These include things like the network is down
// (in which case we can't report anyway) or the user wanted things
// stopped, as opposed to the more unexpected failure types in the other
Expand All @@ -1602,12 +1602,13 @@ func (c *Direct) ReportHealthChange(sys health.Subsystem, sysErr error) {
if c.panicOnUse {
panic("tainted client")
}
// TODO(angott): at some point, update `Subsys` in the request to be `Warnable`
req := &tailcfg.HealthChangeRequest{
Subsys: string(sys),
Subsys: string(w.Code),
NodeKey: nodeKey,
}
if sysErr != nil {
req.Error = sysErr.Error()
if us != nil {
req.Error = us.Text
}

// Best effort, no logging:
Expand Down
30 changes: 30 additions & 0 deletions health/args.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause

package health

// Arg is a type for the key to be used in the Args of a Warnable.
type Arg string

const (
// ArgAvailableVersion provides an update notification Warnable with the available version of the Tailscale client.
ArgAvailableVersion Arg = "available-version"

// ArgCurrentVersion provides an update notification Warnable with the current version of the Tailscale client.
ArgCurrentVersion Arg = "current-version"

// ArgDuration provides a Warnable with how long the Warnable has been in an unhealthy state.
ArgDuration Arg = "duration"

// ArgError provides a Warnable with the underlying error behind an unhealthy state.
ArgError Arg = "error"

// ArgMagicsockFunctionName provides a Warnable with the name of the Magicsock function that caused the unhealthy state.
ArgMagicsockFunctionName Arg = "magicsock-function-name"

// ArgRegionID provides a Warnable with the ID of a DERP server involved in the unhealthy state.
ArgRegionID Arg = "region-id"

// ArgServerName provides a Warnable with the hostname of a server involved in the unhealthy state.
ArgServerName Arg = "server-name"
)
Loading

0 comments on commit a8ee83e

Please sign in to comment.