Skip to content

Commit

Permalink
cmd/containerboot,kube,util/linuxfw: configure kube egress proxies to…
Browse files Browse the repository at this point in the history
… route to 1+ tailnet targets (tailscale#13531)

* cmd/containerboot,kube,util/linuxfw: configure kube egress proxies to route to 1+ tailnet targets

This commit is first part of the work to allow running multiple
replicas of the Kubernetes operator egress proxies per tailnet service +
to allow exposing multiple tailnet services via each proxy replica.

This expands the existing iptables/nftables-based proxy configuration
mechanism.

A proxy can now be configured to route to one or more tailnet targets
via a (mounted) config file that, for each tailnet target, specifies:
- the target's tailnet IP or FQDN
- mappings of container ports to which cluster workloads will send traffic to
tailnet target ports where the traffic should be forwarded.

Example configfile contents:
{
  "some-svc": {"tailnetTarget":{"fqdn":"foo.tailnetxyz.ts.net","ports"{"tcp:4006:80":{"protocol":"tcp","matchPort":4006,"targetPort":80},"tcp:4007:443":{"protocol":"tcp","matchPort":4007,"targetPort":443}}}}
}

A proxy that is configured with this config file will configure firewall rules
to route cluster traffic to the tailnet targets. It will then watch the config file
for updates as well as monitor relevant netmap updates and reconfigure firewall
as needed.

This adds a bunch of new iptables/nftables functionality to make it easier to dynamically update
the firewall rules without needing to restart the proxy Pod as well as to make
it easier to debug/understand the rules:

- for iptables, each portmapping is a DNAT rule with a comment pointing
at the 'service',i.e:

-A PREROUTING ! -i tailscale0 -p tcp -m tcp --dport 4006 -m comment --comment "some-svc:tcp:4006 -> tcp:80" -j DNAT --to-destination 100.64.1.18:80
Additionally there is a SNAT rule for each tailnet target, to mask the source address.

- for nftables, a separate prerouting chain is created for each tailnet target
and all the portmapping rules are placed in that chain. This makes it easier
to look up rules and delete services when no longer needed.
(nftables allows hooking a custom chain to a prerouting hook, so no extra work
is needed to ensure that the rules in the service chains are evaluated).

The next steps will be to get the Kubernetes Operator to generate
the configfile and ensure it is mounted to the relevant proxy nodes.

Updates tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
  • Loading branch information
irbekrm authored Sep 29, 2024
1 parent c62b073 commit 096b090
Show file tree
Hide file tree
Showing 14 changed files with 1,667 additions and 25 deletions.
66 changes: 45 additions & 21 deletions cmd/containerboot/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ func main() {
PodIP: defaultEnv("POD_IP", ""),
EnableForwardingOptimizations: defaultBool("TS_EXPERIMENTAL_ENABLE_FORWARDING_OPTIMIZATIONS", false),
HealthCheckAddrPort: defaultEnv("TS_HEALTHCHECK_ADDR_PORT", ""),
EgressSvcsCfgPath: defaultEnv("TS_EGRESS_SERVICES_CONFIG_PATH", ""),
}

if err := cfg.validate(); err != nil {
Expand Down Expand Up @@ -275,10 +276,8 @@ authLoop:
switch *n.State {
case ipn.NeedsLogin:
if isOneStepConfig(cfg) {
// This could happen if this is the
// first time tailscaled was run for
// this device and the auth key was not
// passed via the configfile.
// This could happen if this is the first time tailscaled was run for this
// device and the auth key was not passed via the configfile.
log.Fatalf("invalid state: tailscaled daemon started with a config file, but tailscale is not logged in: ensure you pass a valid auth key in the config file.")
}
if err := authTailscale(); err != nil {
Expand Down Expand Up @@ -376,6 +375,9 @@ authLoop:
}
})
)
// egressSvcsErrorChan will get an error sent to it if this containerboot instance is configured to expose 1+
// egress services in HA mode and errored.
var egressSvcsErrorChan = make(chan error)
defer t.Stop()
// resetTimer resets timer for when to next attempt to resolve the DNS
// name for the proxy configured with TS_EXPERIMENTAL_DEST_DNS_NAME. The
Expand All @@ -401,6 +403,7 @@ authLoop:
failedResolveAttempts++
}

var egressSvcsNotify chan ipn.Notify
notifyChan := make(chan ipn.Notify)
errChan := make(chan error)
go func() {
Expand Down Expand Up @@ -575,31 +578,50 @@ runLoop:
h.Unlock()
healthzRunner()
}
if egressSvcsNotify != nil {
egressSvcsNotify <- n
}
}
if !startupTasksDone {
// For containerboot instances that act as TCP
// proxies (proxying traffic to an endpoint
// passed via one of the env vars that
// containerbot reads) and store state in a
// Kubernetes Secret, we consider startup tasks
// done at the point when device info has been
// successfully stored to state Secret.
// For all other containerboot instances, if we
// just get to this point the startup tasks can
// be considered done.
// For containerboot instances that act as TCP proxies (proxying traffic to an endpoint
// passed via one of the env vars that containerboot reads) and store state in a
// Kubernetes Secret, we consider startup tasks done at the point when device info has
// been successfully stored to state Secret. For all other containerboot instances, if
// we just get to this point the startup tasks can be considered done.
if !isL3Proxy(cfg) || !hasKubeStateStore(cfg) || (currentDeviceEndpoints != deephash.Sum{} && currentDeviceID != deephash.Sum{}) {
// This log message is used in tests to detect when all
// post-auth configuration is done.
log.Println("Startup complete, waiting for shutdown signal")
startupTasksDone = true

// Wait on tailscaled process. It won't
// be cleaned up by default when the
// container exits as it is not PID1.
// TODO (irbekrm): perhaps we can
// replace the reaper by a running
// cmd.Wait in a goroutine immediately
// after starting tailscaled?
// Configure egress proxy. Egress proxy will set up firewall rules to proxy
// traffic to tailnet targets configured in the provided configuration file. It
// will then continuously monitor the config file and netmap updates and
// reconfigure the firewall rules as needed. If any of its operations fail, it
// will crash this node.
if cfg.EgressSvcsCfgPath != "" {
log.Printf("configuring egress proxy using configuration file at %s", cfg.EgressSvcsCfgPath)
egressSvcsNotify = make(chan ipn.Notify)
ep := egressProxy{
cfgPath: cfg.EgressSvcsCfgPath,
nfr: nfr,
kc: kc,
stateSecret: cfg.KubeSecret,
netmapChan: egressSvcsNotify,
podIP: cfg.PodIP,
tailnetAddrs: addrs,
}
go func() {
if err := ep.run(ctx, n); err != nil {
egressSvcsErrorChan <- err
}
}()
}

// Wait on tailscaled process. It won't be cleaned up by default when the
// container exits as it is not PID1. TODO (irbekrm): perhaps we can replace the
// reaper by a running cmd.Wait in a goroutine immediately after starting
// tailscaled?
reaper := func() {
defer wg.Done()
for {
Expand Down Expand Up @@ -637,6 +659,8 @@ runLoop:
}
backendAddrs = newBackendAddrs
resetTimer(false)
case e := <-egressSvcsErrorChan:
log.Fatalf("egress proxy failed: %v", e)
}
}
wg.Wait()
Expand Down
Loading

0 comments on commit 096b090

Please sign in to comment.