This is a repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Kubernetes, Flux, Renovate and GitHub Actions.
This hyper-converged cluster runs Talos Linux, an immutable and ephemeral Linux distribution tailored for Kubernetes, and is deployed on bare-metal Minisforum MS-01 mini-PCs. Currently, persistent storage is provided via Rook in order to enable resilient block-, file-, and object-storage within the cluster. A Synology NAS handles media file storage and backups, and is also available as an alternate storage location with the help of a custom fork of the official Synology CSI for workloads that should not be hyper-converged. The cluster is designed to enable a full teardown without any data loss.
πΈ Click here to see my Talos configuration.
There is a template at onedr0p/cluster-template if you want to follow along with many of the practices I use here.
- cert-manager: Manage SSL certificates for services in my cluster.
- cilium: eBPF-based networking for my workloads.
- cloudflared: Enables Cloudflare secure access to my services.
- external-dns: Automatically syncs ingress DNS records to a DNS provider.
- external-secrets: Managed Kubernetes secrets using 1Password Connect.
- ingress-nginx: Kubernetes ingress controller using NGINX as a reverse proxy and load balancer.
- rook: Distributed block, file, and object storage for stateful workloads.
- spegel: Stateless cluster-local OCI registry mirror.
- volsync: Backup and recovery of persistent volume claims.
Flux monitors my kubernetes folder (see Directories below) and implements changes to my cluster based on the YAML manifests.
Flux operates by recursively searching the kubernetes/apps folder until it locates the top-level kustomization.yaml
in each directory. It then applies all the resources listed in it. This kustomization.yaml
typically contains a namespace resource and one or more Flux kustomizations. These Flux kustomizations usually include a HelmRelease
or other application-related resources, which are then applied.
Renovate monitors my entire repository for dependency updates, automatically creating a PR when updates are found. When the relevant PRs are merged, Flux then applies the changes to my cluster.
This Git repository contains the following directories under kubernetes/.
π kubernetes
βββ π apps # applications
βββ π bootstrap # bootstrap procedures
βββ π components # reusable kustomize components
βββ π flux # core flux configuration
This is a high-level look how Flux deploys my applications with dependencies. Below there are 3 Flux kustomizations cloudnative-pg
, postgres-cluster
, and atuin
. cloudnative-pg
is the first app that needs to be running and healthy before postgres-cluster
and once postgres-cluster
is healthy, then atuin
will be deployed.
graph TD;
id1>Kustomization: cluster] -->|Creates| id2>Kustomization: cluster-apps];
id2>Kustomization: cluster-apps] -->|Creates| id3>Kustomization: cloudnative-pg];
id2>Kustomization: cluster-apps] -->|Creates| id5>Kustomization: postgres-cluster]
id2>Kustomization: cluster-apps] -->|Creates| id8>Kustomization: atuin]
id3>Kustomization: cloudnative-pg] -->|Creates| id4[HelmRelease: cloudnative-pg];
id5>Kustomization: postgres-cluster] -->|Depends on| id3>Kustomization: cloudnative-pg];
id5>Kustomization: postgres-cluster] -->|Creates| id10[Postgres Cluster];
id8>Kustomization: atuin] -->|Creates| id9(HelmRelease: atuin);
id8>Kustomization: atuin] -->|Depends on| id5>Kustomization: postgres-cluster];
Apps hosted on my cluster are exposed using any combination of three different methods, depending on their use-case, security requirements, and intended audience. All three methods utilise fully encrypted HTTPS connections β TLS certificates are automatically provisioned and renewed by Cert Manager for each application.
The first and easiest way that an app can be exposed is strictly on my local network. This is most often used for apps and services that have to do with home automation β given that every smart home device is on my local network, there is no need to expose e.g. a supporting service like MQTT any further than that.
Local deployments are accomplished by creating an Ingress of type internal
, which will register a virtual IP for the service in a designated subnet (advertised via BGP) and provision a DNS record on the router with the ExternalDNS webhook provider for UniFi.
The second and most common way that an app can be exposed is via Tailscale. Creating an Ingress with the tailscale
class will expose the application to my Tailnet, and automagically configure DNS records. Most self-hosted apps and dashboards are exposed using this Ingress class, so that they are accessible on my personal devices at a consistent URL no matter if I'm at home or abroad.
Tailscale also serves as a Kubernetes auth proxy, which I use in conjunction with the Nautik iOS app to monitor and administer my Kubernetes cluster on-the-go.
The final and least common way to expose an app is via cloudflared
, the Cloudflare Tunnel daemon. By routing all external traffic through Cloudflare's infrastructure, I gain the benefits of their global security infrastructure (notably DDoS protection). This is generally used for webhook endpoints which require access from the wider Internet, though I do expose a select few apps for friends and family.
Creating an external
Ingress will trigger using ExternalDNS to provision a CNAME DNS record on Cloudflare which points at the Cloudflare Tunnel endpoint. The tunnel routes traffic securely into my cluster, where the ingress controller further routes it to the destination Service.
While most of my infrastructure and workloads are self-hosted, I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things:
- Dealing with chicken/egg scenarios
- Critical services that need to be accessible, whether my cluster is online or not.
- The "hit by a bus" scenario - what happens to critical apps (e.g. Email, Password Manager, Photos, etc.) that my friends and family rely on when I'm no longer around.
Alternative solutions to the first two of these problems would be to host a Kubernetes cluster in the cloud and deploy applications like Vault, Vaultwarden, ntfy, and Gatus; however, maintaining another cluster and monitoring another group of workloads would frankly be more time and effort than I am willing to put in. (and would probably cost more or equal out to the same costs as described below)
Service | Use | Cost |
---|---|---|
1Password | Secrets with External Secrets | ~$36/yr |
Cloudflare | Domain/DNS | ~$24/yr |
Backblaze | S3-compatible object storage | ~$36/yr |
GitHub | Hosting this repository and continuous integration/deployments | Free |
Pushover | Kubernetes Alerts and application notifications | $5 OTP |
UptimeRobot | Monitoring internet connectivity and external facing applications | Free |
Healthchecks.io | Dead man's switch for monitoring cron jobs | Free |
Total: ~$10/mo |
Device | Count | OS Disk | Data Disk | RAM | OS | Purpose |
---|---|---|---|---|---|---|
MS-01 (i9-12900H) | 3 | 1TB M.2 SSD | 2TB M.2 SSD (Rook) | 96GB | Talos Linux | Kubernetes |
Synology DS918+ | 1 | - | 2x14TBΒ HDD + 2x18TBΒ HDD + 2x1TBΒ SSDΒ R/WΒ Cache | 16GB | DSM 7 | NAS/NFS/Backup |
JetKVM | 2 | - | - | - | - | KVM |
Home Assistant Yellow | 1 | 8GB eMMC | 1TB M.2 SSD | 4GB | HAOS | Home Automation |
UniFi UDM Pro | 1 | - | - | - | UniFi OS | Router |
UniFi USW Pro 24 PoE | 1 | - | - | - | UniFi OS | Core Switch |
Unifi USP PDU Pro | 1 | - | - | - | UniFi OS | PDU |
CyberPower OR500LCDRM1U | 1 | - | - | - | - | UPS |
Huge thank-you to the folks over at the Home Operations community, especially @onedrop, @bjw-s, and @buroa β their home-ops repos have been an amazing resource to draw upon.
Be sure to check out kubesearch.dev for further ideas and reference for deploying applications on Kubernetes.
See the latest release notes.
See LICENSE.