Skip to content
Back to blog Building a Production-Grade Homelab with K3s, Vault, and FluxCD

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

K8sDevOps

Building a Production-Grade Homelab with K3s, Vault, and FluxCD

I wanted a homelab that wasn’t just “k3s install and call it a day.” Something I could actually use to trial tooling, break things safely, and run real workloads. The kind of setup where if someone asked “how does your secrets management work?” the answer isn’t “I hardcoded them in a ConfigMap.”

This is that setup. Single mini PC, four VMs, full GitOps pipeline, proper secrets management, and observability that would hold up in a real environment. I’m going to walk through the entire build from unboxing to running workloads - every step, every gotcha.

The repo is public: github.com/moabukar/homelab

TL;DR

  • Single Ace Magician mini PC running Proxmox VE bare metal
  • 3-node K3s cluster (1 control plane + 2 workers) + dedicated Vault VM
  • FluxCD for GitOps - push to main, everything reconciles
  • HashiCorp Vault + External Secrets Operator for secrets (JWT auth, no manual secret management)
  • Full observability: Prometheus, Grafana, Jaeger, OTel Collector
  • Cloudflare Tunnel for external access without port forwarding
  • Gateway API + Traefik for routing (not traditional Ingress)
  • Everything as code, everything in Git

Why a Mini PC?

I considered a few options before settling on this approach:

  • Raspberry Pi cluster - I’ve done this before (wrote about it here). It works, but ARM architecture means some container images aren’t available and the Pi 5 tops out at 8GB RAM. Not enough for a full observability stack.
  • Old desktop/server - Loud, power-hungry, takes up space. My wife would not have been happy.
  • Cloud VMs - Defeats the purpose. I wanted something I own, on my network, that I can break and rebuild without a billing surprise.
  • Mini PC - Small, quiet, x86_64, decent specs for under 300 quid. Winner.

The mini PC sits on my desk, plugged into the router. It draws about 15W idle. Silent. My wife doesn’t even know it’s there (she does now).

The Hardware

Here’s everything I bought:

PART                    MODEL / SPEC                   APPROX COST
====                    ============                   ===========
Mini PC                 Ace Magician AM08 Pro           ~£280
CPU (included)          AMD Ryzen 7 7730U (8c/16t)      -
RAM (included)          30GB DDR4                        -
Storage (included)      500GB NVMe SSD                   -
USB drive               Any 8GB+ USB stick               ~£5
Ethernet cable          Cat6, 1m                         ~£3

That’s it. Under 300 quid for a machine that runs four VMs comfortably.

The AM08 Pro comes with everything you need - the NVMe and RAM are pre-installed. No need to open it up. It ships with Windows 11, which we’re going to nuke from orbit in about five minutes.

SPEC                VALUE
====                =====
Model               Ace Magician AM08 Pro
CPU                 AMD Ryzen 7 7730U (8c/16t, 2.0GHz base, 4.5GHz boost)
RAM                 30GB DDR4-3200
Storage             500GB NVMe PCIe 3.0
Network             Gigabit Ethernet (RJ45) + Wi-Fi 6
Ports               2x HDMI, 4x USB 3.0, 1x USB-C
Power               65W adapter, ~15W idle draw
Size                127 x 128 x 57mm (fits in your palm)

You’ll also want a monitor and keyboard for the initial Proxmox install. After that, everything is headless via the Proxmox web UI.

Creating the Proxmox Boot USB

Proxmox VE is a Debian-based hypervisor. Free, open source, enterprise-grade. It runs on bare metal and gives you a web UI to manage VMs, containers, storage, and networking.

First, download the Proxmox VE ISO from the official site:

https://www.proxmox.com/en/downloads/proxmox-virtual-environment/iso

I used Proxmox VE 8.x. Download the ISO (about 1.2GB).

Next, flash it to a USB stick. On macOS:

# Find your USB drive
diskutil list

# Unmount it (replace diskN with your USB disk number)
diskutil unmountDisk /dev/diskN

# Flash the ISO (use rdiskN for faster writes)
sudo dd if=proxmox-ve_8.x-x.iso of=/dev/rdiskN bs=4M status=progress

# Eject when done
diskutil eject /dev/diskN

On Linux:

# Find your USB drive
lsblk

# Flash it
sudo dd if=proxmox-ve_8.x-x.iso of=/dev/sdX bs=4M status=progress conv=fdatasync

On Windows, use Rufus or balenaEtcher - drag the ISO in, select the USB, click flash. Done.

Installing Proxmox on the Mini PC

This is the “nuke Windows” step.

  1. Plug the USB into the mini PC. Also connect a monitor (HDMI) and keyboard.

  2. Boot from USB. Power on the mini PC and hammer the boot menu key. For the Ace Magician AM08 Pro, it’s F7 or Del to enter BIOS, then change boot order to USB first. Some machines use F2, F10, or F12 - check your model.

  3. BIOS settings. While you’re in BIOS, make sure:

    • Virtualization is enabled (AMD-V / SVM Mode = Enabled). This is critical - Proxmox needs hardware virtualisation.
    • Secure Boot is disabled. Proxmox doesn’t play nice with Secure Boot.
    • Boot mode is set to UEFI (not Legacy/CSM).
  4. Boot the Proxmox installer. Save BIOS settings and reboot. The Proxmox installer should load from USB.

  5. Walk through the installer:

    • Accept the EULA
    • Select the target disk (the 500GB NVMe). This wipes everything - bye bye Windows.
    • Set your country, timezone, and keyboard layout
    • Set the root password and admin email
    • Configure networking:
      • Management interface: the Ethernet port (enp1s0 or similar)
      • Hostname: pve.local (or whatever you like)
      • IP: I used a static IP on my LAN - 192.168.1.100/24
      • Gateway: 192.168.1.1 (your router)
      • DNS: 192.168.1.1 (or 1.1.1.1)
  6. Install and reboot. Takes about 5 minutes. Remove the USB when prompted.

  7. Access the web UI. From any machine on your network, open:

    https://192.168.1.100:8006

    Login with root and the password you set. You’ll get a certificate warning - that’s expected, click through it.

You now have a bare-metal hypervisor running on your mini PC. The monitor and keyboard can be disconnected - everything from here is via the web UI or SSH.

Setting Up Tailscale for Remote Access

I didn’t want to be limited to my home network. Tailscale gives you a private WireGuard mesh network - install it on Proxmox and you can access the web UI from anywhere.

SSH into the Proxmox host (or use the web UI console):

# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh

# Authenticate
tailscale up

# Note the Tailscale IP
tailscale ip -4
# e.g., 100.93.110.7

Now you can access Proxmox at https://100.93.110.7:8006 from any device on your Tailscale network. Coffee shop, office, phone - doesn’t matter.

Creating the Ubuntu Cloud-Init Template

Instead of installing Ubuntu manually on every VM, I created a cloud-init template once and clone it for each new VM. Cloud-init handles SSH keys, hostname, static IPs, and package installs automatically on first boot.

On the Proxmox host:

# Download Ubuntu 24.04 cloud image
wget https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img

# Create a new VM (ID 9000) that will become our template
qm create 9000 --name ubuntu-cloud --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0

# Import the cloud image as a disk
qm importdisk 9000 noble-server-cloudimg-amd64.img local-lvm

# Attach the disk as SCSI
qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0

# Add a cloud-init drive
qm set 9000 --ide2 local-lvm:cloudinit

# Set boot order to the SCSI disk
qm set 9000 --boot c --bootdisk scsi0

# Add a serial console (needed for cloud-init)
qm set 9000 --serial0 socket --vga serial0

# Configure cloud-init defaults
qm set 9000 --ciuser mo
qm set 9000 --sshkeys ~/.ssh/authorized_keys
qm set 9000 --ipconfig0 ip=dhcp

# Convert to template
qm template 9000

This template is the foundation for every VM in the lab. Clone it, customise the resources, set a static IP, and boot. Takes about 30 seconds per VM.

Provisioning the K3s VMs

Three VMs cloned from the template. Each gets a static IP, specific RAM allocation, and a resized disk.

# Clone for control plane (VM 101)
qm clone 9000 101 --name cp1 --full
qm set 101 --memory 6144 --cores 2
qm resize 101 scsi0 +18G    # 20GB total
qm set 101 --ipconfig0 ip=192.168.1.21/24,gw=192.168.1.1
qm start 101

# Clone for worker1 (VM 102)
qm clone 9000 102 --name worker1 --full
qm set 102 --memory 8192 --cores 2
qm resize 102 scsi0 +18G
qm set 102 --ipconfig0 ip=192.168.1.22/24,gw=192.168.1.1
qm start 102

# Clone for worker2 (VM 103)
qm clone 9000 103 --name worker2 --full
qm set 103 --memory 8192 --cores 2
qm resize 103 scsi0 +18G
qm set 103 --ipconfig0 ip=192.168.1.23/24,gw=192.168.1.1
qm start 103

Wait about a minute for cloud-init to finish, then SSH in:

ssh mo@192.168.1.21   # cp1
ssh mo@192.168.1.22   # worker1
ssh mo@192.168.1.23   # worker2

All three VMs boot in under a minute. Cloud-init sets the hostname, creates the user, installs the SSH key. No manual OS install, no clicking through wizards.

VM      NAME       IP               ROLE                RAM     DISK
==      ====       ==               ====                ===     ====
101     cp1        192.168.1.21     K3s control plane   6GB     20GB
102     worker1    192.168.1.22     K3s worker          8GB     20GB
103     worker2    192.168.1.23     K3s worker          8GB     20GB
104     vault      192.168.1.104    HashiCorp Vault     4GB     20GB
9000    template   -                Cloud-init base     -       -

The Vault VM (Terraform)

The K3s VMs were created manually with qm commands. For the Vault VM, I used Terraform with the bpg/proxmox provider. Partly because I wanted the Vault infrastructure to be reproducible, partly because I wanted to test the provider.

resource "proxmox_virtual_environment_vm" "vault" {
  name      = "vault"
  node_name = "pve"
  vm_id     = 104

  clone { vm_id = 9000 }

  cpu    { cores = 2 }
  memory { dedicated = 4096 }
  agent  { enabled = false }  # no qemu-guest-agent

  initialization {
    ip_config {
      ipv4 {
        address = "192.168.1.104/24"
        gateway = "192.168.1.1"
      }
    }
    user_data_file_id = proxmox_virtual_environment_file.cloud_config.id
  }
}

The cloud-init user data installs Vault automatically on first boot:

#cloud-config
packages:
  - gpg
  - wget
runcmd:
  - wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
  - echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com noble main" > /etc/apt/sources.list.d/hashicorp.list
  - apt-get update && apt-get install -y vault
  - systemctl enable vault && systemctl start vault

Two gotchas I hit:

  1. agent.enabled = false - If you haven’t installed qemu-guest-agent in the VM, you must set this to false. Otherwise Terraform hangs forever waiting for the agent to respond. Cost me an hour staring at a frozen terminal.

  2. user_data_file_id overrides SSH keys - When using a cloud-init file for user data, the user_account block’s SSH keys get ignored. Put your SSH key in the cloud-config file instead.

Run terraform apply, wait two minutes, and you have a Vault VM with Vault pre-installed and running.

K3s Cluster

K3s v1.34.5 with Traefik disabled at install (re-enabled later with custom config via HelmChartConfig). Flannel for CNI. Single control plane node - it’s a homelab, not a bank.

On the control plane (cp1):

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.34.5+k3s1" \
  sh -s - server --disable traefik --write-kubeconfig-mode 644

Why disable Traefik? Because K3s bundles Traefik with default settings. I wanted to customise it (enable the dashboard, configure tracing, adjust resource limits), which is easier to do by disabling the bundled version and deploying it fresh with a HelmChartConfig.

Grab the join token:

cat /var/lib/rancher/k3s/server/node-token

On each worker:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.34.5+k3s1" \
  K3S_URL=https://192.168.1.21:6443 \
  K3S_TOKEN="<token-from-above>" sh -

Workers joined within seconds. Cluster up and running in under 10 minutes.

$ kubectl get nodes
NAME      STATUS   ROLES                  AGE   VERSION
cp1       Ready    control-plane,master   10m   v1.34.5+k3s1
worker1   Ready    <none>                 8m    v1.34.5+k3s1
worker2   Ready    <none>                 7m    v1.34.5+k3s1

GitOps with FluxCD

Everything runs through Flux. No kubectl apply in production - push to main and Flux reconciles.

Bootstrap Flux:

flux bootstrap github \
  --owner=moabukar \
  --repository=homelab \
  --branch=main \
  --path=clusters/homelab \
  --personal

This creates the repo structure and installs Flux controllers. From here, everything is managed by adding YAML to the repo.

The dependency chain is explicit and critical - get this wrong and things install in the wrong order and break:

flux-system
  └── gateway-api-crds
        └── infrastructure-controllers    (MetalLB, cert-manager, ESO, OTel, etc.)
              └── infrastructure-configs  (MetalLB pool, Gateway, ClusterIssuers, Cloudflared)
                    └── apps              (monitoring stack, Authentik, Home Assistant, etc.)

Each layer waits for the previous one using Flux’s dependsOn and health checks. CRDs install before controllers that need them. Controllers install before configs that reference them. No race conditions, no “apply it again and hope” situations.

One thing I got wrong early on: I had wait: true on the controller Kustomizations. This blocks the entire reconciliation chain until every single resource is ready. The problem is some resources (like CRDs) don’t have meaningful ready conditions, so Flux waits forever. Switched to explicit healthChecks targeting specific Deployments instead. Much more reliable.

Secrets Management

This is the part I’m most pleased with. Zero secrets in Git. Zero manual secret management.

The flow:

Terraform (random_password)


Vault KV v2 (secret/apps/*)


External Secrets Operator (ClusterSecretStore)


Kubernetes Secret


Reloader (rolling restart)

Terraform generates random passwords and writes them to Vault. ESO syncs them into K8s Secrets. Reloader watches for Secret changes and triggers rolling restarts on dependent pods. Add a new app secret? Add it to Terraform, run terraform apply, and ESO picks it up within an hour.

Vault Setup

Vault runs in dev mode… just kidding. Raft storage, TLS disabled (internal network only), UI enabled.

After the VM boots, initialise Vault:

vault operator init -key-shares=1 -key-threshold=1

This gives you an unseal key and root token. Store them safely (they’re saved to /opt/vault/init.json on the VM). Single key share because it’s a homelab - in production you’d use Shamir’s Secret Sharing with 3-of-5 or similar.

vault operator unseal <unseal-key>
vault login <root-token>

Enable the KV v2 secrets engine:

vault secrets enable -path=secret kv-v2

The Vault Terraform config (terraform/vault/config/) manages everything else - the KV engine, policies, auth backends, and the actual secrets:

resource "random_password" "grafana_admin" {
  length  = 32
  special = false
}

resource "vault_kv_secret_v2" "grafana" {
  mount = "secret"
  name  = "apps/grafana"
  data_json = jsonencode({
    admin-password = random_password.grafana_admin.result
    admin-user     = "admin"
  })
}

Every app gets a random_password resource and a corresponding Vault secret. No more “changeme” passwords.

JWT Auth (The Kubernetes Auth Detour)

The Vault authentication story deserves its own section because I wasted significant time on it.

I initially tried the kubernetes auth method. This is the “standard” way - Vault calls back to the K8s API to validate ServiceAccount tokens via TokenReview. Spent hours debugging 403 “permission denied” errors. The TokenReview chain between K3s and a Vault running outside the cluster had issues I never fully resolved. The K3s API server, the Vault kubernetes auth config, the CA certificates - something in that chain wasn’t happy.

Switched to jwt auth instead. Completely different approach. Vault validates K8s ServiceAccount tokens locally using the cluster’s JWKS public key. No network call back to the K8s API. Pure cryptographic verification.

To set it up:

# Extract the JWKS public key from K3s
kubectl get --raw /openid/v1/jwks > jwks.json

# Configure JWT auth in Vault
vault auth enable jwt
vault write auth/jwt/config \
  jwt_validation_pubkeys="$(cat jwks-pem.pem)"

# Create a policy for ESO
vault policy write eso-read - <<EOF
path "secret/data/*" {
  capabilities = ["read", "list"]
}
EOF

# Create a role for ESO
vault write auth/jwt/role/eso \
  role_type="jwt" \
  bound_audiences="kubernetes.default.svc.cluster.local" \
  user_claim="sub" \
  policies="eso-read" \
  ttl="1h"

Then the ClusterSecretStore in K8s:

apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
  name: vault-backend
spec:
  provider:
    vault:
      server: "http://192.168.1.104:8200"
      path: "secret"
      version: "v2"
      auth:
        jwt:
          path: "jwt"
          role: "eso"
          kubernetesServiceAccountToken:
            serviceAccountRef:
              name: external-secrets-operator
              namespace: external-secrets-system
            audiences:
              - kubernetes.default.svc.cluster.local

The key insight: JWKS keys are public. They’re safe to commit to Git. They’re the public half of the signing key - anyone can verify a token, but only the K8s API server can sign one. No chicken-and-egg problem for bootstrapping secrets.

ExternalSecrets in Practice

Each app that needs a secret gets an ExternalSecret manifest:

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: grafana-admin-password
  namespace: monitoring
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: vault-backend
  target:
    name: grafana-admin-password
    creationPolicy: Owner
  data:
    - secretKey: admin-password
      remoteRef:
        key: apps/grafana
        property: admin-password
    - secretKey: admin-user
      remoteRef:
        key: apps/grafana
        property: admin-user

ESO reads from Vault every hour. If a secret changes in Vault, it updates the K8s Secret. Reloader (Stakater) watches the Secret and triggers a rolling restart on any Deployment with the annotation:

metadata:
  annotations:
    reloader.stakater.com/auto: "true"

Rotate a password? Change it in Terraform, terraform apply, wait for ESO to sync, Reloader restarts the pod. Zero manual steps.

Networking

Gateway API with Traefik as the controller. Not traditional Ingress - Gateway API is the future and Traefik supports it natively in K3s.

MetalLB

K3s doesn’t give you LoadBalancer IPs by default (unlike cloud providers). MetalLB fills that gap. L2 mode, IP pool of 192.168.1.200-250:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.200-192.168.1.250

The Traefik Gateway grabs 192.168.1.200 and all HTTPRoutes hang off it.

Gateway API

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: traefik-gateway
  namespace: kube-system
spec:
  gatewayClassName: traefik
  listeners:
    - name: http
      protocol: HTTP
      port: 80
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        certificateRefs:
          - name: wildcard-cert

Each service gets an HTTPRoute:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: grafana
  namespace: monitoring
spec:
  parentRefs:
    - name: traefik-gateway
      namespace: kube-system
  hostnames:
    - grafana.homelab.local
    - grafana.moabukar.co.uk
  rules:
    - backendRefs:
        - name: kube-prometheus-stack-grafana
          port: 80
ROUTE                        SERVICE
=====                        =======
grafana.homelab.local        Grafana dashboards
prometheus.homelab.local     Prometheus UI
alertmanager.homelab.local   Alertmanager
jaeger.homelab.local         Jaeger trace UI
traefik.homelab.local        Traefik dashboard
ha.homelab.local             Home Assistant
auth.homelab.local           Authentik SSO
it-tools.homelab.local       IT-Tools

For local access, add entries to /etc/hosts pointing *.homelab.local to 192.168.1.200. Or use a local DNS server if you’re fancy.

External Access (Cloudflare Tunnel)

I didn’t want to port-forward or expose my home IP. Cloudflare Tunnel creates an outbound-only connection from the cluster to Cloudflare’s edge. External traffic routes through *.moabukar.co.uk without any inbound firewall rules.

Cloudflared runs as a Deployment in the cluster. The tunnel token is stored in Vault and synced via ESO - no secrets in Git.

ExternalDNS watches HTTPRoute resources and automatically creates Cloudflare DNS records for any hostname matching *.moabukar.co.uk. Add a new HTTPRoute with a .moabukar.co.uk hostname, ExternalDNS creates the DNS record, Cloudflare Tunnel routes the traffic. Fully automated.

Observability

Three pillars, all wired up and talking to each other.

Metrics

kube-prometheus-stack provides Prometheus (10Gi storage, 7d retention), Grafana, and Alertmanager. Deployed via Flux HelmRelease.

Custom Grafana dashboards are stored as JSON files in the repo. Kustomize’s configMapGenerator creates ConfigMaps with the grafana_dashboard: "1" label, and Grafana’s sidecar auto-discovers them. Add a dashboard? Drop a JSON file in the repo, push to main, done.

Dashboards deployed:

  • Homelab Overview (custom - cluster-wide resource usage at a glance)
  • Flux GitOps (custom - reconciliation status, errors, sync times)
  • Traefik (from grafana.net - request rates, error rates, latencies)
  • Node Exporter Full (from grafana.net - deep-dive per-node metrics)
  • K8s Cluster (from grafana.net - namespace and pod-level overview)
  • K8s Pods (from grafana.net - per-pod resource usage)

10 custom PrometheusRule alerts covering the things that actually matter:

ALERT                          CONDITION
=====                          =========
NodeHighCPU                    > 80% for 5m
NodeHighMemory                 > 85% for 5m
NodeDiskAlmostFull             > 80%
NodeDown                       Unreachable for 5m
PodCrashLooping                > 5 restarts in 15m
PodNotReady                    Not ready for 10m
DeploymentReplicasMismatch     For 10m
FluxReconciliationFailure      Any Kustomization/HR failing
TraefikHighErrorRate           > 5% 5xx for 5m
CertificateExpiringSoon        < 7 days

Alertmanager sends notifications to Telegram via a bot. HTML-formatted messages with alert name, severity, namespace, and a direct link to the relevant Grafana dashboard.

Traces

OpenTelemetry Collector runs as a DaemonSet on every node. Receives OTLP traces on port 4317 (gRPC) and 4318 (HTTP), enriches with Kubernetes metadata (pod, namespace, node, deployment), and exports to Jaeger.

The OTel Collector pipeline:

receivers:
  otlp (gRPC + HTTP)

processors:
  k8sattributes    (add pod/namespace/node labels)
  resource         (add cluster name)
  transform        (derive service.name from k8s metadata)
  batch            (batch before export)

exporters:
  otlp/jaeger      (Jaeger's OTLP endpoint)

One critical thing I missed initially: the DaemonSet had no Service. The pods were running fine, collecting traces from their own nodes, but application pods couldn’t send traces to the collector because there was no Service to resolve. Added a ClusterIP Service on ports 4317/4318 and everything clicked.

Jaeger uses Badger persistent storage (5Gi PVC, 72h retention). Another gotcha - Jaeger needs a ClusterIP Service, not headless, for OTel gRPC export. Headless Services don’t work with gRPC load balancing the way you’d expect.

What’s sending traces:

  • Traefik - Every HTTP request through the gateway gets a trace automatically. This is the backbone - you can trace a request from ingress all the way through.
  • Grafana - Internal operation traces (dashboard loads, query execution).
  • OTel Demo - The official OpenTelemetry Astronomy Shop demo app. Generates realistic cross-service traces across Go, Python, Java, Node.js, and more. Great for testing trace visualisation and understanding distributed tracing patterns.

What’s Also Running

Beyond the core platform:

  • Home Assistant - Home automation. Runs with host networking and privileged access (needs USB/Bluetooth for Zigbee). Pinned to cp1 node via node affinity (PV is local-path, tied to that node).
  • IT-Tools - Handy web-based developer tools (base64 encode/decode, JWT debugger, cron expression parser, etc.). 2 replicas.
  • Authentik - SSO provider. PostgreSQL + Redis backend. Currently broken - the PostgreSQL PVC was initialised with an empty password, then Vault generated a new one. Needs a PVC reset to fix the mismatch.
  • Podinfo - Smoke test app. If podinfo works, the cluster works.
  • cert-manager - TLS certificates. ClusterIssuers for Let’s Encrypt prod and staging.

What I’d Do Differently

A few things I’d change if I started over:

  • Skip kubernetes auth for Vault entirely. JWT auth is simpler, more reliable with K3s, and has zero network dependencies. Don’t waste time debugging TokenReview.
  • Set resource limits from the start. I added them after the fact based on kubectl top data. Should have done it from day one. Without limits, one misbehaving pod can starve the entire node.
  • Use Terraform for all VMs, not just Vault. The K3s VMs were created manually with qm commands. Works, but it means rebuilding them requires remembering the exact steps.
  • Pin the Traefik version. K3s bundles Traefik and upgrades it on K3s updates. Pin it if you care about stability.
  • Start with 4GB RAM for Vault. I initially gave it 2GB, and it hit 100% memory under load. Rebuilt it with 4GB. Save yourself the rebuild.
  • Don’t use wait: true in Flux Kustomizations. Use explicit healthChecks targeting specific Deployments instead. wait: true blocks on every resource, including ones that don’t have meaningful ready conditions.

What’s Next

  • Authentik SSO wired into Grafana (OAuth2 proxy)
  • Backup strategy for etcd and PVs
  • Network policies (currently everything is wide open within the cluster)
  • Longhorn for replicated storage (currently local-path, no redundancy)
  • More Grafana dashboards as I add services
  • Loki for log aggregation (Promtail DaemonSet, LogQL in Grafana)

The full repo is at github.com/moabukar/homelab. Every manifest, every config, every Terraform file. Fork it, break it, make it yours.

======================================== Proxmox + K3s + Vault + FluxCD

GitOps all the way down.

Found this helpful?

Comments