blumeops/docs/k8s-migration.md
Erich Blume 4d916a46d3 Add Kubernetes migration plan documentation
Comprehensive phased plan for migrating blumeops services from direct
hosting on indri to a minikube cluster. Documents technical decisions
(Zot registry, Podman driver, CloudNativePG, Tailscale Operator) and
9 migration phases with verification and rollback procedures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 13:12:09 -08:00

12 KiB

Blumeops Minikube Migration Plan

This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes.

Architecture Overview

Services Staying on Indri (Outside K8s)

Service Reason
Zot Registry (NEW) Avoid circular dependency - k8s needs images to start
Prometheus Observability backbone must survive k8s failures
Loki Log aggregation backbone
Borgmatic Backup system
Grafana-alloy Metrics/logs collector on host
Plex Until Jellyfin replacement
Transmission Downloads for kiwix ZIM files

Services Moving to K8s

Service Complexity Dependencies
Grafana LOW Phase 1
Kiwix LOW Phase 1
Miniflux MEDIUM PostgreSQL
devpi MEDIUM Registry
PostgreSQL HIGH Phase 1
Forgejo HIGH PostgreSQL
Woodpecker CI MEDIUM Forgejo

Technical Decisions

Container Registry: Zot

  • OCI-native, lightweight
  • Native support for proxying multiple registries (Docker Hub, GHCR, Quay)
  • Single binary, ARM64 native
  • Config at /etc/zot/config.json

Minikube Driver: Podman

  • Rootless containers for better security
  • Lighter than full VM (QEMU)
  • Uses existing container ecosystem
  • minikube start --driver=podman --container-runtime=containerd

PostgreSQL: CloudNativePG Operator

  • Production-grade operator
  • Built-in backup/restore
  • Prometheus metrics
  • PITR support

K8s Service Exposure: Tailscale Operator

  • loadBalancerClass: tailscale on Services
  • Automatic TLS and MagicDNS names
  • ACL-controlled access

LaunchAgent Requirements (Critical)

LaunchAgents do NOT get homebrew on PATH. All commands must use absolute paths:

  • /opt/homebrew/bin/zot not zot
  • /opt/homebrew/opt/mise/bin/mise x -- for mise-managed tools
  • /opt/homebrew/opt/postgresql@18/bin/pg_dump for postgres tools

This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). brew services handles this automatically but those aren't tracked in ansible.


Phase 0: Foundation

Goal: Container registry + minikube cluster without disrupting existing services

Steps

  1. Install Podman on indri

    # Add to Brewfile
    brew "podman"
    
    • Create ansible role podman for machine setup
  2. Install and configure Zot registry

    • Create ansible role zot
    • Deploy as mcquack LaunchAgent (like devpi pattern)
    • Bind to localhost:5000
    • Configure pull-through for Docker Hub + GHCR
    • Add Tailscale serve: svc:registry
  3. Install minikube

    # Add to Brewfile
    brew "minikube"
    
    # Start with podman driver
    minikube start --driver=podman --container-runtime=containerd \
      --cpus=4 --memory=8192 --disk-size=100g
    
    • Create ansible role minikube for initial setup
  4. Update Pulumi ACLs

    • Add tag:registry for registry service
    • Add tag:k8s for cluster services
  5. Configure kubeconfig on gilbert

    • Add minikube context to ~/.kube/config
    • Keep work EKS config separate (already isolated)
    • K9s will auto-discover contexts
  6. Observability for new services (follow existing patterns)

    Zot Registry:

    • Create zk card ~/code/personal/zk/zot.md (like devpi.md, forgejo.md)
    • Add log collection to Alloy config (stdout/stderr from LaunchAgent)
    • Create zot_metrics role with periodic script writing to textfile collector
    • Create Grafana dashboard: cache hit rates, storage usage, pull/push counts

    Minikube:

    • Create zk card ~/code/personal/zk/minikube.md
    • Metrics via kube-state-metrics (deployed in cluster)
    • Node metrics already collected by Alloy
    • Create Grafana dashboard: cluster health, resource usage

    Note: Backups not needed for these services:

    • Zot cache is re-fetchable from upstream registries
    • Minikube state is recreatable from ansible/k8s manifests

New Files

  • ansible/roles/zot/ - Registry role
  • ansible/roles/zot_metrics/ - Metrics collection
  • ansible/roles/podman/ - Podman setup
  • ansible/roles/minikube/ - Cluster setup
  • ~/code/personal/zk/zot.md - Registry management log
  • ~/code/personal/zk/minikube.md - Cluster management log

Verification

# Registry working
curl http://localhost:5000/v2/_catalog

# Minikube running
minikube status
kubectl get nodes

# Metrics flowing
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom'

# Logs in Loki
# Query: {service="zot"}

Rollback

minikube stop && minikube delete
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist

Phase 1: Kubernetes Infrastructure

Goal: Tailscale operator + CloudNativePG operator

Steps

  1. Create Tailscale OAuth client

    • Scopes: Devices Core, Auth Keys, Services write
    • Tag: tag:k8s-operator
    • Store in 1Password
  2. Deploy Tailscale Kubernetes Operator

    helm repo add tailscale https://pkgs.tailscale.com/helmcharts
    helm install tailscale-operator tailscale/tailscale-operator \
      --namespace tailscale-system --create-namespace \
      --set oauth.clientId=$CLIENT_ID \
      --set oauth.clientSecret=$CLIENT_SECRET
    
  3. Deploy CloudNativePG operator

    kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml
    
  4. Create PostgreSQL cluster

    apiVersion: postgresql.cnpg.io/v1
    kind: Cluster
    metadata:
      name: blumeops-pg
      namespace: databases
    spec:
      instances: 1
      storage:
        size: 10Gi
        storageClass: standard
      monitoring:
        enablePodMonitor: true
    
  5. Update Alloy config

    • Add kubernetes_sd_configs for k8s metrics
    • Scrape operator metrics

New Files

  • ansible/k8s/operators/ - Operator manifests
  • ansible/k8s/databases/ - PostgreSQL cluster

Verification

kubectl get pods -n tailscale-system
kubectl get pods -n cnpg-system
kubectl get cluster -n databases

Phase 2: Grafana Migration (Pilot)

Goal: Migrate Grafana as lowest-risk pilot service

Steps

  1. Deploy Grafana via Helm

    • Copy datasource config from existing role
    • Copy dashboards from ansible/roles/grafana/files/dashboards/
    • Point to indri Prometheus/Loki (http://indri:9090, http://indri:3100)
  2. Configure Tailscale LoadBalancer

    service:
      type: LoadBalancer
      loadBalancerClass: tailscale
    
  3. Verify all dashboards work

  4. Update tailscale_serve - remove grafana entry

  5. Stop brew grafana: brew services stop grafana

Verification


Phase 3: PostgreSQL Migration

Goal: Migrate miniflux database to CloudNativePG

Steps

  1. Create databases and users in k8s PostgreSQL

    • miniflux database/user
    • borgmatic read-only user
  2. Export from brew PostgreSQL

    pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql
    
  3. Expose k8s PostgreSQL via Tailscale

    • Service with loadBalancerClass: tailscale
    • Tag: svc:pg-k8s
  4. Import data

    psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql
    
  5. Update borgmatic config

    • Change hostname to k8s PostgreSQL
  6. Verify data integrity

Rollback

Keep brew PostgreSQL running until Phase 4 verified


Phase 4: Miniflux Migration

Goal: Migrate Miniflux to k8s

Steps

  1. Deploy Miniflux

    image: ghcr.io/miniflux/miniflux:latest
    env:
      DATABASE_URL: from secret
      RUN_MIGRATIONS: "1"
    
  2. Configure Tailscale LoadBalancer - tag: svc:feed

  3. Update Alloy log collection - add k8s namespace

  4. Verify: login, feeds refresh, API works

  5. Stop brew miniflux: brew services stop miniflux


Phase 5: devpi Migration

Goal: Migrate devpi to k8s

Steps

  1. Build devpi container

    • Dockerfile with devpi-server + devpi-web
    • Push to local Zot registry
  2. Deploy as StatefulSet

    • PVC for data (50Gi)
    • Migrate existing data (excluding PyPI cache)
  3. Configure Tailscale LoadBalancer - tag: svc:pypi

  4. Update pip.conf on gilbert

  5. Stop mcquack devpi


Phase 6: Kiwix Migration

Goal: Migrate kiwix-serve to k8s

Steps

  1. Create NFS/hostPath PV for ZIM files

    • Point to transmission download directory
    • ReadOnlyMany access
  2. Deploy Kiwix

    image: ghcr.io/kiwix/kiwix-serve:3.8.1
    args: ["/data/*.zim"]
    
  3. Configure Tailscale LoadBalancer - tag: svc:kiwix

  4. Stop mcquack kiwix-serve


Phase 7: Forgejo Migration (Highest Risk)

Goal: Migrate Forgejo to k8s

Pre-Migration Checklist

  • Full borgmatic backup verified
  • Manual backup of /opt/homebrew/var/forgejo
  • Document SSH keys and webhooks

Steps

  1. Deploy Forgejo via Helm

    helm install forgejo forgejo/forgejo \
      --namespace forgejo --create-namespace
    
  2. Migrate data

    • Stop brew forgejo
    • Copy data to PVC
    • Start k8s forgejo
  3. Configure Tailscale services

    • HTTPS 443 via LoadBalancer
    • SSH port 22 (TCP proxy)
  4. Verify all repositories accessible

Rollback

Restore brew forgejo and tailscale serve config


Phase 8: CI/CD (Woodpecker)

Goal: Deploy Woodpecker CI integrated with Forgejo

Steps

  1. Create Forgejo OAuth application

  2. Deploy Woodpecker Server + Agent

  3. Configure Tailscale LoadBalancer - tag: svc:ci

  4. Test pipeline - create .woodpecker.yaml in test repo


Phase 9: Cleanup

Goal: Remove deprecated services, harden system

Steps

  1. Stop/remove unused brew services

    • postgresql@18, grafana, miniflux, forgejo
  2. Update ansible playbook

    • Remove migrated service roles
    • Add k8s deployment references
  3. Configure Velero backups (optional)

    • Install with MinIO on sifaka
    • Schedule daily cluster backups
  4. Update zk documentation

    • New architecture
    • Runbooks
    • DR procedures

Critical Files

File Purpose
ansible/playbooks/indri.yml Main playbook - add k8s roles, remove migrated services
ansible/roles/tailscale_serve/defaults/main.yml Transition services to Tailscale operator
pulumi/policy.hujson Add tags: k8s, registry, ci
ansible/roles/borgmatic/defaults/main.yml Update PostgreSQL endpoint
mise-tasks/indri-services-check Add k8s health checks

New Directory Structure

ansible/
  k8s/
    operators/
      tailscale-operator.yaml
      cloudnative-pg.yaml
    databases/
      blumeops-pg.yaml
    apps/
      grafana/
      miniflux/
      forgejo/
      devpi/
      kiwix/
      woodpecker/
  roles/
    zot/           # NEW
    podman/        # NEW
    minikube/      # NEW

Risk Mitigation

  • Circular dependency prevention: Zot registry runs outside k8s
  • Observability: Prometheus/Loki stay on indri
  • Data loss prevention: borgmatic + manual backups before each phase
  • Recovery: Can manually push images, restore from backups

Container Images (All ARM64)

Service Image
Miniflux ghcr.io/miniflux/miniflux:latest
Forgejo codeberg.org/forgejo/forgejo:10
Grafana grafana/grafana:latest
Kiwix ghcr.io/kiwix/kiwix-serve:3.8.1
Woodpecker woodpeckerci/woodpecker-server
Zot ghcr.io/project-zot/zot-linux-arm64