From 4d916a46d3a3205669b77cefde7851b84e4e1322 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:12:09 -0800 Subject: [PATCH] Add Kubernetes migration plan documentation Comprehensive phased plan for migrating blumeops services from direct hosting on indri to a minikube cluster. Documents technical decisions (Zot registry, Podman driver, CloudNativePG, Tailscale Operator) and 9 migration phases with verification and rollback procedures. Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 469 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 469 insertions(+) create mode 100644 docs/k8s-migration.md diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md new file mode 100644 index 0000000..140fb60 --- /dev/null +++ b/docs/k8s-migration.md @@ -0,0 +1,469 @@ +# Blumeops Minikube Migration Plan + +This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes. + +## Architecture Overview + +### Services Staying on Indri (Outside K8s) +| Service | Reason | +|---------|--------| +| **Zot Registry** (NEW) | Avoid circular dependency - k8s needs images to start | +| **Prometheus** | Observability backbone must survive k8s failures | +| **Loki** | Log aggregation backbone | +| **Borgmatic** | Backup system | +| **Grafana-alloy** | Metrics/logs collector on host | +| **Plex** | Until Jellyfin replacement | +| **Transmission** | Downloads for kiwix ZIM files | + +### Services Moving to K8s +| Service | Complexity | Dependencies | +|---------|------------|--------------| +| Grafana | LOW | Phase 1 | +| Kiwix | LOW | Phase 1 | +| Miniflux | MEDIUM | PostgreSQL | +| devpi | MEDIUM | Registry | +| PostgreSQL | HIGH | Phase 1 | +| Forgejo | HIGH | PostgreSQL | +| Woodpecker CI | MEDIUM | Forgejo | + +## Technical Decisions + +### Container Registry: Zot +- OCI-native, lightweight +- Native support for proxying multiple registries (Docker Hub, GHCR, Quay) +- Single binary, ARM64 native +- Config at `/etc/zot/config.json` + +### Minikube Driver: Podman +- Rootless containers for better security +- Lighter than full VM (QEMU) +- Uses existing container ecosystem +- `minikube start --driver=podman --container-runtime=containerd` + +### PostgreSQL: CloudNativePG Operator +- Production-grade operator +- Built-in backup/restore +- Prometheus metrics +- PITR support + +### K8s Service Exposure: Tailscale Operator +- `loadBalancerClass: tailscale` on Services +- Automatic TLS and MagicDNS names +- ACL-controlled access + +### LaunchAgent Requirements (Critical) +LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths**: +- `/opt/homebrew/bin/zot` not `zot` +- `/opt/homebrew/opt/mise/bin/mise x --` for mise-managed tools +- `/opt/homebrew/opt/postgresql@18/bin/pg_dump` for postgres tools + +This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). +`brew services` handles this automatically but those aren't tracked in ansible. + +--- + +## Phase 0: Foundation + +**Goal**: Container registry + minikube cluster without disrupting existing services + +### Steps + +1. **Install Podman on indri** + ```bash + # Add to Brewfile + brew "podman" + ``` + - Create ansible role `podman` for machine setup + +2. **Install and configure Zot registry** + - Create ansible role `zot` + - Deploy as mcquack LaunchAgent (like devpi pattern) + - Bind to `localhost:5000` + - Configure pull-through for Docker Hub + GHCR + - Add Tailscale serve: `svc:registry` + +3. **Install minikube** + ```bash + # Add to Brewfile + brew "minikube" + + # Start with podman driver + minikube start --driver=podman --container-runtime=containerd \ + --cpus=4 --memory=8192 --disk-size=100g + ``` + - Create ansible role `minikube` for initial setup + +4. **Update Pulumi ACLs** + - Add `tag:registry` for registry service + - Add `tag:k8s` for cluster services + +5. **Configure kubeconfig on gilbert** + - Add minikube context to `~/.kube/config` + - Keep work EKS config separate (already isolated) + - K9s will auto-discover contexts + +6. **Observability for new services** (follow existing patterns) + + **Zot Registry:** + - Create zk card `~/code/personal/zk/zot.md` (like devpi.md, forgejo.md) + - Add log collection to Alloy config (stdout/stderr from LaunchAgent) + - Create `zot_metrics` role with periodic script writing to textfile collector + - Create Grafana dashboard: cache hit rates, storage usage, pull/push counts + + **Minikube:** + - Create zk card `~/code/personal/zk/minikube.md` + - Metrics via kube-state-metrics (deployed in cluster) + - Node metrics already collected by Alloy + - Create Grafana dashboard: cluster health, resource usage + + **Note:** Backups not needed for these services: + - Zot cache is re-fetchable from upstream registries + - Minikube state is recreatable from ansible/k8s manifests + +### New Files +- `ansible/roles/zot/` - Registry role +- `ansible/roles/zot_metrics/` - Metrics collection +- `ansible/roles/podman/` - Podman setup +- `ansible/roles/minikube/` - Cluster setup +- `~/code/personal/zk/zot.md` - Registry management log +- `~/code/personal/zk/minikube.md` - Cluster management log + +### Verification +```bash +# Registry working +curl http://localhost:5000/v2/_catalog + +# Minikube running +minikube status +kubectl get nodes + +# Metrics flowing +ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom' + +# Logs in Loki +# Query: {service="zot"} +``` + +### Rollback +```bash +minikube stop && minikube delete +launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist +``` + +--- + +## Phase 1: Kubernetes Infrastructure + +**Goal**: Tailscale operator + CloudNativePG operator + +### Steps + +1. **Create Tailscale OAuth client** + - Scopes: Devices Core, Auth Keys, Services write + - Tag: `tag:k8s-operator` + - Store in 1Password + +2. **Deploy Tailscale Kubernetes Operator** + ```bash + helm repo add tailscale https://pkgs.tailscale.com/helmcharts + helm install tailscale-operator tailscale/tailscale-operator \ + --namespace tailscale-system --create-namespace \ + --set oauth.clientId=$CLIENT_ID \ + --set oauth.clientSecret=$CLIENT_SECRET + ``` + +3. **Deploy CloudNativePG operator** + ```bash + kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml + ``` + +4. **Create PostgreSQL cluster** + ```yaml + apiVersion: postgresql.cnpg.io/v1 + kind: Cluster + metadata: + name: blumeops-pg + namespace: databases + spec: + instances: 1 + storage: + size: 10Gi + storageClass: standard + monitoring: + enablePodMonitor: true + ``` + +5. **Update Alloy config** + - Add kubernetes_sd_configs for k8s metrics + - Scrape operator metrics + +### New Files +- `ansible/k8s/operators/` - Operator manifests +- `ansible/k8s/databases/` - PostgreSQL cluster + +### Verification +```bash +kubectl get pods -n tailscale-system +kubectl get pods -n cnpg-system +kubectl get cluster -n databases +``` + +--- + +## Phase 2: Grafana Migration (Pilot) + +**Goal**: Migrate Grafana as lowest-risk pilot service + +### Steps + +1. **Deploy Grafana via Helm** + - Copy datasource config from existing role + - Copy dashboards from `ansible/roles/grafana/files/dashboards/` + - Point to indri Prometheus/Loki (http://indri:9090, http://indri:3100) + +2. **Configure Tailscale LoadBalancer** + ```yaml + service: + type: LoadBalancer + loadBalancerClass: tailscale + ``` + +3. **Verify all dashboards work** + +4. **Update tailscale_serve** - remove grafana entry + +5. **Stop brew grafana**: `brew services stop grafana` + +### Verification +- https://grafana.tail8d86e.ts.net loads +- All dashboards functional + +--- + +## Phase 3: PostgreSQL Migration + +**Goal**: Migrate miniflux database to CloudNativePG + +### Steps + +1. **Create databases and users in k8s PostgreSQL** + - miniflux database/user + - borgmatic read-only user + +2. **Export from brew PostgreSQL** + ```bash + pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql + ``` + +3. **Expose k8s PostgreSQL via Tailscale** + - Service with `loadBalancerClass: tailscale` + - Tag: `svc:pg-k8s` + +4. **Import data** + ```bash + psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql + ``` + +5. **Update borgmatic config** + - Change hostname to k8s PostgreSQL + +6. **Verify data integrity** + +### Rollback +Keep brew PostgreSQL running until Phase 4 verified + +--- + +## Phase 4: Miniflux Migration + +**Goal**: Migrate Miniflux to k8s + +### Steps + +1. **Deploy Miniflux** + ```yaml + image: ghcr.io/miniflux/miniflux:latest + env: + DATABASE_URL: from secret + RUN_MIGRATIONS: "1" + ``` + +2. **Configure Tailscale LoadBalancer** - tag: `svc:feed` + +3. **Update Alloy log collection** - add k8s namespace + +4. **Verify**: login, feeds refresh, API works + +5. **Stop brew miniflux**: `brew services stop miniflux` + +--- + +## Phase 5: devpi Migration + +**Goal**: Migrate devpi to k8s + +### Steps + +1. **Build devpi container** + - Dockerfile with devpi-server + devpi-web + - Push to local Zot registry + +2. **Deploy as StatefulSet** + - PVC for data (50Gi) + - Migrate existing data (excluding PyPI cache) + +3. **Configure Tailscale LoadBalancer** - tag: `svc:pypi` + +4. **Update pip.conf on gilbert** + +5. **Stop mcquack devpi** + +--- + +## Phase 6: Kiwix Migration + +**Goal**: Migrate kiwix-serve to k8s + +### Steps + +1. **Create NFS/hostPath PV for ZIM files** + - Point to transmission download directory + - ReadOnlyMany access + +2. **Deploy Kiwix** + ```yaml + image: ghcr.io/kiwix/kiwix-serve:3.8.1 + args: ["/data/*.zim"] + ``` + +3. **Configure Tailscale LoadBalancer** - tag: `svc:kiwix` + +4. **Stop mcquack kiwix-serve** + +--- + +## Phase 7: Forgejo Migration (Highest Risk) + +**Goal**: Migrate Forgejo to k8s + +### Pre-Migration Checklist +- [ ] Full borgmatic backup verified +- [ ] Manual backup of `/opt/homebrew/var/forgejo` +- [ ] Document SSH keys and webhooks + +### Steps + +1. **Deploy Forgejo via Helm** + ```bash + helm install forgejo forgejo/forgejo \ + --namespace forgejo --create-namespace + ``` + +2. **Migrate data** + - Stop brew forgejo + - Copy data to PVC + - Start k8s forgejo + +3. **Configure Tailscale services** + - HTTPS 443 via LoadBalancer + - SSH port 22 (TCP proxy) + +4. **Verify all repositories accessible** + +### Rollback +Restore brew forgejo and tailscale serve config + +--- + +## Phase 8: CI/CD (Woodpecker) + +**Goal**: Deploy Woodpecker CI integrated with Forgejo + +### Steps + +1. **Create Forgejo OAuth application** + - Callback: https://ci.tail8d86e.ts.net/authorize + - Store in 1Password + +2. **Deploy Woodpecker Server + Agent** + +3. **Configure Tailscale LoadBalancer** - tag: `svc:ci` + +4. **Test pipeline** - create `.woodpecker.yaml` in test repo + +--- + +## Phase 9: Cleanup + +**Goal**: Remove deprecated services, harden system + +### Steps + +1. **Stop/remove unused brew services** + - postgresql@18, grafana, miniflux, forgejo + +2. **Update ansible playbook** + - Remove migrated service roles + - Add k8s deployment references + +3. **Configure Velero backups** (optional) + - Install with MinIO on sifaka + - Schedule daily cluster backups + +4. **Update zk documentation** + - New architecture + - Runbooks + - DR procedures + +--- + +## Critical Files + +| File | Purpose | +|------|---------| +| `ansible/playbooks/indri.yml` | Main playbook - add k8s roles, remove migrated services | +| `ansible/roles/tailscale_serve/defaults/main.yml` | Transition services to Tailscale operator | +| `pulumi/policy.hujson` | Add tags: k8s, registry, ci | +| `ansible/roles/borgmatic/defaults/main.yml` | Update PostgreSQL endpoint | +| `mise-tasks/indri-services-check` | Add k8s health checks | + +## New Directory Structure + +``` +ansible/ + k8s/ + operators/ + tailscale-operator.yaml + cloudnative-pg.yaml + databases/ + blumeops-pg.yaml + apps/ + grafana/ + miniflux/ + forgejo/ + devpi/ + kiwix/ + woodpecker/ + roles/ + zot/ # NEW + podman/ # NEW + minikube/ # NEW +``` + +## Risk Mitigation + +- **Circular dependency prevention**: Zot registry runs outside k8s +- **Observability**: Prometheus/Loki stay on indri +- **Data loss prevention**: borgmatic + manual backups before each phase +- **Recovery**: Can manually push images, restore from backups + +## Container Images (All ARM64) + +| Service | Image | +|---------|-------| +| Miniflux | `ghcr.io/miniflux/miniflux:latest` | +| Forgejo | `codeberg.org/forgejo/forgejo:10` | +| Grafana | `grafana/grafana:latest` | +| Kiwix | `ghcr.io/kiwix/kiwix-serve:3.8.1` | +| Woodpecker | `woodpeckerci/woodpecker-server` | +| Zot | `ghcr.io/project-zot/zot-linux-arm64` |