Add Kubernetes migration plan documentation

Comprehensive phased plan for migrating blumeops services from direct
hosting on indri to a minikube cluster. Documents technical decisions
(Zot registry, Podman driver, CloudNativePG, Tailscale Operator) and
9 migration phases with verification and rollback procedures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-01-17 13:12:09 -08:00
commit 4d916a46d3

469
docs/k8s-migration.md Normal file
View file

@ -0,0 +1,469 @@
# Blumeops Minikube Migration Plan
This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes.
## Architecture Overview
### Services Staying on Indri (Outside K8s)
| Service | Reason |
|---------|--------|
| **Zot Registry** (NEW) | Avoid circular dependency - k8s needs images to start |
| **Prometheus** | Observability backbone must survive k8s failures |
| **Loki** | Log aggregation backbone |
| **Borgmatic** | Backup system |
| **Grafana-alloy** | Metrics/logs collector on host |
| **Plex** | Until Jellyfin replacement |
| **Transmission** | Downloads for kiwix ZIM files |
### Services Moving to K8s
| Service | Complexity | Dependencies |
|---------|------------|--------------|
| Grafana | LOW | Phase 1 |
| Kiwix | LOW | Phase 1 |
| Miniflux | MEDIUM | PostgreSQL |
| devpi | MEDIUM | Registry |
| PostgreSQL | HIGH | Phase 1 |
| Forgejo | HIGH | PostgreSQL |
| Woodpecker CI | MEDIUM | Forgejo |
## Technical Decisions
### Container Registry: Zot
- OCI-native, lightweight
- Native support for proxying multiple registries (Docker Hub, GHCR, Quay)
- Single binary, ARM64 native
- Config at `/etc/zot/config.json`
### Minikube Driver: Podman
- Rootless containers for better security
- Lighter than full VM (QEMU)
- Uses existing container ecosystem
- `minikube start --driver=podman --container-runtime=containerd`
### PostgreSQL: CloudNativePG Operator
- Production-grade operator
- Built-in backup/restore
- Prometheus metrics
- PITR support
### K8s Service Exposure: Tailscale Operator
- `loadBalancerClass: tailscale` on Services
- Automatic TLS and MagicDNS names
- ACL-controlled access
### LaunchAgent Requirements (Critical)
LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths**:
- `/opt/homebrew/bin/zot` not `zot`
- `/opt/homebrew/opt/mise/bin/mise x --` for mise-managed tools
- `/opt/homebrew/opt/postgresql@18/bin/pg_dump` for postgres tools
This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors).
`brew services` handles this automatically but those aren't tracked in ansible.
---
## Phase 0: Foundation
**Goal**: Container registry + minikube cluster without disrupting existing services
### Steps
1. **Install Podman on indri**
```bash
# Add to Brewfile
brew "podman"
```
- Create ansible role `podman` for machine setup
2. **Install and configure Zot registry**
- Create ansible role `zot`
- Deploy as mcquack LaunchAgent (like devpi pattern)
- Bind to `localhost:5000`
- Configure pull-through for Docker Hub + GHCR
- Add Tailscale serve: `svc:registry`
3. **Install minikube**
```bash
# Add to Brewfile
brew "minikube"
# Start with podman driver
minikube start --driver=podman --container-runtime=containerd \
--cpus=4 --memory=8192 --disk-size=100g
```
- Create ansible role `minikube` for initial setup
4. **Update Pulumi ACLs**
- Add `tag:registry` for registry service
- Add `tag:k8s` for cluster services
5. **Configure kubeconfig on gilbert**
- Add minikube context to `~/.kube/config`
- Keep work EKS config separate (already isolated)
- K9s will auto-discover contexts
6. **Observability for new services** (follow existing patterns)
**Zot Registry:**
- Create zk card `~/code/personal/zk/zot.md` (like devpi.md, forgejo.md)
- Add log collection to Alloy config (stdout/stderr from LaunchAgent)
- Create `zot_metrics` role with periodic script writing to textfile collector
- Create Grafana dashboard: cache hit rates, storage usage, pull/push counts
**Minikube:**
- Create zk card `~/code/personal/zk/minikube.md`
- Metrics via kube-state-metrics (deployed in cluster)
- Node metrics already collected by Alloy
- Create Grafana dashboard: cluster health, resource usage
**Note:** Backups not needed for these services:
- Zot cache is re-fetchable from upstream registries
- Minikube state is recreatable from ansible/k8s manifests
### New Files
- `ansible/roles/zot/` - Registry role
- `ansible/roles/zot_metrics/` - Metrics collection
- `ansible/roles/podman/` - Podman setup
- `ansible/roles/minikube/` - Cluster setup
- `~/code/personal/zk/zot.md` - Registry management log
- `~/code/personal/zk/minikube.md` - Cluster management log
### Verification
```bash
# Registry working
curl http://localhost:5000/v2/_catalog
# Minikube running
minikube status
kubectl get nodes
# Metrics flowing
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom'
# Logs in Loki
# Query: {service="zot"}
```
### Rollback
```bash
minikube stop && minikube delete
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist
```
---
## Phase 1: Kubernetes Infrastructure
**Goal**: Tailscale operator + CloudNativePG operator
### Steps
1. **Create Tailscale OAuth client**
- Scopes: Devices Core, Auth Keys, Services write
- Tag: `tag:k8s-operator`
- Store in 1Password
2. **Deploy Tailscale Kubernetes Operator**
```bash
helm repo add tailscale https://pkgs.tailscale.com/helmcharts
helm install tailscale-operator tailscale/tailscale-operator \
--namespace tailscale-system --create-namespace \
--set oauth.clientId=$CLIENT_ID \
--set oauth.clientSecret=$CLIENT_SECRET
```
3. **Deploy CloudNativePG operator**
```bash
kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml
```
4. **Create PostgreSQL cluster**
```yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: blumeops-pg
namespace: databases
spec:
instances: 1
storage:
size: 10Gi
storageClass: standard
monitoring:
enablePodMonitor: true
```
5. **Update Alloy config**
- Add kubernetes_sd_configs for k8s metrics
- Scrape operator metrics
### New Files
- `ansible/k8s/operators/` - Operator manifests
- `ansible/k8s/databases/` - PostgreSQL cluster
### Verification
```bash
kubectl get pods -n tailscale-system
kubectl get pods -n cnpg-system
kubectl get cluster -n databases
```
---
## Phase 2: Grafana Migration (Pilot)
**Goal**: Migrate Grafana as lowest-risk pilot service
### Steps
1. **Deploy Grafana via Helm**
- Copy datasource config from existing role
- Copy dashboards from `ansible/roles/grafana/files/dashboards/`
- Point to indri Prometheus/Loki (http://indri:9090, http://indri:3100)
2. **Configure Tailscale LoadBalancer**
```yaml
service:
type: LoadBalancer
loadBalancerClass: tailscale
```
3. **Verify all dashboards work**
4. **Update tailscale_serve** - remove grafana entry
5. **Stop brew grafana**: `brew services stop grafana`
### Verification
- https://grafana.tail8d86e.ts.net loads
- All dashboards functional
---
## Phase 3: PostgreSQL Migration
**Goal**: Migrate miniflux database to CloudNativePG
### Steps
1. **Create databases and users in k8s PostgreSQL**
- miniflux database/user
- borgmatic read-only user
2. **Export from brew PostgreSQL**
```bash
pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql
```
3. **Expose k8s PostgreSQL via Tailscale**
- Service with `loadBalancerClass: tailscale`
- Tag: `svc:pg-k8s`
4. **Import data**
```bash
psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql
```
5. **Update borgmatic config**
- Change hostname to k8s PostgreSQL
6. **Verify data integrity**
### Rollback
Keep brew PostgreSQL running until Phase 4 verified
---
## Phase 4: Miniflux Migration
**Goal**: Migrate Miniflux to k8s
### Steps
1. **Deploy Miniflux**
```yaml
image: ghcr.io/miniflux/miniflux:latest
env:
DATABASE_URL: from secret
RUN_MIGRATIONS: "1"
```
2. **Configure Tailscale LoadBalancer** - tag: `svc:feed`
3. **Update Alloy log collection** - add k8s namespace
4. **Verify**: login, feeds refresh, API works
5. **Stop brew miniflux**: `brew services stop miniflux`
---
## Phase 5: devpi Migration
**Goal**: Migrate devpi to k8s
### Steps
1. **Build devpi container**
- Dockerfile with devpi-server + devpi-web
- Push to local Zot registry
2. **Deploy as StatefulSet**
- PVC for data (50Gi)
- Migrate existing data (excluding PyPI cache)
3. **Configure Tailscale LoadBalancer** - tag: `svc:pypi`
4. **Update pip.conf on gilbert**
5. **Stop mcquack devpi**
---
## Phase 6: Kiwix Migration
**Goal**: Migrate kiwix-serve to k8s
### Steps
1. **Create NFS/hostPath PV for ZIM files**
- Point to transmission download directory
- ReadOnlyMany access
2. **Deploy Kiwix**
```yaml
image: ghcr.io/kiwix/kiwix-serve:3.8.1
args: ["/data/*.zim"]
```
3. **Configure Tailscale LoadBalancer** - tag: `svc:kiwix`
4. **Stop mcquack kiwix-serve**
---
## Phase 7: Forgejo Migration (Highest Risk)
**Goal**: Migrate Forgejo to k8s
### Pre-Migration Checklist
- [ ] Full borgmatic backup verified
- [ ] Manual backup of `/opt/homebrew/var/forgejo`
- [ ] Document SSH keys and webhooks
### Steps
1. **Deploy Forgejo via Helm**
```bash
helm install forgejo forgejo/forgejo \
--namespace forgejo --create-namespace
```
2. **Migrate data**
- Stop brew forgejo
- Copy data to PVC
- Start k8s forgejo
3. **Configure Tailscale services**
- HTTPS 443 via LoadBalancer
- SSH port 22 (TCP proxy)
4. **Verify all repositories accessible**
### Rollback
Restore brew forgejo and tailscale serve config
---
## Phase 8: CI/CD (Woodpecker)
**Goal**: Deploy Woodpecker CI integrated with Forgejo
### Steps
1. **Create Forgejo OAuth application**
- Callback: https://ci.tail8d86e.ts.net/authorize
- Store in 1Password
2. **Deploy Woodpecker Server + Agent**
3. **Configure Tailscale LoadBalancer** - tag: `svc:ci`
4. **Test pipeline** - create `.woodpecker.yaml` in test repo
---
## Phase 9: Cleanup
**Goal**: Remove deprecated services, harden system
### Steps
1. **Stop/remove unused brew services**
- postgresql@18, grafana, miniflux, forgejo
2. **Update ansible playbook**
- Remove migrated service roles
- Add k8s deployment references
3. **Configure Velero backups** (optional)
- Install with MinIO on sifaka
- Schedule daily cluster backups
4. **Update zk documentation**
- New architecture
- Runbooks
- DR procedures
---
## Critical Files
| File | Purpose |
|------|---------|
| `ansible/playbooks/indri.yml` | Main playbook - add k8s roles, remove migrated services |
| `ansible/roles/tailscale_serve/defaults/main.yml` | Transition services to Tailscale operator |
| `pulumi/policy.hujson` | Add tags: k8s, registry, ci |
| `ansible/roles/borgmatic/defaults/main.yml` | Update PostgreSQL endpoint |
| `mise-tasks/indri-services-check` | Add k8s health checks |
## New Directory Structure
```
ansible/
k8s/
operators/
tailscale-operator.yaml
cloudnative-pg.yaml
databases/
blumeops-pg.yaml
apps/
grafana/
miniflux/
forgejo/
devpi/
kiwix/
woodpecker/
roles/
zot/ # NEW
podman/ # NEW
minikube/ # NEW
```
## Risk Mitigation
- **Circular dependency prevention**: Zot registry runs outside k8s
- **Observability**: Prometheus/Loki stay on indri
- **Data loss prevention**: borgmatic + manual backups before each phase
- **Recovery**: Can manually push images, restore from backups
## Container Images (All ARM64)
| Service | Image |
|---------|-------|
| Miniflux | `ghcr.io/miniflux/miniflux:latest` |
| Forgejo | `codeberg.org/forgejo/forgejo:10` |
| Grafana | `grafana/grafana:latest` |
| Kiwix | `ghcr.io/kiwix/kiwix-serve:3.8.1` |
| Woodpecker | `woodpeckerci/woodpecker-server` |
| Zot | `ghcr.io/project-zot/zot-linux-arm64` |