diff --git a/plans/k8s-migration.md b/plans/k8s-migration.md index cca23cf..5d3fd61 100644 --- a/plans/k8s-migration.md +++ b/plans/k8s-migration.md @@ -62,6 +62,23 @@ LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). `brew services` handles this automatically but those aren't tracked in ansible. +### Backup Strategy + +Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at `/Volumes/backups`. This ensures backups continue even if k8s is down. + +| Service | Backup Approach | +|---------|-----------------| +| **Zot Registry** | No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control | +| **Minikube** | No backup of cluster state - declarative manifests in git, can recreate | +| **PostgreSQL (k8s)** | CloudNativePG scheduled backups to sifaka (Phase 1) | +| **Grafana (k8s)** | Dashboards in ansible source control, no runtime backup needed | +| **Miniflux (k8s)** | Database backed up via CloudNativePG | +| **Forgejo (k8s)** | Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration | +| **devpi (k8s)** | Private packages backed up, PyPI cache re-fetchable | +| **Kiwix (k8s)** | ZIM files re-downloadable via torrent, no backup needed | + +**Borgmatic config changes:** None required for Phase 0. Future phases may add k8s PV paths if needed. + --- ## Phase 0: Foundation @@ -265,6 +282,23 @@ zot_sync_registries: ``` +**Handlers (handlers/main.yml):** +```yaml +- name: Restart zot + ansible.builtin.command: + cmd: launchctl kickstart -k gui/$(id -u)/mcquack.eblume.zot + listen: restart zot +``` + +**Tasks should notify handler on config change:** +```yaml +- name: Deploy zot config + ansible.builtin.template: + src: config.json.j2 + dest: "{{ zot_config_dir }}/config.json" + notify: restart zot +``` + **Testing (after deploying role):** ```bash # Check LaunchAgent is running @@ -277,8 +311,9 @@ ssh indri 'curl -s http://localhost:5000/v2/_catalog' # Check logs for errors ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log' -# Test pull-through cache (from indri, using localhost) -ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest' +# Test pull-through cache via curl (podman not installed until Step 0.8) +ssh indri 'curl -s http://localhost:5000/v2/docker.io/library/alpine/manifests/latest -H "Accept: application/vnd.oci.image.manifest.v1+json"' +# Should return manifest JSON (triggers cache fetch from Docker Hub) ssh indri 'curl -s http://localhost:5000/v2/_catalog' # Expected: {"repositories":["docker.io/library/alpine"]} ``` @@ -345,17 +380,13 @@ if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then echo "zot_up 1" > "$TEMP_FILE" else echo "zot_up 0" > "$TEMP_FILE" - mv "$TEMP_FILE" "$METRICS_FILE" - exit 0 fi -# Get metrics from zot's metrics endpoint (if enabled) -# Add storage metrics, cache hits, etc. -# ... - mv "$TEMP_FILE" "$METRICS_FILE" ``` +**Note:** Start with just `zot_up` for now. Additional metrics (storage usage, cache stats) can be added later after reviewing zot's metrics endpoint. + **Testing:** ```bash # Deploy metrics role @@ -455,7 +486,7 @@ ansible/roles/podman/ - name: Initialize podman machine (if not exists) ansible.builtin.command: - cmd: podman machine init --cpus 4 --memory 8192 --disk-size 100 + cmd: podman machine init --cpus 4 --memory 8192 --disk-size 220 register: podman_init changed_when: podman_init.rc == 0 failed_when: podman_init.rc not in [0, 125] # 125 = already exists @@ -614,48 +645,7 @@ mise run indri-services-check --- -### Step 0.12: Create Zot Grafana Dashboard - -**New files:** -- `ansible/roles/grafana/files/dashboards/zot.json` - -**Dashboard panels:** -- `zot_up` - Service availability -- Storage usage (if zot exposes this metric) -- Cache hit/miss rates -- Pull/push request counts - -**Testing:** -```bash -# Deploy dashboard -mise run provision-indri -- --tags grafana - -# Verify in Grafana UI -# Navigate to Dashboards > Zot Registry -``` - ---- - -### Step 0.13: Create Minikube Grafana Dashboard - -**New files:** -- `ansible/roles/grafana/files/dashboards/minikube.json` - -**Dashboard panels:** -- Node CPU/Memory usage -- Pod count by namespace -- Container restart counts -- API server request latency - -**Note:** This may require deploying kube-state-metrics in the cluster first: -```bash -ssh indri 'kubectl apply -f https://raw.githubusercontent.com/kubernetes/kube-state-metrics/main/examples/standard/cluster-role.yaml' -# ... additional kube-state-metrics manifests -``` - ---- - -### Step 0.14: Create Zettelkasten Documentation +### Step 0.12: Create Zettelkasten Documentation **New files:** - `~/code/personal/zk/zot.md` @@ -723,7 +713,7 @@ tail -f ~/Library/Logs/mcquack.zot.err.log --- -### Step 0.15: Update Main Playbook +### Step 0.13: Update Main Playbook **Files to modify:** - `ansible/playbooks/indri.yml` @@ -777,11 +767,7 @@ curl -s "http://indri:9090/api/v1/query?query=zot_up" # In Grafana Explore: {service="zot"} # Should see zot log entries -# 7. Dashboards in Grafana -# Navigate to Zot Registry dashboard - panels should have data -# Navigate to Minikube dashboard - panels should have data - -# 8. k9s from gilbert +# 7. k9s from gilbert k9s # Should connect and show minikube cluster ``` @@ -823,6 +809,23 @@ rm ~/code/personal/zk/{zot,minikube}.md --- +### Phase 0 Follow-up: Grafana Dashboards + +After Phase 0 is running and stable, create monitoring dashboards: + +**Zot Dashboard** (`ansible/roles/grafana/files/dashboards/zot.json`): +1. Check what metrics zot exposes: `ssh indri 'curl -s http://localhost:5000/metrics'` +2. Review community dashboards for inspiration (copy permitted if license allows) +3. Create dashboard with available metrics (at minimum: `zot_up`) + +**Minikube Dashboard** (`ansible/roles/grafana/files/dashboards/minikube.json`): +1. Deploy kube-state-metrics if needed for additional cluster metrics +2. Review what Prometheus can scrape from the cluster +3. Review community dashboards for inspiration (copy permitted if license allows) +4. Create dashboard with relevant panels (node usage, pod counts, etc.) + +--- + ### New Files Summary | File | Purpose | @@ -831,8 +834,6 @@ rm ~/code/personal/zk/{zot,minikube}.md | `ansible/roles/zot_metrics/` | Metrics collection for Zot | | `ansible/roles/podman/` | Podman installation and setup | | `ansible/roles/minikube/` | Minikube cluster setup | -| `ansible/roles/grafana/files/dashboards/zot.json` | Zot monitoring dashboard | -| `ansible/roles/grafana/files/dashboards/minikube.json` | K8s monitoring dashboard | | `~/code/personal/zk/zot.md` | Zot management documentation | | `~/code/personal/zk/minikube.md` | Minikube management documentation | @@ -840,7 +841,7 @@ rm ~/code/personal/zk/{zot,minikube}.md | File | Changes | |------|---------| -| `pulumi/policy.hujson` | Add tag:registry, tag:k8s, ACL rules | +| `pulumi/policy.hujson` | Add tag:registry | | `ansible/playbooks/indri.yml` | Add new roles | | `ansible/roles/tailscale_serve/defaults/main.yml` | Add svc:registry | | `ansible/roles/alloy/templates/config.alloy.j2` | Add zot log collection |