Phase 0 review fixes

- Bump podman disk-size to 220G (> minikube's 200G)
- Fix Step 0.3 test to use curl instead of podman (not installed yet)
- Simplify Step 0.5 zot metrics to just zot_up for now
- Add Backup Strategy section to Technical Decisions
- Add zot restart handler to Step 0.3
- Move dashboard steps to Phase 0 Follow-up section
- Renumber steps (0.14->0.12, 0.15->0.13)
- Fix Modified Files Summary (tag:k8s deferred to Phase 1)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-01-17 17:32:55 -08:00
commit bcd96d86f0

View file

@ -62,6 +62,23 @@ LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths
This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors).
`brew services` handles this automatically but those aren't tracked in ansible.
### Backup Strategy
Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at `/Volumes/backups`. This ensures backups continue even if k8s is down.
| Service | Backup Approach |
|---------|-----------------|
| **Zot Registry** | No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control |
| **Minikube** | No backup of cluster state - declarative manifests in git, can recreate |
| **PostgreSQL (k8s)** | CloudNativePG scheduled backups to sifaka (Phase 1) |
| **Grafana (k8s)** | Dashboards in ansible source control, no runtime backup needed |
| **Miniflux (k8s)** | Database backed up via CloudNativePG |
| **Forgejo (k8s)** | Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration |
| **devpi (k8s)** | Private packages backed up, PyPI cache re-fetchable |
| **Kiwix (k8s)** | ZIM files re-downloadable via torrent, no backup needed |
**Borgmatic config changes:** None required for Phase 0. Future phases may add k8s PV paths if needed.
---
## Phase 0: Foundation
@ -265,6 +282,23 @@ zot_sync_registries:
</plist>
```
**Handlers (handlers/main.yml):**
```yaml
- name: Restart zot
ansible.builtin.command:
cmd: launchctl kickstart -k gui/$(id -u)/mcquack.eblume.zot
listen: restart zot
```
**Tasks should notify handler on config change:**
```yaml
- name: Deploy zot config
ansible.builtin.template:
src: config.json.j2
dest: "{{ zot_config_dir }}/config.json"
notify: restart zot
```
**Testing (after deploying role):**
```bash
# Check LaunchAgent is running
@ -277,8 +311,9 @@ ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Check logs for errors
ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log'
# Test pull-through cache (from indri, using localhost)
ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest'
# Test pull-through cache via curl (podman not installed until Step 0.8)
ssh indri 'curl -s http://localhost:5000/v2/docker.io/library/alpine/manifests/latest -H "Accept: application/vnd.oci.image.manifest.v1+json"'
# Should return manifest JSON (triggers cache fetch from Docker Hub)
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":["docker.io/library/alpine"]}
```
@ -345,17 +380,13 @@ if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then
echo "zot_up 1" > "$TEMP_FILE"
else
echo "zot_up 0" > "$TEMP_FILE"
mv "$TEMP_FILE" "$METRICS_FILE"
exit 0
fi
# Get metrics from zot's metrics endpoint (if enabled)
# Add storage metrics, cache hits, etc.
# ...
mv "$TEMP_FILE" "$METRICS_FILE"
```
**Note:** Start with just `zot_up` for now. Additional metrics (storage usage, cache stats) can be added later after reviewing zot's metrics endpoint.
**Testing:**
```bash
# Deploy metrics role
@ -455,7 +486,7 @@ ansible/roles/podman/
- name: Initialize podman machine (if not exists)
ansible.builtin.command:
cmd: podman machine init --cpus 4 --memory 8192 --disk-size 100
cmd: podman machine init --cpus 4 --memory 8192 --disk-size 220
register: podman_init
changed_when: podman_init.rc == 0
failed_when: podman_init.rc not in [0, 125] # 125 = already exists
@ -614,48 +645,7 @@ mise run indri-services-check
---
### Step 0.12: Create Zot Grafana Dashboard
**New files:**
- `ansible/roles/grafana/files/dashboards/zot.json`
**Dashboard panels:**
- `zot_up` - Service availability
- Storage usage (if zot exposes this metric)
- Cache hit/miss rates
- Pull/push request counts
**Testing:**
```bash
# Deploy dashboard
mise run provision-indri -- --tags grafana
# Verify in Grafana UI
# Navigate to Dashboards > Zot Registry
```
---
### Step 0.13: Create Minikube Grafana Dashboard
**New files:**
- `ansible/roles/grafana/files/dashboards/minikube.json`
**Dashboard panels:**
- Node CPU/Memory usage
- Pod count by namespace
- Container restart counts
- API server request latency
**Note:** This may require deploying kube-state-metrics in the cluster first:
```bash
ssh indri 'kubectl apply -f https://raw.githubusercontent.com/kubernetes/kube-state-metrics/main/examples/standard/cluster-role.yaml'
# ... additional kube-state-metrics manifests
```
---
### Step 0.14: Create Zettelkasten Documentation
### Step 0.12: Create Zettelkasten Documentation
**New files:**
- `~/code/personal/zk/zot.md`
@ -723,7 +713,7 @@ tail -f ~/Library/Logs/mcquack.zot.err.log
---
### Step 0.15: Update Main Playbook
### Step 0.13: Update Main Playbook
**Files to modify:**
- `ansible/playbooks/indri.yml`
@ -777,11 +767,7 @@ curl -s "http://indri:9090/api/v1/query?query=zot_up"
# In Grafana Explore: {service="zot"}
# Should see zot log entries
# 7. Dashboards in Grafana
# Navigate to Zot Registry dashboard - panels should have data
# Navigate to Minikube dashboard - panels should have data
# 8. k9s from gilbert
# 7. k9s from gilbert
k9s
# Should connect and show minikube cluster
```
@ -823,6 +809,23 @@ rm ~/code/personal/zk/{zot,minikube}.md
---
### Phase 0 Follow-up: Grafana Dashboards
After Phase 0 is running and stable, create monitoring dashboards:
**Zot Dashboard** (`ansible/roles/grafana/files/dashboards/zot.json`):
1. Check what metrics zot exposes: `ssh indri 'curl -s http://localhost:5000/metrics'`
2. Review community dashboards for inspiration (copy permitted if license allows)
3. Create dashboard with available metrics (at minimum: `zot_up`)
**Minikube Dashboard** (`ansible/roles/grafana/files/dashboards/minikube.json`):
1. Deploy kube-state-metrics if needed for additional cluster metrics
2. Review what Prometheus can scrape from the cluster
3. Review community dashboards for inspiration (copy permitted if license allows)
4. Create dashboard with relevant panels (node usage, pod counts, etc.)
---
### New Files Summary
| File | Purpose |
@ -831,8 +834,6 @@ rm ~/code/personal/zk/{zot,minikube}.md
| `ansible/roles/zot_metrics/` | Metrics collection for Zot |
| `ansible/roles/podman/` | Podman installation and setup |
| `ansible/roles/minikube/` | Minikube cluster setup |
| `ansible/roles/grafana/files/dashboards/zot.json` | Zot monitoring dashboard |
| `ansible/roles/grafana/files/dashboards/minikube.json` | K8s monitoring dashboard |
| `~/code/personal/zk/zot.md` | Zot management documentation |
| `~/code/personal/zk/minikube.md` | Minikube management documentation |
@ -840,7 +841,7 @@ rm ~/code/personal/zk/{zot,minikube}.md
| File | Changes |
|------|---------|
| `pulumi/policy.hujson` | Add tag:registry, tag:k8s, ACL rules |
| `pulumi/policy.hujson` | Add tag:registry |
| `ansible/playbooks/indri.yml` | Add new roles |
| `ansible/roles/tailscale_serve/defaults/main.yml` | Add svc:registry |
| `ansible/roles/alloy/templates/config.alloy.j2` | Add zot log collection |