Phase 0 review fixes
- Bump podman disk-size to 220G (> minikube's 200G) - Fix Step 0.3 test to use curl instead of podman (not installed yet) - Simplify Step 0.5 zot metrics to just zot_up for now - Add Backup Strategy section to Technical Decisions - Add zot restart handler to Step 0.3 - Move dashboard steps to Phase 0 Follow-up section - Renumber steps (0.14->0.12, 0.15->0.13) - Fix Modified Files Summary (tag:k8s deferred to Phase 1) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
a31e8935c9
commit
bcd96d86f0
1 changed files with 61 additions and 60 deletions
|
|
@ -62,6 +62,23 @@ LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths
|
|||
This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors).
|
||||
`brew services` handles this automatically but those aren't tracked in ansible.
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at `/Volumes/backups`. This ensures backups continue even if k8s is down.
|
||||
|
||||
| Service | Backup Approach |
|
||||
|---------|-----------------|
|
||||
| **Zot Registry** | No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control |
|
||||
| **Minikube** | No backup of cluster state - declarative manifests in git, can recreate |
|
||||
| **PostgreSQL (k8s)** | CloudNativePG scheduled backups to sifaka (Phase 1) |
|
||||
| **Grafana (k8s)** | Dashboards in ansible source control, no runtime backup needed |
|
||||
| **Miniflux (k8s)** | Database backed up via CloudNativePG |
|
||||
| **Forgejo (k8s)** | Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration |
|
||||
| **devpi (k8s)** | Private packages backed up, PyPI cache re-fetchable |
|
||||
| **Kiwix (k8s)** | ZIM files re-downloadable via torrent, no backup needed |
|
||||
|
||||
**Borgmatic config changes:** None required for Phase 0. Future phases may add k8s PV paths if needed.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0: Foundation
|
||||
|
|
@ -265,6 +282,23 @@ zot_sync_registries:
|
|||
</plist>
|
||||
```
|
||||
|
||||
**Handlers (handlers/main.yml):**
|
||||
```yaml
|
||||
- name: Restart zot
|
||||
ansible.builtin.command:
|
||||
cmd: launchctl kickstart -k gui/$(id -u)/mcquack.eblume.zot
|
||||
listen: restart zot
|
||||
```
|
||||
|
||||
**Tasks should notify handler on config change:**
|
||||
```yaml
|
||||
- name: Deploy zot config
|
||||
ansible.builtin.template:
|
||||
src: config.json.j2
|
||||
dest: "{{ zot_config_dir }}/config.json"
|
||||
notify: restart zot
|
||||
```
|
||||
|
||||
**Testing (after deploying role):**
|
||||
```bash
|
||||
# Check LaunchAgent is running
|
||||
|
|
@ -277,8 +311,9 @@ ssh indri 'curl -s http://localhost:5000/v2/_catalog'
|
|||
# Check logs for errors
|
||||
ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log'
|
||||
|
||||
# Test pull-through cache (from indri, using localhost)
|
||||
ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest'
|
||||
# Test pull-through cache via curl (podman not installed until Step 0.8)
|
||||
ssh indri 'curl -s http://localhost:5000/v2/docker.io/library/alpine/manifests/latest -H "Accept: application/vnd.oci.image.manifest.v1+json"'
|
||||
# Should return manifest JSON (triggers cache fetch from Docker Hub)
|
||||
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
|
||||
# Expected: {"repositories":["docker.io/library/alpine"]}
|
||||
```
|
||||
|
|
@ -345,17 +380,13 @@ if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then
|
|||
echo "zot_up 1" > "$TEMP_FILE"
|
||||
else
|
||||
echo "zot_up 0" > "$TEMP_FILE"
|
||||
mv "$TEMP_FILE" "$METRICS_FILE"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Get metrics from zot's metrics endpoint (if enabled)
|
||||
# Add storage metrics, cache hits, etc.
|
||||
# ...
|
||||
|
||||
mv "$TEMP_FILE" "$METRICS_FILE"
|
||||
```
|
||||
|
||||
**Note:** Start with just `zot_up` for now. Additional metrics (storage usage, cache stats) can be added later after reviewing zot's metrics endpoint.
|
||||
|
||||
**Testing:**
|
||||
```bash
|
||||
# Deploy metrics role
|
||||
|
|
@ -455,7 +486,7 @@ ansible/roles/podman/
|
|||
|
||||
- name: Initialize podman machine (if not exists)
|
||||
ansible.builtin.command:
|
||||
cmd: podman machine init --cpus 4 --memory 8192 --disk-size 100
|
||||
cmd: podman machine init --cpus 4 --memory 8192 --disk-size 220
|
||||
register: podman_init
|
||||
changed_when: podman_init.rc == 0
|
||||
failed_when: podman_init.rc not in [0, 125] # 125 = already exists
|
||||
|
|
@ -614,48 +645,7 @@ mise run indri-services-check
|
|||
|
||||
---
|
||||
|
||||
### Step 0.12: Create Zot Grafana Dashboard
|
||||
|
||||
**New files:**
|
||||
- `ansible/roles/grafana/files/dashboards/zot.json`
|
||||
|
||||
**Dashboard panels:**
|
||||
- `zot_up` - Service availability
|
||||
- Storage usage (if zot exposes this metric)
|
||||
- Cache hit/miss rates
|
||||
- Pull/push request counts
|
||||
|
||||
**Testing:**
|
||||
```bash
|
||||
# Deploy dashboard
|
||||
mise run provision-indri -- --tags grafana
|
||||
|
||||
# Verify in Grafana UI
|
||||
# Navigate to Dashboards > Zot Registry
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 0.13: Create Minikube Grafana Dashboard
|
||||
|
||||
**New files:**
|
||||
- `ansible/roles/grafana/files/dashboards/minikube.json`
|
||||
|
||||
**Dashboard panels:**
|
||||
- Node CPU/Memory usage
|
||||
- Pod count by namespace
|
||||
- Container restart counts
|
||||
- API server request latency
|
||||
|
||||
**Note:** This may require deploying kube-state-metrics in the cluster first:
|
||||
```bash
|
||||
ssh indri 'kubectl apply -f https://raw.githubusercontent.com/kubernetes/kube-state-metrics/main/examples/standard/cluster-role.yaml'
|
||||
# ... additional kube-state-metrics manifests
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 0.14: Create Zettelkasten Documentation
|
||||
### Step 0.12: Create Zettelkasten Documentation
|
||||
|
||||
**New files:**
|
||||
- `~/code/personal/zk/zot.md`
|
||||
|
|
@ -723,7 +713,7 @@ tail -f ~/Library/Logs/mcquack.zot.err.log
|
|||
|
||||
---
|
||||
|
||||
### Step 0.15: Update Main Playbook
|
||||
### Step 0.13: Update Main Playbook
|
||||
|
||||
**Files to modify:**
|
||||
- `ansible/playbooks/indri.yml`
|
||||
|
|
@ -777,11 +767,7 @@ curl -s "http://indri:9090/api/v1/query?query=zot_up"
|
|||
# In Grafana Explore: {service="zot"}
|
||||
# Should see zot log entries
|
||||
|
||||
# 7. Dashboards in Grafana
|
||||
# Navigate to Zot Registry dashboard - panels should have data
|
||||
# Navigate to Minikube dashboard - panels should have data
|
||||
|
||||
# 8. k9s from gilbert
|
||||
# 7. k9s from gilbert
|
||||
k9s
|
||||
# Should connect and show minikube cluster
|
||||
```
|
||||
|
|
@ -823,6 +809,23 @@ rm ~/code/personal/zk/{zot,minikube}.md
|
|||
|
||||
---
|
||||
|
||||
### Phase 0 Follow-up: Grafana Dashboards
|
||||
|
||||
After Phase 0 is running and stable, create monitoring dashboards:
|
||||
|
||||
**Zot Dashboard** (`ansible/roles/grafana/files/dashboards/zot.json`):
|
||||
1. Check what metrics zot exposes: `ssh indri 'curl -s http://localhost:5000/metrics'`
|
||||
2. Review community dashboards for inspiration (copy permitted if license allows)
|
||||
3. Create dashboard with available metrics (at minimum: `zot_up`)
|
||||
|
||||
**Minikube Dashboard** (`ansible/roles/grafana/files/dashboards/minikube.json`):
|
||||
1. Deploy kube-state-metrics if needed for additional cluster metrics
|
||||
2. Review what Prometheus can scrape from the cluster
|
||||
3. Review community dashboards for inspiration (copy permitted if license allows)
|
||||
4. Create dashboard with relevant panels (node usage, pod counts, etc.)
|
||||
|
||||
---
|
||||
|
||||
### New Files Summary
|
||||
|
||||
| File | Purpose |
|
||||
|
|
@ -831,8 +834,6 @@ rm ~/code/personal/zk/{zot,minikube}.md
|
|||
| `ansible/roles/zot_metrics/` | Metrics collection for Zot |
|
||||
| `ansible/roles/podman/` | Podman installation and setup |
|
||||
| `ansible/roles/minikube/` | Minikube cluster setup |
|
||||
| `ansible/roles/grafana/files/dashboards/zot.json` | Zot monitoring dashboard |
|
||||
| `ansible/roles/grafana/files/dashboards/minikube.json` | K8s monitoring dashboard |
|
||||
| `~/code/personal/zk/zot.md` | Zot management documentation |
|
||||
| `~/code/personal/zk/minikube.md` | Minikube management documentation |
|
||||
|
||||
|
|
@ -840,7 +841,7 @@ rm ~/code/personal/zk/{zot,minikube}.md
|
|||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `pulumi/policy.hujson` | Add tag:registry, tag:k8s, ACL rules |
|
||||
| `pulumi/policy.hujson` | Add tag:registry |
|
||||
| `ansible/playbooks/indri.yml` | Add new roles |
|
||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Add svc:registry |
|
||||
| `ansible/roles/alloy/templates/config.alloy.j2` | Add zot log collection |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue