From 4d916a46d3a3205669b77cefde7851b84e4e1322 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:12:09 -0800 Subject: [PATCH 01/21] Add Kubernetes migration plan documentation Comprehensive phased plan for migrating blumeops services from direct hosting on indri to a minikube cluster. Documents technical decisions (Zot registry, Podman driver, CloudNativePG, Tailscale Operator) and 9 migration phases with verification and rollback procedures. Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 469 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 469 insertions(+) create mode 100644 docs/k8s-migration.md diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md new file mode 100644 index 0000000..140fb60 --- /dev/null +++ b/docs/k8s-migration.md @@ -0,0 +1,469 @@ +# Blumeops Minikube Migration Plan + +This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes. + +## Architecture Overview + +### Services Staying on Indri (Outside K8s) +| Service | Reason | +|---------|--------| +| **Zot Registry** (NEW) | Avoid circular dependency - k8s needs images to start | +| **Prometheus** | Observability backbone must survive k8s failures | +| **Loki** | Log aggregation backbone | +| **Borgmatic** | Backup system | +| **Grafana-alloy** | Metrics/logs collector on host | +| **Plex** | Until Jellyfin replacement | +| **Transmission** | Downloads for kiwix ZIM files | + +### Services Moving to K8s +| Service | Complexity | Dependencies | +|---------|------------|--------------| +| Grafana | LOW | Phase 1 | +| Kiwix | LOW | Phase 1 | +| Miniflux | MEDIUM | PostgreSQL | +| devpi | MEDIUM | Registry | +| PostgreSQL | HIGH | Phase 1 | +| Forgejo | HIGH | PostgreSQL | +| Woodpecker CI | MEDIUM | Forgejo | + +## Technical Decisions + +### Container Registry: Zot +- OCI-native, lightweight +- Native support for proxying multiple registries (Docker Hub, GHCR, Quay) +- Single binary, ARM64 native +- Config at `/etc/zot/config.json` + +### Minikube Driver: Podman +- Rootless containers for better security +- Lighter than full VM (QEMU) +- Uses existing container ecosystem +- `minikube start --driver=podman --container-runtime=containerd` + +### PostgreSQL: CloudNativePG Operator +- Production-grade operator +- Built-in backup/restore +- Prometheus metrics +- PITR support + +### K8s Service Exposure: Tailscale Operator +- `loadBalancerClass: tailscale` on Services +- Automatic TLS and MagicDNS names +- ACL-controlled access + +### LaunchAgent Requirements (Critical) +LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths**: +- `/opt/homebrew/bin/zot` not `zot` +- `/opt/homebrew/opt/mise/bin/mise x --` for mise-managed tools +- `/opt/homebrew/opt/postgresql@18/bin/pg_dump` for postgres tools + +This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). +`brew services` handles this automatically but those aren't tracked in ansible. + +--- + +## Phase 0: Foundation + +**Goal**: Container registry + minikube cluster without disrupting existing services + +### Steps + +1. **Install Podman on indri** + ```bash + # Add to Brewfile + brew "podman" + ``` + - Create ansible role `podman` for machine setup + +2. **Install and configure Zot registry** + - Create ansible role `zot` + - Deploy as mcquack LaunchAgent (like devpi pattern) + - Bind to `localhost:5000` + - Configure pull-through for Docker Hub + GHCR + - Add Tailscale serve: `svc:registry` + +3. **Install minikube** + ```bash + # Add to Brewfile + brew "minikube" + + # Start with podman driver + minikube start --driver=podman --container-runtime=containerd \ + --cpus=4 --memory=8192 --disk-size=100g + ``` + - Create ansible role `minikube` for initial setup + +4. **Update Pulumi ACLs** + - Add `tag:registry` for registry service + - Add `tag:k8s` for cluster services + +5. **Configure kubeconfig on gilbert** + - Add minikube context to `~/.kube/config` + - Keep work EKS config separate (already isolated) + - K9s will auto-discover contexts + +6. **Observability for new services** (follow existing patterns) + + **Zot Registry:** + - Create zk card `~/code/personal/zk/zot.md` (like devpi.md, forgejo.md) + - Add log collection to Alloy config (stdout/stderr from LaunchAgent) + - Create `zot_metrics` role with periodic script writing to textfile collector + - Create Grafana dashboard: cache hit rates, storage usage, pull/push counts + + **Minikube:** + - Create zk card `~/code/personal/zk/minikube.md` + - Metrics via kube-state-metrics (deployed in cluster) + - Node metrics already collected by Alloy + - Create Grafana dashboard: cluster health, resource usage + + **Note:** Backups not needed for these services: + - Zot cache is re-fetchable from upstream registries + - Minikube state is recreatable from ansible/k8s manifests + +### New Files +- `ansible/roles/zot/` - Registry role +- `ansible/roles/zot_metrics/` - Metrics collection +- `ansible/roles/podman/` - Podman setup +- `ansible/roles/minikube/` - Cluster setup +- `~/code/personal/zk/zot.md` - Registry management log +- `~/code/personal/zk/minikube.md` - Cluster management log + +### Verification +```bash +# Registry working +curl http://localhost:5000/v2/_catalog + +# Minikube running +minikube status +kubectl get nodes + +# Metrics flowing +ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom' + +# Logs in Loki +# Query: {service="zot"} +``` + +### Rollback +```bash +minikube stop && minikube delete +launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist +``` + +--- + +## Phase 1: Kubernetes Infrastructure + +**Goal**: Tailscale operator + CloudNativePG operator + +### Steps + +1. **Create Tailscale OAuth client** + - Scopes: Devices Core, Auth Keys, Services write + - Tag: `tag:k8s-operator` + - Store in 1Password + +2. **Deploy Tailscale Kubernetes Operator** + ```bash + helm repo add tailscale https://pkgs.tailscale.com/helmcharts + helm install tailscale-operator tailscale/tailscale-operator \ + --namespace tailscale-system --create-namespace \ + --set oauth.clientId=$CLIENT_ID \ + --set oauth.clientSecret=$CLIENT_SECRET + ``` + +3. **Deploy CloudNativePG operator** + ```bash + kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml + ``` + +4. **Create PostgreSQL cluster** + ```yaml + apiVersion: postgresql.cnpg.io/v1 + kind: Cluster + metadata: + name: blumeops-pg + namespace: databases + spec: + instances: 1 + storage: + size: 10Gi + storageClass: standard + monitoring: + enablePodMonitor: true + ``` + +5. **Update Alloy config** + - Add kubernetes_sd_configs for k8s metrics + - Scrape operator metrics + +### New Files +- `ansible/k8s/operators/` - Operator manifests +- `ansible/k8s/databases/` - PostgreSQL cluster + +### Verification +```bash +kubectl get pods -n tailscale-system +kubectl get pods -n cnpg-system +kubectl get cluster -n databases +``` + +--- + +## Phase 2: Grafana Migration (Pilot) + +**Goal**: Migrate Grafana as lowest-risk pilot service + +### Steps + +1. **Deploy Grafana via Helm** + - Copy datasource config from existing role + - Copy dashboards from `ansible/roles/grafana/files/dashboards/` + - Point to indri Prometheus/Loki (http://indri:9090, http://indri:3100) + +2. **Configure Tailscale LoadBalancer** + ```yaml + service: + type: LoadBalancer + loadBalancerClass: tailscale + ``` + +3. **Verify all dashboards work** + +4. **Update tailscale_serve** - remove grafana entry + +5. **Stop brew grafana**: `brew services stop grafana` + +### Verification +- https://grafana.tail8d86e.ts.net loads +- All dashboards functional + +--- + +## Phase 3: PostgreSQL Migration + +**Goal**: Migrate miniflux database to CloudNativePG + +### Steps + +1. **Create databases and users in k8s PostgreSQL** + - miniflux database/user + - borgmatic read-only user + +2. **Export from brew PostgreSQL** + ```bash + pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql + ``` + +3. **Expose k8s PostgreSQL via Tailscale** + - Service with `loadBalancerClass: tailscale` + - Tag: `svc:pg-k8s` + +4. **Import data** + ```bash + psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql + ``` + +5. **Update borgmatic config** + - Change hostname to k8s PostgreSQL + +6. **Verify data integrity** + +### Rollback +Keep brew PostgreSQL running until Phase 4 verified + +--- + +## Phase 4: Miniflux Migration + +**Goal**: Migrate Miniflux to k8s + +### Steps + +1. **Deploy Miniflux** + ```yaml + image: ghcr.io/miniflux/miniflux:latest + env: + DATABASE_URL: from secret + RUN_MIGRATIONS: "1" + ``` + +2. **Configure Tailscale LoadBalancer** - tag: `svc:feed` + +3. **Update Alloy log collection** - add k8s namespace + +4. **Verify**: login, feeds refresh, API works + +5. **Stop brew miniflux**: `brew services stop miniflux` + +--- + +## Phase 5: devpi Migration + +**Goal**: Migrate devpi to k8s + +### Steps + +1. **Build devpi container** + - Dockerfile with devpi-server + devpi-web + - Push to local Zot registry + +2. **Deploy as StatefulSet** + - PVC for data (50Gi) + - Migrate existing data (excluding PyPI cache) + +3. **Configure Tailscale LoadBalancer** - tag: `svc:pypi` + +4. **Update pip.conf on gilbert** + +5. **Stop mcquack devpi** + +--- + +## Phase 6: Kiwix Migration + +**Goal**: Migrate kiwix-serve to k8s + +### Steps + +1. **Create NFS/hostPath PV for ZIM files** + - Point to transmission download directory + - ReadOnlyMany access + +2. **Deploy Kiwix** + ```yaml + image: ghcr.io/kiwix/kiwix-serve:3.8.1 + args: ["/data/*.zim"] + ``` + +3. **Configure Tailscale LoadBalancer** - tag: `svc:kiwix` + +4. **Stop mcquack kiwix-serve** + +--- + +## Phase 7: Forgejo Migration (Highest Risk) + +**Goal**: Migrate Forgejo to k8s + +### Pre-Migration Checklist +- [ ] Full borgmatic backup verified +- [ ] Manual backup of `/opt/homebrew/var/forgejo` +- [ ] Document SSH keys and webhooks + +### Steps + +1. **Deploy Forgejo via Helm** + ```bash + helm install forgejo forgejo/forgejo \ + --namespace forgejo --create-namespace + ``` + +2. **Migrate data** + - Stop brew forgejo + - Copy data to PVC + - Start k8s forgejo + +3. **Configure Tailscale services** + - HTTPS 443 via LoadBalancer + - SSH port 22 (TCP proxy) + +4. **Verify all repositories accessible** + +### Rollback +Restore brew forgejo and tailscale serve config + +--- + +## Phase 8: CI/CD (Woodpecker) + +**Goal**: Deploy Woodpecker CI integrated with Forgejo + +### Steps + +1. **Create Forgejo OAuth application** + - Callback: https://ci.tail8d86e.ts.net/authorize + - Store in 1Password + +2. **Deploy Woodpecker Server + Agent** + +3. **Configure Tailscale LoadBalancer** - tag: `svc:ci` + +4. **Test pipeline** - create `.woodpecker.yaml` in test repo + +--- + +## Phase 9: Cleanup + +**Goal**: Remove deprecated services, harden system + +### Steps + +1. **Stop/remove unused brew services** + - postgresql@18, grafana, miniflux, forgejo + +2. **Update ansible playbook** + - Remove migrated service roles + - Add k8s deployment references + +3. **Configure Velero backups** (optional) + - Install with MinIO on sifaka + - Schedule daily cluster backups + +4. **Update zk documentation** + - New architecture + - Runbooks + - DR procedures + +--- + +## Critical Files + +| File | Purpose | +|------|---------| +| `ansible/playbooks/indri.yml` | Main playbook - add k8s roles, remove migrated services | +| `ansible/roles/tailscale_serve/defaults/main.yml` | Transition services to Tailscale operator | +| `pulumi/policy.hujson` | Add tags: k8s, registry, ci | +| `ansible/roles/borgmatic/defaults/main.yml` | Update PostgreSQL endpoint | +| `mise-tasks/indri-services-check` | Add k8s health checks | + +## New Directory Structure + +``` +ansible/ + k8s/ + operators/ + tailscale-operator.yaml + cloudnative-pg.yaml + databases/ + blumeops-pg.yaml + apps/ + grafana/ + miniflux/ + forgejo/ + devpi/ + kiwix/ + woodpecker/ + roles/ + zot/ # NEW + podman/ # NEW + minikube/ # NEW +``` + +## Risk Mitigation + +- **Circular dependency prevention**: Zot registry runs outside k8s +- **Observability**: Prometheus/Loki stay on indri +- **Data loss prevention**: borgmatic + manual backups before each phase +- **Recovery**: Can manually push images, restore from backups + +## Container Images (All ARM64) + +| Service | Image | +|---------|-------| +| Miniflux | `ghcr.io/miniflux/miniflux:latest` | +| Forgejo | `codeberg.org/forgejo/forgejo:10` | +| Grafana | `grafana/grafana:latest` | +| Kiwix | `ghcr.io/kiwix/kiwix-serve:3.8.1` | +| Woodpecker | `woodpeckerci/woodpecker-server` | +| Zot | `ghcr.io/project-zot/zot-linux-arm64` | -- 2.50.1 (Apple Git-155) From ace4822305085bff98c07d6cc0f4da09c2ad0920 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:27:04 -0800 Subject: [PATCH 02/21] Expand Phase 0 with detailed implementation steps - Add 16 numbered steps with specific files, code, and testing commands - Add Tailscale service creation order warning (must create in admin console BEFORE running tailscale serve) - Add comprehensive verification checklist and rollback procedures - Document indri-services-check updates for zot and minikube - Include zk documentation templates Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 743 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 675 insertions(+), 68 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 140fb60..90e7ad9 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -66,92 +66,699 @@ This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics **Goal**: Container registry + minikube cluster without disrupting existing services -### Steps +### Important: Tailscale Service Creation Order -1. **Install Podman on indri** - ```bash - # Add to Brewfile - brew "podman" - ``` - - Create ansible role `podman` for machine setup +> **WARNING**: You MUST create services in the Tailscale admin console BEFORE running `tailscale serve` commands via ansible. If you run `tailscale serve --service svc:foo` before the service exists in the admin console, the local config will be in a bad state. +> +> To fix a misconfigured service: +> ```bash +> tailscale serve --service svc:foo reset +> ``` +> Then create the service in admin console and try again. -2. **Install and configure Zot registry** - - Create ansible role `zot` - - Deploy as mcquack LaunchAgent (like devpi pattern) - - Bind to `localhost:5000` - - Configure pull-through for Docker Hub + GHCR - - Add Tailscale serve: `svc:registry` +--- -3. **Install minikube** - ```bash - # Add to Brewfile - brew "minikube" +### Step 0.1: Update Brewfile and Install Dependencies - # Start with podman driver - minikube start --driver=podman --container-runtime=containerd \ - --cpus=4 --memory=8192 --disk-size=100g - ``` - - Create ansible role `minikube` for initial setup +**Files to modify:** +- `Brewfile` -4. **Update Pulumi ACLs** - - Add `tag:registry` for registry service - - Add `tag:k8s` for cluster services +**Changes:** +```ruby +# Add to Brewfile +brew "podman" +brew "minikube" +brew "zot" # Check if available, otherwise install binary manually +``` -5. **Configure kubeconfig on gilbert** - - Add minikube context to `~/.kube/config` - - Keep work EKS config separate (already isolated) - - K9s will auto-discover contexts - -6. **Observability for new services** (follow existing patterns) - - **Zot Registry:** - - Create zk card `~/code/personal/zk/zot.md` (like devpi.md, forgejo.md) - - Add log collection to Alloy config (stdout/stderr from LaunchAgent) - - Create `zot_metrics` role with periodic script writing to textfile collector - - Create Grafana dashboard: cache hit rates, storage usage, pull/push counts - - **Minikube:** - - Create zk card `~/code/personal/zk/minikube.md` - - Metrics via kube-state-metrics (deployed in cluster) - - Node metrics already collected by Alloy - - Create Grafana dashboard: cluster health, resource usage - - **Note:** Backups not needed for these services: - - Zot cache is re-fetchable from upstream registries - - Minikube state is recreatable from ansible/k8s manifests - -### New Files -- `ansible/roles/zot/` - Registry role -- `ansible/roles/zot_metrics/` - Metrics collection -- `ansible/roles/podman/` - Podman setup -- `ansible/roles/minikube/` - Cluster setup -- `~/code/personal/zk/zot.md` - Registry management log -- `~/code/personal/zk/minikube.md` - Cluster management log - -### Verification +**Testing:** ```bash -# Registry working -curl http://localhost:5000/v2/_catalog +# On gilbert +brew bundle --file=Brewfile -# Minikube running -minikube status -kubectl get nodes +# On indri (via ansible or manual) +ssh indri 'brew install podman minikube' -# Metrics flowing +# Verify installations +ssh indri 'podman --version' +ssh indri 'minikube version' +``` + +--- + +### Step 0.2: Update Pulumi ACLs (BEFORE Tailscale serve) + +**Files to modify:** +- `pulumi/policy.hujson` + +**Changes:** +Add new tags and ACL rules: +```hujson +// In tagOwners section +"tag:registry": ["autogroup:admin"], +"tag:k8s": ["autogroup:admin"], + +// In acls section - add registry access +{ + "action": "accept", + "src": ["autogroup:member"], + "dst": ["tag:registry:443"], +}, +``` + +**Testing:** +```bash +mise run tailnet-preview # Review changes +mise run tailnet-up # Apply changes +``` + +--- + +### Step 0.3: Create Tailscale Services in Admin Console (MANUAL) + +> **CRITICAL**: Do this BEFORE running any ansible that calls `tailscale serve` + +1. Go to https://login.tailscale.com/admin/services +2. Create service `registry` with: + - Port: 443 (HTTPS) + - Host: indri +3. Apply tag `tag:registry` to indri if not already tagged + +**Verification:** +```bash +# Service should appear (even if not yet serving) +tailscale status | grep registry +``` + +--- + +### Step 0.4: Create Zot Registry Ansible Role + +**New files:** +``` +ansible/roles/zot/ +├── defaults/main.yml +├── tasks/main.yml +├── templates/ +│ ├── config.json.j2 +│ └── zot.plist.j2 +└── handlers/main.yml +``` + +**Key configuration (defaults/main.yml):** +```yaml +zot_version: "2.1.0" +zot_data_dir: "/Users/erichblume/zot" +zot_config_dir: "/Users/erichblume/.config/zot" +zot_port: 5000 +zot_log_dir: "/Users/erichblume/Library/Logs" + +# Pull-through cache configuration +zot_registries: + - name: docker.io + url: https://registry-1.docker.io + - name: ghcr.io + url: https://ghcr.io + - name: quay.io + url: https://quay.io +``` + +**LaunchAgent template (zot.plist.j2):** +```xml + + + + + Label + mcquack.eblume.zot + ProgramArguments + + + /opt/homebrew/bin/zot + serve + {{ zot_config_dir }}/config.json + + RunAtLoad + + KeepAlive + + StandardOutPath + {{ zot_log_dir }}/mcquack.zot.out.log + StandardErrorPath + {{ zot_log_dir }}/mcquack.zot.err.log + + +``` + +**Testing (after deploying role):** +```bash +# Check LaunchAgent is running +ssh indri 'launchctl list | grep zot' + +# Check zot is responding +ssh indri 'curl -s http://localhost:5000/v2/_catalog' +# Expected: {"repositories":[]} + +# Check logs for errors +ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log' +``` + +--- + +### Step 0.5: Add Zot to Tailscale Serve + +**Files to modify:** +- `ansible/roles/tailscale_serve/defaults/main.yml` + +**Changes:** +```yaml +# Add to tailscale_serve_services list +- name: svc:registry + https: + port: 443 + upstream: http://localhost:5000 +``` + +**Testing:** +```bash +# Deploy tailscale serve config +mise run provision-indri -- --tags tailscale-serve + +# Verify from gilbert (not indri - hairpinning doesn't work) +curl -s https://registry.tail8d86e.ts.net/v2/_catalog +# Expected: {"repositories":[]} +``` + +--- + +### Step 0.6: Create Zot Metrics Role + +**New files:** +``` +ansible/roles/zot_metrics/ +├── defaults/main.yml +├── tasks/main.yml +├── templates/ +│ ├── zot-metrics.sh.j2 +│ └── zot-metrics.plist.j2 +└── handlers/main.yml +``` + +**Metrics script pattern (zot-metrics.sh.j2):** +```bash +#!/bin/bash +# Collect Zot registry metrics for Prometheus textfile collector +set -euo pipefail + +METRICS_FILE="/opt/homebrew/var/node_exporter/textfile/zot.prom" +TEMP_FILE="${METRICS_FILE}.tmp" + +# Check if zot is up +if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then + echo "zot_up 1" > "$TEMP_FILE" +else + echo "zot_up 0" > "$TEMP_FILE" + mv "$TEMP_FILE" "$METRICS_FILE" + exit 0 +fi + +# Get metrics from zot's metrics endpoint (if enabled) +# Add storage metrics, cache hits, etc. +# ... + +mv "$TEMP_FILE" "$METRICS_FILE" +``` + +**Testing:** +```bash +# Deploy metrics role +mise run provision-indri -- --tags zot_metrics + +# Check metrics file exists and is updated ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom' +# Expected: zot_up 1 -# Logs in Loki -# Query: {service="zot"} +# Verify metrics appear in Prometheus (after a scrape cycle) +curl -s "http://indri:9090/api/v1/query?query=zot_up" | jq '.data.result[0].value[1]' +# Expected: "1" ``` -### Rollback +--- + +### Step 0.7: Add Zot Log Collection to Alloy + +**Files to modify:** +- `ansible/roles/alloy/templates/config.alloy.j2` + +**Changes:** +Add to the mcquack services log collection section: +```alloy +// Zot registry logs +local.file_match "zot_logs" { + path_targets = [ + {__path__ = "/Users/erichblume/Library/Logs/mcquack.zot.out.log", service = "zot", stream = "stdout"}, + {__path__ = "/Users/erichblume/Library/Logs/mcquack.zot.err.log", service = "zot", stream = "stderr"}, + ] +} + +loki.source.file "zot_logs" { + targets = local.file_match.zot_logs.targets + forward_to = [loki.write.local.receiver] +} +``` + +**Testing:** ```bash -minikube stop && minikube delete -launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist +# Deploy alloy config +mise run provision-indri -- --tags alloy + +# Restart alloy to pick up changes +ssh indri 'brew services restart grafana-alloy' + +# Wait a minute, then check Loki for zot logs +# In Grafana Explore, query: {service="zot"} ``` --- +### Step 0.8: Update indri-services-check Script + +**Files to modify:** +- `mise-tasks/indri-services-check` + +**Changes to add:** +```bash +# Add after existing service checks (around line 55) +check_service "zot" "ssh indri 'launchctl list | grep zot | grep -v \"^-\"'" +check_service "zot-metrics" "ssh indri 'launchctl list | grep zot-metrics | grep -v \"^-\"'" + +# Add to HTTP endpoints section (around line 65) +check_http "Zot Registry" "http://indri:5000/v2/_catalog" + +# Add metrics file check +check_service "Zot metrics" "ssh indri 'test -f /opt/homebrew/var/node_exporter/textfile/zot.prom'" +``` + +**Testing:** +```bash +# Run the health check +mise run indri-services-check + +# Expected output includes: +# zot... OK +# zot-metrics... OK +# Zot Registry... OK +# Zot metrics... OK +``` + +--- + +### Step 0.9: Install and Configure Podman on Indri + +**New files:** +``` +ansible/roles/podman/ +├── tasks/main.yml +└── handlers/main.yml +``` + +**Tasks (tasks/main.yml):** +```yaml +- name: Install podman via homebrew + community.general.homebrew: + name: podman + state: present + +- name: Initialize podman machine (if not exists) + ansible.builtin.command: + cmd: podman machine init --cpus 4 --memory 8192 --disk-size 100 + register: podman_init + changed_when: podman_init.rc == 0 + failed_when: false # May already exist + +- name: Start podman machine + ansible.builtin.command: + cmd: podman machine start + register: podman_start + changed_when: "'started' in podman_start.stdout" + failed_when: false # May already be running +``` + +**Testing:** +```bash +# Deploy podman role +mise run provision-indri -- --tags podman + +# Verify podman is working +ssh indri 'podman info' +ssh indri 'podman run --rm hello-world' +``` + +--- + +### Step 0.10: Install and Configure Minikube + +**New files:** +``` +ansible/roles/minikube/ +├── defaults/main.yml +├── tasks/main.yml +└── handlers/main.yml +``` + +**Defaults:** +```yaml +minikube_cpus: 4 +minikube_memory: 8192 +minikube_disk_size: "100g" +minikube_driver: podman +minikube_container_runtime: containerd +``` + +**Tasks:** +```yaml +- name: Install minikube via homebrew + community.general.homebrew: + name: minikube + state: present + +- name: Check if minikube cluster exists + ansible.builtin.command: + cmd: minikube status --format='{{.Host}}' + register: minikube_status + changed_when: false + failed_when: false + +- name: Start minikube cluster + ansible.builtin.command: + cmd: > + minikube start + --driver={{ minikube_driver }} + --container-runtime={{ minikube_container_runtime }} + --cpus={{ minikube_cpus }} + --memory={{ minikube_memory }} + --disk-size={{ minikube_disk_size }} + when: minikube_status.rc != 0 or 'Running' not in minikube_status.stdout +``` + +**Testing:** +```bash +# Deploy minikube role +mise run provision-indri -- --tags minikube + +# Verify cluster is running +ssh indri 'minikube status' +# Expected: host: Running, kubelet: Running, apiserver: Running + +# Test kubectl access from indri +ssh indri 'kubectl get nodes' +# Expected: minikube Ready control-plane ... +``` + +--- + +### Step 0.11: Configure Kubeconfig on Gilbert + +**Manual steps** (kubeconfig management is complex with work configs): + +```bash +# Copy minikube kubeconfig from indri +ssh indri 'cat ~/.kube/config' > /tmp/minikube-config.yaml + +# Merge into local kubeconfig (careful not to overwrite work configs!) +# Option A: Use KUBECONFIG env var to include multiple files +export KUBECONFIG=~/.kube/config:~/.kube/minikube.yaml + +# Option B: Manually merge contexts +kubectl config --kubeconfig=/tmp/minikube-config.yaml view --flatten > ~/.kube/minikube.yaml + +# Set minikube context +kubectl config use-context minikube + +# Verify connection from gilbert +kubectl get nodes +``` + +**Testing:** +```bash +# From gilbert, verify k8s access +kubectl cluster-info +kubectl get namespaces + +# Verify k9s can connect +k9s +# Should show the minikube cluster +``` + +--- + +### Step 0.12: Add Minikube to indri-services-check + +**Files to modify:** +- `mise-tasks/indri-services-check` + +**Changes:** +```bash +# Add new section for Kubernetes +echo "" +echo "Kubernetes cluster:" +check_service "minikube" "ssh indri 'minikube status --format={{.Host}} | grep -q Running'" +check_service "k8s-apiserver" "ssh indri 'kubectl get --raw /healthz'" +``` + +**Testing:** +```bash +mise run indri-services-check + +# Expected output includes: +# Kubernetes cluster: +# minikube... OK +# k8s-apiserver... OK +``` + +--- + +### Step 0.13: Create Zot Grafana Dashboard + +**New files:** +- `ansible/roles/grafana/files/dashboards/zot.json` + +**Dashboard panels:** +- `zot_up` - Service availability +- Storage usage (if zot exposes this metric) +- Cache hit/miss rates +- Pull/push request counts + +**Testing:** +```bash +# Deploy dashboard +mise run provision-indri -- --tags grafana + +# Verify in Grafana UI +# Navigate to Dashboards > Zot Registry +``` + +--- + +### Step 0.14: Create Minikube Grafana Dashboard + +**New files:** +- `ansible/roles/grafana/files/dashboards/minikube.json` + +**Dashboard panels:** +- Node CPU/Memory usage +- Pod count by namespace +- Container restart counts +- API server request latency + +**Note:** This may require deploying kube-state-metrics in the cluster first: +```bash +ssh indri 'kubectl apply -f https://raw.githubusercontent.com/kubernetes/kube-state-metrics/main/examples/standard/cluster-role.yaml' +# ... additional kube-state-metrics manifests +``` + +--- + +### Step 0.15: Create Zettelkasten Documentation + +**New files:** +- `~/code/personal/zk/zot.md` +- `~/code/personal/zk/minikube.md` + +**Template for zot.md:** +```markdown +--- +id: zot +aliases: + - zot + - container-registry +tags: + - blumeops +--- + +# Zot Registry Management Log + +Zot is an OCI-native container registry running on Indri, providing local caching and pull-through proxy for Docker Hub, GHCR, and Quay. + +## Service Details + +- URL: https://registry.tail8d86e.ts.net +- Local port: 5000 +- Data directory: ~/zot +- Config: ~/.config/zot/config.json +- Managed via: mcquack LaunchAgent + +## Pull-Through Cache + +Configured to proxy: +- docker.io (Docker Hub) +- ghcr.io (GitHub Container Registry) +- quay.io (Red Hat Quay) + +## Useful Commands + +\`\`\`bash +# List cached images +curl -s http://localhost:5000/v2/_catalog | jq + +# Check service status +launchctl list | grep zot + +# View logs +tail -f ~/Library/Logs/mcquack.zot.err.log +\`\`\` + +## Log + +### [DATE] +- Initial setup for k8s migration Phase 0 +``` + +--- + +### Step 0.16: Update Main Playbook + +**Files to modify:** +- `ansible/playbooks/indri.yml` + +**Changes:** +```yaml +# Add new roles to the roles list +- role: podman + tags: podman +- role: zot + tags: zot +- role: zot_metrics + tags: zot_metrics +- role: minikube + tags: minikube +``` + +--- + +### Phase 0 Verification Checklist + +Run after completing all steps: + +```bash +# 1. Full service health check +mise run indri-services-check +# All services should show OK, including new ones + +# 2. Registry functionality +curl -s https://registry.tail8d86e.ts.net/v2/_catalog +# Expected: {"repositories":[]} + +# 3. Pull through registry (test caching) +ssh indri 'podman pull localhost:5000/library/alpine:latest' +curl -s https://registry.tail8d86e.ts.net/v2/_catalog +# Expected: {"repositories":["library/alpine"]} + +# 4. Kubernetes cluster +ssh indri 'minikube status' +ssh indri 'kubectl get nodes' +kubectl get nodes # from gilbert + +# 5. Metrics in Prometheus +curl -s "http://indri:9090/api/v1/query?query=zot_up" +# Expected: value = 1 + +# 6. Logs in Loki +# In Grafana Explore: {service="zot"} +# Should see zot log entries + +# 7. Dashboards in Grafana +# Navigate to Zot Registry dashboard - panels should have data +# Navigate to Minikube dashboard - panels should have data + +# 8. k9s from gilbert +k9s +# Should connect and show minikube cluster +``` + +--- + +### Phase 0 Rollback + +If something goes wrong: + +```bash +# Stop and remove minikube +ssh indri 'minikube stop && minikube delete' + +# Stop and remove zot +ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist' +ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.zot.plist' + +# Remove podman machine +ssh indri 'podman machine stop && podman machine rm' + +# Remove from tailscale serve +ssh indri 'tailscale serve --service svc:registry reset' + +# Remove tags from Pulumi (revert policy.hujson changes) +mise run tailnet-up + +# Revert ansible playbook changes +git checkout ansible/playbooks/indri.yml +git checkout ansible/roles/tailscale_serve/defaults/main.yml +git checkout ansible/roles/alloy/templates/config.alloy.j2 + +# Remove new roles +rm -rf ansible/roles/{zot,zot_metrics,podman,minikube} + +# Remove zk cards +rm ~/code/personal/zk/{zot,minikube}.md +``` + +--- + +### New Files Summary + +| File | Purpose | +|------|---------| +| `ansible/roles/zot/` | Zot registry deployment | +| `ansible/roles/zot_metrics/` | Metrics collection for Zot | +| `ansible/roles/podman/` | Podman installation and setup | +| `ansible/roles/minikube/` | Minikube cluster setup | +| `ansible/roles/grafana/files/dashboards/zot.json` | Zot monitoring dashboard | +| `ansible/roles/grafana/files/dashboards/minikube.json` | K8s monitoring dashboard | +| `~/code/personal/zk/zot.md` | Zot management documentation | +| `~/code/personal/zk/minikube.md` | Minikube management documentation | + +### Modified Files Summary + +| File | Changes | +|------|---------| +| `Brewfile` | Add podman, minikube | +| `pulumi/policy.hujson` | Add tag:registry, tag:k8s, ACL rules | +| `ansible/playbooks/indri.yml` | Add new roles | +| `ansible/roles/tailscale_serve/defaults/main.yml` | Add svc:registry | +| `ansible/roles/alloy/templates/config.alloy.j2` | Add zot log collection | +| `mise-tasks/indri-services-check` | Add zot and k8s checks | + +--- + ## Phase 1: Kubernetes Infrastructure **Goal**: Tailscale operator + CloudNativePG operator -- 2.50.1 (Apple Git-155) From 26113aee420cbc5e593193e57bcc95fa9ce74b4f Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:33:09 -0800 Subject: [PATCH 03/21] Remove Brewfile from Phase 0 (it's for gilbert tooling only) Brewfile is for development tooling on gilbert, not for indri services. Ansible roles handle homebrew installations on indri directly. Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 59 +++++++++++-------------------------------- 1 file changed, 15 insertions(+), 44 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 90e7ad9..bf0bdb8 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -78,35 +78,7 @@ This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics --- -### Step 0.1: Update Brewfile and Install Dependencies - -**Files to modify:** -- `Brewfile` - -**Changes:** -```ruby -# Add to Brewfile -brew "podman" -brew "minikube" -brew "zot" # Check if available, otherwise install binary manually -``` - -**Testing:** -```bash -# On gilbert -brew bundle --file=Brewfile - -# On indri (via ansible or manual) -ssh indri 'brew install podman minikube' - -# Verify installations -ssh indri 'podman --version' -ssh indri 'minikube version' -``` - ---- - -### Step 0.2: Update Pulumi ACLs (BEFORE Tailscale serve) +### Step 0.1: Update Pulumi ACLs (BEFORE Tailscale serve) **Files to modify:** - `pulumi/policy.hujson` @@ -134,7 +106,7 @@ mise run tailnet-up # Apply changes --- -### Step 0.3: Create Tailscale Services in Admin Console (MANUAL) +### Step 0.2: Create Tailscale Services in Admin Console (MANUAL) > **CRITICAL**: Do this BEFORE running any ansible that calls `tailscale serve` @@ -152,7 +124,7 @@ tailscale status | grep registry --- -### Step 0.4: Create Zot Registry Ansible Role +### Step 0.3: Create Zot Registry Ansible Role **New files:** ``` @@ -225,7 +197,7 @@ ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log' --- -### Step 0.5: Add Zot to Tailscale Serve +### Step 0.4: Add Zot to Tailscale Serve **Files to modify:** - `ansible/roles/tailscale_serve/defaults/main.yml` @@ -251,7 +223,7 @@ curl -s https://registry.tail8d86e.ts.net/v2/_catalog --- -### Step 0.6: Create Zot Metrics Role +### Step 0.5: Create Zot Metrics Role **New files:** ``` @@ -305,7 +277,7 @@ curl -s "http://indri:9090/api/v1/query?query=zot_up" | jq '.data.result[0].valu --- -### Step 0.7: Add Zot Log Collection to Alloy +### Step 0.6: Add Zot Log Collection to Alloy **Files to modify:** - `ansible/roles/alloy/templates/config.alloy.j2` @@ -341,7 +313,7 @@ ssh indri 'brew services restart grafana-alloy' --- -### Step 0.8: Update indri-services-check Script +### Step 0.7: Update indri-services-check Script **Files to modify:** - `mise-tasks/indri-services-check` @@ -373,7 +345,7 @@ mise run indri-services-check --- -### Step 0.9: Install and Configure Podman on Indri +### Step 0.8: Install and Configure Podman on Indri **New files:** ``` @@ -416,7 +388,7 @@ ssh indri 'podman run --rm hello-world' --- -### Step 0.10: Install and Configure Minikube +### Step 0.9: Install and Configure Minikube **New files:** ``` @@ -477,7 +449,7 @@ ssh indri 'kubectl get nodes' --- -### Step 0.11: Configure Kubeconfig on Gilbert +### Step 0.10: Configure Kubeconfig on Gilbert **Manual steps** (kubeconfig management is complex with work configs): @@ -512,7 +484,7 @@ k9s --- -### Step 0.12: Add Minikube to indri-services-check +### Step 0.11: Add Minikube to indri-services-check **Files to modify:** - `mise-tasks/indri-services-check` @@ -538,7 +510,7 @@ mise run indri-services-check --- -### Step 0.13: Create Zot Grafana Dashboard +### Step 0.12: Create Zot Grafana Dashboard **New files:** - `ansible/roles/grafana/files/dashboards/zot.json` @@ -560,7 +532,7 @@ mise run provision-indri -- --tags grafana --- -### Step 0.14: Create Minikube Grafana Dashboard +### Step 0.13: Create Minikube Grafana Dashboard **New files:** - `ansible/roles/grafana/files/dashboards/minikube.json` @@ -579,7 +551,7 @@ ssh indri 'kubectl apply -f https://raw.githubusercontent.com/kubernetes/kube-st --- -### Step 0.15: Create Zettelkasten Documentation +### Step 0.14: Create Zettelkasten Documentation **New files:** - `~/code/personal/zk/zot.md` @@ -636,7 +608,7 @@ tail -f ~/Library/Logs/mcquack.zot.err.log --- -### Step 0.16: Update Main Playbook +### Step 0.15: Update Main Playbook **Files to modify:** - `ansible/playbooks/indri.yml` @@ -750,7 +722,6 @@ rm ~/code/personal/zk/{zot,minikube}.md | File | Changes | |------|---------| -| `Brewfile` | Add podman, minikube | | `pulumi/policy.hujson` | Add tag:registry, tag:k8s, ACL rules | | `ansible/playbooks/indri.yml` | Add new roles | | `ansible/roles/tailscale_serve/defaults/main.yml` | Add svc:registry | -- 2.50.1 (Apple Git-155) From ee42f0f1a2e217d715e4cbc237d1f559a472f84d Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:42:19 -0800 Subject: [PATCH 04/21] Fix Step 0.1: Use correct policy.hujson structure - Use 'grants' not 'acls' (that's the newer format) - Show exact line numbers and locations for each change - Include tagOwners, grants, and tests sections - Follow existing pattern with tag:blumeops in tagOwners Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index bf0bdb8..c448a3d 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -84,23 +84,30 @@ This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics - `pulumi/policy.hujson` **Changes:** -Add new tags and ACL rules: -```hujson -// In tagOwners section -"tag:registry": ["autogroup:admin"], -"tag:k8s": ["autogroup:admin"], -// In acls section - add registry access +1. Add new tags to `tagOwners` section (around line 104, after `"tag:feed"`): +```hujson +"tag:registry": ["autogroup:admin", "tag:blumeops"], +"tag:k8s": ["autogroup:admin", "tag:blumeops"], +``` + +2. Add registry grant to `grants` section (around line 48, after the `tag:pg` grant): +```hujson { - "action": "accept", - "src": ["autogroup:member"], - "dst": ["tag:registry:443"], + "src": ["autogroup:member"], + "dst": ["tag:registry"], + "ip": ["tcp:443"], }, ``` +3. Add test case to `tests` section (update Erich's accept list around line 111): +```hujson +"accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"], +``` + **Testing:** ```bash -mise run tailnet-preview # Review changes +mise run tailnet-preview # Review changes - should show new tags and grants mise run tailnet-up # Apply changes ``` -- 2.50.1 (Apple Git-155) From 97dce31171854b2ebaff518a33840c4d42f808a1 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:45:03 -0800 Subject: [PATCH 05/21] Remove member grant for registry - admins only Registry access restricted to admins (who already have full access). Members don't need to push/pull container images. K8s accesses registry locally on indri, not via Tailscale. Added note about Zot htpasswd auth for future reference. Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index c448a3d..4a84005 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -91,23 +91,23 @@ This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics "tag:k8s": ["autogroup:admin", "tag:blumeops"], ``` -2. Add registry grant to `grants` section (around line 48, after the `tag:pg` grant): -```hujson -{ - "src": ["autogroup:member"], - "dst": ["tag:registry"], - "ip": ["tcp:443"], -}, -``` +2. Add test cases to `tests` section: + - Update Erich's accept list (around line 111) to include registry: + ```hujson + "accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"], + ``` + - Update Allison's deny list (around line 117) to deny registry: + ```hujson + "deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443"], + ``` -3. Add test case to `tests` section (update Erich's accept list around line 111): -```hujson -"accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"], -``` +**Note:** No member grant needed for registry. Admins already have access via `"dst": ["*"]`. +K8s on indri accesses the registry locally (`localhost:5000`), not via Tailscale. +Zot supports htpasswd auth if we later need finer-grained control. **Testing:** ```bash -mise run tailnet-preview # Review changes - should show new tags and grants +mise run tailnet-preview # Review changes - should show new tags mise run tailnet-up # Apply changes ``` -- 2.50.1 (Apple Git-155) From 950604bf25e23c92d4e18a2db22d122f7b4f3444 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:47:24 -0800 Subject: [PATCH 06/21] Add tag:k8s grant for registry access (Woodpecker CI) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit K8s workloads (like Woodpecker CI) need to push/pull images from Zot. They'll get Tailscale identity via the operator (Phase 1) with tag:k8s. Added grant and test case for tag:k8s → tag:registry access. Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 4a84005..1d1dea5 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -91,7 +91,17 @@ This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics "tag:k8s": ["autogroup:admin", "tag:blumeops"], ``` -2. Add test cases to `tests` section: +2. Add k8s→registry grant to `grants` section (around line 62, in the Infrastructure section): +```hujson +// k8s workloads (e.g., Woodpecker CI) can push/pull from registry +{ + "src": ["tag:k8s"], + "dst": ["tag:registry"], + "ip": ["tcp:443"], +}, +``` + +3. Add test cases to `tests` section: - Update Erich's accept list (around line 111) to include registry: ```hujson "accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"], @@ -100,14 +110,23 @@ This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics ```hujson "deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443"], ``` + - Add k8s test case: + ```hujson + { + "src": "tag:k8s", + "accept": ["tag:registry:443"], + }, + ``` -**Note:** No member grant needed for registry. Admins already have access via `"dst": ["*"]`. -K8s on indri accesses the registry locally (`localhost:5000`), not via Tailscale. -Zot supports htpasswd auth if we later need finer-grained control. +**Note:** +- No member grant needed - admins have full access, members don't need registry +- `tag:k8s` grant allows Woodpecker CI (and other k8s workloads) to push/pull images +- K8s pods get Tailscale identity via the Tailscale Kubernetes Operator (Phase 1) +- Zot supports htpasswd auth if we later need finer-grained control **Testing:** ```bash -mise run tailnet-preview # Review changes - should show new tags +mise run tailnet-preview # Review changes - should show new tags and k8s grant mise run tailnet-up # Apply changes ``` -- 2.50.1 (Apple Git-155) From adba123ad484c1bd7f28cfaedf5b7a316f195f89 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:51:46 -0800 Subject: [PATCH 07/21] Document both registry modes: pull-through cache + private images MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added Zot config.json template showing sync extension for pull-through - Documented namespace convention: - registry.../docker.io/* → cached from Docker Hub - registry.../ghcr.io/* → cached from GHCR - registry.../blumeops/* → private images - Added testing steps for both pull-through and private push - Updated zk template with namespace table and build/push commands - Updated verification checklist Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 101 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 87 insertions(+), 14 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 1d1dea5..24527d5 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -171,8 +171,8 @@ zot_config_dir: "/Users/erichblume/.config/zot" zot_port: 5000 zot_log_dir: "/Users/erichblume/Library/Logs" -# Pull-through cache configuration -zot_registries: +# Pull-through cache registries (on-demand sync) +zot_sync_registries: - name: docker.io url: https://registry-1.docker.io - name: ghcr.io @@ -181,6 +181,53 @@ zot_registries: url: https://quay.io ``` +**Zot config.json template** (key sections): +```json +{ + "storage": { + "rootDirectory": "/Users/erichblume/zot" + }, + "http": { + "address": "0.0.0.0", + "port": "5000" + }, + "extensions": { + "sync": { + "enable": true, + "registries": [ + { + "urls": ["https://registry-1.docker.io"], + "content": [{"prefix": "**"}], + "onDemand": true, + "tlsVerify": true + }, + { + "urls": ["https://ghcr.io"], + "content": [{"prefix": "**"}], + "onDemand": true, + "tlsVerify": true + } + ] + } + } +} +``` + +**Two modes of operation:** + +1. **Pull-through cache** (automatic): When you pull `registry.tail8d86e.ts.net/docker.io/library/nginx:latest`, Zot fetches from Docker Hub and caches locally. Subsequent pulls are local. + +2. **Private images** (manual push): Push your own images to any path NOT matching a sync prefix: + ```bash + # From gilbert (after building) + podman push myapp:v1 registry.tail8d86e.ts.net/blumeops/myapp:v1 + ``` + +**Namespace convention:** +- `registry.tail8d86e.ts.net/docker.io/*` → cached from Docker Hub +- `registry.tail8d86e.ts.net/ghcr.io/*` → cached from GHCR +- `registry.tail8d86e.ts.net/blumeops/*` → private images (built by you/Woodpecker) + **LaunchAgent template (zot.plist.j2):** ```xml @@ -219,6 +266,18 @@ ssh indri 'curl -s http://localhost:5000/v2/_catalog' # Check logs for errors ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log' + +# Test pull-through cache (from indri) +ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest' +ssh indri 'curl -s http://localhost:5000/v2/_catalog' +# Expected: {"repositories":["docker.io/library/alpine"]} + +# Test private image push (from gilbert, after Step 0.4 tailscale serve) +podman pull alpine:latest +podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1 +podman push registry.tail8d86e.ts.net/blumeops/test:v1 +curl -s https://registry.tail8d86e.ts.net/v2/_catalog +# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]} ``` --- @@ -596,7 +655,9 @@ tags: # Zot Registry Management Log -Zot is an OCI-native container registry running on Indri, providing local caching and pull-through proxy for Docker Hub, GHCR, and Quay. +Zot is an OCI-native container registry running on Indri, providing: +1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits) +2. Private image storage for custom-built containers ## Service Details @@ -606,19 +667,28 @@ Zot is an OCI-native container registry running on Indri, providing local cachin - Config: ~/.config/zot/config.json - Managed via: mcquack LaunchAgent -## Pull-Through Cache +## Namespace Convention -Configured to proxy: -- docker.io (Docker Hub) -- ghcr.io (GitHub Container Registry) -- quay.io (Red Hat Quay) +| Path | Source | +|------|--------| +| `registry.../docker.io/*` | Cached from Docker Hub | +| `registry.../ghcr.io/*` | Cached from GHCR | +| `registry.../quay.io/*` | Cached from Quay | +| `registry.../blumeops/*` | Private images (yours) | ## Useful Commands \`\`\`bash -# List cached images +# List all images curl -s http://localhost:5000/v2/_catalog | jq +# Pull via cache (from indri or k8s) +podman pull localhost:5000/docker.io/library/nginx:latest + +# Build and push private image (from gilbert) +podman build -t registry.tail8d86e.ts.net/blumeops/myapp:v1 . +podman push registry.tail8d86e.ts.net/blumeops/myapp:v1 + # Check service status launchctl list | grep zot @@ -663,14 +733,17 @@ Run after completing all steps: mise run indri-services-check # All services should show OK, including new ones -# 2. Registry functionality +# 2. Registry functionality - pull-through cache +ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest' curl -s https://registry.tail8d86e.ts.net/v2/_catalog -# Expected: {"repositories":[]} +# Expected: {"repositories":["docker.io/library/alpine"]} -# 3. Pull through registry (test caching) -ssh indri 'podman pull localhost:5000/library/alpine:latest' +# 3. Registry functionality - private image push (from gilbert) +podman pull alpine:latest +podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1 +podman push registry.tail8d86e.ts.net/blumeops/test:v1 curl -s https://registry.tail8d86e.ts.net/v2/_catalog -# Expected: {"repositories":["library/alpine"]} +# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]} # 4. Kubernetes cluster ssh indri 'minikube status' -- 2.50.1 (Apple Git-155) From f064ba3afae4aec12802183ce27d004c38f247fe Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 13:56:17 -0800 Subject: [PATCH 08/21] Update Zot installation: clone to ~/code/3rd/ and build from source Zot isn't in homebrew. Following existing pattern (like kiwix-tools), clone to ~/code/3rd/zot on indri and build with 'make binary'. Updated defaults and LaunchAgent template to use built binary path. Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 24527d5..20ec9e0 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -152,6 +152,20 @@ tailscale status | grep registry ### Step 0.3: Create Zot Registry Ansible Role +**Note:** Zot is NOT in homebrew (no formula or tap). Clone to `~/code/3rd/` on indri and build from source (requires Go). + +**Prerequisites on indri:** +```bash +# Ensure Go is installed (via mise or homebrew) +ssh indri 'go version' + +# Clone zot repository +ssh indri 'git clone https://github.com/project-zot/zot.git ~/code/3rd/zot' + +# Build (creates bin/zot-darwin-arm64) +ssh indri 'cd ~/code/3rd/zot && make binary' +``` + **New files:** ``` ansible/roles/zot/ @@ -165,7 +179,8 @@ ansible/roles/zot/ **Key configuration (defaults/main.yml):** ```yaml -zot_version: "2.1.0" +zot_repo_dir: "/Users/erichblume/code/3rd/zot" +zot_binary: "{{ zot_repo_dir }}/bin/zot-darwin-arm64" zot_data_dir: "/Users/erichblume/zot" zot_config_dir: "/Users/erichblume/.config/zot" zot_port: 5000 @@ -238,8 +253,8 @@ zot_sync_registries: mcquack.eblume.zot ProgramArguments - - /opt/homebrew/bin/zot + + {{ zot_binary }} serve {{ zot_config_dir }}/config.json -- 2.50.1 (Apple Git-155) From 6d84ff7bca9ede98c459ccab897fec028620417e Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 14:00:54 -0800 Subject: [PATCH 09/21] Use forge mirror for zot, add third-party project guidance - Updated Step 0.3 to clone zot from forge mirror instead of GitHub - Added "Third-Party Projects" section to CLAUDE.md explaining: - Ask user to mirror 3rd party repos to forge first - Clone from mirror to ~/code/3rd/ - Avoids external dependencies Co-Authored-By: Claude Opus 4.5 --- CLAUDE.md | 10 ++++++++++ docs/k8s-migration.md | 4 ++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 023c0a1..bb73139 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -53,6 +53,16 @@ Some important places you can look: ~/code/work # FORBIDDEN, never go here, avoid searching it ``` +## Third-Party Projects + +When a task requires cloning or using a third-party git repository (e.g., for building from source), **ask the user to mirror it on forge first**, then clone from the mirror: +- Mirror location: `https://forge.tail8d86e.ts.net/eblume/.git` +- Clone to: `~/code/3rd//` + +This avoids external dependencies and ensures the project is available even if the upstream is unreachable. Example mirrors: +- `https://forge.tail8d86e.ts.net/eblume/zot.git` (container registry) +- `https://forge.tail8d86e.ts.net/eblume/mcquack.git` (LaunchAgent helper) + ## Task Discovery To discover pending blumeops tasks, run: diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 20ec9e0..23990b3 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -159,8 +159,8 @@ tailscale status | grep registry # Ensure Go is installed (via mise or homebrew) ssh indri 'go version' -# Clone zot repository -ssh indri 'git clone https://github.com/project-zot/zot.git ~/code/3rd/zot' +# Clone zot from forge mirror (not GitHub - avoids external dependency) +ssh indri 'git clone https://forge.tail8d86e.ts.net/eblume/zot.git ~/code/3rd/zot' # Build (creates bin/zot-darwin-arm64) ssh indri 'cd ~/code/3rd/zot && make binary' -- 2.50.1 (Apple Git-155) From c9d7acfafef5c2d1beee78d34dec7816e7df9d66 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 14:01:36 -0800 Subject: [PATCH 10/21] Fix example: mcquack is not a mirror, use devpi instead mcquack is Erich's own project, not a third-party mirror. Co-Authored-By: Claude Opus 4.5 --- CLAUDE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CLAUDE.md b/CLAUDE.md index bb73139..4ee5bb1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -61,7 +61,7 @@ When a task requires cloning or using a third-party git repository (e.g., for bu This avoids external dependencies and ensures the project is available even if the upstream is unreachable. Example mirrors: - `https://forge.tail8d86e.ts.net/eblume/zot.git` (container registry) -- `https://forge.tail8d86e.ts.net/eblume/mcquack.git` (LaunchAgent helper) +- `https://forge.tail8d86e.ts.net/eblume/devpi.git` (PyPI proxy) ## Task Discovery -- 2.50.1 (Apple Git-155) From 6df71537669a2fb1e89c13baca5cad43de640874 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 14:19:29 -0800 Subject: [PATCH 11/21] Update Step 0.3 with verified zot build process - Use localhost:3001 for forge clone (hairpinning limitation) - Document mise go@1.25 setup in repo directory - Correct build command: mise x -- make binary - Mark prerequisites as already completed with verification Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 23990b3..54dfec4 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -53,7 +53,7 @@ This plan details a phased migration of blumeops services from direct hosting on ### LaunchAgent Requirements (Critical) LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths**: -- `/opt/homebrew/bin/zot` not `zot` +- `/Users/erichblume/code/3rd/zot/bin/zot-darwin-arm64` for zot (built from source) - `/opt/homebrew/opt/mise/bin/mise x --` for mise-managed tools - `/opt/homebrew/opt/postgresql@18/bin/pg_dump` for postgres tools @@ -154,18 +154,23 @@ tailscale status | grep registry **Note:** Zot is NOT in homebrew (no formula or tap). Clone to `~/code/3rd/` on indri and build from source (requires Go). -**Prerequisites on indri:** +**Prerequisites on indri (ALREADY COMPLETED):** ```bash -# Ensure Go is installed (via mise or homebrew) -ssh indri 'go version' +# Clone zot from forge mirror (use localhost:3001 - hairpinning doesn't work on indri) +ssh indri 'git clone http://localhost:3001/eblume/zot.git ~/code/3rd/zot' -# Clone zot from forge mirror (not GitHub - avoids external dependency) -ssh indri 'git clone https://forge.tail8d86e.ts.net/eblume/zot.git ~/code/3rd/zot' +# Set up Go via mise (creates mise.toml in repo directory) +ssh indri 'cd ~/code/3rd/zot && mise use go@1.25' -# Build (creates bin/zot-darwin-arm64) -ssh indri 'cd ~/code/3rd/zot && make binary' +# Build (creates bin/zot-darwin-arm64, ~183MB) +ssh indri 'cd ~/code/3rd/zot && mise x -- make binary' + +# Verify binary exists +ssh indri 'ls -la ~/code/3rd/zot/bin/zot-darwin-arm64' ``` +**Build verified:** Binary at `~/code/3rd/zot/bin/zot-darwin-arm64` (183MB, ARM64 native). + **New files:** ``` ansible/roles/zot/ @@ -1158,4 +1163,5 @@ ansible/ | Grafana | `grafana/grafana:latest` | | Kiwix | `ghcr.io/kiwix/kiwix-serve:3.8.1` | | Woodpecker | `woodpeckerci/woodpecker-server` | -| Zot | `ghcr.io/project-zot/zot-linux-arm64` | + +Note: Zot runs as a native binary on indri (built from source at `~/code/3rd/zot`), not as a container. -- 2.50.1 (Apple Git-155) From 546fe08d9c68dd4473aa71d9b10217fbf3eaec01 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 14:23:22 -0800 Subject: [PATCH 12/21] Fix Zot paths in Technical Decisions section Update to match Phase 0 details: - Built from source, not homebrew - Config at ~/.config/zot/config.json - Data at ~/zot/ - Binary path documented Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 54dfec4..e691ce5 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -31,8 +31,10 @@ This plan details a phased migration of blumeops services from direct hosting on ### Container Registry: Zot - OCI-native, lightweight - Native support for proxying multiple registries (Docker Hub, GHCR, Quay) -- Single binary, ARM64 native -- Config at `/etc/zot/config.json` +- Built from source at `~/code/3rd/zot` (not in homebrew) +- Binary: `~/code/3rd/zot/bin/zot-darwin-arm64` +- Config: `~/.config/zot/config.json` +- Data: `~/zot/` ### Minikube Driver: Podman - Rootless containers for better security -- 2.50.1 (Apple Git-155) From 9edecf78ddc48958331613854514b6269e2779c5 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 14:29:09 -0800 Subject: [PATCH 13/21] Defer tag:k8s to Phase 1, clarify kubeconfig setup - Remove tag:k8s from Phase 0 Step 0.1 (not needed until Tailscale Kubernetes Operator is deployed) - Add tag:k8s ACL setup as new Step 1 in Phase 1 - Clarify Step 0.10: no special Tailscale service needed for K8s API (admin wildcard grant covers it) - Add sed commands to replace localhost with indri in kubeconfig Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 77 +++++++++++++++++++++++++++---------------- 1 file changed, 48 insertions(+), 29 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index e691ce5..5742bca 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -87,23 +87,12 @@ This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics **Changes:** -1. Add new tags to `tagOwners` section (around line 104, after `"tag:feed"`): +1. Add new tag to `tagOwners` section (around line 104, after `"tag:feed"`): ```hujson "tag:registry": ["autogroup:admin", "tag:blumeops"], -"tag:k8s": ["autogroup:admin", "tag:blumeops"], ``` -2. Add k8s→registry grant to `grants` section (around line 62, in the Infrastructure section): -```hujson -// k8s workloads (e.g., Woodpecker CI) can push/pull from registry -{ - "src": ["tag:k8s"], - "dst": ["tag:registry"], - "ip": ["tcp:443"], -}, -``` - -3. Add test cases to `tests` section: +2. Add test cases to `tests` section: - Update Erich's accept list (around line 111) to include registry: ```hujson "accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"], @@ -112,23 +101,15 @@ This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics ```hujson "deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443"], ``` - - Add k8s test case: - ```hujson - { - "src": "tag:k8s", - "accept": ["tag:registry:443"], - }, - ``` **Note:** -- No member grant needed - admins have full access, members don't need registry -- `tag:k8s` grant allows Woodpecker CI (and other k8s workloads) to push/pull images -- K8s pods get Tailscale identity via the Tailscale Kubernetes Operator (Phase 1) +- No member grant needed - admins have full access via wildcard, members don't need registry +- `tag:k8s` is added later in Phase 1 when the Tailscale Kubernetes Operator is deployed - Zot supports htpasswd auth if we later need finer-grained control **Testing:** ```bash -mise run tailnet-preview # Review changes - should show new tags and k8s grant +mise run tailnet-preview # Review changes - should show new tag mise run tailnet-up # Apply changes ``` @@ -558,12 +539,19 @@ ssh indri 'kubectl get nodes' ### Step 0.10: Configure Kubeconfig on Gilbert +**No special Tailscale service needed** - admin users already have full access to indri via the `autogroup:admin → * → *` grant. Gilbert can reach the K8s API server on indri directly. + **Manual steps** (kubeconfig management is complex with work configs): ```bash # Copy minikube kubeconfig from indri ssh indri 'cat ~/.kube/config' > /tmp/minikube-config.yaml +# IMPORTANT: Replace localhost/127.0.0.1 with indri's hostname +# Minikube's kubeconfig points to localhost since it runs locally on indri +sed -i '' 's|https://127.0.0.1:|https://indri:|g' /tmp/minikube-config.yaml +sed -i '' 's|https://localhost:|https://indri:|g' /tmp/minikube-config.yaml + # Merge into local kubeconfig (careful not to overwrite work configs!) # Option A: Use KUBECONFIG env var to include multiple files export KUBECONFIG=~/.kube/config:~/.kube/minikube.yaml @@ -857,12 +845,43 @@ rm ~/code/personal/zk/{zot,minikube}.md ### Steps -1. **Create Tailscale OAuth client** +1. **Update Pulumi ACLs for k8s workloads** + + Add `tag:k8s` to `pulumi/policy.hujson` - this tag is for k8s workloads that need to access other services (e.g., Woodpecker CI pushing to registry). + + **Changes to tagOwners:** + ```hujson + "tag:k8s": ["autogroup:admin", "tag:blumeops"], + ``` + + **Add grant for k8s→registry access:** + ```hujson + // k8s workloads (e.g., Woodpecker CI) can push/pull from registry + { + "src": ["tag:k8s"], + "dst": ["tag:registry"], + "ip": ["tcp:443"], + }, + ``` + + **Add test case:** + ```hujson + { + "src": "tag:k8s", + "accept": ["tag:registry:443"], + }, + ``` + + ```bash + mise run tailnet-preview && mise run tailnet-up + ``` + +2. **Create Tailscale OAuth client** - Scopes: Devices Core, Auth Keys, Services write - Tag: `tag:k8s-operator` - Store in 1Password -2. **Deploy Tailscale Kubernetes Operator** +3. **Deploy Tailscale Kubernetes Operator** ```bash helm repo add tailscale https://pkgs.tailscale.com/helmcharts helm install tailscale-operator tailscale/tailscale-operator \ @@ -871,12 +890,12 @@ rm ~/code/personal/zk/{zot,minikube}.md --set oauth.clientSecret=$CLIENT_SECRET ``` -3. **Deploy CloudNativePG operator** +4. **Deploy CloudNativePG operator** ```bash kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml ``` -4. **Create PostgreSQL cluster** +5. **Create PostgreSQL cluster** ```yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster @@ -892,7 +911,7 @@ rm ~/code/personal/zk/{zot,minikube}.md enablePodMonitor: true ``` -5. **Update Alloy config** +6. **Update Alloy config** - Add kubernetes_sd_configs for k8s metrics - Scrape operator metrics -- 2.50.1 (Apple Git-155) From cff951a0f94af6e260db01e827e138a7761c62a4 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 16:11:43 -0800 Subject: [PATCH 14/21] Add quay.io to zot sync config and namespace convention Config template and namespace docs now match defaults/main.yml Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 5742bca..302af43 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -209,6 +209,12 @@ zot_sync_registries: "content": [{"prefix": "**"}], "onDemand": true, "tlsVerify": true + }, + { + "urls": ["https://quay.io"], + "content": [{"prefix": "**"}], + "onDemand": true, + "tlsVerify": true } ] } @@ -229,6 +235,7 @@ zot_sync_registries: **Namespace convention:** - `registry.tail8d86e.ts.net/docker.io/*` → cached from Docker Hub - `registry.tail8d86e.ts.net/ghcr.io/*` → cached from GHCR +- `registry.tail8d86e.ts.net/quay.io/*` → cached from Quay - `registry.tail8d86e.ts.net/blumeops/*` → private images (built by you/Woodpecker) **LaunchAgent template (zot.plist.j2):** -- 2.50.1 (Apple Git-155) From bf1664d1172de265c916f32f1192aabf16eb55fc Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 16:31:35 -0800 Subject: [PATCH 15/21] Move tailscale URL tests from Step 0.3 to Step 0.4 registry.tail8d86e.ts.net isn't available until tailscale serve is configured in Step 0.4. Keep localhost tests in Step 0.3. Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 302af43..3941074 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -277,17 +277,10 @@ ssh indri 'curl -s http://localhost:5000/v2/_catalog' # Check logs for errors ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log' -# Test pull-through cache (from indri) +# Test pull-through cache (from indri, using localhost) ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest' ssh indri 'curl -s http://localhost:5000/v2/_catalog' # Expected: {"repositories":["docker.io/library/alpine"]} - -# Test private image push (from gilbert, after Step 0.4 tailscale serve) -podman pull alpine:latest -podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1 -podman push registry.tail8d86e.ts.net/blumeops/test:v1 -curl -s https://registry.tail8d86e.ts.net/v2/_catalog -# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]} ``` --- @@ -313,7 +306,14 @@ mise run provision-indri -- --tags tailscale-serve # Verify from gilbert (not indri - hairpinning doesn't work) curl -s https://registry.tail8d86e.ts.net/v2/_catalog -# Expected: {"repositories":[]} +# Expected: {"repositories":["docker.io/library/alpine"]} (from Step 0.3 test) + +# Test private image push from gilbert +podman pull alpine:latest +podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1 +podman push registry.tail8d86e.ts.net/blumeops/test:v1 +curl -s https://registry.tail8d86e.ts.net/v2/_catalog +# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]} ``` --- -- 2.50.1 (Apple Git-155) From b703abe4d1e8b0f9fda5f321691086eaa3b840ee Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 16:37:50 -0800 Subject: [PATCH 16/21] Remove manual alloy restart from Step 0.6 Ansible handler restarts alloy automatically when config changes Co-Authored-By: Claude Opus 4.5 --- docs/k8s-migration.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/docs/k8s-migration.md b/docs/k8s-migration.md index 3941074..939587a 100644 --- a/docs/k8s-migration.md +++ b/docs/k8s-migration.md @@ -396,12 +396,9 @@ loki.source.file "zot_logs" { **Testing:** ```bash -# Deploy alloy config +# Deploy alloy config (handler restarts alloy automatically if config changed) mise run provision-indri -- --tags alloy -# Restart alloy to pick up changes -ssh indri 'brew services restart grafana-alloy' - # Wait a minute, then check Loki for zot logs # In Grafana Explore, query: {service="zot"} ``` -- 2.50.1 (Apple Git-155) From b333b7ff2c905231b6d71103447d09f4aac8e194 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 16:47:29 -0800 Subject: [PATCH 17/21] Move plan to plans/ directory, add completion step - Rename docs/k8s-migration.md to plans/k8s-migration.md - Create plans/completed/ for finished plans - Add Plan Completion section with instructions to archive when done Co-Authored-By: Claude Opus 4.5 --- {docs => plans}/k8s-migration.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) rename {docs => plans}/k8s-migration.md (99%) diff --git a/docs/k8s-migration.md b/plans/k8s-migration.md similarity index 99% rename from docs/k8s-migration.md rename to plans/k8s-migration.md index 939587a..ff09b5f 100644 --- a/docs/k8s-migration.md +++ b/plans/k8s-migration.md @@ -1190,3 +1190,15 @@ ansible/ | Woodpecker | `woodpeckerci/woodpecker-server` | Note: Zot runs as a native binary on indri (built from source at `~/code/3rd/zot`), not as a container. + +--- + +## Plan Completion + +When all phases are complete and verified: + +```bash +# Move plan to completed directory with completion date +git mv plans/k8s-migration.md plans/completed/k8s-migration.$(date +%Y-%m-%d).md +git commit -m "Complete k8s migration plan" +``` -- 2.50.1 (Apple Git-155) From 2c8ced07b471ce50496c4043c36e32d01470ef56 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 16:58:35 -0800 Subject: [PATCH 18/21] Tighten podman ansible tasks based on manual testing - Use 'started successfully' instead of just 'started' for changed_when - Use specific failed_when: rc not in [0, 125] instead of false - 125 = already exists (init) or already running (start) Tested manually on indri - podman machine initialized and running. Co-Authored-By: Claude Opus 4.5 --- plans/k8s-migration.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/plans/k8s-migration.md b/plans/k8s-migration.md index ff09b5f..f37a231 100644 --- a/plans/k8s-migration.md +++ b/plans/k8s-migration.md @@ -458,14 +458,14 @@ ansible/roles/podman/ cmd: podman machine init --cpus 4 --memory 8192 --disk-size 100 register: podman_init changed_when: podman_init.rc == 0 - failed_when: false # May already exist + failed_when: podman_init.rc not in [0, 125] # 125 = already exists - name: Start podman machine ansible.builtin.command: cmd: podman machine start register: podman_start - changed_when: "'started' in podman_start.stdout" - failed_when: false # May already be running + changed_when: "'started successfully' in podman_start.stdout" + failed_when: podman_start.rc not in [0, 125] # 125 = already running ``` **Testing:** -- 2.50.1 (Apple Git-155) From 5f350841769580059d185ff506730cdcb8f6a8c3 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 17:09:49 -0800 Subject: [PATCH 19/21] Add Step 0.16: Enable NFS on Sifaka, bump minikube disk to 200g - Add manual step for enabling NFS on Synology DSM - Document NFS permissions config for k8s-volumes share - Include verification commands for testing NFS mount - Bump minikube disk-size from 100g to 200g - Add note explaining storage options (hostPath, NFS, SMB) Co-Authored-By: Claude Opus 4.5 --- plans/k8s-migration.md | 44 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/plans/k8s-migration.md b/plans/k8s-migration.md index f37a231..7c92e74 100644 --- a/plans/k8s-migration.md +++ b/plans/k8s-migration.md @@ -494,11 +494,16 @@ ansible/roles/minikube/ ```yaml minikube_cpus: 4 minikube_memory: 8192 -minikube_disk_size: "100g" +minikube_disk_size: "200g" minikube_driver: podman minikube_container_runtime: containerd ``` +**Note on storage:** The disk-size is for node-local storage only (container images, emptyDir, local PVs). Pods can also mount external storage: +- **hostPath** - indri filesystem (e.g., `~/transmission/` for kiwix ZIM files) +- **NFS** - sifaka volumes (Synology supports NFS natively, easiest for k8s) +- **SMB/CIFS** - requires csi-driver-smb; sifaka currently uses SMB for desktop mounts + **Tasks:** ```yaml - name: Install minikube via homebrew @@ -738,6 +743,43 @@ tail -f ~/Library/Logs/mcquack.zot.err.log --- +### Step 0.16: Enable NFS on Sifaka (MANUAL) + +Enable NFS file sharing on the Synology NAS for k8s workloads that need persistent storage. + +**Steps in DSM web UI** (http://sifaka:5000/): + +1. **Enable NFS service:** + - Control Panel → File Services → NFS + - Check "Enable NFS service" + - Maximum NFS protocol: NFSv4.1 + +2. **Configure NFS permissions on shared folder:** + - Control Panel → Shared Folder + - Select the folder to share (e.g., `backups` or create `k8s-volumes`) + - Edit → NFS Permissions → Create + - Hostname/IP: `indri` (or `100.64.0.0/10` for all tailnet) + - Privilege: Read/Write + - Squash: Map root to admin + - Security: sys + - Enable asynchronous: Yes (for performance) + +3. **Note the export path** (shown in NFS Permissions tab): + - Format: `/volume1/shared-folder-name` + +**Verification:** +```bash +# From indri, test NFS mount +ssh indri 'sudo mkdir -p /mnt/sifaka-test' +ssh indri 'sudo mount -t nfs sifaka:/volume1/k8s-volumes /mnt/sifaka-test' +ssh indri 'ls /mnt/sifaka-test' +ssh indri 'sudo umount /mnt/sifaka-test' +``` + +**Note:** This is for future use - no workloads in Phase 0 require NFS. Pods will use NFS PersistentVolumes in later phases. + +--- + ### Phase 0 Verification Checklist Run after completing all steps: -- 2.50.1 (Apple Git-155) From a31e8935c92b4f95c85797785b25e147866fcbdc Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 17:11:34 -0800 Subject: [PATCH 20/21] Remove Step 0.16 (NFS on Sifaka) Nothing in Phase 0 requires NFS, and it's per-share config anyway. Will add when actually needed. Co-Authored-By: Claude Opus 4.5 --- plans/k8s-migration.md | 37 ------------------------------------- 1 file changed, 37 deletions(-) diff --git a/plans/k8s-migration.md b/plans/k8s-migration.md index 7c92e74..cca23cf 100644 --- a/plans/k8s-migration.md +++ b/plans/k8s-migration.md @@ -743,43 +743,6 @@ tail -f ~/Library/Logs/mcquack.zot.err.log --- -### Step 0.16: Enable NFS on Sifaka (MANUAL) - -Enable NFS file sharing on the Synology NAS for k8s workloads that need persistent storage. - -**Steps in DSM web UI** (http://sifaka:5000/): - -1. **Enable NFS service:** - - Control Panel → File Services → NFS - - Check "Enable NFS service" - - Maximum NFS protocol: NFSv4.1 - -2. **Configure NFS permissions on shared folder:** - - Control Panel → Shared Folder - - Select the folder to share (e.g., `backups` or create `k8s-volumes`) - - Edit → NFS Permissions → Create - - Hostname/IP: `indri` (or `100.64.0.0/10` for all tailnet) - - Privilege: Read/Write - - Squash: Map root to admin - - Security: sys - - Enable asynchronous: Yes (for performance) - -3. **Note the export path** (shown in NFS Permissions tab): - - Format: `/volume1/shared-folder-name` - -**Verification:** -```bash -# From indri, test NFS mount -ssh indri 'sudo mkdir -p /mnt/sifaka-test' -ssh indri 'sudo mount -t nfs sifaka:/volume1/k8s-volumes /mnt/sifaka-test' -ssh indri 'ls /mnt/sifaka-test' -ssh indri 'sudo umount /mnt/sifaka-test' -``` - -**Note:** This is for future use - no workloads in Phase 0 require NFS. Pods will use NFS PersistentVolumes in later phases. - ---- - ### Phase 0 Verification Checklist Run after completing all steps: -- 2.50.1 (Apple Git-155) From bcd96d86f03028fd81dcef249466c78fa758205c Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sat, 17 Jan 2026 17:32:55 -0800 Subject: [PATCH 21/21] Phase 0 review fixes - Bump podman disk-size to 220G (> minikube's 200G) - Fix Step 0.3 test to use curl instead of podman (not installed yet) - Simplify Step 0.5 zot metrics to just zot_up for now - Add Backup Strategy section to Technical Decisions - Add zot restart handler to Step 0.3 - Move dashboard steps to Phase 0 Follow-up section - Renumber steps (0.14->0.12, 0.15->0.13) - Fix Modified Files Summary (tag:k8s deferred to Phase 1) Co-Authored-By: Claude Opus 4.5 --- plans/k8s-migration.md | 121 +++++++++++++++++++++-------------------- 1 file changed, 61 insertions(+), 60 deletions(-) diff --git a/plans/k8s-migration.md b/plans/k8s-migration.md index cca23cf..5d3fd61 100644 --- a/plans/k8s-migration.md +++ b/plans/k8s-migration.md @@ -62,6 +62,23 @@ LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). `brew services` handles this automatically but those aren't tracked in ansible. +### Backup Strategy + +Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at `/Volumes/backups`. This ensures backups continue even if k8s is down. + +| Service | Backup Approach | +|---------|-----------------| +| **Zot Registry** | No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control | +| **Minikube** | No backup of cluster state - declarative manifests in git, can recreate | +| **PostgreSQL (k8s)** | CloudNativePG scheduled backups to sifaka (Phase 1) | +| **Grafana (k8s)** | Dashboards in ansible source control, no runtime backup needed | +| **Miniflux (k8s)** | Database backed up via CloudNativePG | +| **Forgejo (k8s)** | Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration | +| **devpi (k8s)** | Private packages backed up, PyPI cache re-fetchable | +| **Kiwix (k8s)** | ZIM files re-downloadable via torrent, no backup needed | + +**Borgmatic config changes:** None required for Phase 0. Future phases may add k8s PV paths if needed. + --- ## Phase 0: Foundation @@ -265,6 +282,23 @@ zot_sync_registries: ``` +**Handlers (handlers/main.yml):** +```yaml +- name: Restart zot + ansible.builtin.command: + cmd: launchctl kickstart -k gui/$(id -u)/mcquack.eblume.zot + listen: restart zot +``` + +**Tasks should notify handler on config change:** +```yaml +- name: Deploy zot config + ansible.builtin.template: + src: config.json.j2 + dest: "{{ zot_config_dir }}/config.json" + notify: restart zot +``` + **Testing (after deploying role):** ```bash # Check LaunchAgent is running @@ -277,8 +311,9 @@ ssh indri 'curl -s http://localhost:5000/v2/_catalog' # Check logs for errors ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log' -# Test pull-through cache (from indri, using localhost) -ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest' +# Test pull-through cache via curl (podman not installed until Step 0.8) +ssh indri 'curl -s http://localhost:5000/v2/docker.io/library/alpine/manifests/latest -H "Accept: application/vnd.oci.image.manifest.v1+json"' +# Should return manifest JSON (triggers cache fetch from Docker Hub) ssh indri 'curl -s http://localhost:5000/v2/_catalog' # Expected: {"repositories":["docker.io/library/alpine"]} ``` @@ -345,17 +380,13 @@ if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then echo "zot_up 1" > "$TEMP_FILE" else echo "zot_up 0" > "$TEMP_FILE" - mv "$TEMP_FILE" "$METRICS_FILE" - exit 0 fi -# Get metrics from zot's metrics endpoint (if enabled) -# Add storage metrics, cache hits, etc. -# ... - mv "$TEMP_FILE" "$METRICS_FILE" ``` +**Note:** Start with just `zot_up` for now. Additional metrics (storage usage, cache stats) can be added later after reviewing zot's metrics endpoint. + **Testing:** ```bash # Deploy metrics role @@ -455,7 +486,7 @@ ansible/roles/podman/ - name: Initialize podman machine (if not exists) ansible.builtin.command: - cmd: podman machine init --cpus 4 --memory 8192 --disk-size 100 + cmd: podman machine init --cpus 4 --memory 8192 --disk-size 220 register: podman_init changed_when: podman_init.rc == 0 failed_when: podman_init.rc not in [0, 125] # 125 = already exists @@ -614,48 +645,7 @@ mise run indri-services-check --- -### Step 0.12: Create Zot Grafana Dashboard - -**New files:** -- `ansible/roles/grafana/files/dashboards/zot.json` - -**Dashboard panels:** -- `zot_up` - Service availability -- Storage usage (if zot exposes this metric) -- Cache hit/miss rates -- Pull/push request counts - -**Testing:** -```bash -# Deploy dashboard -mise run provision-indri -- --tags grafana - -# Verify in Grafana UI -# Navigate to Dashboards > Zot Registry -``` - ---- - -### Step 0.13: Create Minikube Grafana Dashboard - -**New files:** -- `ansible/roles/grafana/files/dashboards/minikube.json` - -**Dashboard panels:** -- Node CPU/Memory usage -- Pod count by namespace -- Container restart counts -- API server request latency - -**Note:** This may require deploying kube-state-metrics in the cluster first: -```bash -ssh indri 'kubectl apply -f https://raw.githubusercontent.com/kubernetes/kube-state-metrics/main/examples/standard/cluster-role.yaml' -# ... additional kube-state-metrics manifests -``` - ---- - -### Step 0.14: Create Zettelkasten Documentation +### Step 0.12: Create Zettelkasten Documentation **New files:** - `~/code/personal/zk/zot.md` @@ -723,7 +713,7 @@ tail -f ~/Library/Logs/mcquack.zot.err.log --- -### Step 0.15: Update Main Playbook +### Step 0.13: Update Main Playbook **Files to modify:** - `ansible/playbooks/indri.yml` @@ -777,11 +767,7 @@ curl -s "http://indri:9090/api/v1/query?query=zot_up" # In Grafana Explore: {service="zot"} # Should see zot log entries -# 7. Dashboards in Grafana -# Navigate to Zot Registry dashboard - panels should have data -# Navigate to Minikube dashboard - panels should have data - -# 8. k9s from gilbert +# 7. k9s from gilbert k9s # Should connect and show minikube cluster ``` @@ -823,6 +809,23 @@ rm ~/code/personal/zk/{zot,minikube}.md --- +### Phase 0 Follow-up: Grafana Dashboards + +After Phase 0 is running and stable, create monitoring dashboards: + +**Zot Dashboard** (`ansible/roles/grafana/files/dashboards/zot.json`): +1. Check what metrics zot exposes: `ssh indri 'curl -s http://localhost:5000/metrics'` +2. Review community dashboards for inspiration (copy permitted if license allows) +3. Create dashboard with available metrics (at minimum: `zot_up`) + +**Minikube Dashboard** (`ansible/roles/grafana/files/dashboards/minikube.json`): +1. Deploy kube-state-metrics if needed for additional cluster metrics +2. Review what Prometheus can scrape from the cluster +3. Review community dashboards for inspiration (copy permitted if license allows) +4. Create dashboard with relevant panels (node usage, pod counts, etc.) + +--- + ### New Files Summary | File | Purpose | @@ -831,8 +834,6 @@ rm ~/code/personal/zk/{zot,minikube}.md | `ansible/roles/zot_metrics/` | Metrics collection for Zot | | `ansible/roles/podman/` | Podman installation and setup | | `ansible/roles/minikube/` | Minikube cluster setup | -| `ansible/roles/grafana/files/dashboards/zot.json` | Zot monitoring dashboard | -| `ansible/roles/grafana/files/dashboards/minikube.json` | K8s monitoring dashboard | | `~/code/personal/zk/zot.md` | Zot management documentation | | `~/code/personal/zk/minikube.md` | Minikube management documentation | @@ -840,7 +841,7 @@ rm ~/code/personal/zk/{zot,minikube}.md | File | Changes | |------|---------| -| `pulumi/policy.hujson` | Add tag:registry, tag:k8s, ACL rules | +| `pulumi/policy.hujson` | Add tag:registry | | `ansible/playbooks/indri.yml` | Add new roles | | `ansible/roles/tailscale_serve/defaults/main.yml` | Add svc:registry | | `ansible/roles/alloy/templates/config.alloy.j2` | Add zot log collection | -- 2.50.1 (Apple Git-155)