K8s Migration Phase 0: Foundation Infrastructure #26

Merged
eblume merged 22 commits from feature/k8s-migration-phase0 into main 2026-01-18 12:06:28 -08:00

22 commits

Author SHA1 Message Date
57ba61256c Fix zot dashboard: correct latency metric, replace memory with storage
- Use zot_http_method_latency_seconds_bucket (not zot_http_request_duration_*)
- Replace Memory Usage stat with Total Storage
- Replace Memory Over Time with Storage by Repository

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:05:09 -08:00
6dc698d4ef Fix zot dashboard job label filter
Alloy labels scraped metrics with job="prometheus.scrape.zot",
not just "zot". Updated all dashboard queries to match.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:02:27 -08:00
09983b3ae0 Fix Jinja2 escaping in minikube-metrics template
The {{.Host}} Go template syntax was being interpreted by Jinja2.
Added {% raw %} block to escape it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 11:58:38 -08:00
3337392b55 Add Grafana dashboards and metrics collection for zot and minikube
Phase 0 followup:
- Enable zot native metrics endpoint (/metrics)
- Add zot scraping to Alloy config
- Create zot Grafana dashboard (status, requests, latency, memory)
- Create minikube_metrics role (collects cluster health metrics)
- Create minikube Grafana dashboard (status, pod/namespace counts, logs)
- Update indri-services-check with minikube-metrics checks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 11:53:04 -08:00
67d3ba65f3 Add Step 0.13 implementation notes - Phase 0 complete
All Phase 0 steps completed:
- Zot registry with pull-through cache
- Podman container runtime
- Minikube Kubernetes cluster
- Remote kubectl access with 1Password credentials
- Health checks in indri-services-check
- Zettelkasten documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:53:28 -08:00
bb58898731 Add Step 0.12 implementation notes
Zettelkasten documentation created:
- zot.md: Container registry management log
- minikube.md: Kubernetes cluster management log
- Updated main blumeops card with new services, tags, and ports

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:52:39 -08:00
ce3e5fc1e0 Add minikube health checks to indri-services-check
Step 0.11 implementation:
- Check minikube status via SSH to indri
- Check k8s API server accessible from indri
- Check k8s API server accessible remotely from gilbert (via 1Password creds)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:37:12 -08:00
5ba2cf3861 Add fish abbreviations for minikube-indri to plan docs
Step 0.12 update:
- Document ki, k9i, k9 abbreviations for quick kubectl/k9s access
- These avoid accidentally triggering work SAML flow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:30:13 -08:00
e032e27b66 Configure remote kubectl access with 1Password credentials
Step 0.10 implementation:
- Recreate minikube with --apiserver-names=indri --listen-address=0.0.0.0
- Add kubectl-credential-1password exec plugin for 1Password integration
- Client certs fetched from 1Password on-demand (no private keys on disk)
- CA cert stored locally (not secret - public key for server verification)

Minikube role updates:
- Add minikube_apiserver_names and minikube_listen_address variables
- Update tasks to include remote access flags

This mirrors the 1Password SSH agent pattern - biometric auth required
for each kubectl command that needs credentials.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:12:58 -08:00
9950c8207f Update plan with Step 0.10 and 0.12 implementation details
Step 0.10 (kubeconfig on gilbert):
- Document research on kubectl remote access options
- Choose --apiserver-names + --listen-address approach
- Add references to sources

Step 0.12 (zettelkasten):
- Add instructions to update main blumeops card
- Fix zot port from 5000 to 5050
- Add minikube.md template with remote access docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 09:10:53 -08:00
2c90dc01a4 Add minikube role for Kubernetes cluster on indri
- Create ansible/roles/minikube for minikube cluster setup
- Use podman driver with cri-o runtime
- Set memory to 7800MB (vs 8192 podman) to account for VM overhead
- Add minikube role to indri playbook
- Update k8s-migration plan with implementation details

Deployed with Kubernetes v1.34.0 and CRI-O 1.24.6.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 22:39:38 -08:00
55f0335a1e Add podman role with known issue documentation
- Create ansible/roles/podman for podman machine setup on indri
- Document known reliability issue with podman machine init/start via SSH
  (race condition from containers/podman#16945)
- Role attempts init/start but doesn't fail if start hangs
- Workaround: manual init/start on indri if needed
- Update k8s-migration plan with implementation details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 22:23:45 -08:00
8d96f91365 Add Step 0.7 implementation details to plan
Note use of Tailscale service URL for health check.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:18:08 -08:00
8c7a746227 Add zot checks to indri-services-check
- zot LaunchAgent running
- zot-metrics LaunchAgent running
- Zot registry HTTP endpoint (via Tailscale service URL)
- Zot metrics file exists

Also updated Grafana, Kiwix, Forgejo, Devpi to use Tailscale
service URLs for consistency with Miniflux and Zot.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:17:58 -08:00
8c01181e55 Add zot log collection to Alloy
Collects stdout and stderr logs from zot to Loki.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:13:30 -08:00
7d673cbee9 Add zot_metrics role for Prometheus monitoring
Collects zot_up metric via textfile collector every 60 seconds.
Additional metrics can be added later if zot exposes them.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:09:20 -08:00
e0e8c5449b Add podman to Brewfile for container operations on gilbert
Required for testing zot registry push from workstation.
Podman uses a Linux VM under the hood on macOS.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:00:55 -08:00
1ee2863fc7 Fix zot port and sync config, update plan with implementation details
- Change zot port from 5000 to 5050 (macOS ControlCenter uses 5000)
- Fix sync config: use destination for namespacing, prefix ** for matching
- Update tailscale_serve to use port 5050
- Add zot role to main playbook
- Document implementation details in plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:44:45 -08:00
b62b10a60f Add registry to tailscale_serve configuration
Exposes zot registry (localhost:5000) as registry.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:31:28 -08:00
2bba28fc30 Add zot container registry ansible role
Phase 0: Creates zot role with:
- Config for pull-through cache (Docker Hub, GHCR, Quay)
- mcquack LaunchAgent for service management
- Sync registries configured for on-demand caching

Binary is built from source at ~/code/3rd/zot (not homebrew).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:24:58 -08:00
8defa0ef6e Apply tag:registry to indri via Pulumi
- Add tag:registry to indri DeviceTags in __main__.py
- Update plan with implementation details noting the tag is
  managed via Pulumi, not manually in admin console

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:16:06 -08:00
83956afe92 Add tag:registry for Zot container registry
Phase 0 of k8s migration: Add registry tag to ACLs.
- Admins get full access via wildcard grant
- Members denied access (infrastructure only)
- Enables tailscale serve for registry.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:10:36 -08:00