K8s Migration Phase 0: Foundation Infrastructure #26

Merged
eblume merged 22 commits from feature/k8s-migration-phase0 into main 2026-01-18 12:06:28 -08:00
Owner

Summary

  • Step 0.1: Update Pulumi ACLs with tag:registry
  • Step 0.3: Create Zot registry ansible role with mcquack LaunchAgent
  • Step 0.4: Add Zot to Tailscale Serve configuration
  • Step 0.5: Create Zot metrics role for Prometheus scraping
  • Step 0.6: Add Zot log collection to Alloy
  • Step 0.7: Update indri-services-check with zot checks
  • Step 0.8: Add podman role for container runtime
  • Step 0.9: Add minikube role for Kubernetes cluster
  • Step 0.10: Configure remote kubectl access with 1Password credentials

Remaining Steps

  • Step 0.11: Add minikube to indri-services-check
  • Step 0.12: Create zettelkasten documentation
  • Step 0.13: Verify main playbook (already done - roles added)

Deployment and Testing

  • Zot registry deployed and accessible at https://registry.tail8d86e.ts.net
  • Podman machine running on indri
  • Minikube cluster running on indri
  • kubectl access from gilbert working with 1Password credentials
  • indri-services-check passes all checks

🤖 Generated with Claude Code

## Summary - Step 0.1: Update Pulumi ACLs with tag:registry - Step 0.3: Create Zot registry ansible role with mcquack LaunchAgent - Step 0.4: Add Zot to Tailscale Serve configuration - Step 0.5: Create Zot metrics role for Prometheus scraping - Step 0.6: Add Zot log collection to Alloy - Step 0.7: Update indri-services-check with zot checks - Step 0.8: Add podman role for container runtime - Step 0.9: Add minikube role for Kubernetes cluster - Step 0.10: Configure remote kubectl access with 1Password credentials ## Remaining Steps - [x] Step 0.11: Add minikube to indri-services-check - [x] Step 0.12: Create zettelkasten documentation - [x] Step 0.13: Verify main playbook (already done - roles added) ## Deployment and Testing - [x] Zot registry deployed and accessible at https://registry.tail8d86e.ts.net - [x] Podman machine running on indri - [x] Minikube cluster running on indri - [x] kubectl access from gilbert working with 1Password credentials - [x] indri-services-check passes all checks 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Phase 0 of k8s migration: Add registry tag to ACLs.
- Admins get full access via wildcard grant
- Members denied access (infrastructure only)
- Enables tailscale serve for registry.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add tag:registry to indri DeviceTags in __main__.py
- Update plan with implementation details noting the tag is
  managed via Pulumi, not manually in admin console

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 0: Creates zot role with:
- Config for pull-through cache (Docker Hub, GHCR, Quay)
- mcquack LaunchAgent for service management
- Sync registries configured for on-demand caching

Binary is built from source at ~/code/3rd/zot (not homebrew).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Exposes zot registry (localhost:5000) as registry.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change zot port from 5000 to 5050 (macOS ControlCenter uses 5000)
- Fix sync config: use destination for namespacing, prefix ** for matching
- Update tailscale_serve to use port 5050
- Add zot role to main playbook
- Document implementation details in plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required for testing zot registry push from workstation.
Podman uses a Linux VM under the hood on macOS.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Collects zot_up metric via textfile collector every 60 seconds.
Additional metrics can be added later if zot exposes them.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Collects stdout and stderr logs from zot to Loki.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- zot LaunchAgent running
- zot-metrics LaunchAgent running
- Zot registry HTTP endpoint (via Tailscale service URL)
- Zot metrics file exists

Also updated Grafana, Kiwix, Forgejo, Devpi to use Tailscale
service URLs for consistency with Miniflux and Zot.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Note use of Tailscale service URL for health check.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create ansible/roles/podman for podman machine setup on indri
- Document known reliability issue with podman machine init/start via SSH
  (race condition from containers/podman#16945)
- Role attempts init/start but doesn't fail if start hangs
- Workaround: manual init/start on indri if needed
- Update k8s-migration plan with implementation details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create ansible/roles/minikube for minikube cluster setup
- Use podman driver with cri-o runtime
- Set memory to 7800MB (vs 8192 podman) to account for VM overhead
- Add minikube role to indri playbook
- Update k8s-migration plan with implementation details

Deployed with Kubernetes v1.34.0 and CRI-O 1.24.6.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Step 0.10 (kubeconfig on gilbert):
- Document research on kubectl remote access options
- Choose --apiserver-names + --listen-address approach
- Add references to sources

Step 0.12 (zettelkasten):
- Add instructions to update main blumeops card
- Fix zot port from 5000 to 5050
- Add minikube.md template with remote access docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Step 0.10 implementation:
- Recreate minikube with --apiserver-names=indri --listen-address=0.0.0.0
- Add kubectl-credential-1password exec plugin for 1Password integration
- Client certs fetched from 1Password on-demand (no private keys on disk)
- CA cert stored locally (not secret - public key for server verification)

Minikube role updates:
- Add minikube_apiserver_names and minikube_listen_address variables
- Update tasks to include remote access flags

This mirrors the 1Password SSH agent pattern - biometric auth required
for each kubectl command that needs credentials.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Step 0.12 update:
- Document ki, k9i, k9 abbreviations for quick kubectl/k9s access
- These avoid accidentally triggering work SAML flow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Step 0.11 implementation:
- Check minikube status via SSH to indri
- Check k8s API server accessible from indri
- Check k8s API server accessible remotely from gilbert (via 1Password creds)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Zettelkasten documentation created:
- zot.md: Container registry management log
- minikube.md: Kubernetes cluster management log
- Updated main blumeops card with new services, tags, and ports

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All Phase 0 steps completed:
- Zot registry with pull-through cache
- Podman container runtime
- Minikube Kubernetes cluster
- Remote kubectl access with 1Password credentials
- Health checks in indri-services-check
- Zettelkasten documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 0 followup:
- Enable zot native metrics endpoint (/metrics)
- Add zot scraping to Alloy config
- Create zot Grafana dashboard (status, requests, latency, memory)
- Create minikube_metrics role (collects cluster health metrics)
- Create minikube Grafana dashboard (status, pod/namespace counts, logs)
- Update indri-services-check with minikube-metrics checks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The {{.Host}} Go template syntax was being interpreted by Jinja2.
Added {% raw %} block to escape it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Alloy labels scraped metrics with job="prometheus.scrape.zot",
not just "zot". Updated all dashboard queries to match.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use zot_http_method_latency_seconds_bucket (not zot_http_request_duration_*)
- Replace Memory Usage stat with Total Storage
- Replace Memory Over Time with Storage by Repository

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
eblume merged commit 19a82373d5 into main 2026-01-18 12:06:28 -08:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/blumeops!26
No description provided.