blumeops/plans/k8s-migration.md
Erich Blume 3679124ebd Expose Kubernetes API as Tailscale service (Step 0.14) (#27)
## Summary
- Add `tag:k8s-api` to Pulumi ACLs and indri device tags
- Configure Tailscale serve with TCP passthrough for k8s API at `k8s.tail8d86e.ts.net`
- Update minikube role to include `k8s.tail8d86e.ts.net` in certificate SANs
- Add `apiserver_port` config option (internal port 6443, dynamic host port with podman driver)
- Document Step 0.14 in k8s-migration plan (added post-Phase 0 completion)

The Kubernetes API is now accessible at `https://k8s.tail8d86e.ts.net` using TCP passthrough to preserve mTLS authentication.

## Deployment and Testing
- [x] Pulumi ACLs applied
- [x] Tailscale service created and approved in admin console
- [x] Minikube cluster recreated with new cert SANs
- [x] tailscale serve configured with TCP passthrough
- [x] 1Password credentials updated with new certs
- [x] Kubeconfig updated on gilbert
- [x] `mise run indri-services-check` passes
- [x] `kubectl --context=minikube-indri get nodes` works via Tailscale

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/27
2026-01-18 12:49:20 -08:00

46 KiB

Blumeops Minikube Migration Plan

This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes.

Architecture Overview

Services Staying on Indri (Outside K8s)

Service Reason
Zot Registry (NEW) Avoid circular dependency - k8s needs images to start
Prometheus Observability backbone must survive k8s failures
Loki Log aggregation backbone
Borgmatic Backup system
Grafana-alloy Metrics/logs collector on host
Plex Until Jellyfin replacement
Transmission Downloads for kiwix ZIM files

Services Moving to K8s

Service Complexity Dependencies
Grafana LOW Phase 1
Kiwix LOW Phase 1
Miniflux MEDIUM PostgreSQL
devpi MEDIUM Registry
PostgreSQL HIGH Phase 1
Forgejo HIGH PostgreSQL
Woodpecker CI MEDIUM Forgejo

Technical Decisions

Container Registry: Zot

  • OCI-native, lightweight
  • Native support for proxying multiple registries (Docker Hub, GHCR, Quay)
  • Built from source at ~/code/3rd/zot (not in homebrew)
  • Binary: ~/code/3rd/zot/bin/zot-darwin-arm64
  • Config: ~/.config/zot/config.json
  • Data: ~/zot/

Minikube Driver: Podman

  • Rootless containers for better security
  • Lighter than full VM (QEMU)
  • Uses existing container ecosystem
  • minikube start --driver=podman --container-runtime=cri-o

PostgreSQL: CloudNativePG Operator

  • Production-grade operator
  • Built-in backup/restore
  • Prometheus metrics
  • PITR support

K8s Service Exposure: Tailscale Operator

  • loadBalancerClass: tailscale on Services
  • Automatic TLS and MagicDNS names
  • ACL-controlled access

LaunchAgent Requirements (Critical)

LaunchAgents do NOT get homebrew on PATH. All commands must use absolute paths:

  • /Users/erichblume/code/3rd/zot/bin/zot-darwin-arm64 for zot (built from source)
  • /opt/homebrew/opt/mise/bin/mise x -- for mise-managed tools
  • /opt/homebrew/opt/postgresql@18/bin/pg_dump for postgres tools

This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). brew services handles this automatically but those aren't tracked in ansible.

Backup Strategy

Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at /Volumes/backups. This ensures backups continue even if k8s is down.

Service Backup Approach
Zot Registry No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control
Minikube No backup of cluster state - declarative manifests in git, can recreate
PostgreSQL (k8s) CloudNativePG scheduled backups to sifaka (Phase 1)
Grafana (k8s) Dashboards in ansible source control, no runtime backup needed
Miniflux (k8s) Database backed up via CloudNativePG
Forgejo (k8s) Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration
devpi (k8s) Private packages backed up, PyPI cache re-fetchable
Kiwix (k8s) ZIM files re-downloadable via torrent, no backup needed

Borgmatic config changes: None required for Phase 0. Future phases may add k8s PV paths if needed.


Phase 0: Foundation

Goal: Container registry + minikube cluster without disrupting existing services

Important: Tailscale Service Creation Order

WARNING: You MUST create services in the Tailscale admin console BEFORE running tailscale serve commands via ansible. If you run tailscale serve --service svc:foo before the service exists in the admin console, the local config will be in a bad state.

To fix a misconfigured service:

tailscale serve --service svc:foo reset

Then create the service in admin console and try again.


Step 0.1: Update Pulumi ACLs (BEFORE Tailscale serve)

Files to modify:

  • pulumi/policy.hujson

Changes:

  1. Add new tag to tagOwners section (around line 104, after "tag:feed"):
"tag:registry": ["autogroup:admin", "tag:blumeops"],
  1. Add test cases to tests section:
    • Update Erich's accept list (around line 111) to include registry:
    "accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"],
    
    • Update Allison's deny list (around line 117) to deny registry:
    "deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443"],
    

Note:

  • No member grant needed - admins have full access via wildcard, members don't need registry
  • tag:k8s is added later in Phase 1 when the Tailscale Kubernetes Operator is deployed
  • Zot supports htpasswd auth if we later need finer-grained control

Testing:

mise run tailnet-preview   # Review changes - should show new tag
mise run tailnet-up        # Apply changes

Implementation Details:

  • Also need to add "tag:registry" to indri's tags in pulumi/__main__.py (the DeviceTags resource), not just define it in policy.hujson. The policy file defines the tag ownership rules, but the device tags are managed separately in the Python code.

Step 0.2: Create Tailscale Services in Admin Console (MANUAL)

CRITICAL: Do this BEFORE running any ansible that calls tailscale serve

  1. Go to https://login.tailscale.com/admin/services
  2. Create service registry with:
    • Port: 443 (HTTPS)
    • Host: indri

Implementation Details:

  • Tag is applied to indri via Pulumi in Step 0.1, not manually in admin console.

Verification:

# Service should appear (even if not yet serving)
tailscale status | grep registry

Step 0.3: Create Zot Registry Ansible Role

Note: Zot is NOT in homebrew (no formula or tap). Clone to ~/code/3rd/ on indri and build from source (requires Go).

Prerequisites on indri (ALREADY COMPLETED):

# Clone zot from forge mirror (use localhost:3001 - hairpinning doesn't work on indri)
ssh indri 'git clone http://localhost:3001/eblume/zot.git ~/code/3rd/zot'

# Set up Go via mise (creates mise.toml in repo directory)
ssh indri 'cd ~/code/3rd/zot && mise use go@1.25'

# Build (creates bin/zot-darwin-arm64, ~183MB)
ssh indri 'cd ~/code/3rd/zot && mise x -- make binary'

# Verify binary exists
ssh indri 'ls -la ~/code/3rd/zot/bin/zot-darwin-arm64'

Build verified: Binary at ~/code/3rd/zot/bin/zot-darwin-arm64 (183MB, ARM64 native).

New files:

ansible/roles/zot/
├── defaults/main.yml
├── tasks/main.yml
├── templates/
│   ├── config.json.j2
│   └── zot.plist.j2
└── handlers/main.yml

Key configuration (defaults/main.yml):

zot_repo_dir: "/Users/erichblume/code/3rd/zot"
zot_binary: "{{ zot_repo_dir }}/bin/zot-darwin-arm64"
zot_data_dir: "/Users/erichblume/zot"
zot_config_dir: "/Users/erichblume/.config/zot"
zot_port: 5000
zot_log_dir: "/Users/erichblume/Library/Logs"

# Pull-through cache registries (on-demand sync)
zot_sync_registries:
  - name: docker.io
    url: https://registry-1.docker.io
  - name: ghcr.io
    url: https://ghcr.io
  - name: quay.io
    url: https://quay.io

Zot config.json template (key sections):

{
  "storage": {
    "rootDirectory": "/Users/erichblume/zot"
  },
  "http": {
    "address": "0.0.0.0",
    "port": "5000"
  },
  "extensions": {
    "sync": {
      "enable": true,
      "registries": [
        {
          "urls": ["https://registry-1.docker.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        },
        {
          "urls": ["https://ghcr.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        },
        {
          "urls": ["https://quay.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        }
      ]
    }
  }
}

Two modes of operation:

  1. Pull-through cache (automatic): When you pull registry.tail8d86e.ts.net/docker.io/library/nginx:latest, Zot fetches from Docker Hub and caches locally. Subsequent pulls are local.

  2. Private images (manual push): Push your own images to any path NOT matching a sync prefix:

    # From gilbert (after building)
    podman push myapp:v1 registry.tail8d86e.ts.net/blumeops/myapp:v1
    

Namespace convention:

  • registry.tail8d86e.ts.net/docker.io/* → cached from Docker Hub
  • registry.tail8d86e.ts.net/ghcr.io/* → cached from GHCR
  • registry.tail8d86e.ts.net/quay.io/* → cached from Quay
  • registry.tail8d86e.ts.net/blumeops/* → private images (built by you/Woodpecker)

LaunchAgent template (zot.plist.j2):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "...">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>mcquack.eblume.zot</string>
    <key>ProgramArguments</key>
    <array>
        <!-- ABSOLUTE PATH to built binary in ~/code/3rd/zot -->
        <string>{{ zot_binary }}</string>
        <string>serve</string>
        <string>{{ zot_config_dir }}/config.json</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>{{ zot_log_dir }}/mcquack.zot.out.log</string>
    <key>StandardErrorPath</key>
    <string>{{ zot_log_dir }}/mcquack.zot.err.log</string>
</dict>
</plist>

Handlers (handlers/main.yml):

- name: Restart zot
  ansible.builtin.shell: |
    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist 2>/dev/null || true
    launchctl load ~/Library/LaunchAgents/mcquack.eblume.zot.plist
  changed_when: true

Tasks should notify handler on config change:

- name: Deploy zot config
  ansible.builtin.template:
    src: config.json.j2
    dest: "{{ zot_config_dir }}/config.json"
  notify: Restart zot

Testing (after deploying role):

# Check LaunchAgent is running
ssh indri 'launchctl list | grep zot'

# Check zot is responding
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":[]}

# Check logs for errors
ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log'

# Test pull-through cache via curl (podman not installed until Step 0.8)
ssh indri 'curl -s http://localhost:5000/v2/docker.io/library/alpine/manifests/latest -H "Accept: application/vnd.oci.image.manifest.v1+json"'
# Should return manifest JSON (triggers cache fetch from Docker Hub)
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":["docker.io/library/alpine"]}

Implementation Details:

  • Changed port from 5000 to 5050 because macOS ControlCenter (AirPlay Receiver) uses port 5000 by default.
  • Fixed sync config: use "content": [{"prefix": "**", "destination": "/{{ registry.name }}"}] instead of "prefix": "{{ registry.name }}/**". The destination rewrites the local path, while prefix ** matches all upstream repos.

Step 0.4: Add Zot to Tailscale Serve

Files to modify:

  • ansible/roles/tailscale_serve/defaults/main.yml

Changes:

# Add to tailscale_serve_services list
- name: svc:registry
  https:
    port: 443
    upstream: http://localhost:5000

Testing:

# Deploy tailscale serve config
mise run provision-indri -- --tags tailscale-serve

# Verify from gilbert (not indri - hairpinning doesn't work)
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["docker.io/library/alpine"]} (from Step 0.3 test)

# Test private image push from gilbert
podman pull alpine:latest
podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1
podman push registry.tail8d86e.ts.net/blumeops/test:v1
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]}

Implementation Details:

  • Changed upstream port from 5000 to 5050 (see Step 0.3 implementation details).
  • After running tailscale serve, the service must be approved in Tailscale admin console at https://login.tailscale.com/admin/services before it becomes accessible.
  • Podman needed on gilbert for testing - added to Brewfile. Requires podman machine init && podman machine start after install.

Step 0.5: Create Zot Metrics Role

New files:

ansible/roles/zot_metrics/
├── defaults/main.yml
├── tasks/main.yml
├── templates/
│   ├── zot-metrics.sh.j2
│   └── zot-metrics.plist.j2
└── handlers/main.yml

Metrics script pattern (zot-metrics.sh.j2):

#!/bin/bash
# Collect Zot registry metrics for Prometheus textfile collector
set -euo pipefail

METRICS_FILE="/opt/homebrew/var/node_exporter/textfile/zot.prom"
TEMP_FILE="${METRICS_FILE}.tmp"

# Check if zot is up
if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then
    echo "zot_up 1" > "$TEMP_FILE"
else
    echo "zot_up 0" > "$TEMP_FILE"
fi

mv "$TEMP_FILE" "$METRICS_FILE"

Note: Start with just zot_up for now. Additional metrics (storage usage, cache stats) can be added later after reviewing zot's metrics endpoint.

Testing:

# Deploy metrics role
mise run provision-indri -- --tags zot_metrics

# Check metrics file exists and is updated
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom'
# Expected: zot_up 1

# Verify metrics appear in Prometheus (after a scrape cycle)
curl -s "http://indri:9090/api/v1/query?query=zot_up" | jq '.data.result[0].value[1]'
# Expected: "1"

Step 0.6: Add Zot Log Collection to Alloy

Files to modify:

  • ansible/roles/alloy/defaults/main.yml

Changes: Add to the alloy_mcquack_logs list:

  - path: /Users/erichblume/Library/Logs/mcquack.zot.out.log
    service: zot
    stream: stdout
  - path: /Users/erichblume/Library/Logs/mcquack.zot.err.log
    service: zot
    stream: stderr

Testing:

# Deploy alloy config (handler restarts alloy automatically if config changed)
mise run provision-indri -- --tags alloy

# Wait a minute, then check Loki for zot logs
# In Grafana Explore, query: {service="zot"}

Step 0.7: Update indri-services-check Script

Files to modify:

  • mise-tasks/indri-services-check

Changes to add:

# Add after existing service checks (around line 55)
check_service "zot" "ssh indri 'launchctl list | grep zot | grep -v \"^-\"'"
check_service "zot-metrics" "ssh indri 'launchctl list | grep zot-metrics | grep -v \"^-\"'"

# Add to HTTP endpoints section (around line 65)
check_http "Zot Registry" "http://indri:5000/v2/_catalog"

# Add metrics file check
check_service "Zot metrics" "ssh indri 'test -f /opt/homebrew/var/node_exporter/textfile/zot.prom'"

Testing:

# Run the health check
mise run indri-services-check

# Expected output includes:
# zot...               OK
# zot-metrics...       OK
# Zot Registry...      OK
# Zot metrics...       OK

Implementation Details:

  • Used Tailscale service URL (https://registry.tail8d86e.ts.net/v2/_catalog) instead of internal endpoint to verify full path works.

Step 0.8: Install and Configure Podman on Indri

New files:

ansible/roles/podman/
├── tasks/main.yml
└── handlers/main.yml

Tasks (tasks/main.yml):

- name: Install podman via homebrew
  community.general.homebrew:
    name: podman
    state: present

- name: Initialize podman machine (if not exists)
  ansible.builtin.command:
    cmd: podman machine init --cpus 4 --memory 8192 --disk-size 220
  register: podman_init
  changed_when: podman_init.rc == 0
  failed_when: podman_init.rc not in [0, 125]  # 125 = already exists

- name: Start podman machine
  ansible.builtin.command:
    cmd: podman machine start
  register: podman_start
  changed_when: "'started successfully' in podman_start.stdout"
  failed_when: podman_start.rc not in [0, 125]  # 125 = already running

Testing:

# Deploy podman role
mise run provision-indri -- --tags podman

# Verify podman is working
ssh indri 'podman info'
ssh indri 'podman run --rm hello-world'

Implementation Details:

  • KNOWN ISSUE: podman machine init and podman machine start have reliability issues when run via Ansible/SSH. The machine sometimes gets stuck in "Starting" state due to a race condition (see https://github.com/containers/podman/issues/16945). Apple Hypervisor may also require GUI session context.
  • WORKAROUND: If the machine fails to start via Ansible, manually run on indri:
    podman machine rm -f podman-machine-default
    podman machine init --cpus 4 --memory 8192 --disk-size 220
    podman machine start
    
  • LaunchAgent approach was attempted but didn't resolve the issue reliably.
  • TODO: Investigate proper automation solution for reliable podman machine management.

Step 0.9: Install and Configure Minikube

New files:

ansible/roles/minikube/
├── defaults/main.yml
├── tasks/main.yml
└── handlers/main.yml

Defaults:

minikube_cpus: 4
minikube_memory: 8192
minikube_disk_size: "200g"
minikube_driver: podman
minikube_container_runtime: cri-o

Note on storage: The disk-size is for node-local storage only (container images, emptyDir, local PVs). Pods can also mount external storage:

  • hostPath - indri filesystem (e.g., ~/transmission/ for kiwix ZIM files)
  • NFS - sifaka volumes (Synology supports NFS natively, easiest for k8s)
  • SMB/CIFS - requires csi-driver-smb; sifaka currently uses SMB for desktop mounts

Tasks:

- name: Install minikube via homebrew
  community.general.homebrew:
    name: minikube
    state: present

- name: Check if minikube cluster exists
  ansible.builtin.command:
    cmd: minikube status --format='{{.Host}}'
  register: minikube_status
  changed_when: false
  failed_when: false

- name: Start minikube cluster
  ansible.builtin.command:
    cmd: >
      minikube start
      --driver={{ minikube_driver }}
      --container-runtime={{ minikube_container_runtime }}
      --cpus={{ minikube_cpus }}
      --memory={{ minikube_memory }}
      --disk-size={{ minikube_disk_size }}
  when: minikube_status.rc != 0 or 'Running' not in minikube_status.stdout

Testing:

# Deploy minikube role
mise run provision-indri -- --tags minikube

# Verify cluster is running
ssh indri 'minikube status'
# Expected: host: Running, kubelet: Running, apiserver: Running

# Test kubectl access from indri
ssh indri 'kubectl get nodes'
# Expected: minikube   Ready    control-plane   ...

Implementation Details:

  • Changed minikube_memory from 8192 to 7800 because podman machine reports slightly less available memory (7908MB) due to VM overhead. Minikube rejects memory requests exceeding what podman reports.
  • Deployed with Kubernetes v1.34.0 and CRI-O 1.24.6.

Step 0.10: Configure Kubeconfig on Gilbert

Goal: Enable kubectl and k9s on gilbert to connect to the minikube cluster running on indri.

Considerations:

  • Minikube runs inside a podman VM on indri, so the API server isn't directly exposed on indri's network interface
  • Admin users have full Tailscale access to indri via autogroup:admin → * → *
  • Be careful not to overwrite existing work kubeconfigs

Possible approaches:

  1. SSH tunneling to forward the API server port
  2. minikube tunnel running on indri (exposes LoadBalancer services)
  3. Configure minikube with --apiserver-names=indri at cluster creation time
  4. Use kubectl via SSH wrapper: ssh indri kubectl ...

Verification:

# From gilbert, these should work:
kubectl get nodes
kubectl get namespaces
k9s  # Should show the minikube cluster

The exact approach will be determined during implementation based on what works best with the podman driver.

Implementation Details:

Chose Option 3: Recreate cluster with --apiserver-names after researching alternatives:

  1. SSH tunneling - Requires keeping a tunnel running or complex on-demand setup
  2. SOCKS5 proxy with kubeconfig proxy-url - Kubeconfig supports proxy-url: socks5://localhost:1080 per-context, but still requires managing the proxy
  3. --apiserver-names + --listen-address - Native minikube support, cleanest solution

Cluster Setup: Recreated the minikube cluster with additional flags:

minikube delete
minikube start \
  --driver=podman \
  --container-runtime=cri-o \
  --cpus=4 --memory=7800 --disk-size=200g \
  --apiserver-names=indri \
  --listen-address=0.0.0.0
  • --apiserver-names=indri adds "indri" to the API server certificate SAN
  • --listen-address=0.0.0.0 tells podman to expose the API port on all interfaces
  • API server port is dynamic (check with kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}" on indri)

Credential Management with 1Password:

Rather than copying private keys between machines, credentials are stored in 1Password and fetched on-demand using kubectl's exec credential plugin. This mirrors the 1Password SSH agent pattern for biometric-protected key access.

  1. Store credentials in 1Password (vault: vg6xf6vvfmoh5hqjjhlhbeoaie, item: 3jo4f2hnzvwfmamudfsbbbec7e):

    • client-cert - Contents of ~/.minikube/profiles/minikube/client.crt (text field)
    • client-key - Contents of ~/.minikube/profiles/minikube/client.key (text field)
    • ca-cert - Contents of ~/.minikube/ca.crt (text field, not secret but stored for convenience)
  2. Created credential helper script at bin/kubectl-credential-1password:

    #!/bin/bash
    # Fetches client cert/key from 1Password, outputs ExecCredential JSON
    # Usage: kubectl-credential-1password <vault-id> <item-id> <cert-field> <key-field>
    

    Symlinked to ~/.local/bin/kubectl-credential-1password

  3. Kubeconfig setup on gilbert:

    # Store CA cert locally (not secret - public key for server verification)
    mkdir -p ~/.kube/minikube-indri
    op --vault <vault> item get <item> --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt
    
    # Configure cluster
    kubectl config set-cluster minikube-indri \
      --server=https://indri:<port> \
      --certificate-authority=/Users/eblume/.kube/minikube-indri/ca.crt
    
    # Configure credentials with exec plugin
    kubectl config set-credentials minikube-indri \
      --exec-api-version=client.authentication.k8s.io/v1beta1 \
      --exec-command=kubectl-credential-1password \
      --exec-arg=<vault-id> \
      --exec-arg=<item-id> \
      --exec-arg=client-cert \
      --exec-arg=client-key
    
    # Create context
    kubectl config set-context minikube-indri \
      --cluster=minikube-indri \
      --user=minikube-indri
    
  4. Usage:

    kubectl --context=minikube-indri get nodes
    # or
    kubectl config use-context minikube-indri
    kubectl get nodes
    

Security Notes:

  • Client private key never stored on disk - fetched from 1Password on each kubectl command
  • CA cert stored on disk (not secret - it's a public key for server verification)
  • 1Password biometric/password prompt required for credential access
  • op command strips quotes from text fields with sed 's/^"//; s/"$//'

References:


Step 0.11: Add Minikube to indri-services-check

Files to modify:

  • mise-tasks/indri-services-check

Changes:

# Add new section for Kubernetes
echo ""
echo "Kubernetes cluster:"
check_service "minikube" "ssh indri 'minikube status --format={{.Host}} | grep -q Running'"
check_service "k8s-apiserver" "ssh indri 'kubectl get --raw /healthz'"

Testing:

mise run indri-services-check

# Expected output includes:
# Kubernetes cluster:
# minikube...          OK
# k8s-apiserver...     OK

Implementation Notes:

  • Added a third check k8s-apiserver (remote) that verifies kubectl access from gilbert, not just via SSH to indri. This ensures the 1Password credential flow and remote API server access are working.
  • The remote check uses both --kubeconfig and --context flags explicitly since the script runs in bash (not fish) and doesn't inherit the KUBECONFIG environment variable from fish config.

Step 0.12: Create Zettelkasten Documentation

New files:

  • ~/code/personal/zk/zot.md
  • ~/code/personal/zk/minikube.md

Files to update:

  • ~/code/personal/zk/1767747119-YCPO.md (main blumeops card)

Updates to main blumeops card:

  1. Add to Device Tags table: | tag:registry | indri | Container registry access |

  2. Add to Services table: | Registry | https://registry.tail8d86e.ts.net | OCI container registry (Zot) | zot | | Kubernetes | https://indri: | Minikube cluster | minikube |

  3. Add to Port Map (Indri) table: | 5050 | Zot | HTTP | localhost | Container registry | | | K8s API | HTTPS | 0.0.0.0 | Minikube API server |

  4. Add new section Remote Kubernetes Access:

    ## Remote Kubernetes Access (from Gilbert)
    
    The minikube cluster on indri is accessible from gilbert via direct connection.
    Cluster was created with `--apiserver-names=indri --listen-address=0.0.0.0`.
    
    **Fish abbreviations** (in `~/.config/fish/config.fish`):
    - `ki``kubectl --context=minikube-indri`
    - `k9i``k9s --context=minikube-indri`
    - `k9``k9s`
    
    ```bash
    # Quick access via abbreviations
    ki get nodes
    k9i
    
    # Or explicitly set context
    kubectl config use-context minikube-indri
    kubectl get nodes
    
    
    

Template for zot.md:

---
id: zot
aliases:
  - zot
  - container-registry
tags:
  - blumeops
---

# Zot Registry Management Log

Zot is an OCI-native container registry running on Indri, providing:
1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits)
2. Private image storage for custom-built containers

## Service Details

- URL: https://registry.tail8d86e.ts.net
- Local port: 5050
- Data directory: ~/zot
- Config: ~/.config/zot/config.json
- Managed via: mcquack LaunchAgent

## Namespace Convention

| Path | Source |
|------|--------|
| `registry.../docker.io/*` | Cached from Docker Hub |
| `registry.../ghcr.io/*` | Cached from GHCR |
| `registry.../quay.io/*` | Cached from Quay |
| `registry.../blumeops/*` | Private images (yours) |

## Useful Commands

\`\`\`bash
# List all images
curl -s http://localhost:5050/v2/_catalog | jq

# Pull via cache (from indri or k8s)
podman pull localhost:5050/docker.io/library/nginx:latest

# Build and push private image (from gilbert)
podman build -t registry.tail8d86e.ts.net/blumeops/myapp:v1 .
podman push registry.tail8d86e.ts.net/blumeops/myapp:v1

# Check service status
launchctl list | grep zot

# View logs
tail -f ~/Library/Logs/mcquack.zot.err.log
\`\`\`

## Log

### [DATE]
- Initial setup for k8s migration Phase 0

Template for minikube.md:

---
id: minikube
aliases:
  - minikube
  - kubernetes
  - k8s
tags:
  - blumeops
---

# Minikube Management Log

Minikube provides a single-node Kubernetes cluster on Indri for running containerized services.

## Cluster Details

- Driver: podman (rootless)
- Container runtime: CRI-O
- Kubernetes version: v1.34.0
- Resources: 4 CPUs, 7800MB RAM, 200GB disk
- API server: https://indri:<port> (accessible from gilbert via Tailscale)

## Remote Access from Gilbert

Cluster was created with `--apiserver-names=indri --listen-address=0.0.0.0` to allow remote kubectl access.

\`\`\`bash
# Switch context
kubectl config use-context minikube-indri

# Verify
kubectl get nodes
kubectl get namespaces

# Use k9s
k9s --context minikube-indri
\`\`\`

## Useful Commands (on indri)

\`\`\`bash
# Cluster status
minikube status

# Start/stop cluster
minikube start
minikube stop

# Access dashboard
minikube dashboard

# SSH into node
minikube ssh

# View logs
minikube logs
\`\`\`

## Podman Machine (prerequisite)

Minikube uses podman as the container runtime. The podman machine must be running:

\`\`\`bash
# Check podman machine
podman machine list

# Start if needed
podman machine start
\`\`\`

## Log

### [DATE]
- Initial cluster setup for k8s migration Phase 0
- Configured for remote access with --apiserver-names=indri

Implementation Notes:

  • Created zot.md and minikube.md in ~/code/personal/zk/
  • Updated 1767747119-YCPO.md (main blumeops card) with all specified changes
  • Added 1Password credential plugin reference to minikube docs
  • K8s API port is 39535 (dynamically assigned by minikube, may change on cluster recreation)

Step 0.13: Update Main Playbook

Files to modify:

  • ansible/playbooks/indri.yml

Changes:

# Add new roles to the roles list
- role: podman
  tags: podman
- role: zot
  tags: zot
- role: zot_metrics
  tags: zot_metrics
- role: minikube
  tags: minikube

Implementation Notes:

  • Roles were added incrementally during Steps 0.3, 0.5, 0.8, and 0.9
  • All four roles (zot, zot_metrics, podman, minikube) confirmed present in indri.yml

Phase 0 Verification Checklist

Run after completing all steps:

# 1. Full service health check
mise run indri-services-check
# All services should show OK, including new ones

# 2. Registry functionality - pull-through cache
ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest'
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["docker.io/library/alpine"]}

# 3. Registry functionality - private image push (from gilbert)
podman pull alpine:latest
podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1
podman push registry.tail8d86e.ts.net/blumeops/test:v1
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]}

# 4. Kubernetes cluster
ssh indri 'minikube status'
ssh indri 'kubectl get nodes'
kubectl get nodes  # from gilbert

# 5. Metrics in Prometheus
curl -s "http://indri:9090/api/v1/query?query=zot_up"
# Expected: value = 1

# 6. Logs in Loki
# In Grafana Explore: {service="zot"}
# Should see zot log entries

# 7. k9s from gilbert
k9s
# Should connect and show minikube cluster

Phase 0 Rollback

If something goes wrong:

# Stop and remove minikube
ssh indri 'minikube stop && minikube delete'

# Stop and remove zot
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist'
ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.zot.plist'

# Remove podman machine
ssh indri 'podman machine stop && podman machine rm'

# Remove from tailscale serve
ssh indri 'tailscale serve --service svc:registry reset'

# Remove tags from Pulumi (revert policy.hujson changes)
mise run tailnet-up

# Revert ansible playbook changes
git checkout ansible/playbooks/indri.yml
git checkout ansible/roles/tailscale_serve/defaults/main.yml
git checkout ansible/roles/alloy/templates/config.alloy.j2

# Remove new roles
rm -rf ansible/roles/{zot,zot_metrics,podman,minikube}

# Remove zk cards
rm ~/code/personal/zk/{zot,minikube}.md

Step 0.14: Expose K8s API as Tailscale Service (Added Post-Completion)

Note

: This step was added after Phase 0 was otherwise complete, to provide a stable, named endpoint for the Kubernetes API server.

Goal: Expose the minikube API server as k8s.tail8d86e.ts.net instead of using indri:<dynamic-port>.

Current state:

  • Minikube API server on port 39535 (dynamic, could change on cluster recreation)
  • Accessed via https://indri:39535
  • Certificate SANs include "indri"

Target state:

  • Stable Tailscale service at k8s.tail8d86e.ts.net:443
  • Fixed API server port (6443, the k8s standard)
  • Certificate SANs include both hostnames for compatibility

Step 0.14.1: Update Pulumi ACLs

Files to modify:

  • pulumi/policy.hujson
  • pulumi/__main__.py

Changes to policy.hujson:

  1. Add tag to tagOwners:
"tag:k8s-api": ["autogroup:admin", "tag:blumeops"],
  1. Update Erich's test case accept list to include k8s-api:
"accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443", "tag:k8s-api:443"],
  1. Update Allison's deny list:
"deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443", "tag:k8s-api:443"],

Changes to main.py:

  • Add "tag:k8s-api" to indri's DeviceTags

Testing:

mise run tailnet-preview   # Review changes
mise run tailnet-up        # Apply changes

Step 0.14.2: Create Tailscale Service in Admin Console (MANUAL)

CRITICAL: Do this BEFORE running ansible that calls tailscale serve

  1. Go to https://login.tailscale.com/admin/services
  2. Create service k8s with:
    • Port: 443 (TCP)
    • Host: indri

Step 0.14.3: Recreate Minikube Cluster

The cluster needs to be recreated to:

  1. Add k8s.tail8d86e.ts.net to the API server certificate SANs
  2. Fix the API server port to 6443 (standard k8s port)

On indri:

# Stop and delete existing cluster
minikube stop
minikube delete

# Recreate with new settings
minikube start \
  --driver=podman \
  --container-runtime=cri-o \
  --cpus=4 --memory=7800 --disk-size=200g \
  --apiserver-names=k8s.tail8d86e.ts.net,indri \
  --apiserver-port=6443 \
  --listen-address=0.0.0.0

# Verify certificate SANs include both names
kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
# Expected: https://127.0.0.1:6443 or similar

# Verify cluster is running
minikube status
kubectl get nodes

Update ansible role defaults (ansible/roles/minikube/defaults/main.yml):

minikube_apiserver_names:
  - k8s.tail8d86e.ts.net
  - indri
minikube_apiserver_port: 6443

Step 0.14.4: Add K8s Service to Tailscale Serve

Files to modify:

  • ansible/roles/tailscale_serve/defaults/main.yml

Add to services list:

- name: svc:k8s
  tcp:
    port: 443
    upstream: tcp://localhost:6443

Note: Using TCP passthrough (not HTTPS termination) because k8s uses mTLS authentication.

Deploy:

mise run provision-indri -- --tags tailscale-serve

Step 0.14.5: Update 1Password Credentials

After cluster recreation, the client certificates have changed.

On indri, get the new credentials:

# Display new certificates (copy to 1Password)
cat ~/.minikube/profiles/minikube/client.crt
cat ~/.minikube/profiles/minikube/client.key
cat ~/.minikube/ca.crt

In 1Password (vault: vg6xf6vvfmoh5hqjjhlhbeoaie, item: 3jo4f2hnzvwfmamudfsbbbec7e):

  • Update client-cert field with new certificate
  • Update client-key field with new key
  • Update ca-cert field with new CA certificate

Step 0.14.6: Update Kubeconfig on Gilbert

Update CA certificate:

# Fetch new CA cert from 1Password
op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 3jo4f2hnzvwfmamudfsbbbec7e --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt

Update kubeconfig (~/.kube/minikube-indri/config.yml):

clusters:
- cluster:
    certificate-authority: /Users/eblume/.kube/minikube-indri/ca.crt
    server: https://k8s.tail8d86e.ts.net  # Changed from https://indri:39535
  name: minikube-indri

Verification:

# Test connection via new hostname
kubectl --context=minikube-indri get nodes

# Test via abbreviation
ki get nodes

Step 0.14.7: Update Documentation

Files to update:

  • ~/code/personal/zk/minikube.md - Update API server URL and port info
  • ~/code/personal/zk/1767747119-YCPO.md - Update Services table and Port Map

Changes to blumeops card:

  1. Update Services table: | Kubernetes | https://k8s.tail8d86e.ts.net | Minikube cluster | minikube |

  2. Update Port Map: | 6443 | K8s API | HTTPS/TCP | 0.0.0.0 | Minikube API server (via Tailscale) |

  3. Add tag:k8s-api to Device Tags table


Step 0.14.8: Update indri-services-check

Files to modify:

  • mise-tasks/indri-services-check

Changes:

# Update remote k8s check to use new URL
check_service "k8s-apiserver (remote)" "kubectl --kubeconfig=$HOME/.kube/minikube-indri/config.yml --context=minikube-indri get --raw /healthz"
# (No change needed - uses kubeconfig which now points to k8s.tail8d86e.ts.net)

Step 0.14 Verification

# 1. Service health check
mise run indri-services-check
# All services should be OK

# 2. Test k8s access via Tailscale hostname
curl -k https://k8s.tail8d86e.ts.net/healthz
# Expected: ok (or certificate error if mTLS required - that's fine)

# 3. kubectl via Tailscale
ki get nodes
ki get namespaces

# 4. k9s via Tailscale
k9i

Phase 0 Follow-up: Grafana Dashboards

After Phase 0 is running and stable, create monitoring dashboards:

Zot Dashboard (ansible/roles/grafana/files/dashboards/zot.json):

  1. Check what metrics zot exposes: ssh indri 'curl -s http://localhost:5000/metrics'
  2. Review community dashboards for inspiration (copy permitted if license allows)
  3. Create dashboard with available metrics (at minimum: zot_up)

Minikube Dashboard (ansible/roles/grafana/files/dashboards/minikube.json):

  1. Deploy kube-state-metrics if needed for additional cluster metrics
  2. Review what Prometheus can scrape from the cluster
  3. Review community dashboards for inspiration (copy permitted if license allows)
  4. Create dashboard with relevant panels (node usage, pod counts, etc.)

New Files Summary

File Purpose
ansible/roles/zot/ Zot registry deployment
ansible/roles/zot_metrics/ Metrics collection for Zot
ansible/roles/podman/ Podman installation and setup
ansible/roles/minikube/ Minikube cluster setup
~/code/personal/zk/zot.md Zot management documentation
~/code/personal/zk/minikube.md Minikube management documentation

Modified Files Summary

File Changes
pulumi/policy.hujson Add tag:registry
ansible/playbooks/indri.yml Add new roles
ansible/roles/tailscale_serve/defaults/main.yml Add svc:registry
ansible/roles/alloy/templates/config.alloy.j2 Add zot log collection
mise-tasks/indri-services-check Add zot and k8s checks

Phase 1: Kubernetes Infrastructure

Goal: Tailscale operator + CloudNativePG operator

Steps

  1. Update Pulumi ACLs for k8s workloads

    Add tag:k8s to pulumi/policy.hujson - this tag is for k8s workloads that need to access other services (e.g., Woodpecker CI pushing to registry).

    Changes to tagOwners:

    "tag:k8s": ["autogroup:admin", "tag:blumeops"],
    

    Add grant for k8s→registry access:

    // k8s workloads (e.g., Woodpecker CI) can push/pull from registry
    {
    	"src": ["tag:k8s"],
    	"dst": ["tag:registry"],
    	"ip":  ["tcp:443"],
    },
    

    Add test case:

    {
    	"src":    "tag:k8s",
    	"accept": ["tag:registry:443"],
    },
    
    mise run tailnet-preview && mise run tailnet-up
    
  2. Create Tailscale OAuth client

    • Scopes: Devices Core, Auth Keys, Services write
    • Tag: tag:k8s-operator
    • Store in 1Password
  3. Deploy Tailscale Kubernetes Operator

    helm repo add tailscale https://pkgs.tailscale.com/helmcharts
    helm install tailscale-operator tailscale/tailscale-operator \
      --namespace tailscale-system --create-namespace \
      --set oauth.clientId=$CLIENT_ID \
      --set oauth.clientSecret=$CLIENT_SECRET
    
  4. Deploy CloudNativePG operator

    kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml
    
  5. Create PostgreSQL cluster

    apiVersion: postgresql.cnpg.io/v1
    kind: Cluster
    metadata:
      name: blumeops-pg
      namespace: databases
    spec:
      instances: 1
      storage:
        size: 10Gi
        storageClass: standard
      monitoring:
        enablePodMonitor: true
    
  6. Update Alloy config

    • Add kubernetes_sd_configs for k8s metrics
    • Scrape operator metrics

New Files

  • ansible/k8s/operators/ - Operator manifests
  • ansible/k8s/databases/ - PostgreSQL cluster

Verification

kubectl get pods -n tailscale-system
kubectl get pods -n cnpg-system
kubectl get cluster -n databases

Phase 2: Grafana Migration (Pilot)

Goal: Migrate Grafana as lowest-risk pilot service

Steps

  1. Deploy Grafana via Helm

    • Copy datasource config from existing role
    • Copy dashboards from ansible/roles/grafana/files/dashboards/
    • Point to indri Prometheus/Loki (http://indri:9090, http://indri:3100)
  2. Configure Tailscale LoadBalancer

    service:
      type: LoadBalancer
      loadBalancerClass: tailscale
    
  3. Verify all dashboards work

  4. Update tailscale_serve - remove grafana entry

  5. Stop brew grafana: brew services stop grafana

Verification


Phase 3: PostgreSQL Migration

Goal: Migrate miniflux database to CloudNativePG

Steps

  1. Create databases and users in k8s PostgreSQL

    • miniflux database/user
    • borgmatic read-only user
  2. Export from brew PostgreSQL

    pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql
    
  3. Expose k8s PostgreSQL via Tailscale

    • Service with loadBalancerClass: tailscale
    • Tag: svc:pg-k8s
  4. Import data

    psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql
    
  5. Update borgmatic config

    • Change hostname to k8s PostgreSQL
  6. Verify data integrity

Rollback

Keep brew PostgreSQL running until Phase 4 verified


Phase 4: Miniflux Migration

Goal: Migrate Miniflux to k8s

Steps

  1. Deploy Miniflux

    image: ghcr.io/miniflux/miniflux:latest
    env:
      DATABASE_URL: from secret
      RUN_MIGRATIONS: "1"
    
  2. Configure Tailscale LoadBalancer - tag: svc:feed

  3. Update Alloy log collection - add k8s namespace

  4. Verify: login, feeds refresh, API works

  5. Stop brew miniflux: brew services stop miniflux


Phase 5: devpi Migration

Goal: Migrate devpi to k8s

Steps

  1. Build devpi container

    • Dockerfile with devpi-server + devpi-web
    • Push to local Zot registry
  2. Deploy as StatefulSet

    • PVC for data (50Gi)
    • Migrate existing data (excluding PyPI cache)
  3. Configure Tailscale LoadBalancer - tag: svc:pypi

  4. Update pip.conf on gilbert

  5. Stop mcquack devpi


Phase 6: Kiwix Migration

Goal: Migrate kiwix-serve to k8s

Steps

  1. Create NFS/hostPath PV for ZIM files

    • Point to transmission download directory
    • ReadOnlyMany access
  2. Deploy Kiwix

    image: ghcr.io/kiwix/kiwix-serve:3.8.1
    args: ["/data/*.zim"]
    
  3. Configure Tailscale LoadBalancer - tag: svc:kiwix

  4. Stop mcquack kiwix-serve


Phase 7: Forgejo Migration (Highest Risk)

Goal: Migrate Forgejo to k8s

Pre-Migration Checklist

  • Full borgmatic backup verified
  • Manual backup of /opt/homebrew/var/forgejo
  • Document SSH keys and webhooks

Steps

  1. Deploy Forgejo via Helm

    helm install forgejo forgejo/forgejo \
      --namespace forgejo --create-namespace
    
  2. Migrate data

    • Stop brew forgejo
    • Copy data to PVC
    • Start k8s forgejo
  3. Configure Tailscale services

    • HTTPS 443 via LoadBalancer
    • SSH port 22 (TCP proxy)
  4. Verify all repositories accessible

Rollback

Restore brew forgejo and tailscale serve config


Phase 8: CI/CD (Woodpecker)

Goal: Deploy Woodpecker CI integrated with Forgejo

Steps

  1. Create Forgejo OAuth application

  2. Deploy Woodpecker Server + Agent

  3. Configure Tailscale LoadBalancer - tag: svc:ci

  4. Test pipeline - create .woodpecker.yaml in test repo


Phase 9: Cleanup

Goal: Remove deprecated services, harden system

Steps

  1. Stop/remove unused brew services

    • postgresql@18, grafana, miniflux, forgejo
  2. Update ansible playbook

    • Remove migrated service roles
    • Add k8s deployment references
  3. Configure Velero backups (optional)

    • Install with MinIO on sifaka
    • Schedule daily cluster backups
  4. Update zk documentation

    • New architecture
    • Runbooks
    • DR procedures

Critical Files

File Purpose
ansible/playbooks/indri.yml Main playbook - add k8s roles, remove migrated services
ansible/roles/tailscale_serve/defaults/main.yml Transition services to Tailscale operator
pulumi/policy.hujson Add tags: k8s, registry, ci
ansible/roles/borgmatic/defaults/main.yml Update PostgreSQL endpoint
mise-tasks/indri-services-check Add k8s health checks

New Directory Structure

ansible/
  k8s/
    operators/
      tailscale-operator.yaml
      cloudnative-pg.yaml
    databases/
      blumeops-pg.yaml
    apps/
      grafana/
      miniflux/
      forgejo/
      devpi/
      kiwix/
      woodpecker/
  roles/
    zot/           # NEW
    podman/        # NEW
    minikube/      # NEW

Risk Mitigation

  • Circular dependency prevention: Zot registry runs outside k8s
  • Observability: Prometheus/Loki stay on indri
  • Data loss prevention: borgmatic + manual backups before each phase
  • Recovery: Can manually push images, restore from backups

Container Images (All ARM64)

Service Image
Miniflux ghcr.io/miniflux/miniflux:latest
Forgejo codeberg.org/forgejo/forgejo:10
Grafana grafana/grafana:latest
Kiwix ghcr.io/kiwix/kiwix-serve:3.8.1
Woodpecker woodpeckerci/woodpecker-server

Note: Zot runs as a native binary on indri (built from source at ~/code/3rd/zot), not as a container.


Plan Completion

When all phases are complete and verified:

# Move plan to completed directory with completion date
git mv plans/k8s-migration.md plans/completed/k8s-migration.$(date +%Y-%m-%d).md
git commit -m "Complete k8s migration plan"