Erich Blume 3679124ebd Expose Kubernetes API as Tailscale service (Step 0.14) (#27 )

## Summary
- Add `tag:k8s-api` to Pulumi ACLs and indri device tags
- Configure Tailscale serve with TCP passthrough for k8s API at `k8s.tail8d86e.ts.net`
- Update minikube role to include `k8s.tail8d86e.ts.net` in certificate SANs
- Add `apiserver_port` config option (internal port 6443, dynamic host port with podman driver)
- Document Step 0.14 in k8s-migration plan (added post-Phase 0 completion)

The Kubernetes API is now accessible at `https://k8s.tail8d86e.ts.net` using TCP passthrough to preserve mTLS authentication.

## Deployment and Testing
- [x] Pulumi ACLs applied
- [x] Tailscale service created and approved in admin console
- [x] Minikube cluster recreated with new cert SANs
- [x] tailscale serve configured with TCP passthrough
- [x] 1Password credentials updated with new certs
- [x] Kubeconfig updated on gilbert
- [x] `mise run indri-services-check` passes
- [x] `kubectl --context=minikube-indri get nodes` works via Tailscale

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/27

2026-01-18 12:49:20 -08:00

46 KiB

Raw Blame History

Blumeops Minikube Migration Plan

This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes.

Architecture Overview

Services Staying on Indri (Outside K8s)

Service	Reason
Zot Registry (NEW)	Avoid circular dependency - k8s needs images to start
Prometheus	Observability backbone must survive k8s failures
Loki	Log aggregation backbone
Borgmatic	Backup system
Grafana-alloy	Metrics/logs collector on host
Plex	Until Jellyfin replacement
Transmission	Downloads for kiwix ZIM files

Services Moving to K8s

Service	Complexity	Dependencies
Grafana	LOW	Phase 1
Kiwix	LOW	Phase 1
Miniflux	MEDIUM	PostgreSQL
devpi	MEDIUM	Registry
PostgreSQL	HIGH	Phase 1
Forgejo	HIGH	PostgreSQL
Woodpecker CI	MEDIUM	Forgejo

Technical Decisions

Container Registry: Zot

OCI-native, lightweight
Native support for proxying multiple registries (Docker Hub, GHCR, Quay)
Built from source at ~/code/3rd/zot (not in homebrew)
Binary: ~/code/3rd/zot/bin/zot-darwin-arm64
Config: ~/.config/zot/config.json
Data: ~/zot/

Minikube Driver: Podman

Rootless containers for better security
Lighter than full VM (QEMU)
Uses existing container ecosystem
minikube start --driver=podman --container-runtime=cri-o

PostgreSQL: CloudNativePG Operator

Production-grade operator
Built-in backup/restore
Prometheus metrics
PITR support

K8s Service Exposure: Tailscale Operator

loadBalancerClass: tailscale on Services
Automatic TLS and MagicDNS names
ACL-controlled access

LaunchAgent Requirements (Critical)

LaunchAgents do NOT get homebrew on PATH. All commands must use absolute paths:

/Users/erichblume/code/3rd/zot/bin/zot-darwin-arm64 for zot (built from source)
/opt/homebrew/opt/mise/bin/mise x -- for mise-managed tools
/opt/homebrew/opt/postgresql@18/bin/pg_dump for postgres tools

This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). brew services handles this automatically but those aren't tracked in ansible.

Backup Strategy

Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at /Volumes/backups. This ensures backups continue even if k8s is down.

Service	Backup Approach
Zot Registry	No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control
Minikube	No backup of cluster state - declarative manifests in git, can recreate
PostgreSQL (k8s)	CloudNativePG scheduled backups to sifaka (Phase 1)
Grafana (k8s)	Dashboards in ansible source control, no runtime backup needed
Miniflux (k8s)	Database backed up via CloudNativePG
Forgejo (k8s)	Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration
devpi (k8s)	Private packages backed up, PyPI cache re-fetchable
Kiwix (k8s)	ZIM files re-downloadable via torrent, no backup needed

Borgmatic config changes: None required for Phase 0. Future phases may add k8s PV paths if needed.

Phase 0: Foundation

Goal: Container registry + minikube cluster without disrupting existing services

Important: Tailscale Service Creation Order

WARNING: You MUST create services in the Tailscale admin console BEFORE running tailscale serve commands via ansible. If you run tailscale serve --service svc:foo before the service exists in the admin console, the local config will be in a bad state.

To fix a misconfigured service:
tailscale serve --service svc:foo reset
Then create the service in admin console and try again.

Step 0.1: Update Pulumi ACLs (BEFORE Tailscale serve)

Files to modify:

pulumi/policy.hujson

Changes:

Add new tag to tagOwners section (around line 104, after "tag:feed"):

"tag:registry": ["autogroup:admin", "tag:blumeops"],

Add test cases to tests section:

Update Erich's accept list (around line 111) to include registry:

"accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"],

Update Allison's deny list (around line 117) to deny registry:

"deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443"],

Note:

No member grant needed - admins have full access via wildcard, members don't need registry
tag:k8s is added later in Phase 1 when the Tailscale Kubernetes Operator is deployed
Zot supports htpasswd auth if we later need finer-grained control

Testing:

mise run tailnet-preview   # Review changes - should show new tag
mise run tailnet-up        # Apply changes

Implementation Details:

Also need to add "tag:registry" to indri's tags in pulumi/__main__.py (the DeviceTags resource), not just define it in policy.hujson. The policy file defines the tag ownership rules, but the device tags are managed separately in the Python code.

Step 0.2: Create Tailscale Services in Admin Console (MANUAL)

CRITICAL: Do this BEFORE running any ansible that calls tailscale serve

Go to https://login.tailscale.com/admin/services
Create service registry with:
- Port: 443 (HTTPS)
- Host: indri

Implementation Details:

Tag is applied to indri via Pulumi in Step 0.1, not manually in admin console.

Verification:

# Service should appear (even if not yet serving)
tailscale status | grep registry

Step 0.3: Create Zot Registry Ansible Role

Note: Zot is NOT in homebrew (no formula or tap). Clone to ~/code/3rd/ on indri and build from source (requires Go).

Prerequisites on indri (ALREADY COMPLETED):

# Clone zot from forge mirror (use localhost:3001 - hairpinning doesn't work on indri)
ssh indri 'git clone http://localhost:3001/eblume/zot.git ~/code/3rd/zot'

# Set up Go via mise (creates mise.toml in repo directory)
ssh indri 'cd ~/code/3rd/zot && mise use go@1.25'

# Build (creates bin/zot-darwin-arm64, ~183MB)
ssh indri 'cd ~/code/3rd/zot && mise x -- make binary'

# Verify binary exists
ssh indri 'ls -la ~/code/3rd/zot/bin/zot-darwin-arm64'

Build verified: Binary at ~/code/3rd/zot/bin/zot-darwin-arm64 (183MB, ARM64 native).

New files:

ansible/roles/zot/
├── defaults/main.yml
├── tasks/main.yml
├── templates/
│   ├── config.json.j2
│   └── zot.plist.j2
└── handlers/main.yml

Key configuration (defaults/main.yml):

zot_repo_dir: "/Users/erichblume/code/3rd/zot"
zot_binary: "{{ zot_repo_dir }}/bin/zot-darwin-arm64"
zot_data_dir: "/Users/erichblume/zot"
zot_config_dir: "/Users/erichblume/.config/zot"
zot_port: 5000
zot_log_dir: "/Users/erichblume/Library/Logs"

# Pull-through cache registries (on-demand sync)
zot_sync_registries:
  - name: docker.io
    url: https://registry-1.docker.io
  - name: ghcr.io
    url: https://ghcr.io
  - name: quay.io
    url: https://quay.io

Zot config.json template (key sections):

{
  "storage": {
    "rootDirectory": "/Users/erichblume/zot"
  },
  "http": {
    "address": "0.0.0.0",
    "port": "5000"
  },
  "extensions": {
    "sync": {
      "enable": true,
      "registries": [
        {
          "urls": ["https://registry-1.docker.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        },
        {
          "urls": ["https://ghcr.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        },
        {
          "urls": ["https://quay.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        }
      ]
    }
  }
}

Two modes of operation:

Pull-through cache (automatic): When you pull registry.tail8d86e.ts.net/docker.io/library/nginx:latest, Zot fetches from Docker Hub and caches locally. Subsequent pulls are local.

Private images (manual push): Push your own images to any path NOT matching a sync prefix:

# From gilbert (after building)
podman push myapp:v1 registry.tail8d86e.ts.net/blumeops/myapp:v1

Namespace convention:

registry.tail8d86e.ts.net/docker.io/* → cached from Docker Hub
registry.tail8d86e.ts.net/ghcr.io/* → cached from GHCR
registry.tail8d86e.ts.net/quay.io/* → cached from Quay
registry.tail8d86e.ts.net/blumeops/* → private images (built by you/Woodpecker)

LaunchAgent template (zot.plist.j2):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "...">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>mcquack.eblume.zot</string>
    <key>ProgramArguments</key>
    <array>
        <!-- ABSOLUTE PATH to built binary in ~/code/3rd/zot -->
        <string>{{ zot_binary }}</string>
        <string>serve</string>
        <string>{{ zot_config_dir }}/config.json</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>{{ zot_log_dir }}/mcquack.zot.out.log</string>
    <key>StandardErrorPath</key>
    <string>{{ zot_log_dir }}/mcquack.zot.err.log</string>
</dict>
</plist>

Handlers (handlers/main.yml):

- name: Restart zot
  ansible.builtin.shell: |
    launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist 2>/dev/null || true
    launchctl load ~/Library/LaunchAgents/mcquack.eblume.zot.plist
  changed_when: true

Tasks should notify handler on config change:

- name: Deploy zot config
  ansible.builtin.template:
    src: config.json.j2
    dest: "{{ zot_config_dir }}/config.json"
  notify: Restart zot

Testing (after deploying role):

# Check LaunchAgent is running
ssh indri 'launchctl list | grep zot'

# Check zot is responding
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":[]}

# Check logs for errors
ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log'

# Test pull-through cache via curl (podman not installed until Step 0.8)
ssh indri 'curl -s http://localhost:5000/v2/docker.io/library/alpine/manifests/latest -H "Accept: application/vnd.oci.image.manifest.v1+json"'
# Should return manifest JSON (triggers cache fetch from Docker Hub)
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":["docker.io/library/alpine"]}

Implementation Details:

Changed port from 5000 to 5050 because macOS ControlCenter (AirPlay Receiver) uses port 5000 by default.
Fixed sync config: use "content": [{"prefix": "**", "destination": "/{{ registry.name }}"}] instead of "prefix": "{{ registry.name }}/**". The destination rewrites the local path, while prefix ** matches all upstream repos.

Step 0.4: Add Zot to Tailscale Serve

Files to modify:

ansible/roles/tailscale_serve/defaults/main.yml

Changes:

# Add to tailscale_serve_services list
- name: svc:registry
  https:
    port: 443
    upstream: http://localhost:5000

Testing:

# Deploy tailscale serve config
mise run provision-indri -- --tags tailscale-serve

# Verify from gilbert (not indri - hairpinning doesn't work)
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["docker.io/library/alpine"]} (from Step 0.3 test)

# Test private image push from gilbert
podman pull alpine:latest
podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1
podman push registry.tail8d86e.ts.net/blumeops/test:v1
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]}

Implementation Details:

Changed upstream port from 5000 to 5050 (see Step 0.3 implementation details).
After running tailscale serve, the service must be approved in Tailscale admin console at https://login.tailscale.com/admin/services before it becomes accessible.
Podman needed on gilbert for testing - added to Brewfile. Requires podman machine init && podman machine start after install.

Step 0.5: Create Zot Metrics Role

New files:

ansible/roles/zot_metrics/
├── defaults/main.yml
├── tasks/main.yml
├── templates/
│   ├── zot-metrics.sh.j2
│   └── zot-metrics.plist.j2
└── handlers/main.yml

Metrics script pattern (zot-metrics.sh.j2):

#!/bin/bash
# Collect Zot registry metrics for Prometheus textfile collector
set -euo pipefail

METRICS_FILE="/opt/homebrew/var/node_exporter/textfile/zot.prom"
TEMP_FILE="${METRICS_FILE}.tmp"

# Check if zot is up
if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then
    echo "zot_up 1" > "$TEMP_FILE"
else
    echo "zot_up 0" > "$TEMP_FILE"
fi

mv "$TEMP_FILE" "$METRICS_FILE"

Note: Start with just zot_up for now. Additional metrics (storage usage, cache stats) can be added later after reviewing zot's metrics endpoint.

Testing:

# Deploy metrics role
mise run provision-indri -- --tags zot_metrics

# Check metrics file exists and is updated
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom'
# Expected: zot_up 1

# Verify metrics appear in Prometheus (after a scrape cycle)
curl -s "http://indri:9090/api/v1/query?query=zot_up" | jq '.data.result[0].value[1]'
# Expected: "1"

Step 0.6: Add Zot Log Collection to Alloy

Files to modify:

ansible/roles/alloy/defaults/main.yml

Changes: Add to the alloy_mcquack_logs list:

  - path: /Users/erichblume/Library/Logs/mcquack.zot.out.log
    service: zot
    stream: stdout
  - path: /Users/erichblume/Library/Logs/mcquack.zot.err.log
    service: zot
    stream: stderr

Testing:

# Deploy alloy config (handler restarts alloy automatically if config changed)
mise run provision-indri -- --tags alloy

# Wait a minute, then check Loki for zot logs
# In Grafana Explore, query: {service="zot"}

Step 0.7: Update indri-services-check Script

Files to modify:

mise-tasks/indri-services-check

Changes to add:

# Add after existing service checks (around line 55)
check_service "zot" "ssh indri 'launchctl list | grep zot | grep -v \"^-\"'"
check_service "zot-metrics" "ssh indri 'launchctl list | grep zot-metrics | grep -v \"^-\"'"

# Add to HTTP endpoints section (around line 65)
check_http "Zot Registry" "http://indri:5000/v2/_catalog"

# Add metrics file check
check_service "Zot metrics" "ssh indri 'test -f /opt/homebrew/var/node_exporter/textfile/zot.prom'"

Testing:

# Run the health check
mise run indri-services-check

# Expected output includes:
# zot...               OK
# zot-metrics...       OK
# Zot Registry...      OK
# Zot metrics...       OK

Implementation Details:

Used Tailscale service URL (https://registry.tail8d86e.ts.net/v2/_catalog) instead of internal endpoint to verify full path works.

Step 0.8: Install and Configure Podman on Indri

New files:

ansible/roles/podman/
├── tasks/main.yml
└── handlers/main.yml

Tasks (tasks/main.yml):

- name: Install podman via homebrew
  community.general.homebrew:
    name: podman
    state: present

- name: Initialize podman machine (if not exists)
  ansible.builtin.command:
    cmd: podman machine init --cpus 4 --memory 8192 --disk-size 220
  register: podman_init
  changed_when: podman_init.rc == 0
  failed_when: podman_init.rc not in [0, 125]  # 125 = already exists

- name: Start podman machine
  ansible.builtin.command:
    cmd: podman machine start
  register: podman_start
  changed_when: "'started successfully' in podman_start.stdout"
  failed_when: podman_start.rc not in [0, 125]  # 125 = already running

Testing:

# Deploy podman role
mise run provision-indri -- --tags podman

# Verify podman is working
ssh indri 'podman info'
ssh indri 'podman run --rm hello-world'

Implementation Details:

KNOWN ISSUE: podman machine init and podman machine start have reliability issues when run via Ansible/SSH. The machine sometimes gets stuck in "Starting" state due to a race condition (see https://github.com/containers/podman/issues/16945). Apple Hypervisor may also require GUI session context.

WORKAROUND: If the machine fails to start via Ansible, manually run on indri:

podman machine rm -f podman-machine-default
podman machine init --cpus 4 --memory 8192 --disk-size 220
podman machine start

LaunchAgent approach was attempted but didn't resolve the issue reliably.
TODO: Investigate proper automation solution for reliable podman machine management.

Step 0.9: Install and Configure Minikube

New files:

ansible/roles/minikube/
├── defaults/main.yml
├── tasks/main.yml
└── handlers/main.yml

Defaults:

minikube_cpus: 4
minikube_memory: 8192
minikube_disk_size: "200g"
minikube_driver: podman
minikube_container_runtime: cri-o

Note on storage: The disk-size is for node-local storage only (container images, emptyDir, local PVs). Pods can also mount external storage:

hostPath - indri filesystem (e.g., ~/transmission/ for kiwix ZIM files)
NFS - sifaka volumes (Synology supports NFS natively, easiest for k8s)
SMB/CIFS - requires csi-driver-smb; sifaka currently uses SMB for desktop mounts

Tasks:

- name: Install minikube via homebrew
  community.general.homebrew:
    name: minikube
    state: present

- name: Check if minikube cluster exists
  ansible.builtin.command:
    cmd: minikube status --format='{{.Host}}'
  register: minikube_status
  changed_when: false
  failed_when: false

- name: Start minikube cluster
  ansible.builtin.command:
    cmd: >
      minikube start
      --driver={{ minikube_driver }}
      --container-runtime={{ minikube_container_runtime }}
      --cpus={{ minikube_cpus }}
      --memory={{ minikube_memory }}
      --disk-size={{ minikube_disk_size }}
  when: minikube_status.rc != 0 or 'Running' not in minikube_status.stdout

Testing:

# Deploy minikube role
mise run provision-indri -- --tags minikube

# Verify cluster is running
ssh indri 'minikube status'
# Expected: host: Running, kubelet: Running, apiserver: Running

# Test kubectl access from indri
ssh indri 'kubectl get nodes'
# Expected: minikube   Ready    control-plane   ...

Implementation Details:

Changed minikube_memory from 8192 to 7800 because podman machine reports slightly less available memory (7908MB) due to VM overhead. Minikube rejects memory requests exceeding what podman reports.
Deployed with Kubernetes v1.34.0 and CRI-O 1.24.6.

Step 0.10: Configure Kubeconfig on Gilbert

Goal: Enable kubectl and k9s on gilbert to connect to the minikube cluster running on indri.

Considerations:

Minikube runs inside a podman VM on indri, so the API server isn't directly exposed on indri's network interface
Admin users have full Tailscale access to indri via autogroup:admin → * → *
Be careful not to overwrite existing work kubeconfigs

Possible approaches:

SSH tunneling to forward the API server port
minikube tunnel running on indri (exposes LoadBalancer services)
Configure minikube with --apiserver-names=indri at cluster creation time
Use kubectl via SSH wrapper: ssh indri kubectl ...

Verification:

# From gilbert, these should work:
kubectl get nodes
kubectl get namespaces
k9s  # Should show the minikube cluster

The exact approach will be determined during implementation based on what works best with the podman driver.

Implementation Details:

Chose Option 3: Recreate cluster with --apiserver-names after researching alternatives:

SSH tunneling - Requires keeping a tunnel running or complex on-demand setup
SOCKS5 proxy with kubeconfig proxy-url - Kubeconfig supports proxy-url: socks5://localhost:1080 per-context, but still requires managing the proxy
--apiserver-names + --listen-address - Native minikube support, cleanest solution

Cluster Setup: Recreated the minikube cluster with additional flags:

minikube delete
minikube start \
  --driver=podman \
  --container-runtime=cri-o \
  --cpus=4 --memory=7800 --disk-size=200g \
  --apiserver-names=indri \
  --listen-address=0.0.0.0

--apiserver-names=indri adds "indri" to the API server certificate SAN
--listen-address=0.0.0.0 tells podman to expose the API port on all interfaces
API server port is dynamic (check with kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}" on indri)

Credential Management with 1Password:

Rather than copying private keys between machines, credentials are stored in 1Password and fetched on-demand using kubectl's exec credential plugin. This mirrors the 1Password SSH agent pattern for biometric-protected key access.

Store credentials in 1Password (vault: vg6xf6vvfmoh5hqjjhlhbeoaie, item: 3jo4f2hnzvwfmamudfsbbbec7e):
- client-cert - Contents of ~/.minikube/profiles/minikube/client.crt (text field)
- client-key - Contents of ~/.minikube/profiles/minikube/client.key (text field)
- ca-cert - Contents of ~/.minikube/ca.crt (text field, not secret but stored for convenience)

Created credential helper script at bin/kubectl-credential-1password:

#!/bin/bash
# Fetches client cert/key from 1Password, outputs ExecCredential JSON
# Usage: kubectl-credential-1password <vault-id> <item-id> <cert-field> <key-field>

Symlinked to ~/.local/bin/kubectl-credential-1password

Kubeconfig setup on gilbert:

# Store CA cert locally (not secret - public key for server verification)
mkdir -p ~/.kube/minikube-indri
op --vault <vault> item get <item> --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt

# Configure cluster
kubectl config set-cluster minikube-indri \
  --server=https://indri:<port> \
  --certificate-authority=/Users/eblume/.kube/minikube-indri/ca.crt

# Configure credentials with exec plugin
kubectl config set-credentials minikube-indri \
  --exec-api-version=client.authentication.k8s.io/v1beta1 \
  --exec-command=kubectl-credential-1password \
  --exec-arg=<vault-id> \
  --exec-arg=<item-id> \
  --exec-arg=client-cert \
  --exec-arg=client-key

# Create context
kubectl config set-context minikube-indri \
  --cluster=minikube-indri \
  --user=minikube-indri

Usage:

kubectl --context=minikube-indri get nodes
# or
kubectl config use-context minikube-indri
kubectl get nodes

Security Notes:

Client private key never stored on disk - fetched from 1Password on each kubectl command
CA cert stored on disk (not secret - it's a public key for server verification)
1Password biometric/password prompt required for credential access
op command strips quotes from text fields with sed 's/^"//; s/"$//'

References:

Step 0.11: Add Minikube to indri-services-check

Files to modify:

mise-tasks/indri-services-check

Changes:

# Add new section for Kubernetes
echo ""
echo "Kubernetes cluster:"
check_service "minikube" "ssh indri 'minikube status --format={{.Host}} | grep -q Running'"
check_service "k8s-apiserver" "ssh indri 'kubectl get --raw /healthz'"

Testing:

mise run indri-services-check

# Expected output includes:
# Kubernetes cluster:
# minikube...          OK
# k8s-apiserver...     OK

Implementation Notes:

Added a third check k8s-apiserver (remote) that verifies kubectl access from gilbert, not just via SSH to indri. This ensures the 1Password credential flow and remote API server access are working.
The remote check uses both --kubeconfig and --context flags explicitly since the script runs in bash (not fish) and doesn't inherit the KUBECONFIG environment variable from fish config.

Step 0.12: Create Zettelkasten Documentation

New files:

~/code/personal/zk/zot.md
~/code/personal/zk/minikube.md

Files to update:

~/code/personal/zk/1767747119-YCPO.md (main blumeops card)

Updates to main blumeops card:

Add to Device Tags table: | tag:registry | indri | Container registry access |
Add to Services table: | Registry | https://registry.tail8d86e.ts.net | OCI container registry (Zot) | zot | | Kubernetes | https://indri: | Minikube cluster | minikube |
Add to Port Map (Indri) table: | 5050 | Zot | HTTP | localhost | Container registry | | | K8s API | HTTPS | 0.0.0.0 | Minikube API server |

Add new section Remote Kubernetes Access:

## Remote Kubernetes Access (from Gilbert)

The minikube cluster on indri is accessible from gilbert via direct connection.
Cluster was created with `--apiserver-names=indri --listen-address=0.0.0.0`.

**Fish abbreviations** (in `~/.config/fish/config.fish`):
- `ki` → `kubectl --context=minikube-indri`
- `k9i` → `k9s --context=minikube-indri`
- `k9` → `k9s`

```bash
# Quick access via abbreviations
ki get nodes
k9i

# Or explicitly set context
kubectl config use-context minikube-indri
kubectl get nodes

Template for zot.md:

---
id: zot
aliases:
  - zot
  - container-registry
tags:
  - blumeops
---

# Zot Registry Management Log

Zot is an OCI-native container registry running on Indri, providing:
1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits)
2. Private image storage for custom-built containers

## Service Details

- URL: https://registry.tail8d86e.ts.net
- Local port: 5050
- Data directory: ~/zot
- Config: ~/.config/zot/config.json
- Managed via: mcquack LaunchAgent

## Namespace Convention

| Path | Source |
|------|--------|
| `registry.../docker.io/*` | Cached from Docker Hub |
| `registry.../ghcr.io/*` | Cached from GHCR |
| `registry.../quay.io/*` | Cached from Quay |
| `registry.../blumeops/*` | Private images (yours) |

## Useful Commands

\`\`\`bash
# List all images
curl -s http://localhost:5050/v2/_catalog | jq

# Pull via cache (from indri or k8s)
podman pull localhost:5050/docker.io/library/nginx:latest

# Build and push private image (from gilbert)
podman build -t registry.tail8d86e.ts.net/blumeops/myapp:v1 .
podman push registry.tail8d86e.ts.net/blumeops/myapp:v1

# Check service status
launchctl list | grep zot

# View logs
tail -f ~/Library/Logs/mcquack.zot.err.log
\`\`\`

## Log

### [DATE]
- Initial setup for k8s migration Phase 0

Template for minikube.md:

---
id: minikube
aliases:
  - minikube
  - kubernetes
  - k8s
tags:
  - blumeops
---

# Minikube Management Log

Minikube provides a single-node Kubernetes cluster on Indri for running containerized services.

## Cluster Details

- Driver: podman (rootless)
- Container runtime: CRI-O
- Kubernetes version: v1.34.0
- Resources: 4 CPUs, 7800MB RAM, 200GB disk
- API server: https://indri:<port> (accessible from gilbert via Tailscale)

## Remote Access from Gilbert

Cluster was created with `--apiserver-names=indri --listen-address=0.0.0.0` to allow remote kubectl access.

\`\`\`bash
# Switch context
kubectl config use-context minikube-indri

# Verify
kubectl get nodes
kubectl get namespaces

# Use k9s
k9s --context minikube-indri
\`\`\`

## Useful Commands (on indri)

\`\`\`bash
# Cluster status
minikube status

# Start/stop cluster
minikube start
minikube stop

# Access dashboard
minikube dashboard

# SSH into node
minikube ssh

# View logs
minikube logs
\`\`\`

## Podman Machine (prerequisite)

Minikube uses podman as the container runtime. The podman machine must be running:

\`\`\`bash
# Check podman machine
podman machine list

# Start if needed
podman machine start
\`\`\`

## Log

### [DATE]
- Initial cluster setup for k8s migration Phase 0
- Configured for remote access with --apiserver-names=indri

Implementation Notes:

Created zot.md and minikube.md in ~/code/personal/zk/
Updated 1767747119-YCPO.md (main blumeops card) with all specified changes
Added 1Password credential plugin reference to minikube docs
K8s API port is 39535 (dynamically assigned by minikube, may change on cluster recreation)

Step 0.13: Update Main Playbook

Files to modify:

ansible/playbooks/indri.yml

Changes:

# Add new roles to the roles list
- role: podman
  tags: podman
- role: zot
  tags: zot
- role: zot_metrics
  tags: zot_metrics
- role: minikube
  tags: minikube

Implementation Notes:

Roles were added incrementally during Steps 0.3, 0.5, 0.8, and 0.9
All four roles (zot, zot_metrics, podman, minikube) confirmed present in indri.yml

Phase 0 Verification Checklist

Run after completing all steps:

# 1. Full service health check
mise run indri-services-check
# All services should show OK, including new ones

# 2. Registry functionality - pull-through cache
ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest'
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["docker.io/library/alpine"]}

# 3. Registry functionality - private image push (from gilbert)
podman pull alpine:latest
podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1
podman push registry.tail8d86e.ts.net/blumeops/test:v1
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]}

# 4. Kubernetes cluster
ssh indri 'minikube status'
ssh indri 'kubectl get nodes'
kubectl get nodes  # from gilbert

# 5. Metrics in Prometheus
curl -s "http://indri:9090/api/v1/query?query=zot_up"
# Expected: value = 1

# 6. Logs in Loki
# In Grafana Explore: {service="zot"}
# Should see zot log entries

# 7. k9s from gilbert
k9s
# Should connect and show minikube cluster

Phase 0 Rollback

If something goes wrong:

# Stop and remove minikube
ssh indri 'minikube stop && minikube delete'

# Stop and remove zot
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist'
ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.zot.plist'

# Remove podman machine
ssh indri 'podman machine stop && podman machine rm'

# Remove from tailscale serve
ssh indri 'tailscale serve --service svc:registry reset'

# Remove tags from Pulumi (revert policy.hujson changes)
mise run tailnet-up

# Revert ansible playbook changes
git checkout ansible/playbooks/indri.yml
git checkout ansible/roles/tailscale_serve/defaults/main.yml
git checkout ansible/roles/alloy/templates/config.alloy.j2

# Remove new roles
rm -rf ansible/roles/{zot,zot_metrics,podman,minikube}

# Remove zk cards
rm ~/code/personal/zk/{zot,minikube}.md

Step 0.14: Expose K8s API as Tailscale Service (Added Post-Completion)

Note

: This step was added after Phase 0 was otherwise complete, to provide a stable, named endpoint for the Kubernetes API server.

Goal: Expose the minikube API server as k8s.tail8d86e.ts.net instead of using indri:<dynamic-port>.

Current state:

Minikube API server on port 39535 (dynamic, could change on cluster recreation)
Accessed via https://indri:39535
Certificate SANs include "indri"

Target state:

Stable Tailscale service at k8s.tail8d86e.ts.net:443
Fixed API server port (6443, the k8s standard)
Certificate SANs include both hostnames for compatibility

Step 0.14.1: Update Pulumi ACLs

Files to modify:

pulumi/policy.hujson
pulumi/__main__.py

Changes to policy.hujson:

Add tag to tagOwners:

"tag:k8s-api": ["autogroup:admin", "tag:blumeops"],

Update Erich's test case accept list to include k8s-api:

"accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443", "tag:k8s-api:443"],

Update Allison's deny list:

"deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443", "tag:k8s-api:443"],

Changes to main.py:

Add "tag:k8s-api" to indri's DeviceTags

Testing:

mise run tailnet-preview   # Review changes
mise run tailnet-up        # Apply changes

Step 0.14.2: Create Tailscale Service in Admin Console (MANUAL)

CRITICAL: Do this BEFORE running ansible that calls tailscale serve

Go to https://login.tailscale.com/admin/services
Create service k8s with:
- Port: 443 (TCP)
- Host: indri

Step 0.14.3: Recreate Minikube Cluster

The cluster needs to be recreated to:

Add k8s.tail8d86e.ts.net to the API server certificate SANs
Fix the API server port to 6443 (standard k8s port)

On indri:

# Stop and delete existing cluster
minikube stop
minikube delete

# Recreate with new settings
minikube start \
  --driver=podman \
  --container-runtime=cri-o \
  --cpus=4 --memory=7800 --disk-size=200g \
  --apiserver-names=k8s.tail8d86e.ts.net,indri \
  --apiserver-port=6443 \
  --listen-address=0.0.0.0

# Verify certificate SANs include both names
kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
# Expected: https://127.0.0.1:6443 or similar

# Verify cluster is running
minikube status
kubectl get nodes

Update ansible role defaults (ansible/roles/minikube/defaults/main.yml):

minikube_apiserver_names:
  - k8s.tail8d86e.ts.net
  - indri
minikube_apiserver_port: 6443

Step 0.14.4: Add K8s Service to Tailscale Serve

Files to modify:

ansible/roles/tailscale_serve/defaults/main.yml

Add to services list:

- name: svc:k8s
  tcp:
    port: 443
    upstream: tcp://localhost:6443

Note: Using TCP passthrough (not HTTPS termination) because k8s uses mTLS authentication.

Deploy:

mise run provision-indri -- --tags tailscale-serve

Step 0.14.5: Update 1Password Credentials

After cluster recreation, the client certificates have changed.

On indri, get the new credentials:

# Display new certificates (copy to 1Password)
cat ~/.minikube/profiles/minikube/client.crt
cat ~/.minikube/profiles/minikube/client.key
cat ~/.minikube/ca.crt

In 1Password (vault: vg6xf6vvfmoh5hqjjhlhbeoaie, item: 3jo4f2hnzvwfmamudfsbbbec7e):

Update client-cert field with new certificate
Update client-key field with new key
Update ca-cert field with new CA certificate

Step 0.14.6: Update Kubeconfig on Gilbert

Update CA certificate:

# Fetch new CA cert from 1Password
op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 3jo4f2hnzvwfmamudfsbbbec7e --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt

Update kubeconfig (~/.kube/minikube-indri/config.yml):

clusters:
- cluster:
    certificate-authority: /Users/eblume/.kube/minikube-indri/ca.crt
    server: https://k8s.tail8d86e.ts.net  # Changed from https://indri:39535
  name: minikube-indri

Verification:

# Test connection via new hostname
kubectl --context=minikube-indri get nodes

# Test via abbreviation
ki get nodes

Step 0.14.7: Update Documentation

Files to update:

~/code/personal/zk/minikube.md - Update API server URL and port info
~/code/personal/zk/1767747119-YCPO.md - Update Services table and Port Map

Changes to blumeops card:

Update Services table: | Kubernetes | https://k8s.tail8d86e.ts.net | Minikube cluster | minikube |
Update Port Map: | 6443 | K8s API | HTTPS/TCP | 0.0.0.0 | Minikube API server (via Tailscale) |
Add tag:k8s-api to Device Tags table

Step 0.14.8: Update indri-services-check

Files to modify:

mise-tasks/indri-services-check

Changes:

# Update remote k8s check to use new URL
check_service "k8s-apiserver (remote)" "kubectl --kubeconfig=$HOME/.kube/minikube-indri/config.yml --context=minikube-indri get --raw /healthz"
# (No change needed - uses kubeconfig which now points to k8s.tail8d86e.ts.net)

Step 0.14 Verification

# 1. Service health check
mise run indri-services-check
# All services should be OK

# 2. Test k8s access via Tailscale hostname
curl -k https://k8s.tail8d86e.ts.net/healthz
# Expected: ok (or certificate error if mTLS required - that's fine)

# 3. kubectl via Tailscale
ki get nodes
ki get namespaces

# 4. k9s via Tailscale
k9i

Phase 0 Follow-up: Grafana Dashboards

After Phase 0 is running and stable, create monitoring dashboards:

Zot Dashboard (ansible/roles/grafana/files/dashboards/zot.json):

Check what metrics zot exposes: ssh indri 'curl -s http://localhost:5000/metrics'
Review community dashboards for inspiration (copy permitted if license allows)
Create dashboard with available metrics (at minimum: zot_up)

Minikube Dashboard (ansible/roles/grafana/files/dashboards/minikube.json):

Deploy kube-state-metrics if needed for additional cluster metrics
Review what Prometheus can scrape from the cluster
Review community dashboards for inspiration (copy permitted if license allows)
Create dashboard with relevant panels (node usage, pod counts, etc.)

New Files Summary

File	Purpose
`ansible/roles/zot/`	Zot registry deployment
`ansible/roles/zot_metrics/`	Metrics collection for Zot
`ansible/roles/podman/`	Podman installation and setup
`ansible/roles/minikube/`	Minikube cluster setup
`~/code/personal/zk/zot.md`	Zot management documentation
`~/code/personal/zk/minikube.md`	Minikube management documentation

Modified Files Summary

File	Changes
`pulumi/policy.hujson`	Add tag:registry
`ansible/playbooks/indri.yml`	Add new roles
`ansible/roles/tailscale_serve/defaults/main.yml`	Add svc:registry
`ansible/roles/alloy/templates/config.alloy.j2`	Add zot log collection
`mise-tasks/indri-services-check`	Add zot and k8s checks

Phase 1: Kubernetes Infrastructure

Goal: Tailscale operator + CloudNativePG operator

Steps

Update Pulumi ACLs for k8s workloads

Add tag:k8s to pulumi/policy.hujson - this tag is for k8s workloads that need to access other services (e.g., Woodpecker CI pushing to registry).

Changes to tagOwners:

"tag:k8s": ["autogroup:admin", "tag:blumeops"],

Add grant for k8s→registry access:

// k8s workloads (e.g., Woodpecker CI) can push/pull from registry
{
	"src": ["tag:k8s"],
	"dst": ["tag:registry"],
	"ip":  ["tcp:443"],
},

Add test case:

{
	"src":    "tag:k8s",
	"accept": ["tag:registry:443"],
},

mise run tailnet-preview && mise run tailnet-up

Create Tailscale OAuth client
- Scopes: Devices Core, Auth Keys, Services write
- Tag: tag:k8s-operator
- Store in 1Password

Deploy Tailscale Kubernetes Operator

helm repo add tailscale https://pkgs.tailscale.com/helmcharts
helm install tailscale-operator tailscale/tailscale-operator \
  --namespace tailscale-system --create-namespace \
  --set oauth.clientId=$CLIENT_ID \
  --set oauth.clientSecret=$CLIENT_SECRET

Deploy CloudNativePG operator

kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml

Create PostgreSQL cluster

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: blumeops-pg
  namespace: databases
spec:
  instances: 1
  storage:
    size: 10Gi
    storageClass: standard
  monitoring:
    enablePodMonitor: true

Update Alloy config
- Add kubernetes_sd_configs for k8s metrics
- Scrape operator metrics

New Files

ansible/k8s/operators/ - Operator manifests
ansible/k8s/databases/ - PostgreSQL cluster

Verification

kubectl get pods -n tailscale-system
kubectl get pods -n cnpg-system
kubectl get cluster -n databases

Phase 2: Grafana Migration (Pilot)

Goal: Migrate Grafana as lowest-risk pilot service

Steps

Deploy Grafana via Helm
- Copy datasource config from existing role
- Copy dashboards from ansible/roles/grafana/files/dashboards/
- Point to indri Prometheus/Loki (http://indri:9090, http://indri:3100)

Configure Tailscale LoadBalancer

service:
  type: LoadBalancer
  loadBalancerClass: tailscale

Verify all dashboards work
Update tailscale_serve - remove grafana entry
Stop brew grafana: brew services stop grafana

Verification

https://grafana.tail8d86e.ts.net loads
All dashboards functional

Phase 3: PostgreSQL Migration

Goal: Migrate miniflux database to CloudNativePG

Steps

Create databases and users in k8s PostgreSQL
- miniflux database/user
- borgmatic read-only user

Export from brew PostgreSQL

pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql

Expose k8s PostgreSQL via Tailscale
- Service with loadBalancerClass: tailscale
- Tag: svc:pg-k8s

Import data

psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql

Update borgmatic config
- Change hostname to k8s PostgreSQL
Verify data integrity

Rollback

Keep brew PostgreSQL running until Phase 4 verified

Phase 4: Miniflux Migration

Goal: Migrate Miniflux to k8s

Steps

Deploy Miniflux

image: ghcr.io/miniflux/miniflux:latest
env:
  DATABASE_URL: from secret
  RUN_MIGRATIONS: "1"

Configure Tailscale LoadBalancer - tag: svc:feed
Update Alloy log collection - add k8s namespace
Verify: login, feeds refresh, API works
Stop brew miniflux: brew services stop miniflux

Phase 5: devpi Migration

Goal: Migrate devpi to k8s

Steps

Build devpi container
- Dockerfile with devpi-server + devpi-web
- Push to local Zot registry
Deploy as StatefulSet
- PVC for data (50Gi)
- Migrate existing data (excluding PyPI cache)
Configure Tailscale LoadBalancer - tag: svc:pypi
Update pip.conf on gilbert
Stop mcquack devpi

Phase 6: Kiwix Migration

Goal: Migrate kiwix-serve to k8s

Steps

Create NFS/hostPath PV for ZIM files
- Point to transmission download directory
- ReadOnlyMany access

Deploy Kiwix

image: ghcr.io/kiwix/kiwix-serve:3.8.1
args: ["/data/*.zim"]

Configure Tailscale LoadBalancer - tag: svc:kiwix
Stop mcquack kiwix-serve

Phase 7: Forgejo Migration (Highest Risk)

Goal: Migrate Forgejo to k8s

Pre-Migration Checklist

Full borgmatic backup verified
Manual backup of /opt/homebrew/var/forgejo
Document SSH keys and webhooks

Steps

Deploy Forgejo via Helm

helm install forgejo forgejo/forgejo \
  --namespace forgejo --create-namespace

Migrate data
- Stop brew forgejo
- Copy data to PVC
- Start k8s forgejo
Configure Tailscale services
- HTTPS 443 via LoadBalancer
- SSH port 22 (TCP proxy)
Verify all repositories accessible

Rollback

Restore brew forgejo and tailscale serve config

Phase 8: CI/CD (Woodpecker)

Goal: Deploy Woodpecker CI integrated with Forgejo

Steps

Create Forgejo OAuth application
- Callback: https://ci.tail8d86e.ts.net/authorize
- Store in 1Password
Deploy Woodpecker Server + Agent
Configure Tailscale LoadBalancer - tag: svc:ci
Test pipeline - create .woodpecker.yaml in test repo

Phase 9: Cleanup

Goal: Remove deprecated services, harden system

Steps

Stop/remove unused brew services
- postgresql@18, grafana, miniflux, forgejo
Update ansible playbook
- Remove migrated service roles
- Add k8s deployment references
Configure Velero backups (optional)
- Install with MinIO on sifaka
- Schedule daily cluster backups
Update zk documentation
- New architecture
- Runbooks
- DR procedures

Critical Files

File	Purpose
`ansible/playbooks/indri.yml`	Main playbook - add k8s roles, remove migrated services
`ansible/roles/tailscale_serve/defaults/main.yml`	Transition services to Tailscale operator
`pulumi/policy.hujson`	Add tags: k8s, registry, ci
`ansible/roles/borgmatic/defaults/main.yml`	Update PostgreSQL endpoint
`mise-tasks/indri-services-check`	Add k8s health checks

New Directory Structure

ansible/
  k8s/
    operators/
      tailscale-operator.yaml
      cloudnative-pg.yaml
    databases/
      blumeops-pg.yaml
    apps/
      grafana/
      miniflux/
      forgejo/
      devpi/
      kiwix/
      woodpecker/
  roles/
    zot/           # NEW
    podman/        # NEW
    minikube/      # NEW

Risk Mitigation

Circular dependency prevention: Zot registry runs outside k8s
Observability: Prometheus/Loki stay on indri
Data loss prevention: borgmatic + manual backups before each phase
Recovery: Can manually push images, restore from backups

Container Images (All ARM64)

Service	Image
Miniflux	`ghcr.io/miniflux/miniflux:latest`
Forgejo	`codeberg.org/forgejo/forgejo:10`
Grafana	`grafana/grafana:latest`
Kiwix	`ghcr.io/kiwix/kiwix-serve:3.8.1`
Woodpecker	`woodpeckerci/woodpecker-server`

Note: Zot runs as a native binary on indri (built from source at ~/code/3rd/zot), not as a container.

Plan Completion

When all phases are complete and verified:

# Move plan to completed directory with completion date
git mv plans/k8s-migration.md plans/completed/k8s-migration.$(date +%Y-%m-%d).md
git commit -m "Complete k8s migration plan"

46 KiB Raw Blame History

Blumeops Minikube Migration Plan

Architecture Overview

Services Staying on Indri (Outside K8s)

Services Moving to K8s

Technical Decisions

Container Registry: Zot

Minikube Driver: Podman

PostgreSQL: CloudNativePG Operator

K8s Service Exposure: Tailscale Operator

LaunchAgent Requirements (Critical)

Backup Strategy

Phase 0: Foundation

Important: Tailscale Service Creation Order

Step 0.1: Update Pulumi ACLs (BEFORE Tailscale serve)

Step 0.2: Create Tailscale Services in Admin Console (MANUAL)

Step 0.3: Create Zot Registry Ansible Role

Step 0.4: Add Zot to Tailscale Serve

Step 0.5: Create Zot Metrics Role

Step 0.6: Add Zot Log Collection to Alloy

Step 0.7: Update indri-services-check Script

Step 0.8: Install and Configure Podman on Indri

Step 0.9: Install and Configure Minikube

Step 0.10: Configure Kubeconfig on Gilbert

Step 0.11: Add Minikube to indri-services-check

Step 0.12: Create Zettelkasten Documentation

Step 0.13: Update Main Playbook

Phase 0 Verification Checklist

Phase 0 Rollback

Step 0.14: Expose K8s API as Tailscale Service (Added Post-Completion)

Step 0.14.1: Update Pulumi ACLs

Step 0.14.2: Create Tailscale Service in Admin Console (MANUAL)

Step 0.14.3: Recreate Minikube Cluster

Step 0.14.4: Add K8s Service to Tailscale Serve

Step 0.14.5: Update 1Password Credentials

Step 0.14.6: Update Kubeconfig on Gilbert

Step 0.14.7: Update Documentation

Step 0.14.8: Update indri-services-check

Step 0.14 Verification

Phase 0 Follow-up: Grafana Dashboards

New Files Summary

Modified Files Summary

Phase 1: Kubernetes Infrastructure

Steps

New Files

Verification

Phase 2: Grafana Migration (Pilot)

Steps

Verification

Phase 3: PostgreSQL Migration

Steps

Rollback

Phase 4: Miniflux Migration

Steps

Phase 5: devpi Migration

Steps

Phase 6: Kiwix Migration

Steps

Phase 7: Forgejo Migration (Highest Risk)

Pre-Migration Checklist

Steps

Rollback

Phase 8: CI/CD (Woodpecker)

Steps

Phase 9: Cleanup

Steps

Critical Files

New Directory Structure

Risk Mitigation

Container Images (All ARM64)

Plan Completion

46 KiB

Raw Blame History