blumeops/docs/k8s-migration.md
Erich Blume 6d84ff7bca Use forge mirror for zot, add third-party project guidance
- Updated Step 0.3 to clone zot from forge mirror instead of GitHub
- Added "Third-Party Projects" section to CLAUDE.md explaining:
  - Ask user to mirror 3rd party repos to forge first
  - Clone from mirror to ~/code/3rd/
  - Avoids external dependencies

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 14:00:54 -08:00

28 KiB

Blumeops Minikube Migration Plan

This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes.

Architecture Overview

Services Staying on Indri (Outside K8s)

Service Reason
Zot Registry (NEW) Avoid circular dependency - k8s needs images to start
Prometheus Observability backbone must survive k8s failures
Loki Log aggregation backbone
Borgmatic Backup system
Grafana-alloy Metrics/logs collector on host
Plex Until Jellyfin replacement
Transmission Downloads for kiwix ZIM files

Services Moving to K8s

Service Complexity Dependencies
Grafana LOW Phase 1
Kiwix LOW Phase 1
Miniflux MEDIUM PostgreSQL
devpi MEDIUM Registry
PostgreSQL HIGH Phase 1
Forgejo HIGH PostgreSQL
Woodpecker CI MEDIUM Forgejo

Technical Decisions

Container Registry: Zot

  • OCI-native, lightweight
  • Native support for proxying multiple registries (Docker Hub, GHCR, Quay)
  • Single binary, ARM64 native
  • Config at /etc/zot/config.json

Minikube Driver: Podman

  • Rootless containers for better security
  • Lighter than full VM (QEMU)
  • Uses existing container ecosystem
  • minikube start --driver=podman --container-runtime=containerd

PostgreSQL: CloudNativePG Operator

  • Production-grade operator
  • Built-in backup/restore
  • Prometheus metrics
  • PITR support

K8s Service Exposure: Tailscale Operator

  • loadBalancerClass: tailscale on Services
  • Automatic TLS and MagicDNS names
  • ACL-controlled access

LaunchAgent Requirements (Critical)

LaunchAgents do NOT get homebrew on PATH. All commands must use absolute paths:

  • /opt/homebrew/bin/zot not zot
  • /opt/homebrew/opt/mise/bin/mise x -- for mise-managed tools
  • /opt/homebrew/opt/postgresql@18/bin/pg_dump for postgres tools

This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). brew services handles this automatically but those aren't tracked in ansible.


Phase 0: Foundation

Goal: Container registry + minikube cluster without disrupting existing services

Important: Tailscale Service Creation Order

WARNING: You MUST create services in the Tailscale admin console BEFORE running tailscale serve commands via ansible. If you run tailscale serve --service svc:foo before the service exists in the admin console, the local config will be in a bad state.

To fix a misconfigured service:

tailscale serve --service svc:foo reset

Then create the service in admin console and try again.


Step 0.1: Update Pulumi ACLs (BEFORE Tailscale serve)

Files to modify:

  • pulumi/policy.hujson

Changes:

  1. Add new tags to tagOwners section (around line 104, after "tag:feed"):
"tag:registry": ["autogroup:admin", "tag:blumeops"],
"tag:k8s":      ["autogroup:admin", "tag:blumeops"],
  1. Add k8s→registry grant to grants section (around line 62, in the Infrastructure section):
// k8s workloads (e.g., Woodpecker CI) can push/pull from registry
{
	"src": ["tag:k8s"],
	"dst": ["tag:registry"],
	"ip":  ["tcp:443"],
},
  1. Add test cases to tests section:
    • Update Erich's accept list (around line 111) to include registry:
    "accept": ["tag:grafana:443", "tag:kiwix:443", "tag:feed:443", "tag:loki:3100", "tag:pg:5432", "tag:homelab:22", "tag:registry:443"],
    
    • Update Allison's deny list (around line 117) to deny registry:
    "deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443"],
    
    • Add k8s test case:
    {
    	"src":    "tag:k8s",
    	"accept": ["tag:registry:443"],
    },
    

Note:

  • No member grant needed - admins have full access, members don't need registry
  • tag:k8s grant allows Woodpecker CI (and other k8s workloads) to push/pull images
  • K8s pods get Tailscale identity via the Tailscale Kubernetes Operator (Phase 1)
  • Zot supports htpasswd auth if we later need finer-grained control

Testing:

mise run tailnet-preview   # Review changes - should show new tags and k8s grant
mise run tailnet-up        # Apply changes

Step 0.2: Create Tailscale Services in Admin Console (MANUAL)

CRITICAL: Do this BEFORE running any ansible that calls tailscale serve

  1. Go to https://login.tailscale.com/admin/services
  2. Create service registry with:
    • Port: 443 (HTTPS)
    • Host: indri
  3. Apply tag tag:registry to indri if not already tagged

Verification:

# Service should appear (even if not yet serving)
tailscale status | grep registry

Step 0.3: Create Zot Registry Ansible Role

Note: Zot is NOT in homebrew (no formula or tap). Clone to ~/code/3rd/ on indri and build from source (requires Go).

Prerequisites on indri:

# Ensure Go is installed (via mise or homebrew)
ssh indri 'go version'

# Clone zot from forge mirror (not GitHub - avoids external dependency)
ssh indri 'git clone https://forge.tail8d86e.ts.net/eblume/zot.git ~/code/3rd/zot'

# Build (creates bin/zot-darwin-arm64)
ssh indri 'cd ~/code/3rd/zot && make binary'

New files:

ansible/roles/zot/
├── defaults/main.yml
├── tasks/main.yml
├── templates/
│   ├── config.json.j2
│   └── zot.plist.j2
└── handlers/main.yml

Key configuration (defaults/main.yml):

zot_repo_dir: "/Users/erichblume/code/3rd/zot"
zot_binary: "{{ zot_repo_dir }}/bin/zot-darwin-arm64"
zot_data_dir: "/Users/erichblume/zot"
zot_config_dir: "/Users/erichblume/.config/zot"
zot_port: 5000
zot_log_dir: "/Users/erichblume/Library/Logs"

# Pull-through cache registries (on-demand sync)
zot_sync_registries:
  - name: docker.io
    url: https://registry-1.docker.io
  - name: ghcr.io
    url: https://ghcr.io
  - name: quay.io
    url: https://quay.io

Zot config.json template (key sections):

{
  "storage": {
    "rootDirectory": "/Users/erichblume/zot"
  },
  "http": {
    "address": "0.0.0.0",
    "port": "5000"
  },
  "extensions": {
    "sync": {
      "enable": true,
      "registries": [
        {
          "urls": ["https://registry-1.docker.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        },
        {
          "urls": ["https://ghcr.io"],
          "content": [{"prefix": "**"}],
          "onDemand": true,
          "tlsVerify": true
        }
      ]
    }
  }
}

Two modes of operation:

  1. Pull-through cache (automatic): When you pull registry.tail8d86e.ts.net/docker.io/library/nginx:latest, Zot fetches from Docker Hub and caches locally. Subsequent pulls are local.

  2. Private images (manual push): Push your own images to any path NOT matching a sync prefix:

    # From gilbert (after building)
    podman push myapp:v1 registry.tail8d86e.ts.net/blumeops/myapp:v1
    

Namespace convention:

  • registry.tail8d86e.ts.net/docker.io/* → cached from Docker Hub
  • registry.tail8d86e.ts.net/ghcr.io/* → cached from GHCR
  • registry.tail8d86e.ts.net/blumeops/* → private images (built by you/Woodpecker)

LaunchAgent template (zot.plist.j2):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "...">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>mcquack.eblume.zot</string>
    <key>ProgramArguments</key>
    <array>
        <!-- ABSOLUTE PATH to built binary in ~/code/3rd/zot -->
        <string>{{ zot_binary }}</string>
        <string>serve</string>
        <string>{{ zot_config_dir }}/config.json</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>{{ zot_log_dir }}/mcquack.zot.out.log</string>
    <key>StandardErrorPath</key>
    <string>{{ zot_log_dir }}/mcquack.zot.err.log</string>
</dict>
</plist>

Testing (after deploying role):

# Check LaunchAgent is running
ssh indri 'launchctl list | grep zot'

# Check zot is responding
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":[]}

# Check logs for errors
ssh indri 'tail -20 ~/Library/Logs/mcquack.zot.err.log'

# Test pull-through cache (from indri)
ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest'
ssh indri 'curl -s http://localhost:5000/v2/_catalog'
# Expected: {"repositories":["docker.io/library/alpine"]}

# Test private image push (from gilbert, after Step 0.4 tailscale serve)
podman pull alpine:latest
podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1
podman push registry.tail8d86e.ts.net/blumeops/test:v1
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]}

Step 0.4: Add Zot to Tailscale Serve

Files to modify:

  • ansible/roles/tailscale_serve/defaults/main.yml

Changes:

# Add to tailscale_serve_services list
- name: svc:registry
  https:
    port: 443
    upstream: http://localhost:5000

Testing:

# Deploy tailscale serve config
mise run provision-indri -- --tags tailscale-serve

# Verify from gilbert (not indri - hairpinning doesn't work)
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":[]}

Step 0.5: Create Zot Metrics Role

New files:

ansible/roles/zot_metrics/
├── defaults/main.yml
├── tasks/main.yml
├── templates/
│   ├── zot-metrics.sh.j2
│   └── zot-metrics.plist.j2
└── handlers/main.yml

Metrics script pattern (zot-metrics.sh.j2):

#!/bin/bash
# Collect Zot registry metrics for Prometheus textfile collector
set -euo pipefail

METRICS_FILE="/opt/homebrew/var/node_exporter/textfile/zot.prom"
TEMP_FILE="${METRICS_FILE}.tmp"

# Check if zot is up
if curl -sf http://localhost:5000/v2/_catalog > /dev/null 2>&1; then
    echo "zot_up 1" > "$TEMP_FILE"
else
    echo "zot_up 0" > "$TEMP_FILE"
    mv "$TEMP_FILE" "$METRICS_FILE"
    exit 0
fi

# Get metrics from zot's metrics endpoint (if enabled)
# Add storage metrics, cache hits, etc.
# ...

mv "$TEMP_FILE" "$METRICS_FILE"

Testing:

# Deploy metrics role
mise run provision-indri -- --tags zot_metrics

# Check metrics file exists and is updated
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/zot.prom'
# Expected: zot_up 1

# Verify metrics appear in Prometheus (after a scrape cycle)
curl -s "http://indri:9090/api/v1/query?query=zot_up" | jq '.data.result[0].value[1]'
# Expected: "1"

Step 0.6: Add Zot Log Collection to Alloy

Files to modify:

  • ansible/roles/alloy/templates/config.alloy.j2

Changes: Add to the mcquack services log collection section:

// Zot registry logs
local.file_match "zot_logs" {
  path_targets = [
    {__path__ = "/Users/erichblume/Library/Logs/mcquack.zot.out.log", service = "zot", stream = "stdout"},
    {__path__ = "/Users/erichblume/Library/Logs/mcquack.zot.err.log", service = "zot", stream = "stderr"},
  ]
}

loki.source.file "zot_logs" {
  targets    = local.file_match.zot_logs.targets
  forward_to = [loki.write.local.receiver]
}

Testing:

# Deploy alloy config
mise run provision-indri -- --tags alloy

# Restart alloy to pick up changes
ssh indri 'brew services restart grafana-alloy'

# Wait a minute, then check Loki for zot logs
# In Grafana Explore, query: {service="zot"}

Step 0.7: Update indri-services-check Script

Files to modify:

  • mise-tasks/indri-services-check

Changes to add:

# Add after existing service checks (around line 55)
check_service "zot" "ssh indri 'launchctl list | grep zot | grep -v \"^-\"'"
check_service "zot-metrics" "ssh indri 'launchctl list | grep zot-metrics | grep -v \"^-\"'"

# Add to HTTP endpoints section (around line 65)
check_http "Zot Registry" "http://indri:5000/v2/_catalog"

# Add metrics file check
check_service "Zot metrics" "ssh indri 'test -f /opt/homebrew/var/node_exporter/textfile/zot.prom'"

Testing:

# Run the health check
mise run indri-services-check

# Expected output includes:
# zot...               OK
# zot-metrics...       OK
# Zot Registry...      OK
# Zot metrics...       OK

Step 0.8: Install and Configure Podman on Indri

New files:

ansible/roles/podman/
├── tasks/main.yml
└── handlers/main.yml

Tasks (tasks/main.yml):

- name: Install podman via homebrew
  community.general.homebrew:
    name: podman
    state: present

- name: Initialize podman machine (if not exists)
  ansible.builtin.command:
    cmd: podman machine init --cpus 4 --memory 8192 --disk-size 100
  register: podman_init
  changed_when: podman_init.rc == 0
  failed_when: false  # May already exist

- name: Start podman machine
  ansible.builtin.command:
    cmd: podman machine start
  register: podman_start
  changed_when: "'started' in podman_start.stdout"
  failed_when: false  # May already be running

Testing:

# Deploy podman role
mise run provision-indri -- --tags podman

# Verify podman is working
ssh indri 'podman info'
ssh indri 'podman run --rm hello-world'

Step 0.9: Install and Configure Minikube

New files:

ansible/roles/minikube/
├── defaults/main.yml
├── tasks/main.yml
└── handlers/main.yml

Defaults:

minikube_cpus: 4
minikube_memory: 8192
minikube_disk_size: "100g"
minikube_driver: podman
minikube_container_runtime: containerd

Tasks:

- name: Install minikube via homebrew
  community.general.homebrew:
    name: minikube
    state: present

- name: Check if minikube cluster exists
  ansible.builtin.command:
    cmd: minikube status --format='{{.Host}}'
  register: minikube_status
  changed_when: false
  failed_when: false

- name: Start minikube cluster
  ansible.builtin.command:
    cmd: >
      minikube start
      --driver={{ minikube_driver }}
      --container-runtime={{ minikube_container_runtime }}
      --cpus={{ minikube_cpus }}
      --memory={{ minikube_memory }}
      --disk-size={{ minikube_disk_size }}
  when: minikube_status.rc != 0 or 'Running' not in minikube_status.stdout

Testing:

# Deploy minikube role
mise run provision-indri -- --tags minikube

# Verify cluster is running
ssh indri 'minikube status'
# Expected: host: Running, kubelet: Running, apiserver: Running

# Test kubectl access from indri
ssh indri 'kubectl get nodes'
# Expected: minikube   Ready    control-plane   ...

Step 0.10: Configure Kubeconfig on Gilbert

Manual steps (kubeconfig management is complex with work configs):

# Copy minikube kubeconfig from indri
ssh indri 'cat ~/.kube/config' > /tmp/minikube-config.yaml

# Merge into local kubeconfig (careful not to overwrite work configs!)
# Option A: Use KUBECONFIG env var to include multiple files
export KUBECONFIG=~/.kube/config:~/.kube/minikube.yaml

# Option B: Manually merge contexts
kubectl config --kubeconfig=/tmp/minikube-config.yaml view --flatten > ~/.kube/minikube.yaml

# Set minikube context
kubectl config use-context minikube

# Verify connection from gilbert
kubectl get nodes

Testing:

# From gilbert, verify k8s access
kubectl cluster-info
kubectl get namespaces

# Verify k9s can connect
k9s
# Should show the minikube cluster

Step 0.11: Add Minikube to indri-services-check

Files to modify:

  • mise-tasks/indri-services-check

Changes:

# Add new section for Kubernetes
echo ""
echo "Kubernetes cluster:"
check_service "minikube" "ssh indri 'minikube status --format={{.Host}} | grep -q Running'"
check_service "k8s-apiserver" "ssh indri 'kubectl get --raw /healthz'"

Testing:

mise run indri-services-check

# Expected output includes:
# Kubernetes cluster:
# minikube...          OK
# k8s-apiserver...     OK

Step 0.12: Create Zot Grafana Dashboard

New files:

  • ansible/roles/grafana/files/dashboards/zot.json

Dashboard panels:

  • zot_up - Service availability
  • Storage usage (if zot exposes this metric)
  • Cache hit/miss rates
  • Pull/push request counts

Testing:

# Deploy dashboard
mise run provision-indri -- --tags grafana

# Verify in Grafana UI
# Navigate to Dashboards > Zot Registry

Step 0.13: Create Minikube Grafana Dashboard

New files:

  • ansible/roles/grafana/files/dashboards/minikube.json

Dashboard panels:

  • Node CPU/Memory usage
  • Pod count by namespace
  • Container restart counts
  • API server request latency

Note: This may require deploying kube-state-metrics in the cluster first:

ssh indri 'kubectl apply -f https://raw.githubusercontent.com/kubernetes/kube-state-metrics/main/examples/standard/cluster-role.yaml'
# ... additional kube-state-metrics manifests

Step 0.14: Create Zettelkasten Documentation

New files:

  • ~/code/personal/zk/zot.md
  • ~/code/personal/zk/minikube.md

Template for zot.md:

---
id: zot
aliases:
  - zot
  - container-registry
tags:
  - blumeops
---

# Zot Registry Management Log

Zot is an OCI-native container registry running on Indri, providing:
1. Pull-through cache for Docker Hub, GHCR, Quay (avoids rate limits)
2. Private image storage for custom-built containers

## Service Details

- URL: https://registry.tail8d86e.ts.net
- Local port: 5000
- Data directory: ~/zot
- Config: ~/.config/zot/config.json
- Managed via: mcquack LaunchAgent

## Namespace Convention

| Path | Source |
|------|--------|
| `registry.../docker.io/*` | Cached from Docker Hub |
| `registry.../ghcr.io/*` | Cached from GHCR |
| `registry.../quay.io/*` | Cached from Quay |
| `registry.../blumeops/*` | Private images (yours) |

## Useful Commands

\`\`\`bash
# List all images
curl -s http://localhost:5000/v2/_catalog | jq

# Pull via cache (from indri or k8s)
podman pull localhost:5000/docker.io/library/nginx:latest

# Build and push private image (from gilbert)
podman build -t registry.tail8d86e.ts.net/blumeops/myapp:v1 .
podman push registry.tail8d86e.ts.net/blumeops/myapp:v1

# Check service status
launchctl list | grep zot

# View logs
tail -f ~/Library/Logs/mcquack.zot.err.log
\`\`\`

## Log

### [DATE]
- Initial setup for k8s migration Phase 0

Step 0.15: Update Main Playbook

Files to modify:

  • ansible/playbooks/indri.yml

Changes:

# Add new roles to the roles list
- role: podman
  tags: podman
- role: zot
  tags: zot
- role: zot_metrics
  tags: zot_metrics
- role: minikube
  tags: minikube

Phase 0 Verification Checklist

Run after completing all steps:

# 1. Full service health check
mise run indri-services-check
# All services should show OK, including new ones

# 2. Registry functionality - pull-through cache
ssh indri 'podman pull localhost:5000/docker.io/library/alpine:latest'
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["docker.io/library/alpine"]}

# 3. Registry functionality - private image push (from gilbert)
podman pull alpine:latest
podman tag alpine:latest registry.tail8d86e.ts.net/blumeops/test:v1
podman push registry.tail8d86e.ts.net/blumeops/test:v1
curl -s https://registry.tail8d86e.ts.net/v2/_catalog
# Expected: {"repositories":["blumeops/test","docker.io/library/alpine"]}

# 4. Kubernetes cluster
ssh indri 'minikube status'
ssh indri 'kubectl get nodes'
kubectl get nodes  # from gilbert

# 5. Metrics in Prometheus
curl -s "http://indri:9090/api/v1/query?query=zot_up"
# Expected: value = 1

# 6. Logs in Loki
# In Grafana Explore: {service="zot"}
# Should see zot log entries

# 7. Dashboards in Grafana
# Navigate to Zot Registry dashboard - panels should have data
# Navigate to Minikube dashboard - panels should have data

# 8. k9s from gilbert
k9s
# Should connect and show minikube cluster

Phase 0 Rollback

If something goes wrong:

# Stop and remove minikube
ssh indri 'minikube stop && minikube delete'

# Stop and remove zot
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist'
ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.zot.plist'

# Remove podman machine
ssh indri 'podman machine stop && podman machine rm'

# Remove from tailscale serve
ssh indri 'tailscale serve --service svc:registry reset'

# Remove tags from Pulumi (revert policy.hujson changes)
mise run tailnet-up

# Revert ansible playbook changes
git checkout ansible/playbooks/indri.yml
git checkout ansible/roles/tailscale_serve/defaults/main.yml
git checkout ansible/roles/alloy/templates/config.alloy.j2

# Remove new roles
rm -rf ansible/roles/{zot,zot_metrics,podman,minikube}

# Remove zk cards
rm ~/code/personal/zk/{zot,minikube}.md

New Files Summary

File Purpose
ansible/roles/zot/ Zot registry deployment
ansible/roles/zot_metrics/ Metrics collection for Zot
ansible/roles/podman/ Podman installation and setup
ansible/roles/minikube/ Minikube cluster setup
ansible/roles/grafana/files/dashboards/zot.json Zot monitoring dashboard
ansible/roles/grafana/files/dashboards/minikube.json K8s monitoring dashboard
~/code/personal/zk/zot.md Zot management documentation
~/code/personal/zk/minikube.md Minikube management documentation

Modified Files Summary

File Changes
pulumi/policy.hujson Add tag:registry, tag:k8s, ACL rules
ansible/playbooks/indri.yml Add new roles
ansible/roles/tailscale_serve/defaults/main.yml Add svc:registry
ansible/roles/alloy/templates/config.alloy.j2 Add zot log collection
mise-tasks/indri-services-check Add zot and k8s checks

Phase 1: Kubernetes Infrastructure

Goal: Tailscale operator + CloudNativePG operator

Steps

  1. Create Tailscale OAuth client

    • Scopes: Devices Core, Auth Keys, Services write
    • Tag: tag:k8s-operator
    • Store in 1Password
  2. Deploy Tailscale Kubernetes Operator

    helm repo add tailscale https://pkgs.tailscale.com/helmcharts
    helm install tailscale-operator tailscale/tailscale-operator \
      --namespace tailscale-system --create-namespace \
      --set oauth.clientId=$CLIENT_ID \
      --set oauth.clientSecret=$CLIENT_SECRET
    
  3. Deploy CloudNativePG operator

    kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml
    
  4. Create PostgreSQL cluster

    apiVersion: postgresql.cnpg.io/v1
    kind: Cluster
    metadata:
      name: blumeops-pg
      namespace: databases
    spec:
      instances: 1
      storage:
        size: 10Gi
        storageClass: standard
      monitoring:
        enablePodMonitor: true
    
  5. Update Alloy config

    • Add kubernetes_sd_configs for k8s metrics
    • Scrape operator metrics

New Files

  • ansible/k8s/operators/ - Operator manifests
  • ansible/k8s/databases/ - PostgreSQL cluster

Verification

kubectl get pods -n tailscale-system
kubectl get pods -n cnpg-system
kubectl get cluster -n databases

Phase 2: Grafana Migration (Pilot)

Goal: Migrate Grafana as lowest-risk pilot service

Steps

  1. Deploy Grafana via Helm

    • Copy datasource config from existing role
    • Copy dashboards from ansible/roles/grafana/files/dashboards/
    • Point to indri Prometheus/Loki (http://indri:9090, http://indri:3100)
  2. Configure Tailscale LoadBalancer

    service:
      type: LoadBalancer
      loadBalancerClass: tailscale
    
  3. Verify all dashboards work

  4. Update tailscale_serve - remove grafana entry

  5. Stop brew grafana: brew services stop grafana

Verification


Phase 3: PostgreSQL Migration

Goal: Migrate miniflux database to CloudNativePG

Steps

  1. Create databases and users in k8s PostgreSQL

    • miniflux database/user
    • borgmatic read-only user
  2. Export from brew PostgreSQL

    pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql
    
  3. Expose k8s PostgreSQL via Tailscale

    • Service with loadBalancerClass: tailscale
    • Tag: svc:pg-k8s
  4. Import data

    psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql
    
  5. Update borgmatic config

    • Change hostname to k8s PostgreSQL
  6. Verify data integrity

Rollback

Keep brew PostgreSQL running until Phase 4 verified


Phase 4: Miniflux Migration

Goal: Migrate Miniflux to k8s

Steps

  1. Deploy Miniflux

    image: ghcr.io/miniflux/miniflux:latest
    env:
      DATABASE_URL: from secret
      RUN_MIGRATIONS: "1"
    
  2. Configure Tailscale LoadBalancer - tag: svc:feed

  3. Update Alloy log collection - add k8s namespace

  4. Verify: login, feeds refresh, API works

  5. Stop brew miniflux: brew services stop miniflux


Phase 5: devpi Migration

Goal: Migrate devpi to k8s

Steps

  1. Build devpi container

    • Dockerfile with devpi-server + devpi-web
    • Push to local Zot registry
  2. Deploy as StatefulSet

    • PVC for data (50Gi)
    • Migrate existing data (excluding PyPI cache)
  3. Configure Tailscale LoadBalancer - tag: svc:pypi

  4. Update pip.conf on gilbert

  5. Stop mcquack devpi


Phase 6: Kiwix Migration

Goal: Migrate kiwix-serve to k8s

Steps

  1. Create NFS/hostPath PV for ZIM files

    • Point to transmission download directory
    • ReadOnlyMany access
  2. Deploy Kiwix

    image: ghcr.io/kiwix/kiwix-serve:3.8.1
    args: ["/data/*.zim"]
    
  3. Configure Tailscale LoadBalancer - tag: svc:kiwix

  4. Stop mcquack kiwix-serve


Phase 7: Forgejo Migration (Highest Risk)

Goal: Migrate Forgejo to k8s

Pre-Migration Checklist

  • Full borgmatic backup verified
  • Manual backup of /opt/homebrew/var/forgejo
  • Document SSH keys and webhooks

Steps

  1. Deploy Forgejo via Helm

    helm install forgejo forgejo/forgejo \
      --namespace forgejo --create-namespace
    
  2. Migrate data

    • Stop brew forgejo
    • Copy data to PVC
    • Start k8s forgejo
  3. Configure Tailscale services

    • HTTPS 443 via LoadBalancer
    • SSH port 22 (TCP proxy)
  4. Verify all repositories accessible

Rollback

Restore brew forgejo and tailscale serve config


Phase 8: CI/CD (Woodpecker)

Goal: Deploy Woodpecker CI integrated with Forgejo

Steps

  1. Create Forgejo OAuth application

  2. Deploy Woodpecker Server + Agent

  3. Configure Tailscale LoadBalancer - tag: svc:ci

  4. Test pipeline - create .woodpecker.yaml in test repo


Phase 9: Cleanup

Goal: Remove deprecated services, harden system

Steps

  1. Stop/remove unused brew services

    • postgresql@18, grafana, miniflux, forgejo
  2. Update ansible playbook

    • Remove migrated service roles
    • Add k8s deployment references
  3. Configure Velero backups (optional)

    • Install with MinIO on sifaka
    • Schedule daily cluster backups
  4. Update zk documentation

    • New architecture
    • Runbooks
    • DR procedures

Critical Files

File Purpose
ansible/playbooks/indri.yml Main playbook - add k8s roles, remove migrated services
ansible/roles/tailscale_serve/defaults/main.yml Transition services to Tailscale operator
pulumi/policy.hujson Add tags: k8s, registry, ci
ansible/roles/borgmatic/defaults/main.yml Update PostgreSQL endpoint
mise-tasks/indri-services-check Add k8s health checks

New Directory Structure

ansible/
  k8s/
    operators/
      tailscale-operator.yaml
      cloudnative-pg.yaml
    databases/
      blumeops-pg.yaml
    apps/
      grafana/
      miniflux/
      forgejo/
      devpi/
      kiwix/
      woodpecker/
  roles/
    zot/           # NEW
    podman/        # NEW
    minikube/      # NEW

Risk Mitigation

  • Circular dependency prevention: Zot registry runs outside k8s
  • Observability: Prometheus/Loki stay on indri
  • Data loss prevention: borgmatic + manual backups before each phase
  • Recovery: Can manually push images, restore from backups

Container Images (All ARM64)

Service Image
Miniflux ghcr.io/miniflux/miniflux:latest
Forgejo codeberg.org/forgejo/forgejo:10
Grafana grafana/grafana:latest
Kiwix ghcr.io/kiwix/kiwix-serve:3.8.1
Woodpecker woodpeckerci/woodpecker-server
Zot ghcr.io/project-zot/zot-linux-arm64