blumeops/plans/k8s-migration/P5.1_docker_migration.md
Erich Blume 21848a7919 P5.1: Migrate minikube from podman to QEMU2 driver (#38)
## Summary
- Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support
- Update ansible minikube role with qemu installation and containerd runtime
- Remove podman role dependency from indri.yml
- Add synology user creation steps and post-migration zot reconfiguration notes

## Why
Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support.

## Deployment and Testing
- [ ] Create k8s-storage user on Synology DSM
- [ ] Store credentials in 1Password (synology-k8s-storage)
- [ ] Export current k8s state
- [ ] Stop and delete podman-based minikube cluster
- [ ] Run ansible to create QEMU2 cluster
- [ ] Test NFS volume mount with test pod
- [ ] Redeploy ArgoCD and all apps
- [ ] Verify all services healthy
- [ ] Reconfigure zot registry mirrors for containerd (post-migration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38
2026-01-21 16:03:37 -08:00

6.8 KiB

Phase 5.1: Migrate Minikube from QEMU2 to Docker Driver

Goal: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts

Status: Complete (2026-01-21) - Cluster running, ArgoCD deployed, apps synced

Prerequisites: Phase 5 complete


Background

Original Problem (Podman → QEMU2)

During Phase 6 (Kiwix/Transmission migration), we discovered that the podman driver has fundamental limitations that prevent mounting external volumes:

  1. SMB CSI driver fails with "Operation not permitted" - the rootless container lacks kernel-level mount capabilities
  2. minikube mount fails - 9p mount gets "permission denied" inside the podman VM
  3. hostPath volumes only work for paths inside the minikube container, not the macOS host

We migrated to QEMU2 to get a full VM with kernel capabilities.

New Problem (QEMU2 → Docker)

The QEMU2 driver introduced a new problem: the Kubernetes API server is inside the VM at 192.168.105.2:6443, and Tailscale's TCP proxy cannot forward to it properly:

  • TCP connections succeed (nc -zv works)
  • TLS handshake times out
  • Root cause unknown, but likely related to Tailscale serve's handling of non-localhost upstreams

Additionally, the volume mount solution with QEMU2 was complex:

  • Required NFS mount from sifaka → indri
  • Then minikube mount to pass through to VM
  • Two LaunchAgents/LaunchDaemons for persistence
  • macOS GUI approval required for network access

Why Docker?

The docker driver solves both problems:

  1. API Server on localhost: Docker Desktop handles port forwarding from container to localhost automatically, so tailscale serve --tcp=443 tcp://localhost:PORT works

  2. Simpler volume mounts: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers.

  3. Official Tailscale recommendation: Tailscale's own Kubernetes guide uses minikube with the docker driver.


Implementation Summary

Infrastructure Changes

  1. Docker Desktop installed (manual via brew install --cask docker)

    • Configured with 12GB memory in Docker Desktop settings
    • Kubernetes option disabled (using minikube instead)
  2. Docker minikube cluster created:

    minikube start \
      --driver=docker \
      --container-runtime=docker \
      --cpus=6 \
      --memory=11264 \
      --disk-size=200g \
      --apiserver-names=k8s.tail8d86e.ts.net,indri \
      --apiserver-port=6443 \
      --listen-address=0.0.0.0
    
  3. Tailscale serve configured for k8s API:

    • API server on localhost (port is dynamic with docker driver)
    • tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:<PORT>
  4. Remote kubectl access working from gilbert:

    • Created mise-tasks/ensure-minikube-indri-kubectl-config script
    • Fetches certs from indri and sets up ~/.kube/minikube-indri/config.yml

Ansible Roles Updated

  • ansible/roles/minikube/ - docker driver, removed qemu2/NFS/socket_vmnet
  • ansible/roles/tailscale_serve/ - removed svc:k8s (minikube role handles dynamic port)
  • Containerd registry mirrors configured for zot pull-through cache

ArgoCD Bootstrap

All apps deployed and synced from feature/p5.1-qemu2-migration branch:

App Status Notes
tailscale-operator Healthy Manages Tailscale ingresses
argocd Healthy Self-managed
cloudnative-pg Healthy PostgreSQL operator
blumeops-pg Progressing PostgreSQL cluster starting
grafana Progressing Needs grafana-admin secret
grafana-config Healthy Dashboards and ingress
miniflux Progressing Needs miniflux-config secret
devpi Progressing Starting up

Secrets Still Needed

After PR merge, apply these secrets manually:

# Grafana admin password
op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | kubectl --context=minikube-indri apply -f -

# Miniflux config
op inject -i argocd/manifests/miniflux/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -

Technical Notes

API Server Port

With docker driver, the API server port is dynamic - Docker maps a random host port to 6443 inside the container.

The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly.

Registry Mirror Configuration

Containerd uses /etc/containerd/certs.d/<registry>/hosts.toml files. The ansible role configures mirrors for:

  • registry.tail8d86e.ts.net (private images)
  • docker.io
  • ghcr.io
  • quay.io

ProxyClass Renamed

Changed from crio-compat to default - the old name was misleading since we're no longer using CRI-O.

Volume Mounts for P6 (Kiwix/Transmission)

Solution: Direct NFS from pods to sifaka TESTED AND WORKING

Docker NATs outbound traffic through indri's LAN IP (192.168.1.50), so sifaka's NFS exports need to allow 192.168.1.0/24.

Sifaka NFS exports configured:

  • 192.168.1.0/24 - Docker containers via indri NAT
  • 100.64.0.0/10 - Tailscale clients

Pods can mount NFS directly:

volumes:
  - name: torrents
    nfs:
      server: sifaka
      path: /volume1/torrents

No LaunchAgents, no minikube mount, no SMB CSI driver needed.


Verification Checklist

  • Docker Desktop installed and running on indri
  • QEMU2 minikube deleted
  • Docker minikube running (6 CPUs, 11GB RAM)
  • API server accessible on localhost
  • Tailscale serve configured for svc:k8s
  • Remote kubectl access working from gilbert
  • Ansible roles updated for docker driver
  • socket_vmnet stopped
  • ArgoCD deployed and synced
  • All apps synced to feature branch
  • Apply app secrets (grafana-admin, miniflux-db, devpi-root, eblume, borgmatic)
  • Verify all apps healthy after secrets applied
  • Miniflux database restored from borgmatic backup
  • Merge PR and reset apps to main branch
  • mise run indri-services-check passes

Post-Merge Steps

After PR is merged:

# Reset all blumeops apps to main branch
argocd app set apps --revision main
argocd app set argocd --revision main
argocd app set blumeops-pg --revision main
argocd app set devpi --revision main
argocd app set grafana-config --revision main
argocd app set miniflux --revision main
argocd app set tailscale-operator --revision main

# Sync all apps
argocd app sync apps
argocd app sync argocd
argocd app sync tailscale-operator
argocd app sync blumeops-pg
argocd app sync grafana-config
argocd app sync miniflux
argocd app sync devpi

Rollback Plan

If Docker driver doesn't work:

  1. Delete Docker minikube: minikube delete
  2. Recreate QEMU2 cluster (restore old ansible config from git)
  3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl