blumeops/plans/k8s-migration/P5.1_docker_migration.md

208 lines
6.8 KiB
Markdown
Raw Normal View History

# Phase 5.1: Migrate Minikube from QEMU2 to Docker Driver
**Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts
**Status**: Complete (2026-01-21) - Cluster running, ArgoCD deployed, apps synced
**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
---
## Background
### Original Problem (Podman → QEMU2)
During Phase 6 (Kiwix/Transmission migration), we discovered that the **podman driver has fundamental limitations** that prevent mounting external volumes:
1. **SMB CSI driver fails** with "Operation not permitted" - the rootless container lacks kernel-level mount capabilities
2. **`minikube mount` fails** - 9p mount gets "permission denied" inside the podman VM
3. **hostPath volumes** only work for paths inside the minikube container, not the macOS host
We migrated to QEMU2 to get a full VM with kernel capabilities.
### New Problem (QEMU2 → Docker)
The QEMU2 driver introduced a **new problem**: the Kubernetes API server is inside the VM at `192.168.105.2:6443`, and Tailscale's TCP proxy cannot forward to it properly:
- TCP connections succeed (nc -zv works)
- TLS handshake times out
- Root cause unknown, but likely related to Tailscale serve's handling of non-localhost upstreams
Additionally, the volume mount solution with QEMU2 was complex:
- Required NFS mount from sifaka → indri
- Then `minikube mount` to pass through to VM
- Two LaunchAgents/LaunchDaemons for persistence
- macOS GUI approval required for network access
### Why Docker?
The **docker driver** solves both problems:
1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` works
2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers.
3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver.
---
## Implementation Summary
### Infrastructure Changes
1. **Docker Desktop installed** (manual via `brew install --cask docker`)
- Configured with 12GB memory in Docker Desktop settings
- Kubernetes option disabled (using minikube instead)
2. **Docker minikube cluster created**:
```bash
minikube start \
--driver=docker \
--container-runtime=docker \
--cpus=6 \
--memory=11264 \
--disk-size=200g \
--apiserver-names=k8s.tail8d86e.ts.net,indri \
--apiserver-port=6443 \
--listen-address=0.0.0.0
```
3. **Tailscale serve configured** for k8s API:
- API server on localhost (port is dynamic with docker driver)
- `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:<PORT>`
4. **Remote kubectl access working** from gilbert:
- Created `mise-tasks/ensure-minikube-indri-kubectl-config` script
- Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml`
### Ansible Roles Updated
- `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet
- `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port)
- Containerd registry mirrors configured for zot pull-through cache
### ArgoCD Bootstrap
All apps deployed and synced from `feature/p5.1-qemu2-migration` branch:
| App | Status | Notes |
|-----|--------|-------|
| tailscale-operator | Healthy | Manages Tailscale ingresses |
| argocd | Healthy | Self-managed |
| cloudnative-pg | Healthy | PostgreSQL operator |
| blumeops-pg | Progressing | PostgreSQL cluster starting |
| grafana | Progressing | Needs grafana-admin secret |
| grafana-config | Healthy | Dashboards and ingress |
| miniflux | Progressing | Needs miniflux-config secret |
| devpi | Progressing | Starting up |
### Secrets Still Needed
After PR merge, apply these secrets manually:
```bash
# Grafana admin password
op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | kubectl --context=minikube-indri apply -f -
# Miniflux config
op inject -i argocd/manifests/miniflux/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -
```
---
## Technical Notes
### API Server Port
With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container.
The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly.
### Registry Mirror Configuration
Containerd uses `/etc/containerd/certs.d/<registry>/hosts.toml` files. The ansible role configures mirrors for:
- `registry.tail8d86e.ts.net` (private images)
- `docker.io`
- `ghcr.io`
- `quay.io`
### ProxyClass Renamed
Changed from `crio-compat` to `default` - the old name was misleading since we're no longer using CRI-O.
### Volume Mounts for P6 (Kiwix/Transmission)
**Solution: Direct NFS from pods to sifaka** ✅ TESTED AND WORKING
Docker NATs outbound traffic through indri's LAN IP (192.168.1.50), so sifaka's NFS exports need to allow `192.168.1.0/24`.
Sifaka NFS exports configured:
- `192.168.1.0/24` - Docker containers via indri NAT
- `100.64.0.0/10` - Tailscale clients
Pods can mount NFS directly:
```yaml
volumes:
- name: torrents
nfs:
server: sifaka
path: /volume1/torrents
```
No LaunchAgents, no `minikube mount`, no SMB CSI driver needed.
---
## Verification Checklist
- [x] Docker Desktop installed and running on indri
- [x] QEMU2 minikube deleted
- [x] Docker minikube running (6 CPUs, 11GB RAM)
- [x] API server accessible on localhost
- [x] Tailscale serve configured for svc:k8s
- [x] Remote kubectl access working from gilbert
- [x] Ansible roles updated for docker driver
- [x] socket_vmnet stopped
- [x] ArgoCD deployed and synced
- [x] All apps synced to feature branch
- [x] Apply app secrets (grafana-admin, miniflux-db, devpi-root, eblume, borgmatic)
- [x] Verify all apps healthy after secrets applied
- [x] Miniflux database restored from borgmatic backup
- [ ] Merge PR and reset apps to main branch
- [ ] `mise run indri-services-check` passes
---
## Post-Merge Steps
After PR is merged:
```bash
# Reset all blumeops apps to main branch
argocd app set apps --revision main
argocd app set argocd --revision main
argocd app set blumeops-pg --revision main
argocd app set devpi --revision main
argocd app set grafana-config --revision main
argocd app set miniflux --revision main
argocd app set tailscale-operator --revision main
# Sync all apps
argocd app sync apps
argocd app sync argocd
argocd app sync tailscale-operator
argocd app sync blumeops-pg
argocd app sync grafana-config
argocd app sync miniflux
argocd app sync devpi
```
---
## Rollback Plan
If Docker driver doesn't work:
1. Delete Docker minikube: `minikube delete`
2. Recreate QEMU2 cluster (restore old ansible config from git)
3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl