P5.1: Migrate minikube from podman to QEMU2 driver #38

Merged
eblume merged 16 commits from feature/p5.1-qemu2-migration into main 2026-01-21 16:03:38 -08:00
Showing only changes of commit 75f945385c - Show all commits

Update P5.1 plan with completion status and P6 storage options

- Document completed steps (docker driver working, kubectl access, ansible updated)
- Add detailed analysis of volume mount options for P6
- Recommend hostPath via Docker Desktop file sharing as simplest approach
- Document why direct NFS won't work (Docker network isolation)
- Include sample LaunchDaemon for persistent NFS mount

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Erich Blume 2026-01-21 14:05:26 -08:00

View file

@ -2,7 +2,7 @@
**Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts **Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts
**Status**: In Progress (2026-01-21) **Status**: In Progress (2026-01-21) - Ansible roles updated, cluster running, awaiting ArgoCD redeploy
**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete **Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
@ -38,269 +38,232 @@ Additionally, the volume mount solution with QEMU2 was complex:
The **docker driver** solves both problems: The **docker driver** solves both problems:
1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` will work (like podman did) 1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` works
2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers, and minikube (running in Docker) can use those paths via hostPath. 2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers.
3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver. 3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver.
--- ---
## Prerequisites ## Implementation Progress
### 1. Install Docker Desktop (Manual - Before Ansible) ### Completed ✅
Docker Desktop requires GUI setup, so install manually first: 1. **Docker Desktop installed** (manual via `brew install --cask docker`)
- Configured with 12GB memory in Docker Desktop settings
- Kubernetes option disabled (using minikube instead)
```bash 2. **QEMU2 minikube deleted** (`minikube stop && minikube delete`)
# On indri:
brew install --cask docker-desktop
# Then launch Docker Desktop from /Applications 3. **Docker minikube cluster created**:
# Complete the setup wizard (accept license, skip tutorial)
# Wait for Docker to be "Running" (green icon in menu bar)
# Verify:
docker version
docker run hello-world
```
**File Sharing Configuration** (in Docker Desktop → Settings → Resources → File sharing):
- Ensure `/Volumes` is shared (for future NFS mounts from sifaka)
- Or add specific paths as needed for P6
### 2. Stop Current QEMU2 Minikube
```bash
# On indri:
minikube stop
minikube delete
# Verify QEMU resources are cleaned up
ps aux | grep qemu
```
---
## Plan
### 1. Update Ansible Role for Docker Driver
**Changes to `ansible/roles/minikube/defaults/main.yml`:**
```yaml
# Change from:
minikube_driver: qemu2
minikube_network: socket_vmnet
minikube_container_runtime: containerd
# To:
minikube_driver: docker
minikube_container_runtime: docker # or containerd, both work
```
**Remove from defaults:**
- `minikube_network` (not needed for docker driver)
**Changes to `ansible/roles/minikube/tasks/main.yml`:**
- Remove qemu installation
- Remove socket_vmnet installation and service management
- Remove NFS mount point creation
- Remove NFS LaunchDaemon installation
- Remove minikube mount LaunchAgent installation
- Keep containerd registry mirror config (adapting for docker if needed)
**Remove files from `ansible/roles/minikube/files/`:**
- `com.blumeops.nfs-torrents.plist`
- `com.blumeops.minikube-mount.plist`
**Changes to `ansible/roles/minikube/handlers/main.yml`:**
- Remove `Load NFS mount LaunchDaemon`
- Remove `Load minikube mount LaunchAgent`
**Add to Brewfile:**
```ruby
cask "docker" # Docker Desktop
```
### 2. Update Tailscale Serve Configuration
**Changes to `ansible/roles/tailscale_serve/defaults/main.yml`:**
```yaml
# Change svc:k8s upstream from VM IP back to localhost:
- name: svc:k8s
tcp:
port: 443
upstream: tcp://localhost:PORT # PORT will be dynamic, see below
```
**Note on API server port**: With the docker driver, the API server port is dynamic (assigned by minikube). We need to either:
- Use `--apiserver-port=6443` to fix it
- Or query and update the config after cluster creation
### 3. Create Docker Minikube Cluster
```bash
# On indri (after Docker Desktop is running):
minikube start \
--driver=docker \
--cpus=6 \
--memory=12288 \
--disk-size=200g \
--apiserver-names=k8s.tail8d86e.ts.net,indri \
--apiserver-port=6443 \
--listen-address=0.0.0.0
# Verify cluster
minikube status
kubectl get nodes
```
### 4. Verify API Server is on Localhost
```bash
# Check what port the API server is on
kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
# Should show https://127.0.0.1:PORT or similar
# Verify local access works
curl -k https://localhost:6443/healthz
# Should return "ok"
```
### 5. Update 1Password Credentials
After cluster recreation, update the credentials in 1Password:
```bash
# On indri, get the new certificates:
cat ~/.minikube/profiles/minikube/client.crt
cat ~/.minikube/profiles/minikube/client.key
cat ~/.minikube/ca.crt
```
Update in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `3jo4f2hnzvwfmamudfsbbbec7e`).
### 6. Update Kubeconfig on Gilbert
```bash
# Fetch new CA cert from 1Password
op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 3jo4f2hnzvwfmamudfsbbbec7e --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt
```
### 7. Configure Tailscale Serve for K8s
```bash
# On indri:
tailscale serve --service="svc:k8s" --tcp=443 tcp://localhost:6443
```
### 8. Verify Remote Access
```bash
# From gilbert:
curl -k --connect-timeout 5 https://k8s.tail8d86e.ts.net/healthz
# Should return "ok"
kubectl --context=minikube-indri get nodes
# Should show the minikube node
```
### 9. Redeploy ArgoCD and Apps
Since this is a cluster recreation, we need to re-bootstrap:
```bash
# On indri - apply secrets first
op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
# Create repo secret for ArgoCD
PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n'
kubectl create namespace argocd
kubectl create secret generic repo-forge -n argocd \
--from-literal=type=git \
--from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \
--from-literal=insecure=true \
--from-literal=sshPrivateKey="$PRIV_KEY"
kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository
# Bootstrap operators
kubectl create namespace tailscale
kubectl apply -k argocd/manifests/tailscale-operator/
kubectl apply -k argocd/manifests/argocd/
# Wait for ArgoCD
kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
# Login and sync apps
argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
argocd app sync apps
argocd app sync tailscale-operator
argocd app sync cloudnative-pg
argocd app sync blumeops-pg
argocd app sync grafana
argocd app sync grafana-config
argocd app sync miniflux
argocd app sync devpi
```
### 10. Verify All Services
```bash
mise run indri-services-check
argocd app list
kubectl get pods --all-namespaces
```
---
## Volume Mounts for P6 (Kiwix/Transmission)
With the docker driver, volume mounts work differently than QEMU2:
**Option A: Docker Desktop File Sharing + hostPath**
1. Mount sifaka NFS share on indri: `/Volumes/torrents`
2. Add `/Volumes/torrents` to Docker Desktop file sharing
3. Pods use hostPath pointing to that path
**Option B: NFS directly from pods**
- Docker containers can make NFS mounts (unlike podman's rootless containers)
- May need to test if sifaka allows connections from the Docker network
This will be fully tested in Phase 6.
---
## Cleanup
After successful migration:
1. **Remove QEMU2 artifacts:**
```bash ```bash
brew uninstall qemu socket_vmnet minikube start \
--driver=docker \
--container-runtime=docker \
--cpus=6 \
--memory=11264 \
--disk-size=200g \
--apiserver-names=k8s.tail8d86e.ts.net,indri \
--apiserver-port=6443 \
--listen-address=0.0.0.0
```
Note: Memory set to 11264MB (11GB) to leave headroom for Docker Desktop overhead.
4. **Tailscale serve configured** for k8s API:
- API server on localhost:50820 (port is dynamic with docker driver)
- `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:50820`
5. **Remote kubectl access working** from gilbert:
- Created `mise-tasks/ensure-minikube-indri-kubectl-config` script
- Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml`
- `kubectl --context=minikube-indri get nodes` works
6. **Ansible roles updated**:
- `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet
- `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port)
- Containerd registry mirrors configured for zot pull-through cache
7. **QEMU2 artifacts cleaned up**:
- Stopped socket_vmnet service
- Removed NFS LaunchDaemon
- Removed minikube mount LaunchAgent
- kubectl still works after cleanup
### Remaining 📋
1. **Redeploy ArgoCD and apps** - bootstrap the cluster with:
```bash
# On indri - apply secrets first
op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
# Create repo secret for ArgoCD
PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n'
kubectl create namespace argocd
kubectl create secret generic repo-forge -n argocd \
--from-literal=type=git \
--from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \
--from-literal=insecure=true \
--from-literal=sshPrivateKey="$PRIV_KEY"
kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository
# Bootstrap operators
kubectl create namespace tailscale
kubectl apply -k argocd/manifests/tailscale-operator/
kubectl apply -k argocd/manifests/argocd/
# Wait for ArgoCD
kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
# Login and sync apps
argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
argocd app sync apps
argocd app sync tailscale-operator
argocd app sync cloudnative-pg
argocd app sync blumeops-pg
argocd app sync grafana
argocd app sync grafana-config
argocd app sync miniflux
argocd app sync devpi
``` ```
2. **Remove podman if no longer needed:** 2. **Verify all services** with `mise run indri-services-check`
```bash
podman machine stop 3. **Configure containerd registry mirrors** (will be done by ansible on next provision)
podman machine rm
brew uninstall podman ---
```
## Technical Notes
### API Server Port
With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container. Current port: 50820.
The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly.
### Registry Mirror Configuration
Containerd uses `/etc/containerd/certs.d/<registry>/hosts.toml` files:
```toml
# /etc/containerd/certs.d/docker.io/hosts.toml
server = "https://registry-1.docker.io"
[host."http://host.minikube.internal:5050"]
capabilities = ["pull", "resolve"]
skip_verify = true
```
The ansible role configures mirrors for:
- `registry.tail8d86e.ts.net` (private images)
- `docker.io`
- `ghcr.io`
- `quay.io`
### Volume Mounts for P6 (Kiwix/Transmission)
With the docker driver, volume mounts work differently than podman or qemu2. Here's the analysis:
**Current Network State:**
- Minikube container is on Docker network `192.168.49.0/24`
- Sifaka NFS exports `/volume1/torrents` to:
- `192.168.105.0/24` (old qemu2 VM network - no longer used)
- `100.64.0.0/10` (Tailscale CGNAT range)
- Minikube can resolve `sifaka` (192.168.1.203) but can't reach it (100% packet loss due to Docker network isolation)
**Option A: hostPath via Docker Desktop File Sharing** ⭐ RECOMMENDED
1. Mount sifaka NFS share on indri macOS: `mount -t nfs sifaka:/volume1/torrents /Volumes/torrents`
2. Docker Desktop file sharing exposes `/Volumes` into the Docker VM
3. Pods use hostPath to access `/Volumes/torrents`
Pros:
- Simplest approach, uses native Docker file sharing
- No network reconfiguration needed on sifaka
- Path is stable and predictable
Cons:
- Requires persistent NFS mount on indri (LaunchDaemon)
- File sharing performance may be slower than direct NFS
Implementation:
```bash
# Manual mount test
ssh indri 'sudo mkdir -p /Volumes/torrents && sudo mount -t nfs -o resvport,rw sifaka:/volume1/torrents /Volumes/torrents'
# Verify Docker can see it
ssh indri 'docker run --rm -v /Volumes/torrents:/data alpine ls /data'
# Pod manifest uses hostPath:
# volumes:
# - name: torrents
# hostPath:
# path: /Volumes/torrents
# type: Directory
```
**Option B: Update sifaka NFS exports for Docker network**
1. Add `192.168.49.0/24` to sifaka's NFS exports
2. Pods mount NFS directly using kubernetes NFS volume type
Cons:
- Docker network might change (though `192.168.49.x` seems stable for minikube)
- Requires sifaka configuration change
- NFS mount from inside container may have permission issues
**Option C: Tailscale sidecar for NFS access**
1. Pods include a Tailscale sidecar that joins the tailnet
2. Mount NFS via Tailscale IP (sifaka is at 100.x.x.x)
Cons:
- Complex setup with sidecar containers
- Each pod needs Tailscale auth
- Overkill for this use case
**Recommendation for P6:**
Use **Option A** (hostPath via Docker Desktop file sharing). It's the simplest and most reliable approach. We'll need a LaunchDaemon for the NFS mount, but it's straightforward:
```xml
<!-- /Library/LaunchDaemons/com.blumeops.nfs-torrents.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.blumeops.nfs-torrents</string>
<key>ProgramArguments</key>
<array>
<string>/sbin/mount</string>
<string>-t</string>
<string>nfs</string>
<string>-o</string>
<string>resvport,rw</string>
<string>sifaka:/volume1/torrents</string>
<string>/Volumes/torrents</string>
</array>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
```
This is simpler than the qemu2 approach because there's no intermediate `minikube mount` step - Docker Desktop handles the path passthrough automatically.
--- ---
## Verification Checklist ## Verification Checklist
- [ ] Docker Desktop installed and running on indri - [x] Docker Desktop installed and running on indri
- [ ] QEMU2 minikube deleted - [x] QEMU2 minikube deleted
- [ ] Docker minikube running - [x] Docker minikube running (6 CPUs, 11GB RAM)
- [ ] API server accessible on localhost:6443 - [x] API server accessible on localhost:50820
- [ ] Tailscale serve configured for svc:k8s → localhost:6443 - [x] Tailscale serve configured for svc:k8s → localhost:50820
- [ ] Remote kubectl access working from gilbert - [x] Remote kubectl access working from gilbert
- [x] Ansible roles updated for docker driver
- [x] socket_vmnet stopped
- [ ] ArgoCD redeployed and synced - [ ] ArgoCD redeployed and synced
- [ ] All existing apps healthy (grafana, miniflux, devpi, etc.) - [ ] All existing apps healthy (grafana, miniflux, devpi, etc.)
- [ ] PostgreSQL cluster healthy - [ ] PostgreSQL cluster healthy
- [ ] Containerd registry mirrors configured
- [ ] `mise run indri-services-check` passes - [ ] `mise run indri-services-check` passes
--- ---
@ -312,29 +275,3 @@ If Docker driver doesn't work:
1. Delete Docker minikube: `minikube delete` 1. Delete Docker minikube: `minikube delete`
2. Recreate QEMU2 cluster (restore old ansible config from git) 2. Recreate QEMU2 cluster (restore old ansible config from git)
3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl 3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl
---
## Notes
- Docker Desktop has resource overhead but provides better macOS integration
- The docker driver is more widely used and tested than qemu2
- File sharing permissions may need adjustment in Docker Desktop settings
- First cluster start may be slow as Docker pulls the minikube base image
## Implementation Notes (2026-01-21)
### QEMU2 Cleanup Done
Removed from indri:
- `/Library/LaunchDaemons/com.blumeops.nfs-torrents.plist` - NFS mount daemon
- `~/Library/LaunchAgents/com.blumeops.minikube-mount.plist` - minikube mount agent
- Unmounted `/Volumes/torrents-nfs` NFS mount
- Removed `/Volumes/torrents-nfs` mount point
### Previous QEMU2 Issues
The QEMU2 migration partially worked but had a critical issue:
- Volume mounts worked via NFS → indri → minikube mount chain
- But Tailscale TCP proxy to VM IP (192.168.105.2:6443) failed with TLS timeout
- Root cause unknown - TCP connected but TLS handshake never completed