P5.1: Migrate minikube from podman to QEMU2 driver #38

Merged
eblume merged 16 commits from feature/p5.1-qemu2-migration into main 2026-01-21 16:03:38 -08:00
Showing only changes of commit 75f945385c - Show all commits

Update P5.1 plan with completion status and P6 storage options

- Document completed steps (docker driver working, kubectl access, ansible updated)
- Add detailed analysis of volume mount options for P6
- Recommend hostPath via Docker Desktop file sharing as simplest approach
- Document why direct NFS won't work (Docker network isolation)
- Include sample LaunchDaemon for persistent NFS mount

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Erich Blume 2026-01-21 14:05:26 -08:00

View file

@ -2,7 +2,7 @@
**Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts **Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts
**Status**: In Progress (2026-01-21) **Status**: In Progress (2026-01-21) - Ansible roles updated, cluster running, awaiting ArgoCD redeploy
**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete **Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
@ -38,269 +38,232 @@ Additionally, the volume mount solution with QEMU2 was complex:
The **docker driver** solves both problems: The **docker driver** solves both problems:
1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` will work (like podman did) 1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` works
2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers, and minikube (running in Docker) can use those paths via hostPath. 2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers.
3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver. 3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver.
--- ---
## Prerequisites ## Implementation Progress
### 1. Install Docker Desktop (Manual - Before Ansible) ### Completed ✅
Docker Desktop requires GUI setup, so install manually first: 1. **Docker Desktop installed** (manual via `brew install --cask docker`)
- Configured with 12GB memory in Docker Desktop settings
- Kubernetes option disabled (using minikube instead)
```bash 2. **QEMU2 minikube deleted** (`minikube stop && minikube delete`)
# On indri:
brew install --cask docker-desktop
# Then launch Docker Desktop from /Applications 3. **Docker minikube cluster created**:
# Complete the setup wizard (accept license, skip tutorial) ```bash
# Wait for Docker to be "Running" (green icon in menu bar) minikube start \
# Verify:
docker version
docker run hello-world
```
**File Sharing Configuration** (in Docker Desktop → Settings → Resources → File sharing):
- Ensure `/Volumes` is shared (for future NFS mounts from sifaka)
- Or add specific paths as needed for P6
### 2. Stop Current QEMU2 Minikube
```bash
# On indri:
minikube stop
minikube delete
# Verify QEMU resources are cleaned up
ps aux | grep qemu
```
---
## Plan
### 1. Update Ansible Role for Docker Driver
**Changes to `ansible/roles/minikube/defaults/main.yml`:**
```yaml
# Change from:
minikube_driver: qemu2
minikube_network: socket_vmnet
minikube_container_runtime: containerd
# To:
minikube_driver: docker
minikube_container_runtime: docker # or containerd, both work
```
**Remove from defaults:**
- `minikube_network` (not needed for docker driver)
**Changes to `ansible/roles/minikube/tasks/main.yml`:**
- Remove qemu installation
- Remove socket_vmnet installation and service management
- Remove NFS mount point creation
- Remove NFS LaunchDaemon installation
- Remove minikube mount LaunchAgent installation
- Keep containerd registry mirror config (adapting for docker if needed)
**Remove files from `ansible/roles/minikube/files/`:**
- `com.blumeops.nfs-torrents.plist`
- `com.blumeops.minikube-mount.plist`
**Changes to `ansible/roles/minikube/handlers/main.yml`:**
- Remove `Load NFS mount LaunchDaemon`
- Remove `Load minikube mount LaunchAgent`
**Add to Brewfile:**
```ruby
cask "docker" # Docker Desktop
```
### 2. Update Tailscale Serve Configuration
**Changes to `ansible/roles/tailscale_serve/defaults/main.yml`:**
```yaml
# Change svc:k8s upstream from VM IP back to localhost:
- name: svc:k8s
tcp:
port: 443
upstream: tcp://localhost:PORT # PORT will be dynamic, see below
```
**Note on API server port**: With the docker driver, the API server port is dynamic (assigned by minikube). We need to either:
- Use `--apiserver-port=6443` to fix it
- Or query and update the config after cluster creation
### 3. Create Docker Minikube Cluster
```bash
# On indri (after Docker Desktop is running):
minikube start \
--driver=docker \ --driver=docker \
--container-runtime=docker \
--cpus=6 \ --cpus=6 \
--memory=12288 \ --memory=11264 \
--disk-size=200g \ --disk-size=200g \
--apiserver-names=k8s.tail8d86e.ts.net,indri \ --apiserver-names=k8s.tail8d86e.ts.net,indri \
--apiserver-port=6443 \ --apiserver-port=6443 \
--listen-address=0.0.0.0 --listen-address=0.0.0.0
```
Note: Memory set to 11264MB (11GB) to leave headroom for Docker Desktop overhead.
# Verify cluster 4. **Tailscale serve configured** for k8s API:
minikube status - API server on localhost:50820 (port is dynamic with docker driver)
kubectl get nodes - `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:50820`
```
### 4. Verify API Server is on Localhost 5. **Remote kubectl access working** from gilbert:
- Created `mise-tasks/ensure-minikube-indri-kubectl-config` script
- Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml`
- `kubectl --context=minikube-indri get nodes` works
```bash 6. **Ansible roles updated**:
# Check what port the API server is on - `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet
kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}" - `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port)
# Should show https://127.0.0.1:PORT or similar - Containerd registry mirrors configured for zot pull-through cache
# Verify local access works 7. **QEMU2 artifacts cleaned up**:
curl -k https://localhost:6443/healthz - Stopped socket_vmnet service
# Should return "ok" - Removed NFS LaunchDaemon
``` - Removed minikube mount LaunchAgent
- kubectl still works after cleanup
### 5. Update 1Password Credentials ### Remaining 📋
After cluster recreation, update the credentials in 1Password: 1. **Redeploy ArgoCD and apps** - bootstrap the cluster with:
```bash
# On indri - apply secrets first
op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
```bash # Create repo secret for ArgoCD
# On indri, get the new certificates: PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n'
cat ~/.minikube/profiles/minikube/client.crt kubectl create namespace argocd
cat ~/.minikube/profiles/minikube/client.key kubectl create secret generic repo-forge -n argocd \
cat ~/.minikube/ca.crt
```
Update in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `3jo4f2hnzvwfmamudfsbbbec7e`).
### 6. Update Kubeconfig on Gilbert
```bash
# Fetch new CA cert from 1Password
op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 3jo4f2hnzvwfmamudfsbbbec7e --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt
```
### 7. Configure Tailscale Serve for K8s
```bash
# On indri:
tailscale serve --service="svc:k8s" --tcp=443 tcp://localhost:6443
```
### 8. Verify Remote Access
```bash
# From gilbert:
curl -k --connect-timeout 5 https://k8s.tail8d86e.ts.net/healthz
# Should return "ok"
kubectl --context=minikube-indri get nodes
# Should show the minikube node
```
### 9. Redeploy ArgoCD and Apps
Since this is a cluster recreation, we need to re-bootstrap:
```bash
# On indri - apply secrets first
op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
# Create repo secret for ArgoCD
PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n'
kubectl create namespace argocd
kubectl create secret generic repo-forge -n argocd \
--from-literal=type=git \ --from-literal=type=git \
--from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \ --from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \
--from-literal=insecure=true \ --from-literal=insecure=true \
--from-literal=sshPrivateKey="$PRIV_KEY" --from-literal=sshPrivateKey="$PRIV_KEY"
kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository
# Bootstrap operators # Bootstrap operators
kubectl create namespace tailscale kubectl create namespace tailscale
kubectl apply -k argocd/manifests/tailscale-operator/ kubectl apply -k argocd/manifests/tailscale-operator/
kubectl apply -k argocd/manifests/argocd/ kubectl apply -k argocd/manifests/argocd/
# Wait for ArgoCD # Wait for ArgoCD
kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
# Login and sync apps # Login and sync apps
argocd login argocd.tail8d86e.ts.net --username admin --grpc-web argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
argocd app sync apps argocd app sync apps
argocd app sync tailscale-operator argocd app sync tailscale-operator
argocd app sync cloudnative-pg argocd app sync cloudnative-pg
argocd app sync blumeops-pg argocd app sync blumeops-pg
argocd app sync grafana argocd app sync grafana
argocd app sync grafana-config argocd app sync grafana-config
argocd app sync miniflux argocd app sync miniflux
argocd app sync devpi argocd app sync devpi
```
2. **Verify all services** with `mise run indri-services-check`
3. **Configure containerd registry mirrors** (will be done by ansible on next provision)
---
## Technical Notes
### API Server Port
With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container. Current port: 50820.
The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly.
### Registry Mirror Configuration
Containerd uses `/etc/containerd/certs.d/<registry>/hosts.toml` files:
```toml
# /etc/containerd/certs.d/docker.io/hosts.toml
server = "https://registry-1.docker.io"
[host."http://host.minikube.internal:5050"]
capabilities = ["pull", "resolve"]
skip_verify = true
``` ```
### 10. Verify All Services The ansible role configures mirrors for:
- `registry.tail8d86e.ts.net` (private images)
- `docker.io`
- `ghcr.io`
- `quay.io`
### Volume Mounts for P6 (Kiwix/Transmission)
With the docker driver, volume mounts work differently than podman or qemu2. Here's the analysis:
**Current Network State:**
- Minikube container is on Docker network `192.168.49.0/24`
- Sifaka NFS exports `/volume1/torrents` to:
- `192.168.105.0/24` (old qemu2 VM network - no longer used)
- `100.64.0.0/10` (Tailscale CGNAT range)
- Minikube can resolve `sifaka` (192.168.1.203) but can't reach it (100% packet loss due to Docker network isolation)
**Option A: hostPath via Docker Desktop File Sharing** ⭐ RECOMMENDED
1. Mount sifaka NFS share on indri macOS: `mount -t nfs sifaka:/volume1/torrents /Volumes/torrents`
2. Docker Desktop file sharing exposes `/Volumes` into the Docker VM
3. Pods use hostPath to access `/Volumes/torrents`
Pros:
- Simplest approach, uses native Docker file sharing
- No network reconfiguration needed on sifaka
- Path is stable and predictable
Cons:
- Requires persistent NFS mount on indri (LaunchDaemon)
- File sharing performance may be slower than direct NFS
Implementation:
```bash ```bash
mise run indri-services-check # Manual mount test
argocd app list ssh indri 'sudo mkdir -p /Volumes/torrents && sudo mount -t nfs -o resvport,rw sifaka:/volume1/torrents /Volumes/torrents'
kubectl get pods --all-namespaces
# Verify Docker can see it
ssh indri 'docker run --rm -v /Volumes/torrents:/data alpine ls /data'
# Pod manifest uses hostPath:
# volumes:
# - name: torrents
# hostPath:
# path: /Volumes/torrents
# type: Directory
``` ```
--- **Option B: Update sifaka NFS exports for Docker network**
1. Add `192.168.49.0/24` to sifaka's NFS exports
2. Pods mount NFS directly using kubernetes NFS volume type
## Volume Mounts for P6 (Kiwix/Transmission) Cons:
- Docker network might change (though `192.168.49.x` seems stable for minikube)
- Requires sifaka configuration change
- NFS mount from inside container may have permission issues
With the docker driver, volume mounts work differently than QEMU2: **Option C: Tailscale sidecar for NFS access**
1. Pods include a Tailscale sidecar that joins the tailnet
2. Mount NFS via Tailscale IP (sifaka is at 100.x.x.x)
**Option A: Docker Desktop File Sharing + hostPath** Cons:
1. Mount sifaka NFS share on indri: `/Volumes/torrents` - Complex setup with sidecar containers
2. Add `/Volumes/torrents` to Docker Desktop file sharing - Each pod needs Tailscale auth
3. Pods use hostPath pointing to that path - Overkill for this use case
**Option B: NFS directly from pods** **Recommendation for P6:**
- Docker containers can make NFS mounts (unlike podman's rootless containers) Use **Option A** (hostPath via Docker Desktop file sharing). It's the simplest and most reliable approach. We'll need a LaunchDaemon for the NFS mount, but it's straightforward:
- May need to test if sifaka allows connections from the Docker network
This will be fully tested in Phase 6. ```xml
<!-- /Library/LaunchDaemons/com.blumeops.nfs-torrents.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.blumeops.nfs-torrents</string>
<key>ProgramArguments</key>
<array>
<string>/sbin/mount</string>
<string>-t</string>
<string>nfs</string>
<string>-o</string>
<string>resvport,rw</string>
<string>sifaka:/volume1/torrents</string>
<string>/Volumes/torrents</string>
</array>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
```
--- This is simpler than the qemu2 approach because there's no intermediate `minikube mount` step - Docker Desktop handles the path passthrough automatically.
## Cleanup
After successful migration:
1. **Remove QEMU2 artifacts:**
```bash
brew uninstall qemu socket_vmnet
```
2. **Remove podman if no longer needed:**
```bash
podman machine stop
podman machine rm
brew uninstall podman
```
--- ---
## Verification Checklist ## Verification Checklist
- [ ] Docker Desktop installed and running on indri - [x] Docker Desktop installed and running on indri
- [ ] QEMU2 minikube deleted - [x] QEMU2 minikube deleted
- [ ] Docker minikube running - [x] Docker minikube running (6 CPUs, 11GB RAM)
- [ ] API server accessible on localhost:6443 - [x] API server accessible on localhost:50820
- [ ] Tailscale serve configured for svc:k8s → localhost:6443 - [x] Tailscale serve configured for svc:k8s → localhost:50820
- [ ] Remote kubectl access working from gilbert - [x] Remote kubectl access working from gilbert
- [x] Ansible roles updated for docker driver
- [x] socket_vmnet stopped
- [ ] ArgoCD redeployed and synced - [ ] ArgoCD redeployed and synced
- [ ] All existing apps healthy (grafana, miniflux, devpi, etc.) - [ ] All existing apps healthy (grafana, miniflux, devpi, etc.)
- [ ] PostgreSQL cluster healthy - [ ] PostgreSQL cluster healthy
- [ ] Containerd registry mirrors configured
- [ ] `mise run indri-services-check` passes - [ ] `mise run indri-services-check` passes
--- ---
@ -312,29 +275,3 @@ If Docker driver doesn't work:
1. Delete Docker minikube: `minikube delete` 1. Delete Docker minikube: `minikube delete`
2. Recreate QEMU2 cluster (restore old ansible config from git) 2. Recreate QEMU2 cluster (restore old ansible config from git)
3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl 3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl
---
## Notes
- Docker Desktop has resource overhead but provides better macOS integration
- The docker driver is more widely used and tested than qemu2
- File sharing permissions may need adjustment in Docker Desktop settings
- First cluster start may be slow as Docker pulls the minikube base image
## Implementation Notes (2026-01-21)
### QEMU2 Cleanup Done
Removed from indri:
- `/Library/LaunchDaemons/com.blumeops.nfs-torrents.plist` - NFS mount daemon
- `~/Library/LaunchAgents/com.blumeops.minikube-mount.plist` - minikube mount agent
- Unmounted `/Volumes/torrents-nfs` NFS mount
- Removed `/Volumes/torrents-nfs` mount point
### Previous QEMU2 Issues
The QEMU2 migration partially worked but had a critical issue:
- Volume mounts worked via NFS → indri → minikube mount chain
- But Tailscale TCP proxy to VM IP (192.168.105.2:6443) failed with TLS timeout
- Root cause unknown - TCP connected but TLS handshake never completed