P5.1: Migrate minikube from podman to QEMU2 driver #38

Merged
eblume merged 16 commits from feature/p5.1-qemu2-migration into main 2026-01-21 16:03:38 -08:00
Showing only changes of commit 70357d247b - Show all commits

Update P5.1 plan to complete status

- ArgoCD deployed and all apps synced
- Document remaining steps (secrets, post-merge reset)
- Simplified and reorganized documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Erich Blume 2026-01-21 14:49:04 -08:00

View file

@ -2,7 +2,7 @@
**Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts **Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts
**Status**: In Progress (2026-01-21) - Ansible roles updated, cluster running, awaiting ArgoCD redeploy **Status**: Complete (2026-01-21) - Cluster running, ArgoCD deployed, apps synced
**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete **Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
@ -46,17 +46,15 @@ The **docker driver** solves both problems:
--- ---
## Implementation Progress ## Implementation Summary
### Completed ✅ ### Infrastructure Changes
1. **Docker Desktop installed** (manual via `brew install --cask docker`) 1. **Docker Desktop installed** (manual via `brew install --cask docker`)
- Configured with 12GB memory in Docker Desktop settings - Configured with 12GB memory in Docker Desktop settings
- Kubernetes option disabled (using minikube instead) - Kubernetes option disabled (using minikube instead)
2. **QEMU2 minikube deleted** (`minikube stop && minikube delete`) 2. **Docker minikube cluster created**:
3. **Docker minikube cluster created**:
```bash ```bash
minikube start \ minikube start \
--driver=docker \ --driver=docker \
@ -68,68 +66,47 @@ The **docker driver** solves both problems:
--apiserver-port=6443 \ --apiserver-port=6443 \
--listen-address=0.0.0.0 --listen-address=0.0.0.0
``` ```
Note: Memory set to 11264MB (11GB) to leave headroom for Docker Desktop overhead.
4. **Tailscale serve configured** for k8s API: 3. **Tailscale serve configured** for k8s API:
- API server on localhost:50820 (port is dynamic with docker driver) - API server on localhost (port is dynamic with docker driver)
- `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:50820` - `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:<PORT>`
5. **Remote kubectl access working** from gilbert: 4. **Remote kubectl access working** from gilbert:
- Created `mise-tasks/ensure-minikube-indri-kubectl-config` script - Created `mise-tasks/ensure-minikube-indri-kubectl-config` script
- Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml` - Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml`
- `kubectl --context=minikube-indri get nodes` works
6. **Ansible roles updated**: ### Ansible Roles Updated
- `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet
- `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port)
- Containerd registry mirrors configured for zot pull-through cache
7. **QEMU2 artifacts cleaned up**: - `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet
- Stopped socket_vmnet service - `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port)
- Removed NFS LaunchDaemon - Containerd registry mirrors configured for zot pull-through cache
- Removed minikube mount LaunchAgent
- kubectl still works after cleanup
### Remaining 📋 ### ArgoCD Bootstrap
1. **Redeploy ArgoCD and apps** - bootstrap the cluster with: All apps deployed and synced from `feature/p5.1-qemu2-migration` branch:
```bash
# On indri - apply secrets first
op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
# Create repo secret for ArgoCD | App | Status | Notes |
PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n' |-----|--------|-------|
kubectl create namespace argocd | tailscale-operator | Healthy | Manages Tailscale ingresses |
kubectl create secret generic repo-forge -n argocd \ | argocd | Healthy | Self-managed |
--from-literal=type=git \ | cloudnative-pg | Healthy | PostgreSQL operator |
--from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \ | blumeops-pg | Progressing | PostgreSQL cluster starting |
--from-literal=insecure=true \ | grafana | Progressing | Needs grafana-admin secret |
--from-literal=sshPrivateKey="$PRIV_KEY" | grafana-config | Healthy | Dashboards and ingress |
kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository | miniflux | Progressing | Needs miniflux-config secret |
| devpi | Progressing | Starting up |
# Bootstrap operators ### Secrets Still Needed
kubectl create namespace tailscale
kubectl apply -k argocd/manifests/tailscale-operator/
kubectl apply -k argocd/manifests/argocd/
# Wait for ArgoCD After PR merge, apply these secrets manually:
kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
# Login and sync apps ```bash
argocd login argocd.tail8d86e.ts.net --username admin --grpc-web # Grafana admin password
argocd app sync apps op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | kubectl --context=minikube-indri apply -f -
argocd app sync tailscale-operator
argocd app sync cloudnative-pg
argocd app sync blumeops-pg
argocd app sync grafana
argocd app sync grafana-config
argocd app sync miniflux
argocd app sync devpi
```
2. **Verify all services** with `mise run indri-services-check` # Miniflux config
op inject -i argocd/manifests/miniflux/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -
3. **Configure containerd registry mirrors** (will be done by ansible on next provision) ```
--- ---
@ -137,127 +114,34 @@ The **docker driver** solves both problems:
### API Server Port ### API Server Port
With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container. Current port: 50820. With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container.
The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly. The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly.
### Registry Mirror Configuration ### Registry Mirror Configuration
Containerd uses `/etc/containerd/certs.d/<registry>/hosts.toml` files: Containerd uses `/etc/containerd/certs.d/<registry>/hosts.toml` files. The ansible role configures mirrors for:
```toml
# /etc/containerd/certs.d/docker.io/hosts.toml
server = "https://registry-1.docker.io"
[host."http://host.minikube.internal:5050"]
capabilities = ["pull", "resolve"]
skip_verify = true
```
The ansible role configures mirrors for:
- `registry.tail8d86e.ts.net` (private images) - `registry.tail8d86e.ts.net` (private images)
- `docker.io` - `docker.io`
- `ghcr.io` - `ghcr.io`
- `quay.io` - `quay.io`
### ProxyClass Renamed
Changed from `crio-compat` to `default` - the old name was misleading since we're no longer using CRI-O.
### Volume Mounts for P6 (Kiwix/Transmission) ### Volume Mounts for P6 (Kiwix/Transmission)
With the docker driver, volume mounts work differently than podman or qemu2. Here's the analysis: Two options available:
**Current Network State:**
- Minikube container is on Docker network `192.168.49.0/24`
- Sifaka NFS exports `/volume1/torrents` to:
- `192.168.105.0/24` (old qemu2 VM network - no longer used)
- `100.64.0.0/10` (Tailscale CGNAT range)
- Network connectivity: ✅ Works after approving macOS network access GUI prompt
- NFS access: ❌ Denied (sifaka doesn't allow `192.168.49.0/24`)
**Option A: hostPath via Docker Desktop File Sharing** ⭐ RECOMMENDED **Option A: hostPath via Docker Desktop File Sharing** ⭐ RECOMMENDED
1. Mount sifaka NFS share on indri macOS: `mount -t nfs sifaka:/volume1/torrents /Volumes/torrents` 1. Mount sifaka NFS share on indri macOS: `/Volumes/torrents`
2. Docker Desktop file sharing exposes `/Volumes` into the Docker VM 2. Docker Desktop file sharing exposes `/Volumes` into the Docker VM
3. Pods use hostPath to access `/Volumes/torrents` 3. Pods use hostPath to access `/Volumes/torrents`
Pros: **Option B: Update sifaka NFS exports for Docker network**
- Simplest approach, uses native Docker file sharing 1. Add `192.168.49.0/24` to sifaka's NFS exports
- No network reconfiguration needed on sifaka 2. Pods mount NFS directly (network connectivity works after macOS approval)
- Path is stable and predictable
Cons:
- Requires persistent NFS mount on indri (LaunchDaemon)
- File sharing performance may be slower than direct NFS
Implementation:
```bash
# Manual mount test
ssh indri 'sudo mkdir -p /Volumes/torrents && sudo mount -t nfs -o resvport,rw sifaka:/volume1/torrents /Volumes/torrents'
# Verify Docker can see it
ssh indri 'docker run --rm -v /Volumes/torrents:/data alpine ls /data'
# Pod manifest uses hostPath:
# volumes:
# - name: torrents
# hostPath:
# path: /Volumes/torrents
# type: Directory
```
**Option B: Update sifaka NFS exports for Docker network** ⭐ ALTERNATIVE
1. In Synology DSM: Control Panel → Shared Folder → torrents → Edit → NFS Permissions
2. Add `192.168.49.0/24` to allowed clients
3. Pods mount NFS directly using kubernetes NFS volume type
Pros:
- Simpler than Option A (no intermediate macOS mount)
- Direct path, better performance
- Network connectivity confirmed working (after macOS network access approval)
Cons:
- Requires sifaka configuration change (one-time)
- Docker network might change (though `192.168.49.x` seems stable for minikube)
Test command (after updating sifaka):
```bash
ssh indri 'minikube ssh "sudo mount -t nfs sifaka:/volume1/torrents /mnt/torrents && ls /mnt/torrents"'
```
**Option C: Tailscale sidecar for NFS access**
1. Pods include a Tailscale sidecar that joins the tailnet
2. Mount NFS via Tailscale IP (sifaka is at 100.x.x.x)
Cons:
- Complex setup with sidecar containers
- Each pod needs Tailscale auth
- Overkill for this use case
**Recommendation for P6:**
Use **Option A** (hostPath via Docker Desktop file sharing). It's the simplest and most reliable approach. We'll need a LaunchDaemon for the NFS mount, but it's straightforward:
```xml
<!-- /Library/LaunchDaemons/com.blumeops.nfs-torrents.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.blumeops.nfs-torrents</string>
<key>ProgramArguments</key>
<array>
<string>/sbin/mount</string>
<string>-t</string>
<string>nfs</string>
<string>-o</string>
<string>resvport,rw</string>
<string>sifaka:/volume1/torrents</string>
<string>/Volumes/torrents</string>
</array>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
```
This is simpler than the qemu2 approach because there's no intermediate `minikube mount` step - Docker Desktop handles the path passthrough automatically.
--- ---
@ -266,19 +150,46 @@ This is simpler than the qemu2 approach because there's no intermediate `minikub
- [x] Docker Desktop installed and running on indri - [x] Docker Desktop installed and running on indri
- [x] QEMU2 minikube deleted - [x] QEMU2 minikube deleted
- [x] Docker minikube running (6 CPUs, 11GB RAM) - [x] Docker minikube running (6 CPUs, 11GB RAM)
- [x] API server accessible on localhost:50820 - [x] API server accessible on localhost
- [x] Tailscale serve configured for svc:k8s → localhost:50820 - [x] Tailscale serve configured for svc:k8s
- [x] Remote kubectl access working from gilbert - [x] Remote kubectl access working from gilbert
- [x] Ansible roles updated for docker driver - [x] Ansible roles updated for docker driver
- [x] socket_vmnet stopped - [x] socket_vmnet stopped
- [ ] ArgoCD redeployed and synced - [x] ArgoCD deployed and synced
- [ ] All existing apps healthy (grafana, miniflux, devpi, etc.) - [x] All apps synced to feature branch
- [ ] PostgreSQL cluster healthy - [ ] Apply app secrets (grafana-admin, miniflux-config)
- [ ] Containerd registry mirrors configured - [ ] Verify all apps healthy after secrets applied
- [ ] Merge PR and reset apps to main branch
- [ ] `mise run indri-services-check` passes - [ ] `mise run indri-services-check` passes
--- ---
## Post-Merge Steps
After PR is merged:
```bash
# Reset all blumeops apps to main branch
argocd app set apps --revision main
argocd app set argocd --revision main
argocd app set blumeops-pg --revision main
argocd app set devpi --revision main
argocd app set grafana-config --revision main
argocd app set miniflux --revision main
argocd app set tailscale-operator --revision main
# Sync all apps
argocd app sync apps
argocd app sync argocd
argocd app sync tailscale-operator
argocd app sync blumeops-pg
argocd app sync grafana-config
argocd app sync miniflux
argocd app sync devpi
```
---
## Rollback Plan ## Rollback Plan
If Docker driver doesn't work: If Docker driver doesn't work: