diff --git a/plans/k8s-migration/P5.1_qemu2_migration.md b/plans/k8s-migration/P5.1_qemu2_migration.md index 63f12e9..e37b65b 100644 --- a/plans/k8s-migration/P5.1_qemu2_migration.md +++ b/plans/k8s-migration/P5.1_qemu2_migration.md @@ -2,7 +2,7 @@ **Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts -**Status**: In Progress (2026-01-21) +**Status**: In Progress (2026-01-21) - Ansible roles updated, cluster running, awaiting ArgoCD redeploy **Prerequisites**: [Phase 5](P5_devpi.complete.md) complete @@ -38,269 +38,232 @@ Additionally, the volume mount solution with QEMU2 was complex: The **docker driver** solves both problems: -1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` will work (like podman did) +1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` works -2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers, and minikube (running in Docker) can use those paths via hostPath. +2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers. 3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver. --- -## Prerequisites +## Implementation Progress -### 1. Install Docker Desktop (Manual - Before Ansible) +### Completed ✅ -Docker Desktop requires GUI setup, so install manually first: +1. **Docker Desktop installed** (manual via `brew install --cask docker`) + - Configured with 12GB memory in Docker Desktop settings + - Kubernetes option disabled (using minikube instead) -```bash -# On indri: -brew install --cask docker-desktop +2. **QEMU2 minikube deleted** (`minikube stop && minikube delete`) -# Then launch Docker Desktop from /Applications -# Complete the setup wizard (accept license, skip tutorial) -# Wait for Docker to be "Running" (green icon in menu bar) - -# Verify: -docker version -docker run hello-world -``` - -**File Sharing Configuration** (in Docker Desktop → Settings → Resources → File sharing): -- Ensure `/Volumes` is shared (for future NFS mounts from sifaka) -- Or add specific paths as needed for P6 - -### 2. Stop Current QEMU2 Minikube - -```bash -# On indri: -minikube stop -minikube delete - -# Verify QEMU resources are cleaned up -ps aux | grep qemu -``` - ---- - -## Plan - -### 1. Update Ansible Role for Docker Driver - -**Changes to `ansible/roles/minikube/defaults/main.yml`:** - -```yaml -# Change from: -minikube_driver: qemu2 -minikube_network: socket_vmnet -minikube_container_runtime: containerd - -# To: -minikube_driver: docker -minikube_container_runtime: docker # or containerd, both work -``` - -**Remove from defaults:** -- `minikube_network` (not needed for docker driver) - -**Changes to `ansible/roles/minikube/tasks/main.yml`:** -- Remove qemu installation -- Remove socket_vmnet installation and service management -- Remove NFS mount point creation -- Remove NFS LaunchDaemon installation -- Remove minikube mount LaunchAgent installation -- Keep containerd registry mirror config (adapting for docker if needed) - -**Remove files from `ansible/roles/minikube/files/`:** -- `com.blumeops.nfs-torrents.plist` -- `com.blumeops.minikube-mount.plist` - -**Changes to `ansible/roles/minikube/handlers/main.yml`:** -- Remove `Load NFS mount LaunchDaemon` -- Remove `Load minikube mount LaunchAgent` - -**Add to Brewfile:** -```ruby -cask "docker" # Docker Desktop -``` - -### 2. Update Tailscale Serve Configuration - -**Changes to `ansible/roles/tailscale_serve/defaults/main.yml`:** - -```yaml -# Change svc:k8s upstream from VM IP back to localhost: -- name: svc:k8s - tcp: - port: 443 - upstream: tcp://localhost:PORT # PORT will be dynamic, see below -``` - -**Note on API server port**: With the docker driver, the API server port is dynamic (assigned by minikube). We need to either: -- Use `--apiserver-port=6443` to fix it -- Or query and update the config after cluster creation - -### 3. Create Docker Minikube Cluster - -```bash -# On indri (after Docker Desktop is running): -minikube start \ - --driver=docker \ - --cpus=6 \ - --memory=12288 \ - --disk-size=200g \ - --apiserver-names=k8s.tail8d86e.ts.net,indri \ - --apiserver-port=6443 \ - --listen-address=0.0.0.0 - -# Verify cluster -minikube status -kubectl get nodes -``` - -### 4. Verify API Server is on Localhost - -```bash -# Check what port the API server is on -kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}" -# Should show https://127.0.0.1:PORT or similar - -# Verify local access works -curl -k https://localhost:6443/healthz -# Should return "ok" -``` - -### 5. Update 1Password Credentials - -After cluster recreation, update the credentials in 1Password: - -```bash -# On indri, get the new certificates: -cat ~/.minikube/profiles/minikube/client.crt -cat ~/.minikube/profiles/minikube/client.key -cat ~/.minikube/ca.crt -``` - -Update in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `3jo4f2hnzvwfmamudfsbbbec7e`). - -### 6. Update Kubeconfig on Gilbert - -```bash -# Fetch new CA cert from 1Password -op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 3jo4f2hnzvwfmamudfsbbbec7e --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt -``` - -### 7. Configure Tailscale Serve for K8s - -```bash -# On indri: -tailscale serve --service="svc:k8s" --tcp=443 tcp://localhost:6443 -``` - -### 8. Verify Remote Access - -```bash -# From gilbert: -curl -k --connect-timeout 5 https://k8s.tail8d86e.ts.net/healthz -# Should return "ok" - -kubectl --context=minikube-indri get nodes -# Should show the minikube node -``` - -### 9. Redeploy ArgoCD and Apps - -Since this is a cluster recreation, we need to re-bootstrap: - -```bash -# On indri - apply secrets first -op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f - - -# Create repo secret for ArgoCD -PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n' -kubectl create namespace argocd -kubectl create secret generic repo-forge -n argocd \ - --from-literal=type=git \ - --from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \ - --from-literal=insecure=true \ - --from-literal=sshPrivateKey="$PRIV_KEY" -kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository - -# Bootstrap operators -kubectl create namespace tailscale -kubectl apply -k argocd/manifests/tailscale-operator/ -kubectl apply -k argocd/manifests/argocd/ - -# Wait for ArgoCD -kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s - -# Login and sync apps -argocd login argocd.tail8d86e.ts.net --username admin --grpc-web -argocd app sync apps -argocd app sync tailscale-operator -argocd app sync cloudnative-pg -argocd app sync blumeops-pg -argocd app sync grafana -argocd app sync grafana-config -argocd app sync miniflux -argocd app sync devpi -``` - -### 10. Verify All Services - -```bash -mise run indri-services-check -argocd app list -kubectl get pods --all-namespaces -``` - ---- - -## Volume Mounts for P6 (Kiwix/Transmission) - -With the docker driver, volume mounts work differently than QEMU2: - -**Option A: Docker Desktop File Sharing + hostPath** -1. Mount sifaka NFS share on indri: `/Volumes/torrents` -2. Add `/Volumes/torrents` to Docker Desktop file sharing -3. Pods use hostPath pointing to that path - -**Option B: NFS directly from pods** -- Docker containers can make NFS mounts (unlike podman's rootless containers) -- May need to test if sifaka allows connections from the Docker network - -This will be fully tested in Phase 6. - ---- - -## Cleanup - -After successful migration: - -1. **Remove QEMU2 artifacts:** +3. **Docker minikube cluster created**: ```bash - brew uninstall qemu socket_vmnet + minikube start \ + --driver=docker \ + --container-runtime=docker \ + --cpus=6 \ + --memory=11264 \ + --disk-size=200g \ + --apiserver-names=k8s.tail8d86e.ts.net,indri \ + --apiserver-port=6443 \ + --listen-address=0.0.0.0 + ``` + Note: Memory set to 11264MB (11GB) to leave headroom for Docker Desktop overhead. + +4. **Tailscale serve configured** for k8s API: + - API server on localhost:50820 (port is dynamic with docker driver) + - `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:50820` + +5. **Remote kubectl access working** from gilbert: + - Created `mise-tasks/ensure-minikube-indri-kubectl-config` script + - Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml` + - `kubectl --context=minikube-indri get nodes` works + +6. **Ansible roles updated**: + - `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet + - `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port) + - Containerd registry mirrors configured for zot pull-through cache + +7. **QEMU2 artifacts cleaned up**: + - Stopped socket_vmnet service + - Removed NFS LaunchDaemon + - Removed minikube mount LaunchAgent + - kubectl still works after cleanup + +### Remaining 📋 + +1. **Redeploy ArgoCD and apps** - bootstrap the cluster with: + ```bash + # On indri - apply secrets first + op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f - + + # Create repo secret for ArgoCD + PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n' + kubectl create namespace argocd + kubectl create secret generic repo-forge -n argocd \ + --from-literal=type=git \ + --from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \ + --from-literal=insecure=true \ + --from-literal=sshPrivateKey="$PRIV_KEY" + kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository + + # Bootstrap operators + kubectl create namespace tailscale + kubectl apply -k argocd/manifests/tailscale-operator/ + kubectl apply -k argocd/manifests/argocd/ + + # Wait for ArgoCD + kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s + + # Login and sync apps + argocd login argocd.tail8d86e.ts.net --username admin --grpc-web + argocd app sync apps + argocd app sync tailscale-operator + argocd app sync cloudnative-pg + argocd app sync blumeops-pg + argocd app sync grafana + argocd app sync grafana-config + argocd app sync miniflux + argocd app sync devpi ``` -2. **Remove podman if no longer needed:** - ```bash - podman machine stop - podman machine rm - brew uninstall podman - ``` +2. **Verify all services** with `mise run indri-services-check` + +3. **Configure containerd registry mirrors** (will be done by ansible on next provision) + +--- + +## Technical Notes + +### API Server Port + +With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container. Current port: 50820. + +The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly. + +### Registry Mirror Configuration + +Containerd uses `/etc/containerd/certs.d//hosts.toml` files: + +```toml +# /etc/containerd/certs.d/docker.io/hosts.toml +server = "https://registry-1.docker.io" + +[host."http://host.minikube.internal:5050"] + capabilities = ["pull", "resolve"] + skip_verify = true +``` + +The ansible role configures mirrors for: +- `registry.tail8d86e.ts.net` (private images) +- `docker.io` +- `ghcr.io` +- `quay.io` + +### Volume Mounts for P6 (Kiwix/Transmission) + +With the docker driver, volume mounts work differently than podman or qemu2. Here's the analysis: + +**Current Network State:** +- Minikube container is on Docker network `192.168.49.0/24` +- Sifaka NFS exports `/volume1/torrents` to: + - `192.168.105.0/24` (old qemu2 VM network - no longer used) + - `100.64.0.0/10` (Tailscale CGNAT range) +- Minikube can resolve `sifaka` (192.168.1.203) but can't reach it (100% packet loss due to Docker network isolation) + +**Option A: hostPath via Docker Desktop File Sharing** ⭐ RECOMMENDED +1. Mount sifaka NFS share on indri macOS: `mount -t nfs sifaka:/volume1/torrents /Volumes/torrents` +2. Docker Desktop file sharing exposes `/Volumes` into the Docker VM +3. Pods use hostPath to access `/Volumes/torrents` + +Pros: +- Simplest approach, uses native Docker file sharing +- No network reconfiguration needed on sifaka +- Path is stable and predictable + +Cons: +- Requires persistent NFS mount on indri (LaunchDaemon) +- File sharing performance may be slower than direct NFS + +Implementation: +```bash +# Manual mount test +ssh indri 'sudo mkdir -p /Volumes/torrents && sudo mount -t nfs -o resvport,rw sifaka:/volume1/torrents /Volumes/torrents' + +# Verify Docker can see it +ssh indri 'docker run --rm -v /Volumes/torrents:/data alpine ls /data' + +# Pod manifest uses hostPath: +# volumes: +# - name: torrents +# hostPath: +# path: /Volumes/torrents +# type: Directory +``` + +**Option B: Update sifaka NFS exports for Docker network** +1. Add `192.168.49.0/24` to sifaka's NFS exports +2. Pods mount NFS directly using kubernetes NFS volume type + +Cons: +- Docker network might change (though `192.168.49.x` seems stable for minikube) +- Requires sifaka configuration change +- NFS mount from inside container may have permission issues + +**Option C: Tailscale sidecar for NFS access** +1. Pods include a Tailscale sidecar that joins the tailnet +2. Mount NFS via Tailscale IP (sifaka is at 100.x.x.x) + +Cons: +- Complex setup with sidecar containers +- Each pod needs Tailscale auth +- Overkill for this use case + +**Recommendation for P6:** +Use **Option A** (hostPath via Docker Desktop file sharing). It's the simplest and most reliable approach. We'll need a LaunchDaemon for the NFS mount, but it's straightforward: + +```xml + + + + + + Label + com.blumeops.nfs-torrents + ProgramArguments + + /sbin/mount + -t + nfs + -o + resvport,rw + sifaka:/volume1/torrents + /Volumes/torrents + + RunAtLoad + + + +``` + +This is simpler than the qemu2 approach because there's no intermediate `minikube mount` step - Docker Desktop handles the path passthrough automatically. --- ## Verification Checklist -- [ ] Docker Desktop installed and running on indri -- [ ] QEMU2 minikube deleted -- [ ] Docker minikube running -- [ ] API server accessible on localhost:6443 -- [ ] Tailscale serve configured for svc:k8s → localhost:6443 -- [ ] Remote kubectl access working from gilbert +- [x] Docker Desktop installed and running on indri +- [x] QEMU2 minikube deleted +- [x] Docker minikube running (6 CPUs, 11GB RAM) +- [x] API server accessible on localhost:50820 +- [x] Tailscale serve configured for svc:k8s → localhost:50820 +- [x] Remote kubectl access working from gilbert +- [x] Ansible roles updated for docker driver +- [x] socket_vmnet stopped - [ ] ArgoCD redeployed and synced - [ ] All existing apps healthy (grafana, miniflux, devpi, etc.) - [ ] PostgreSQL cluster healthy +- [ ] Containerd registry mirrors configured - [ ] `mise run indri-services-check` passes --- @@ -312,29 +275,3 @@ If Docker driver doesn't work: 1. Delete Docker minikube: `minikube delete` 2. Recreate QEMU2 cluster (restore old ansible config from git) 3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl - ---- - -## Notes - -- Docker Desktop has resource overhead but provides better macOS integration -- The docker driver is more widely used and tested than qemu2 -- File sharing permissions may need adjustment in Docker Desktop settings -- First cluster start may be slow as Docker pulls the minikube base image - -## Implementation Notes (2026-01-21) - -### QEMU2 Cleanup Done - -Removed from indri: -- `/Library/LaunchDaemons/com.blumeops.nfs-torrents.plist` - NFS mount daemon -- `~/Library/LaunchAgents/com.blumeops.minikube-mount.plist` - minikube mount agent -- Unmounted `/Volumes/torrents-nfs` NFS mount -- Removed `/Volumes/torrents-nfs` mount point - -### Previous QEMU2 Issues - -The QEMU2 migration partially worked but had a critical issue: -- Volume mounts worked via NFS → indri → minikube mount chain -- But Tailscale TCP proxy to VM IP (192.168.105.2:6443) failed with TLS timeout -- Root cause unknown - TCP connected but TLS handshake never completed