Update P5.1 plan with completion status and P6 storage options

- Document completed steps (docker driver working, kubectl access, ansible updated) - Add detailed analysis of volume mount options for P6 - Recommend hostPath via Docker Desktop file sharing as simplest approach - Document why direct NFS won't work (Docker network isolation) - Include sample LaunchDaemon for persistent NFS mount Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 14:05:26 -08:00 · 2026-01-21 14:05:26 -08:00 · 75f945385c
commit 75f945385c
parent 9fac4439b1
1 changed files with 178 additions and 241 deletions
--- a/plans/k8s-migration/P5.1_qemu2_migration.md
+++ b/plans/k8s-migration/P5.1_qemu2_migration.md
@ -2,7 +2,7 @@

 **Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts

-**Status**: In Progress (2026-01-21)
+**Status**: In Progress (2026-01-21) - Ansible roles updated, cluster running, awaiting ArgoCD redeploy

 **Prerequisites**: [Phase 5](P5_devpi.complete.md) complete

@ -38,269 +38,232 @@ Additionally, the volume mount solution with QEMU2 was complex:

 The **docker driver** solves both problems:

-1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` will work (like podman did)
+1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` works

-2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers, and minikube (running in Docker) can use those paths via hostPath.
+2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers.

 3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver.

 ---

-## Prerequisites
+## Implementation Progress

-### 1. Install Docker Desktop (Manual - Before Ansible)
+### Completed ✅

-Docker Desktop requires GUI setup, so install manually first:
+1. **Docker Desktop installed** (manual via `brew install --cask docker`)
+   - Configured with 12GB memory in Docker Desktop settings
+   - Kubernetes option disabled (using minikube instead)

-```bash
-# On indri:
-brew install --cask docker-desktop
+2. **QEMU2 minikube deleted** (`minikube stop && minikube delete`)

-# Then launch Docker Desktop from /Applications
-# Complete the setup wizard (accept license, skip tutorial)
-# Wait for Docker to be "Running" (green icon in menu bar)
-
-# Verify:
-docker version
-docker run hello-world
-```
-
-**File Sharing Configuration** (in Docker Desktop → Settings → Resources → File sharing):
- Ensure `/Volumes` is shared (for future NFS mounts from sifaka)
- Or add specific paths as needed for P6
-
-### 2. Stop Current QEMU2 Minikube
-
-```bash
-# On indri:
-minikube stop
-minikube delete
-
-# Verify QEMU resources are cleaned up
-ps aux | grep qemu
-```
-
---
-
-## Plan
-
-### 1. Update Ansible Role for Docker Driver
-
-**Changes to `ansible/roles/minikube/defaults/main.yml`:**
-
-```yaml
-# Change from:
-minikube_driver: qemu2
-minikube_network: socket_vmnet
-minikube_container_runtime: containerd
-
-# To:
-minikube_driver: docker
-minikube_container_runtime: docker  # or containerd, both work
-```
-
-**Remove from defaults:**
- `minikube_network` (not needed for docker driver)
-
-**Changes to `ansible/roles/minikube/tasks/main.yml`:**
- Remove qemu installation
- Remove socket_vmnet installation and service management
- Remove NFS mount point creation
- Remove NFS LaunchDaemon installation
- Remove minikube mount LaunchAgent installation
- Keep containerd registry mirror config (adapting for docker if needed)
-
-**Remove files from `ansible/roles/minikube/files/`:**
- `com.blumeops.nfs-torrents.plist`
- `com.blumeops.minikube-mount.plist`
-
-**Changes to `ansible/roles/minikube/handlers/main.yml`:**
- Remove `Load NFS mount LaunchDaemon`
- Remove `Load minikube mount LaunchAgent`
-
-**Add to Brewfile:**
-```ruby
-cask "docker"  # Docker Desktop
-```
-
-### 2. Update Tailscale Serve Configuration
-
-**Changes to `ansible/roles/tailscale_serve/defaults/main.yml`:**
-
-```yaml
-# Change svc:k8s upstream from VM IP back to localhost:
- name: svc:k8s
-  tcp:
-    port: 443
-    upstream: tcp://localhost:PORT  # PORT will be dynamic, see below
-```
-
-**Note on API server port**: With the docker driver, the API server port is dynamic (assigned by minikube). We need to either:
- Use `--apiserver-port=6443` to fix it
- Or query and update the config after cluster creation
-
-### 3. Create Docker Minikube Cluster
-
-```bash
-# On indri (after Docker Desktop is running):
-minikube start \
-  --driver=docker \
-  --cpus=6 \
-  --memory=12288 \
-  --disk-size=200g \
-  --apiserver-names=k8s.tail8d86e.ts.net,indri \
-  --apiserver-port=6443 \
-  --listen-address=0.0.0.0
-
-# Verify cluster
-minikube status
-kubectl get nodes
-```
-
-### 4. Verify API Server is on Localhost
-
-```bash
-# Check what port the API server is on
-kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
-# Should show https://127.0.0.1:PORT or similar
-
-# Verify local access works
-curl -k https://localhost:6443/healthz
-# Should return "ok"
-```
-
-### 5. Update 1Password Credentials
-
-After cluster recreation, update the credentials in 1Password:
-
-```bash
-# On indri, get the new certificates:
-cat ~/.minikube/profiles/minikube/client.crt
-cat ~/.minikube/profiles/minikube/client.key
-cat ~/.minikube/ca.crt
-```
-
-Update in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `3jo4f2hnzvwfmamudfsbbbec7e`).
-
-### 6. Update Kubeconfig on Gilbert
-
-```bash
-# Fetch new CA cert from 1Password
-op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 3jo4f2hnzvwfmamudfsbbbec7e --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt
-```
-
-### 7. Configure Tailscale Serve for K8s
-
-```bash
-# On indri:
-tailscale serve --service="svc:k8s" --tcp=443 tcp://localhost:6443
-```
-
-### 8. Verify Remote Access
-
-```bash
-# From gilbert:
-curl -k --connect-timeout 5 https://k8s.tail8d86e.ts.net/healthz
-# Should return "ok"
-
-kubectl --context=minikube-indri get nodes
-# Should show the minikube node
-```
-
-### 9. Redeploy ArgoCD and Apps
-
-Since this is a cluster recreation, we need to re-bootstrap:
-
-```bash
-# On indri - apply secrets first
-op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
-
-# Create repo secret for ArgoCD
-PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n'
-kubectl create namespace argocd
-kubectl create secret generic repo-forge -n argocd \
-  --from-literal=type=git \
-  --from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \
-  --from-literal=insecure=true \
-  --from-literal=sshPrivateKey="$PRIV_KEY"
-kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository
-
-# Bootstrap operators
-kubectl create namespace tailscale
-kubectl apply -k argocd/manifests/tailscale-operator/
-kubectl apply -k argocd/manifests/argocd/
-
-# Wait for ArgoCD
-kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
-
-# Login and sync apps
-argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
-argocd app sync apps
-argocd app sync tailscale-operator
-argocd app sync cloudnative-pg
-argocd app sync blumeops-pg
-argocd app sync grafana
-argocd app sync grafana-config
-argocd app sync miniflux
-argocd app sync devpi
-```
-
-### 10. Verify All Services
-
-```bash
-mise run indri-services-check
-argocd app list
-kubectl get pods --all-namespaces
-```
-
---
-
-## Volume Mounts for P6 (Kiwix/Transmission)
-
-With the docker driver, volume mounts work differently than QEMU2:
-
-**Option A: Docker Desktop File Sharing + hostPath**
-1. Mount sifaka NFS share on indri: `/Volumes/torrents`
-2. Add `/Volumes/torrents` to Docker Desktop file sharing
-3. Pods use hostPath pointing to that path
-
-**Option B: NFS directly from pods**
- Docker containers can make NFS mounts (unlike podman's rootless containers)
- May need to test if sifaka allows connections from the Docker network
-
-This will be fully tested in Phase 6.
-
---
-
-## Cleanup
-
-After successful migration:
-
-1. **Remove QEMU2 artifacts:**
+3. **Docker minikube cluster created**:
   ```bash
-   brew uninstall qemu socket_vmnet
+   minikube start \
+     --driver=docker \
+     --container-runtime=docker \
+     --cpus=6 \
+     --memory=11264 \
+     --disk-size=200g \
+     --apiserver-names=k8s.tail8d86e.ts.net,indri \
+     --apiserver-port=6443 \
+     --listen-address=0.0.0.0
+   ```
+   Note: Memory set to 11264MB (11GB) to leave headroom for Docker Desktop overhead.
+
+4. **Tailscale serve configured** for k8s API:
+   - API server on localhost:50820 (port is dynamic with docker driver)
+   - `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:50820`
+
+5. **Remote kubectl access working** from gilbert:
+   - Created `mise-tasks/ensure-minikube-indri-kubectl-config` script
+   - Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml`
+   - `kubectl --context=minikube-indri get nodes` works
+
+6. **Ansible roles updated**:
+   - `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet
+   - `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port)
+   - Containerd registry mirrors configured for zot pull-through cache
+
+7. **QEMU2 artifacts cleaned up**:
+   - Stopped socket_vmnet service
+   - Removed NFS LaunchDaemon
+   - Removed minikube mount LaunchAgent
+   - kubectl still works after cleanup
+
+### Remaining 📋
+
+1. **Redeploy ArgoCD and apps** - bootstrap the cluster with:
+   ```bash
+   # On indri - apply secrets first
+   op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
+
+   # Create repo secret for ArgoCD
+   PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n'
+   kubectl create namespace argocd
+   kubectl create secret generic repo-forge -n argocd \
+     --from-literal=type=git \
+     --from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \
+     --from-literal=insecure=true \
+     --from-literal=sshPrivateKey="$PRIV_KEY"
+   kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository
+
+   # Bootstrap operators
+   kubectl create namespace tailscale
+   kubectl apply -k argocd/manifests/tailscale-operator/
+   kubectl apply -k argocd/manifests/argocd/
+
+   # Wait for ArgoCD
+   kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
+
+   # Login and sync apps
+   argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
+   argocd app sync apps
+   argocd app sync tailscale-operator
+   argocd app sync cloudnative-pg
+   argocd app sync blumeops-pg
+   argocd app sync grafana
+   argocd app sync grafana-config
+   argocd app sync miniflux
+   argocd app sync devpi
   ```

-2. **Remove podman if no longer needed:**
-   ```bash
-   podman machine stop
-   podman machine rm
-   brew uninstall podman
-   ```
+2. **Verify all services** with `mise run indri-services-check`
+
+3. **Configure containerd registry mirrors** (will be done by ansible on next provision)
+
+---
+
+## Technical Notes
+
+### API Server Port
+
+With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container. Current port: 50820.
+
+The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly.
+
+### Registry Mirror Configuration
+
+Containerd uses `/etc/containerd/certs.d/<registry>/hosts.toml` files:
+
+```toml
+# /etc/containerd/certs.d/docker.io/hosts.toml
+server = "https://registry-1.docker.io"
+
+[host."http://host.minikube.internal:5050"]
+  capabilities = ["pull", "resolve"]
+  skip_verify = true
+```
+
+The ansible role configures mirrors for:
+- `registry.tail8d86e.ts.net` (private images)
+- `docker.io`
+- `ghcr.io`
+- `quay.io`
+
+### Volume Mounts for P6 (Kiwix/Transmission)
+
+With the docker driver, volume mounts work differently than podman or qemu2. Here's the analysis:
+
+**Current Network State:**
+- Minikube container is on Docker network `192.168.49.0/24`
+- Sifaka NFS exports `/volume1/torrents` to:
+  - `192.168.105.0/24` (old qemu2 VM network - no longer used)
+  - `100.64.0.0/10` (Tailscale CGNAT range)
+- Minikube can resolve `sifaka` (192.168.1.203) but can't reach it (100% packet loss due to Docker network isolation)
+
+**Option A: hostPath via Docker Desktop File Sharing** ⭐ RECOMMENDED
+1. Mount sifaka NFS share on indri macOS: `mount -t nfs sifaka:/volume1/torrents /Volumes/torrents`
+2. Docker Desktop file sharing exposes `/Volumes` into the Docker VM
+3. Pods use hostPath to access `/Volumes/torrents`
+
+Pros:
+- Simplest approach, uses native Docker file sharing
+- No network reconfiguration needed on sifaka
+- Path is stable and predictable
+
+Cons:
+- Requires persistent NFS mount on indri (LaunchDaemon)
+- File sharing performance may be slower than direct NFS
+
+Implementation:
+```bash
+# Manual mount test
+ssh indri 'sudo mkdir -p /Volumes/torrents && sudo mount -t nfs -o resvport,rw sifaka:/volume1/torrents /Volumes/torrents'
+
+# Verify Docker can see it
+ssh indri 'docker run --rm -v /Volumes/torrents:/data alpine ls /data'
+
+# Pod manifest uses hostPath:
+# volumes:
+#   - name: torrents
+#     hostPath:
+#       path: /Volumes/torrents
+#       type: Directory
+```
+
+**Option B: Update sifaka NFS exports for Docker network**
+1. Add `192.168.49.0/24` to sifaka's NFS exports
+2. Pods mount NFS directly using kubernetes NFS volume type
+
+Cons:
+- Docker network might change (though `192.168.49.x` seems stable for minikube)
+- Requires sifaka configuration change
+- NFS mount from inside container may have permission issues
+
+**Option C: Tailscale sidecar for NFS access**
+1. Pods include a Tailscale sidecar that joins the tailnet
+2. Mount NFS via Tailscale IP (sifaka is at 100.x.x.x)
+
+Cons:
+- Complex setup with sidecar containers
+- Each pod needs Tailscale auth
+- Overkill for this use case
+
+**Recommendation for P6:**
+Use **Option A** (hostPath via Docker Desktop file sharing). It's the simplest and most reliable approach. We'll need a LaunchDaemon for the NFS mount, but it's straightforward:
+
+```xml
+<!-- /Library/LaunchDaemons/com.blumeops.nfs-torrents.plist -->
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>com.blumeops.nfs-torrents</string>
+    <key>ProgramArguments</key>
+    <array>
+        <string>/sbin/mount</string>
+        <string>-t</string>
+        <string>nfs</string>
+        <string>-o</string>
+        <string>resvport,rw</string>
+        <string>sifaka:/volume1/torrents</string>
+        <string>/Volumes/torrents</string>
+    </array>
+    <key>RunAtLoad</key>
+    <true/>
+</dict>
+</plist>
+```
+
+This is simpler than the qemu2 approach because there's no intermediate `minikube mount` step - Docker Desktop handles the path passthrough automatically.

 ---

 ## Verification Checklist

- [ ] Docker Desktop installed and running on indri
- [ ] QEMU2 minikube deleted
- [ ] Docker minikube running
- [ ] API server accessible on localhost:6443
- [ ] Tailscale serve configured for svc:k8s → localhost:6443
- [ ] Remote kubectl access working from gilbert
+- [x] Docker Desktop installed and running on indri
+- [x] QEMU2 minikube deleted
+- [x] Docker minikube running (6 CPUs, 11GB RAM)
+- [x] API server accessible on localhost:50820
+- [x] Tailscale serve configured for svc:k8s → localhost:50820
+- [x] Remote kubectl access working from gilbert
+- [x] Ansible roles updated for docker driver
+- [x] socket_vmnet stopped
 - [ ] ArgoCD redeployed and synced
 - [ ] All existing apps healthy (grafana, miniflux, devpi, etc.)
 - [ ] PostgreSQL cluster healthy
+- [ ] Containerd registry mirrors configured
 - [ ] `mise run indri-services-check` passes

 ---
@ -312,29 +275,3 @@ If Docker driver doesn't work:
 1. Delete Docker minikube: `minikube delete`
 2. Recreate QEMU2 cluster (restore old ansible config from git)
 3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl
-
---
-
-## Notes
-
- Docker Desktop has resource overhead but provides better macOS integration
- The docker driver is more widely used and tested than qemu2
- File sharing permissions may need adjustment in Docker Desktop settings
- First cluster start may be slow as Docker pulls the minikube base image
-
-## Implementation Notes (2026-01-21)
-
-### QEMU2 Cleanup Done
-
-Removed from indri:
- `/Library/LaunchDaemons/com.blumeops.nfs-torrents.plist` - NFS mount daemon
- `~/Library/LaunchAgents/com.blumeops.minikube-mount.plist` - minikube mount agent
- Unmounted `/Volumes/torrents-nfs` NFS mount
- Removed `/Volumes/torrents-nfs` mount point
-
-### Previous QEMU2 Issues
-
-The QEMU2 migration partially worked but had a critical issue:
- Volume mounts worked via NFS → indri → minikube mount chain
- But Tailscale TCP proxy to VM IP (192.168.105.2:6443) failed with TLS timeout
- Root cause unknown - TCP connected but TLS handshake never completed