save some work

2026-01-21 16:03:38 -08:00 · 2026-01-21 13:27:27 -08:00 · 2026-01-21 13:27:27 -08:00
commit 5724b61fb4
2 changed files with 228 additions and 235 deletions
--- a/bin/kubectl-credential-1password
+++ b/bin/kubectl-credential-1password
@ -1,31 +0,0 @@
 #!/bin/bash
 # kubectl exec credential plugin for 1Password
 # Usage: kubectl-credential-1password <vault-id> <item-id> <cert-field> <key-field>
 #
 # Fetches client certificate and key from 1Password and outputs
 # ExecCredential JSON for kubectl authentication.
 set -euo pipefail
 VAULT_ID="$1"
 ITEM_ID="$2"
 CERT_FIELD="$3"
 KEY_FIELD="$4"
 # Fetch credentials from 1Password (strips surrounding quotes from text fields)
 CLIENT_CERT=$(op --vault "$VAULT_ID" item get "$ITEM_ID" --fields "$CERT_FIELD" | sed 's/^"//; s/"$//')
 CLIENT_KEY=$(op --vault "$VAULT_ID" item get "$ITEM_ID" --fields "$KEY_FIELD" | sed 's/^"//; s/"$//')
 # Output ExecCredential JSON
 # Note: jq is used to properly escape the PEM data for JSON
 jq -n \
  --arg cert "$CLIENT_CERT" \
  --arg key "$CLIENT_KEY" \
  '{
    "apiVersion": "client.authentication.k8s.io/v1beta1",
    "kind": "ExecCredential",
    "status": {
      "clientCertificateData": $cert,
      "clientKeyData": $key
    }
  }'
--- a/plans/k8s-migration/P5.1_qemu2_migration.md
+++ b/plans/k8s-migration/P5.1_qemu2_migration.md
@ -1,8 +1,8 @@
-# Phase 5.1: Migrate Minikube from Podman to QEMU2 Driver
+# Phase 5.1: Migrate Minikube from QEMU2 to Docker Driver
-**Goal**: Replace the podman driver with qemu2 to enable proper volume mounts (hostPath, NFS, SMB CSI)
+**Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts
-**Status**: Complete (2026-01-21)
+**Status**: In Progress (2026-01-21)
 **Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
@ -10,307 +10,331 @@
 ## Background
 ### Original Problem (Podman → QEMU2)
 During Phase 6 (Kiwix/Transmission migration), we discovered that the **podman driver has fundamental limitations** that prevent mounting external volumes:
 1. **SMB CSI driver fails** with "Operation not permitted" - the rootless container lacks kernel-level mount capabilities
 2. **`minikube mount` fails** - 9p mount gets "permission denied" inside the podman VM
 3. **hostPath volumes** only work for paths inside the minikube container, not the macOS host
-These are documented limitations of the podman driver, which is labeled "experimental" in the [minikube documentation](https://minikube.sigs.k8s.io/docs/drivers/podman/).
+We migrated to QEMU2 to get a full VM with kernel capabilities.
-### Failed P6 Attempt
+### New Problem (QEMU2 → Docker)
-Branch `feature/p6-kiwix-transmission` contains the P6 implementation that was blocked by these issues. The manifests are complete and tested, but couldn't mount the torrents volume.
+The QEMU2 driver introduced a **new problem**: the Kubernetes API server is inside the VM at `192.168.105.2:6443`, and Tailscale's TCP proxy cannot forward to it properly:
-**What was tried:**
+- TCP connections succeed (nc -zv works)
- NFS volume mounts - failed due to missing CAP_SYS_ADMIN in podman container
+- TLS handshake times out
- SMB CSI driver (v1.17.0) - mount fails with EPERM (same root cause)
+- Root cause unknown, but likely related to Tailscale serve's handling of non-localhost upstreams
 - `minikube mount /Volumes/torrents:/Volumes/torrents` - 9p mount permission denied
 - hostPath PV pointing to `/Volumes/torrents` - path doesn't exist inside minikube container
 - Installing cifs-utils in minikube VM - still fails at kernel level
-All of these failures trace back to the same root cause: the podman driver runs minikube in a rootless container that lacks the kernel capabilities required for filesystem mounts.
+Additionally, the volume mount solution with QEMU2 was complex:
 - Required NFS mount from sifaka → indri
 - Then `minikube mount` to pass through to VM
 - Two LaunchAgents/LaunchDaemons for persistence
 - macOS GUI approval required for network access
-### Why QEMU2?
+### Why Docker?
-Multiple sources recommend QEMU2 as the best driver for Apple Silicon Macs:
+The **docker driver** solves both problems:
-> "Qemu emulator is the best option to run a Kubernetes Cluster using minikube on MAC arm64-based systems without any issues."
+1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` will work (like podman did)
 > — [DevOpsCube](https://devopscube.com/minikube-mac/)
-QEMU2 creates an actual VM (not a container), which has:
+2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers, and minikube (running in Docker) can use those paths via hostPath.
- Full kernel capabilities for mounts
+
- Proper 9p/virtio filesystem support
+3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver.
 - Native NFS client support
 ---
-## Prerequisites (Manual Steps)
+## Prerequisites
-### Create Synology User for Kubernetes Storage Access
+### 1. Install Docker Desktop (Manual - Before Ansible)
-Create a dedicated Synology user for k8s NFS/SMB access (do not use personal account):
+Docker Desktop requires GUI setup, so install manually first:
-On Synology DSM (Control Panel → User & Group):
+```bash
-1. Create new user: `k8s-storage`
+# On indri:
-   - Set a strong password
+brew install --cask docker-desktop
-   - No admin privileges needed
+
-   - Deny access to all applications (only needs file services)
+# Then launch Docker Desktop from /Applications
-2. Set permissions on the `torrents` share:
+# Complete the setup wizard (accept license, skip tutorial)
-   - Give `k8s-storage` user Read/Write access
+# Wait for Docker to be "Running" (green icon in menu bar)
-3. Store credentials in 1Password:
+
-   - Vault: `vg6xf6vvfmoh5hqjjhlhbeoaie` (blumeops vault)
+# Verify:
-   - Item name: `synology-k8s-storage`
+docker version
-   - Fields: `username` (k8s-storage), `password`
+docker run hello-world
 ```
 **File Sharing Configuration** (in Docker Desktop → Settings → Resources → File sharing):
 - Ensure `/Volumes` is shared (for future NFS mounts from sifaka)
 - Or add specific paths as needed for P6
 ### 2. Stop Current QEMU2 Minikube
 ```bash
 # On indri:
 minikube stop
 minikube delete
 # Verify QEMU resources are cleaned up
 ps aux | grep qemu
 ```
 ---
 ## Plan
-### 1. Export Current State
+### 1. Update Ansible Role for Docker Driver
-Before destroying the cluster, capture the current state:
+**Changes to `ansible/roles/minikube/defaults/main.yml`:**
-```bash
+```yaml
-# List all ArgoCD apps and their sync status
+# Change from:
-argocd app list
+minikube_driver: qemu2
 minikube_network: socket_vmnet
 minikube_container_runtime: containerd
-# Backup any runtime state that matters (should be minimal - everything is in git)
+# To:
-kubectl --context=minikube-indri get all --all-namespaces -o yaml > /tmp/k8s-backup.yaml
+minikube_driver: docker
 minikube_container_runtime: docker  # or containerd, both work
 ```
-### 2. Stop and Delete Podman Minikube
+**Remove from defaults:**
 - `minikube_network` (not needed for docker driver)
-```bash
+**Changes to `ansible/roles/minikube/tasks/main.yml`:**
-# Stop the cluster
+- Remove qemu installation
-minikube stop
+- Remove socket_vmnet installation and service management
 - Remove NFS mount point creation
 - Remove NFS LaunchDaemon installation
 - Remove minikube mount LaunchAgent installation
 - Keep containerd registry mirror config (adapting for docker if needed)
-# Delete the cluster and all data
+**Remove files from `ansible/roles/minikube/files/`:**
-minikube delete
+- `com.blumeops.nfs-torrents.plist`
 - `com.blumeops.minikube-mount.plist`
-# Verify podman VM is cleaned up
+**Changes to `ansible/roles/minikube/handlers/main.yml`:**
-podman machine list
+- Remove `Load NFS mount LaunchDaemon`
 - Remove `Load minikube mount LaunchAgent`
 **Add to Brewfile:**
 ```ruby
 cask "docker"  # Docker Desktop
 ```
-### 3. Update Ansible Roles for QEMU2
+### 2. Update Tailscale Serve Configuration
-The installation must be orchestrated via ansible, following the existing patterns for `podman` and `minikube` roles.
+**Changes to `ansible/roles/tailscale_serve/defaults/main.yml`:**
-**Changes needed:**
+```yaml
 # Change svc:k8s upstream from VM IP back to localhost:
 - name: svc:k8s
  tcp:
    port: 443
    upstream: tcp://localhost:PORT  # PORT will be dynamic, see below
 ```
-1. **Update `ansible/roles/minikube/` role:**
+**Note on API server port**: With the docker driver, the API server port is dynamic (assigned by minikube). We need to either:
-   - Change driver from `podman` to `qemu2`
+- Use `--apiserver-port=6443` to fix it
-   - Add QEMU as a dependency (via Brewfile or role)
+- Or query and update the config after cluster creation
   - Optionally add socket_vmnet for full networking support
   - Update any driver-specific configuration
-2. **Update `Brewfile`:**
+### 3. Create Docker Minikube Cluster
   ```ruby
   brew "qemu"
   # Optional: brew "socket_vmnet"
   ```
 3. **Update minikube start command in role:**
   ```bash
   minikube start \
     --driver=qemu2 \
     --cpus=4 \
     --memory=8192 \
     --disk-size=50g \
     --container-runtime=containerd \
     --kubernetes-version=stable
   ```
 4. **Remove or update podman role** (may still be useful for container builds)
 ### 4. Run Ansible to Create QEMU2 Cluster
 ```bash
-# Run the updated minikube role
+# On indri (after Docker Desktop is running):
-mise run provision-indri -- --tags minikube
+minikube start \
  --driver=docker \
  --cpus=6 \
  --memory=12288 \
  --disk-size=200g \
  --apiserver-names=k8s.tail8d86e.ts.net,indri \
  --apiserver-port=6443 \
  --listen-address=0.0.0.0
-# Verify cluster is running
+# Verify cluster
 minikube status
 kubectl get nodes
 ```
-### 5. Configure Host Path Access
+### 4. Verify API Server is on Localhost
 With QEMU2, we need to either:
 **Option A: Use `minikube mount` (9p)**
 ```bash
 # Start persistent mount (run in background or via launchd)
 minikube mount /Volumes/torrents:/Volumes/torrents &
 ```
 **Option B: Use NFS export from macOS**
 ```bash
 # Add NFS export on macOS
 echo "/Volumes/torrents -alldirs -mapall=$(id -u):$(id -g) -network 192.168.0.0 -mask 255.255.0.0" | sudo tee -a /etc/exports
 sudo nfsd restart
 # In k8s, use NFS volume type directly
 ```
 ### 6. Test Volume Mount with Test Pod
 Create a test pod that mounts the torrents volume:
 ```yaml
 apiVersion: v1
 kind: Pod
 metadata:
  name: volume-test
  namespace: default
 spec:
  containers:
    - name: test
      image: busybox
      command: ["sh", "-c", "ls -la /data && sleep 3600"]
      volumeMounts:
        - name: torrents
          mountPath: /data
  volumes:
    - name: torrents
      hostPath:
        path: /Volumes/torrents
        type: Directory
 ```
 Verify:
 ```bash
 kubectl apply -f volume-test.yaml
 kubectl logs volume-test
 kubectl exec volume-test -- ls -la /data
 ```
 ### 7. Redeploy ArgoCD and Existing Apps
 ```bash
-# Re-add ArgoCD
+# Check what port the API server is on
 kubectl config view --minify -o jsonpath="{.clusters[0].cluster.server}"
 # Should show https://127.0.0.1:PORT or similar
 # Verify local access works
 curl -k https://localhost:6443/healthz
 # Should return "ok"
 ```
 ### 5. Update 1Password Credentials
 After cluster recreation, update the credentials in 1Password:
 ```bash
 # On indri, get the new certificates:
 cat ~/.minikube/profiles/minikube/client.crt
 cat ~/.minikube/profiles/minikube/client.key
 cat ~/.minikube/ca.crt
 ```
 Update in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `3jo4f2hnzvwfmamudfsbbbec7e`).
 ### 6. Update Kubeconfig on Gilbert
 ```bash
 # Fetch new CA cert from 1Password
 op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 3jo4f2hnzvwfmamudfsbbbec7e --fields ca-cert | sed 's/^"//; s/"$//' > ~/.kube/minikube-indri/ca.crt
 ```
 ### 7. Configure Tailscale Serve for K8s
 ```bash
 # On indri:
 tailscale serve --service="svc:k8s" --tcp=443 tcp://localhost:6443
 ```
 ### 8. Verify Remote Access
 ```bash
 # From gilbert:
 curl -k --connect-timeout 5 https://k8s.tail8d86e.ts.net/healthz
 # Should return "ok"
 kubectl --context=minikube-indri get nodes
 # Should show the minikube node
 ```
 ### 9. Redeploy ArgoCD and Apps
 Since this is a cluster recreation, we need to re-bootstrap:
 ```bash
 # On indri - apply secrets first
 op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
 # Create repo secret for ArgoCD
 PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n'
 kubectl create namespace argocd
-kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
+kubectl create secret generic repo-forge -n argocd \
  --from-literal=type=git \
  --from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \
  --from-literal=insecure=true \
  --from-literal=sshPrivateKey="$PRIV_KEY"
 kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository
-# Wait for ArgoCD to be ready
+# Bootstrap operators
 kubectl create namespace tailscale
 kubectl apply -k argocd/manifests/tailscale-operator/
 kubectl apply -k argocd/manifests/argocd/
 # Wait for ArgoCD
 kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
-# Re-configure ArgoCD (repo credentials, etc.)
+# Login and sync apps
-# ... follow P1 setup steps ...
+argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
 # Sync all apps
 argocd app sync apps
 argocd app sync tailscale-operator
 argocd app sync cloudnative-pg
 argocd app sync blumeops-pg
 argocd app sync grafana
 argocd app sync grafana-config
 argocd app sync miniflux
 argocd app sync devpi
 ```
-### 8. Verify All Services
+### 10. Verify All Services
 ```bash
 # Run health check
 mise run indri-services-check
 # Verify each k8s service
 argocd app list
 kubectl get pods --all-namespaces
 ```
-### 9. Clean Up Test Pod
+---
-```bash
+## Volume Mounts for P6 (Kiwix/Transmission)
-kubectl delete pod volume-test
+
-```
+With the docker driver, volume mounts work differently than QEMU2:
 **Option A: Docker Desktop File Sharing + hostPath**
 1. Mount sifaka NFS share on indri: `/Volumes/torrents`
 2. Add `/Volumes/torrents` to Docker Desktop file sharing
 3. Pods use hostPath pointing to that path
 **Option B: NFS directly from pods**
 - Docker containers can make NFS mounts (unlike podman's rootless containers)
 - May need to test if sifaka allows connections from the Docker network
 This will be fully tested in Phase 6.
 ---
 ## Cleanup
 After successful migration:
 1. **Remove QEMU2 artifacts:**
   ```bash
   brew uninstall qemu socket_vmnet
   ```
 2. **Remove podman if no longer needed:**
   ```bash
   podman machine stop
   podman machine rm
   brew uninstall podman
   ```
 ---
 ## Verification Checklist
- [ ] Podman minikube deleted
+- [ ] Docker Desktop installed and running on indri
- [ ] QEMU2 minikube running
+- [ ] QEMU2 minikube deleted
- [ ] `minikube mount` or NFS working
+- [ ] Docker minikube running
- [ ] Test pod can read `/Volumes/torrents`
+- [ ] API server accessible on localhost:6443
 - [ ] Tailscale serve configured for svc:k8s → localhost:6443
 - [ ] Remote kubectl access working from gilbert
 - [ ] ArgoCD redeployed and synced
 - [ ] All existing apps healthy (grafana, miniflux, devpi, etc.)
 - [ ] PostgreSQL cluster healthy
- [ ] Test pod deleted
+- [ ] `mise run indri-services-check` passes
 - [ ] `mise run indri-services-check` passes (except intentionally offline services)
 ---
 ## Rollback Plan
-If QEMU2 doesn't work:
+If Docker driver doesn't work:
-1. Delete QEMU2 cluster: `minikube delete`
+1. Delete Docker minikube: `minikube delete`
-2. Recreate podman cluster following P0/P1 steps
+2. Recreate QEMU2 cluster (restore old ansible config from git)
-3. Redeploy apps from git
+3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl
 All state is in git, so cluster recreation is straightforward.
 ---
 ## Notes
- The QEMU2 VM will use more resources than podman (actual VM vs container)
+- Docker Desktop has resource overhead but provides better macOS integration
- First boot may be slower due to VM initialization
+- The docker driver is more widely used and tested than qemu2
- socket_vmnet provides better networking but requires sudo setup
+- File sharing permissions may need adjustment in Docker Desktop settings
- Consider creating a LaunchAgent for `minikube mount` if using that approach
+- First cluster start may be slow as Docker pulls the minikube base image
 ## Implementation Notes (2026-01-21)
-### What Actually Worked
+### QEMU2 Cleanup Done
-**Volume mounting solution**: NFS mount on indri (host) + `minikube mount` to pass through to VM
+Removed from indri:
 - `/Library/LaunchDaemons/com.blumeops.nfs-torrents.plist` - NFS mount daemon
 - `~/Library/LaunchAgents/com.blumeops.minikube-mount.plist` - minikube mount agent
 - Unmounted `/Volumes/torrents-nfs` NFS mount
 - Removed `/Volumes/torrents-nfs` mount point
-1. Mount sifaka's torrents share on indri via NFS: `sudo mount -t nfs sifaka:/volume1/torrents /Volumes/torrents-nfs`
+### Previous QEMU2 Issues
 2. Run `minikube mount /Volumes/torrents-nfs:/mnt/torrents` from indri console (GUI session required due to macOS security)
 3. Pods can access `/mnt/torrents` via hostPath
-**Why NFS from inside VM didn't work**: Despite allowing 192.168.105.0/24 in Synology NFS settings, the VM got "access denied". Root cause unknown - may be Synology NFS quirk.
+The QEMU2 migration partially worked but had a critical issue:
-
+- Volume mounts worked via NFS → indri → minikube mount chain
-**Why SMB didn't work**: The minikube containerd kernel doesn't include the CIFS module.
+- But Tailscale TCP proxy to VM IP (192.168.105.2:6443) failed with TLS timeout
-
+- Root cause unknown - TCP connected but TLS handshake never completed
 ### Zot Registry Mirror (Implemented)
 The ansible role now configures containerd to redirect `registry.tail8d86e.ts.net` to `host.minikube.internal:5050`:
 - Adds hosts file entry in VM
 - Creates containerd registry mirror config at `/etc/containerd/certs.d/registry.tail8d86e.ts.net/hosts.toml`
 ### Passwordless Sudo on Indri
 Configured passwordless sudo for `erichblume` user to allow ansible `become: true` tasks to run without `-K` flag:
 ```bash
 echo "erichblume ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/erichblume
 ```
 This is acceptable given the security model - tailnet access is the trust boundary.
 ### macOS Network Permission
 The first time `minikube mount` runs, macOS will show a GUI popup asking to allow network access. This must be approved from the indri console (not SSH). Once approved, subsequent runs won't prompt.
 ### Manual Steps Still Required
 These steps cannot be fully automated via ansible and must be done manually:
 1. **socket_vmnet service (once per reboot)**:
   ```bash
   # On indri console:
   sudo brew services start socket_vmnet
   ```
 2. **NFS mount on indri (once per reboot)**:
   ```bash
   # On indri console:
   sudo mount -t nfs sifaka:/volume1/torrents /Volumes/torrents-nfs
   ```
 3. **minikube mount (must run in GUI session)**:
   ```bash
   # On indri console (not SSH - requires GUI session for macOS security):
   minikube mount /Volumes/torrents-nfs:/mnt/torrents
   # Keep this terminal open - the mount dies if process exits
   ```
 ### TODO: LaunchAgent for Persistent Mount
 Create a LaunchAgent to run `minikube mount` at login. Challenge: must run in GUI session context for macOS security model.