From 816443bfb58cdceaecc9654a7ed0194465f7137c Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Tue, 20 Jan 2026 20:48:35 -0800 Subject: [PATCH] Document P6 blocker and add P5.1 QEMU2 migration plan P6 (Kiwix/Transmission) is blocked because podman driver cannot mount external volumes (NFS, SMB, hostPath all fail due to missing CAP_SYS_ADMIN). Changes: - Add P5.1 plan to migrate minikube from podman to qemu2 driver - Update P6 with blocker documentation and reference to feature branch - Update overview with new phase and corrected driver info - Mark P1-P5 as complete in overview Implementation branch: feature/p6-kiwix-transmission (manifests complete, untested) Co-Authored-By: Claude Opus 4.5 --- plans/k8s-migration/00_overview.md | 28 +-- plans/k8s-migration/P5.1_qemu2_migration.md | 235 ++++++++++++++++++++ plans/k8s-migration/P6_kiwix.md | 42 +++- 3 files changed, 283 insertions(+), 22 deletions(-) create mode 100644 plans/k8s-migration/P5.1_qemu2_migration.md diff --git a/plans/k8s-migration/00_overview.md b/plans/k8s-migration/00_overview.md index 514206b..643122c 100644 --- a/plans/k8s-migration/00_overview.md +++ b/plans/k8s-migration/00_overview.md @@ -7,12 +7,13 @@ This plan details a phased migration of blumeops services from direct hosting on | Phase | Name | Status | Description | |-------|------|--------|-------------| | 0 | [Foundation](P0_foundation.complete.md) | Complete | Container registry + minikube cluster | -| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | In Progress | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster | -| 2 | [Grafana](P2_grafana.md) | Pending | Migrate Grafana (pilot) via ArgoCD | -| 3 | [PostgreSQL](P3_postgresql.md) | Pending | Data migration to k8s PostgreSQL | -| 4 | [Miniflux](P4_miniflux.md) | Pending | Migrate Miniflux via ArgoCD | -| 5 | [devpi](P5_devpi.md) | Pending | Migrate devpi via ArgoCD | -| 6 | [Kiwix](P6_kiwix.md) | Pending | Migrate Kiwix via ArgoCD | +| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | Complete | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster | +| 2 | [Grafana](P2_grafana.complete.md) | Complete | Migrate Grafana (pilot) via ArgoCD | +| 3 | [PostgreSQL](P3_postgresql.complete.md) | Complete | Data migration to k8s PostgreSQL | +| 4 | [Miniflux](P4_miniflux.complete.md) | Complete | Migrate Miniflux via ArgoCD | +| 5 | [devpi](P5_devpi.complete.md) | Complete | Migrate devpi via ArgoCD | +| 5.1 | [QEMU2 Migration](P5.1_qemu2_migration.md) | Pending | Switch minikube from podman to qemu2 driver | +| 6 | [Kiwix](P6_kiwix.md) | Blocked | Migrate Kiwix + Transmission via ArgoCD (blocked on P5.1) | | 7 | [Forgejo](P7_forgejo.md) | Pending | Migrate Forgejo (highest risk) via ArgoCD | | 8 | [Woodpecker](P8_woodpecker.md) | Pending | Deploy CI/CD via ArgoCD | | 9 | [Cleanup](P9_cleanup.md) | Pending | Remove deprecated services | @@ -28,13 +29,13 @@ This plan details a phased migration of blumeops services from direct hosting on | **Borgmatic** | Backup system | | **Grafana-alloy** | Metrics/logs collector on host | | **Plex** | Until Jellyfin replacement | -| **Transmission** | Downloads for kiwix ZIM files | ### Services Moving to K8s | Service | Complexity | Dependencies | |---------|------------|--------------| | Grafana | LOW | Phase 1 | -| Kiwix | LOW | Phase 1 | +| Kiwix | MEDIUM | Phase 5.1 (QEMU2), shared storage | +| Transmission | MEDIUM | Phase 5.1 (QEMU2), shared storage | | Miniflux | MEDIUM | PostgreSQL | | devpi | MEDIUM | Registry | | PostgreSQL | HIGH | Phase 1 | @@ -51,11 +52,12 @@ This plan details a phased migration of blumeops services from direct hosting on - Config: `~/.config/zot/config.json` - Data: `~/zot/` -### Minikube Driver: Podman -- Rootless containers for better security -- Lighter than full VM (QEMU) -- Uses existing container ecosystem -- `minikube start --driver=podman --container-runtime=cri-o` +### Minikube Driver: QEMU2 (migrating from Podman) +- **Original choice (Podman)** proved unable to mount external volumes (NFS, SMB, hostPath) +- Podman's rootless containers lack CAP_SYS_ADMIN for filesystem mounts +- **QEMU2** creates an actual VM with full kernel capabilities +- Phase 5.1 handles the migration from podman to qemu2 +- `minikube start --driver=qemu2 --container-runtime=containerd` ### PostgreSQL: CloudNativePG Operator - Production-grade operator diff --git a/plans/k8s-migration/P5.1_qemu2_migration.md b/plans/k8s-migration/P5.1_qemu2_migration.md new file mode 100644 index 0000000..73b47b8 --- /dev/null +++ b/plans/k8s-migration/P5.1_qemu2_migration.md @@ -0,0 +1,235 @@ +# Phase 5.1: Migrate Minikube from Podman to QEMU2 Driver + +**Goal**: Replace the podman driver with qemu2 to enable proper volume mounts (hostPath, NFS, SMB CSI) + +**Status**: Planning + +**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete + +--- + +## Background + +During Phase 6 (Kiwix/Transmission migration), we discovered that the **podman driver has fundamental limitations** that prevent mounting external volumes: + +1. **SMB CSI driver fails** with "Operation not permitted" - the rootless container lacks kernel-level mount capabilities +2. **`minikube mount` fails** - 9p mount gets "permission denied" inside the podman VM +3. **hostPath volumes** only work for paths inside the minikube container, not the macOS host + +These are documented limitations of the podman driver, which is labeled "experimental" in the [minikube documentation](https://minikube.sigs.k8s.io/docs/drivers/podman/). + +### Failed P6 Attempt + +Branch `feature/p6-kiwix-transmission` contains the P6 implementation that was blocked by these issues. The manifests are complete and tested, but couldn't mount the torrents volume. + +**What was tried:** +- NFS volume mounts - failed due to missing CAP_SYS_ADMIN in podman container +- SMB CSI driver (v1.17.0) - mount fails with EPERM (same root cause) +- `minikube mount /Volumes/torrents:/Volumes/torrents` - 9p mount permission denied +- hostPath PV pointing to `/Volumes/torrents` - path doesn't exist inside minikube container +- Installing cifs-utils in minikube VM - still fails at kernel level + +All of these failures trace back to the same root cause: the podman driver runs minikube in a rootless container that lacks the kernel capabilities required for filesystem mounts. + +### Why QEMU2? + +Multiple sources recommend QEMU2 as the best driver for Apple Silicon Macs: + +> "Qemu emulator is the best option to run a Kubernetes Cluster using minikube on MAC arm64-based systems without any issues." +> — [DevOpsCube](https://devopscube.com/minikube-mac/) + +QEMU2 creates an actual VM (not a container), which has: +- Full kernel capabilities for mounts +- Proper 9p/virtio filesystem support +- Native NFS client support + +--- + +## Plan + +### 1. Export Current State + +Before destroying the cluster, capture the current state: + +```bash +# List all ArgoCD apps and their sync status +argocd app list + +# Backup any runtime state that matters (should be minimal - everything is in git) +kubectl --context=minikube-indri get all --all-namespaces -o yaml > /tmp/k8s-backup.yaml +``` + +### 2. Stop and Delete Podman Minikube + +```bash +# Stop the cluster +minikube stop + +# Delete the cluster and all data +minikube delete + +# Verify podman VM is cleaned up +podman machine list +``` + +### 3. Update Ansible Roles for QEMU2 + +The installation must be orchestrated via ansible, following the existing patterns for `podman` and `minikube` roles. + +**Changes needed:** + +1. **Update `ansible/roles/minikube/` role:** + - Change driver from `podman` to `qemu2` + - Add QEMU as a dependency (via Brewfile or role) + - Optionally add socket_vmnet for full networking support + - Update any driver-specific configuration + +2. **Update `Brewfile`:** + ```ruby + brew "qemu" + # Optional: brew "socket_vmnet" + ``` + +3. **Update minikube start command in role:** + ```bash + minikube start \ + --driver=qemu2 \ + --cpus=4 \ + --memory=8192 \ + --disk-size=50g \ + --container-runtime=containerd \ + --kubernetes-version=stable + ``` + +4. **Remove or update podman role** (may still be useful for container builds) + +### 4. Run Ansible to Create QEMU2 Cluster + +```bash +# Run the updated minikube role +mise run provision-indri -- --tags minikube + +# Verify cluster is running +minikube status +kubectl get nodes +``` + +### 5. Configure Host Path Access + +With QEMU2, we need to either: + +**Option A: Use `minikube mount` (9p)** +```bash +# Start persistent mount (run in background or via launchd) +minikube mount /Volumes/torrents:/Volumes/torrents & +``` + +**Option B: Use NFS export from macOS** +```bash +# Add NFS export on macOS +echo "/Volumes/torrents -alldirs -mapall=$(id -u):$(id -g) -network 192.168.0.0 -mask 255.255.0.0" | sudo tee -a /etc/exports +sudo nfsd restart + +# In k8s, use NFS volume type directly +``` + +### 6. Test Volume Mount with Test Pod + +Create a test pod that mounts the torrents volume: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: volume-test + namespace: default +spec: + containers: + - name: test + image: busybox + command: ["sh", "-c", "ls -la /data && sleep 3600"] + volumeMounts: + - name: torrents + mountPath: /data + volumes: + - name: torrents + hostPath: + path: /Volumes/torrents + type: Directory +``` + +Verify: +```bash +kubectl apply -f volume-test.yaml +kubectl logs volume-test +kubectl exec volume-test -- ls -la /data +``` + +### 7. Redeploy ArgoCD and Existing Apps + +```bash +# Re-add ArgoCD +kubectl create namespace argocd +kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml + +# Wait for ArgoCD to be ready +kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s + +# Re-configure ArgoCD (repo credentials, etc.) +# ... follow P1 setup steps ... + +# Sync all apps +argocd app sync apps +``` + +### 8. Verify All Services + +```bash +# Run health check +mise run indri-services-check + +# Verify each k8s service +argocd app list +kubectl get pods --all-namespaces +``` + +### 9. Clean Up Test Pod + +```bash +kubectl delete pod volume-test +``` + +--- + +## Verification Checklist + +- [ ] Podman minikube deleted +- [ ] QEMU2 minikube running +- [ ] `minikube mount` or NFS working +- [ ] Test pod can read `/Volumes/torrents` +- [ ] ArgoCD redeployed and synced +- [ ] All existing apps healthy (grafana, miniflux, devpi, etc.) +- [ ] PostgreSQL cluster healthy +- [ ] Test pod deleted +- [ ] `mise run indri-services-check` passes (except intentionally offline services) + +--- + +## Rollback Plan + +If QEMU2 doesn't work: + +1. Delete QEMU2 cluster: `minikube delete` +2. Recreate podman cluster following P0/P1 steps +3. Redeploy apps from git + +All state is in git, so cluster recreation is straightforward. + +--- + +## Notes + +- The QEMU2 VM will use more resources than podman (actual VM vs container) +- First boot may be slower due to VM initialization +- socket_vmnet provides better networking but requires sudo setup +- Consider creating a LaunchAgent for `minikube mount` if using that approach diff --git a/plans/k8s-migration/P6_kiwix.md b/plans/k8s-migration/P6_kiwix.md index d368f8f..eeec827 100644 --- a/plans/k8s-migration/P6_kiwix.md +++ b/plans/k8s-migration/P6_kiwix.md @@ -1,10 +1,34 @@ # Phase 6: Kiwix and Transmission Migration -**Goal**: Migrate kiwix-serve and transmission torrent daemon to k8s with SMB storage on sifaka +**Goal**: Migrate kiwix-serve and transmission torrent daemon to k8s with shared storage -**Status**: Planning +**Status**: BLOCKED - waiting for [Phase 5.1](P5.1_qemu2_migration.md) (QEMU2 migration) -**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete +**Prerequisites**: [Phase 5.1](P5.1_qemu2_migration.md) complete (minikube on QEMU2 driver) + +--- + +## Blocker: Podman Driver Volume Mount Limitations + +**First attempt branch:** `feature/p6-kiwix-transmission` + +The initial implementation was completed and tested, but **all volume mount approaches failed** due to the podman driver's rootless container limitations: + +| Approach | Result | +|----------|--------| +| NFS volume | Failed - CAP_SYS_ADMIN required for NFS mounts | +| SMB CSI driver | Failed - `mount.cifs` returns EPERM inside rootless container | +| `minikube mount` (9p) | Failed - permission denied mounting into podman VM | +| hostPath | Failed - path doesn't exist inside minikube container | + +**Root cause:** The podman driver runs minikube in a rootless container that lacks kernel capabilities for filesystem mounts. This is a [documented limitation](https://minikube.sigs.k8s.io/docs/drivers/podman/) of the experimental podman driver. + +**Solution:** Phase 5.1 migrates minikube from podman to QEMU2 driver, which creates an actual VM with full kernel capabilities. + +**What's preserved:** +- All k8s manifests in `feature/p6-kiwix-transmission` are complete and tested +- Prerequisites (SMB share, k8s-smb user, data rsync) are done +- Can retry P6 immediately after P5.1 completes --- @@ -38,14 +62,14 @@ New architecture in k8s: ## Architecture Decisions -### Storage: SMB on Sifaka +### Storage: SMB on Sifaka (or NFS after QEMU2 migration) -**Why SMB instead of NFS:** -- Minikube with podman driver lacks CAP_SYS_ADMIN required for NFS mounts -- SMB already works reliably with Synology (used for other shares) -- SMB CSI driver ([csi-driver-smb](https://github.com/kubernetes-csi/csi-driver-smb)) is well-maintained -- Supports ReadWriteMany access mode for concurrent pod access +**Note:** The original plan chose SMB over NFS, but both failed with podman driver. After QEMU2 migration, either should work. SMB is still preferred for: - Native Synology SMB support with good macOS compatibility +- ReadWriteMany access mode for concurrent pod access +- SMB CSI driver already mirrored to forge + +**Alternative after QEMU2:** NFS may be simpler with `minikube mount` or direct NFS volume type. **Storage path:** `/volume1/torrents/` on sifaka (SMB share name: `torrents`) - General-purpose torrent download directory