Document P6 blocker and add P5.1 QEMU2 migration plan (#37)

## Summary
- Document P6 (Kiwix/Transmission) blocker: podman driver cannot mount external volumes
- Add P5.1 plan to migrate minikube from podman to QEMU2 driver
- Update overview with corrected phase statuses and driver information

## Background

P6 implementation (`feature/p6-kiwix-transmission`) was completed but blocked because **all volume mount approaches failed** with the podman driver:

| Approach | Result |
|----------|--------|
| NFS volume | Failed - CAP_SYS_ADMIN required |
| SMB CSI driver | Failed - EPERM in rootless container |
| `minikube mount` (9p) | Failed - permission denied |
| hostPath | Failed - path doesn't exist in container |

Root cause: Podman driver runs minikube in a rootless container lacking kernel capabilities for filesystem mounts.

## What's Next

1. Merge this documentation PR
2. Execute P5.1 (QEMU2 migration) in a fresh session
3. Retry P6 with the QEMU2 driver

## Deployment and Testing
- [x] No deployment needed - documentation only
- [x] ArgoCD apps reset to main
- [x] Cluster healthy (except kiwix/transmission intentionally offline)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/37
This commit is contained in:
Erich Blume 2026-01-20 20:49:48 -08:00
commit 7b60cca31e
3 changed files with 283 additions and 22 deletions

View file

@ -7,12 +7,13 @@ This plan details a phased migration of blumeops services from direct hosting on
| Phase | Name | Status | Description |
|-------|------|--------|-------------|
| 0 | [Foundation](P0_foundation.complete.md) | Complete | Container registry + minikube cluster |
| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | In Progress | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster |
| 2 | [Grafana](P2_grafana.md) | Pending | Migrate Grafana (pilot) via ArgoCD |
| 3 | [PostgreSQL](P3_postgresql.md) | Pending | Data migration to k8s PostgreSQL |
| 4 | [Miniflux](P4_miniflux.md) | Pending | Migrate Miniflux via ArgoCD |
| 5 | [devpi](P5_devpi.md) | Pending | Migrate devpi via ArgoCD |
| 6 | [Kiwix](P6_kiwix.md) | Pending | Migrate Kiwix via ArgoCD |
| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | Complete | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster |
| 2 | [Grafana](P2_grafana.complete.md) | Complete | Migrate Grafana (pilot) via ArgoCD |
| 3 | [PostgreSQL](P3_postgresql.complete.md) | Complete | Data migration to k8s PostgreSQL |
| 4 | [Miniflux](P4_miniflux.complete.md) | Complete | Migrate Miniflux via ArgoCD |
| 5 | [devpi](P5_devpi.complete.md) | Complete | Migrate devpi via ArgoCD |
| 5.1 | [QEMU2 Migration](P5.1_qemu2_migration.md) | Pending | Switch minikube from podman to qemu2 driver |
| 6 | [Kiwix](P6_kiwix.md) | Blocked | Migrate Kiwix + Transmission via ArgoCD (blocked on P5.1) |
| 7 | [Forgejo](P7_forgejo.md) | Pending | Migrate Forgejo (highest risk) via ArgoCD |
| 8 | [Woodpecker](P8_woodpecker.md) | Pending | Deploy CI/CD via ArgoCD |
| 9 | [Cleanup](P9_cleanup.md) | Pending | Remove deprecated services |
@ -28,13 +29,13 @@ This plan details a phased migration of blumeops services from direct hosting on
| **Borgmatic** | Backup system |
| **Grafana-alloy** | Metrics/logs collector on host |
| **Plex** | Until Jellyfin replacement |
| **Transmission** | Downloads for kiwix ZIM files |
### Services Moving to K8s
| Service | Complexity | Dependencies |
|---------|------------|--------------|
| Grafana | LOW | Phase 1 |
| Kiwix | LOW | Phase 1 |
| Kiwix | MEDIUM | Phase 5.1 (QEMU2), shared storage |
| Transmission | MEDIUM | Phase 5.1 (QEMU2), shared storage |
| Miniflux | MEDIUM | PostgreSQL |
| devpi | MEDIUM | Registry |
| PostgreSQL | HIGH | Phase 1 |
@ -51,11 +52,12 @@ This plan details a phased migration of blumeops services from direct hosting on
- Config: `~/.config/zot/config.json`
- Data: `~/zot/`
### Minikube Driver: Podman
- Rootless containers for better security
- Lighter than full VM (QEMU)
- Uses existing container ecosystem
- `minikube start --driver=podman --container-runtime=cri-o`
### Minikube Driver: QEMU2 (migrating from Podman)
- **Original choice (Podman)** proved unable to mount external volumes (NFS, SMB, hostPath)
- Podman's rootless containers lack CAP_SYS_ADMIN for filesystem mounts
- **QEMU2** creates an actual VM with full kernel capabilities
- Phase 5.1 handles the migration from podman to qemu2
- `minikube start --driver=qemu2 --container-runtime=containerd`
### PostgreSQL: CloudNativePG Operator
- Production-grade operator

View file

@ -0,0 +1,235 @@
# Phase 5.1: Migrate Minikube from Podman to QEMU2 Driver
**Goal**: Replace the podman driver with qemu2 to enable proper volume mounts (hostPath, NFS, SMB CSI)
**Status**: Planning
**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
---
## Background
During Phase 6 (Kiwix/Transmission migration), we discovered that the **podman driver has fundamental limitations** that prevent mounting external volumes:
1. **SMB CSI driver fails** with "Operation not permitted" - the rootless container lacks kernel-level mount capabilities
2. **`minikube mount` fails** - 9p mount gets "permission denied" inside the podman VM
3. **hostPath volumes** only work for paths inside the minikube container, not the macOS host
These are documented limitations of the podman driver, which is labeled "experimental" in the [minikube documentation](https://minikube.sigs.k8s.io/docs/drivers/podman/).
### Failed P6 Attempt
Branch `feature/p6-kiwix-transmission` contains the P6 implementation that was blocked by these issues. The manifests are complete and tested, but couldn't mount the torrents volume.
**What was tried:**
- NFS volume mounts - failed due to missing CAP_SYS_ADMIN in podman container
- SMB CSI driver (v1.17.0) - mount fails with EPERM (same root cause)
- `minikube mount /Volumes/torrents:/Volumes/torrents` - 9p mount permission denied
- hostPath PV pointing to `/Volumes/torrents` - path doesn't exist inside minikube container
- Installing cifs-utils in minikube VM - still fails at kernel level
All of these failures trace back to the same root cause: the podman driver runs minikube in a rootless container that lacks the kernel capabilities required for filesystem mounts.
### Why QEMU2?
Multiple sources recommend QEMU2 as the best driver for Apple Silicon Macs:
> "Qemu emulator is the best option to run a Kubernetes Cluster using minikube on MAC arm64-based systems without any issues."
> — [DevOpsCube](https://devopscube.com/minikube-mac/)
QEMU2 creates an actual VM (not a container), which has:
- Full kernel capabilities for mounts
- Proper 9p/virtio filesystem support
- Native NFS client support
---
## Plan
### 1. Export Current State
Before destroying the cluster, capture the current state:
```bash
# List all ArgoCD apps and their sync status
argocd app list
# Backup any runtime state that matters (should be minimal - everything is in git)
kubectl --context=minikube-indri get all --all-namespaces -o yaml > /tmp/k8s-backup.yaml
```
### 2. Stop and Delete Podman Minikube
```bash
# Stop the cluster
minikube stop
# Delete the cluster and all data
minikube delete
# Verify podman VM is cleaned up
podman machine list
```
### 3. Update Ansible Roles for QEMU2
The installation must be orchestrated via ansible, following the existing patterns for `podman` and `minikube` roles.
**Changes needed:**
1. **Update `ansible/roles/minikube/` role:**
- Change driver from `podman` to `qemu2`
- Add QEMU as a dependency (via Brewfile or role)
- Optionally add socket_vmnet for full networking support
- Update any driver-specific configuration
2. **Update `Brewfile`:**
```ruby
brew "qemu"
# Optional: brew "socket_vmnet"
```
3. **Update minikube start command in role:**
```bash
minikube start \
--driver=qemu2 \
--cpus=4 \
--memory=8192 \
--disk-size=50g \
--container-runtime=containerd \
--kubernetes-version=stable
```
4. **Remove or update podman role** (may still be useful for container builds)
### 4. Run Ansible to Create QEMU2 Cluster
```bash
# Run the updated minikube role
mise run provision-indri -- --tags minikube
# Verify cluster is running
minikube status
kubectl get nodes
```
### 5. Configure Host Path Access
With QEMU2, we need to either:
**Option A: Use `minikube mount` (9p)**
```bash
# Start persistent mount (run in background or via launchd)
minikube mount /Volumes/torrents:/Volumes/torrents &
```
**Option B: Use NFS export from macOS**
```bash
# Add NFS export on macOS
echo "/Volumes/torrents -alldirs -mapall=$(id -u):$(id -g) -network 192.168.0.0 -mask 255.255.0.0" | sudo tee -a /etc/exports
sudo nfsd restart
# In k8s, use NFS volume type directly
```
### 6. Test Volume Mount with Test Pod
Create a test pod that mounts the torrents volume:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: volume-test
namespace: default
spec:
containers:
- name: test
image: busybox
command: ["sh", "-c", "ls -la /data && sleep 3600"]
volumeMounts:
- name: torrents
mountPath: /data
volumes:
- name: torrents
hostPath:
path: /Volumes/torrents
type: Directory
```
Verify:
```bash
kubectl apply -f volume-test.yaml
kubectl logs volume-test
kubectl exec volume-test -- ls -la /data
```
### 7. Redeploy ArgoCD and Existing Apps
```bash
# Re-add ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Wait for ArgoCD to be ready
kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s
# Re-configure ArgoCD (repo credentials, etc.)
# ... follow P1 setup steps ...
# Sync all apps
argocd app sync apps
```
### 8. Verify All Services
```bash
# Run health check
mise run indri-services-check
# Verify each k8s service
argocd app list
kubectl get pods --all-namespaces
```
### 9. Clean Up Test Pod
```bash
kubectl delete pod volume-test
```
---
## Verification Checklist
- [ ] Podman minikube deleted
- [ ] QEMU2 minikube running
- [ ] `minikube mount` or NFS working
- [ ] Test pod can read `/Volumes/torrents`
- [ ] ArgoCD redeployed and synced
- [ ] All existing apps healthy (grafana, miniflux, devpi, etc.)
- [ ] PostgreSQL cluster healthy
- [ ] Test pod deleted
- [ ] `mise run indri-services-check` passes (except intentionally offline services)
---
## Rollback Plan
If QEMU2 doesn't work:
1. Delete QEMU2 cluster: `minikube delete`
2. Recreate podman cluster following P0/P1 steps
3. Redeploy apps from git
All state is in git, so cluster recreation is straightforward.
---
## Notes
- The QEMU2 VM will use more resources than podman (actual VM vs container)
- First boot may be slower due to VM initialization
- socket_vmnet provides better networking but requires sudo setup
- Consider creating a LaunchAgent for `minikube mount` if using that approach

View file

@ -1,10 +1,34 @@
# Phase 6: Kiwix and Transmission Migration
**Goal**: Migrate kiwix-serve and transmission torrent daemon to k8s with SMB storage on sifaka
**Goal**: Migrate kiwix-serve and transmission torrent daemon to k8s with shared storage
**Status**: Planning
**Status**: BLOCKED - waiting for [Phase 5.1](P5.1_qemu2_migration.md) (QEMU2 migration)
**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
**Prerequisites**: [Phase 5.1](P5.1_qemu2_migration.md) complete (minikube on QEMU2 driver)
---
## Blocker: Podman Driver Volume Mount Limitations
**First attempt branch:** `feature/p6-kiwix-transmission`
The initial implementation was completed and tested, but **all volume mount approaches failed** due to the podman driver's rootless container limitations:
| Approach | Result |
|----------|--------|
| NFS volume | Failed - CAP_SYS_ADMIN required for NFS mounts |
| SMB CSI driver | Failed - `mount.cifs` returns EPERM inside rootless container |
| `minikube mount` (9p) | Failed - permission denied mounting into podman VM |
| hostPath | Failed - path doesn't exist inside minikube container |
**Root cause:** The podman driver runs minikube in a rootless container that lacks kernel capabilities for filesystem mounts. This is a [documented limitation](https://minikube.sigs.k8s.io/docs/drivers/podman/) of the experimental podman driver.
**Solution:** Phase 5.1 migrates minikube from podman to QEMU2 driver, which creates an actual VM with full kernel capabilities.
**What's preserved:**
- All k8s manifests in `feature/p6-kiwix-transmission` are complete and tested
- Prerequisites (SMB share, k8s-smb user, data rsync) are done
- Can retry P6 immediately after P5.1 completes
---
@ -38,14 +62,14 @@ New architecture in k8s:
## Architecture Decisions
### Storage: SMB on Sifaka
### Storage: SMB on Sifaka (or NFS after QEMU2 migration)
**Why SMB instead of NFS:**
- Minikube with podman driver lacks CAP_SYS_ADMIN required for NFS mounts
- SMB already works reliably with Synology (used for other shares)
- SMB CSI driver ([csi-driver-smb](https://github.com/kubernetes-csi/csi-driver-smb)) is well-maintained
- Supports ReadWriteMany access mode for concurrent pod access
**Note:** The original plan chose SMB over NFS, but both failed with podman driver. After QEMU2 migration, either should work. SMB is still preferred for:
- Native Synology SMB support with good macOS compatibility
- ReadWriteMany access mode for concurrent pod access
- SMB CSI driver already mirrored to forge
**Alternative after QEMU2:** NFS may be simpler with `minikube mount` or direct NFS volume type.
**Storage path:** `/volume1/torrents/` on sifaka (SMB share name: `torrents`)
- General-purpose torrent download directory