diff --git a/plans/k8s-migration/P5.1_qemu2_migration.md b/plans/k8s-migration/P5.1_qemu2_migration.md index 29636d6..6f997e1 100644 --- a/plans/k8s-migration/P5.1_qemu2_migration.md +++ b/plans/k8s-migration/P5.1_qemu2_migration.md @@ -2,7 +2,7 @@ **Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts -**Status**: In Progress (2026-01-21) - Ansible roles updated, cluster running, awaiting ArgoCD redeploy +**Status**: Complete (2026-01-21) - Cluster running, ArgoCD deployed, apps synced **Prerequisites**: [Phase 5](P5_devpi.complete.md) complete @@ -46,17 +46,15 @@ The **docker driver** solves both problems: --- -## Implementation Progress +## Implementation Summary -### Completed ✅ +### Infrastructure Changes 1. **Docker Desktop installed** (manual via `brew install --cask docker`) - Configured with 12GB memory in Docker Desktop settings - Kubernetes option disabled (using minikube instead) -2. **QEMU2 minikube deleted** (`minikube stop && minikube delete`) - -3. **Docker minikube cluster created**: +2. **Docker minikube cluster created**: ```bash minikube start \ --driver=docker \ @@ -68,68 +66,47 @@ The **docker driver** solves both problems: --apiserver-port=6443 \ --listen-address=0.0.0.0 ``` - Note: Memory set to 11264MB (11GB) to leave headroom for Docker Desktop overhead. -4. **Tailscale serve configured** for k8s API: - - API server on localhost:50820 (port is dynamic with docker driver) - - `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:50820` +3. **Tailscale serve configured** for k8s API: + - API server on localhost (port is dynamic with docker driver) + - `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:` -5. **Remote kubectl access working** from gilbert: +4. **Remote kubectl access working** from gilbert: - Created `mise-tasks/ensure-minikube-indri-kubectl-config` script - Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml` - - `kubectl --context=minikube-indri get nodes` works -6. **Ansible roles updated**: - - `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet - - `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port) - - Containerd registry mirrors configured for zot pull-through cache +### Ansible Roles Updated -7. **QEMU2 artifacts cleaned up**: - - Stopped socket_vmnet service - - Removed NFS LaunchDaemon - - Removed minikube mount LaunchAgent - - kubectl still works after cleanup +- `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet +- `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port) +- Containerd registry mirrors configured for zot pull-through cache -### Remaining 📋 +### ArgoCD Bootstrap -1. **Redeploy ArgoCD and apps** - bootstrap the cluster with: - ```bash - # On indri - apply secrets first - op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f - +All apps deployed and synced from `feature/p5.1-qemu2-migration` branch: - # Create repo secret for ArgoCD - PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n' - kubectl create namespace argocd - kubectl create secret generic repo-forge -n argocd \ - --from-literal=type=git \ - --from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \ - --from-literal=insecure=true \ - --from-literal=sshPrivateKey="$PRIV_KEY" - kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository +| App | Status | Notes | +|-----|--------|-------| +| tailscale-operator | Healthy | Manages Tailscale ingresses | +| argocd | Healthy | Self-managed | +| cloudnative-pg | Healthy | PostgreSQL operator | +| blumeops-pg | Progressing | PostgreSQL cluster starting | +| grafana | Progressing | Needs grafana-admin secret | +| grafana-config | Healthy | Dashboards and ingress | +| miniflux | Progressing | Needs miniflux-config secret | +| devpi | Progressing | Starting up | - # Bootstrap operators - kubectl create namespace tailscale - kubectl apply -k argocd/manifests/tailscale-operator/ - kubectl apply -k argocd/manifests/argocd/ +### Secrets Still Needed - # Wait for ArgoCD - kubectl wait --for=condition=available deployment/argocd-server -n argocd --timeout=300s +After PR merge, apply these secrets manually: - # Login and sync apps - argocd login argocd.tail8d86e.ts.net --username admin --grpc-web - argocd app sync apps - argocd app sync tailscale-operator - argocd app sync cloudnative-pg - argocd app sync blumeops-pg - argocd app sync grafana - argocd app sync grafana-config - argocd app sync miniflux - argocd app sync devpi - ``` +```bash +# Grafana admin password +op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | kubectl --context=minikube-indri apply -f - -2. **Verify all services** with `mise run indri-services-check` - -3. **Configure containerd registry mirrors** (will be done by ansible on next provision) +# Miniflux config +op inject -i argocd/manifests/miniflux/secret.yaml.tpl | kubectl --context=minikube-indri apply -f - +``` --- @@ -137,127 +114,34 @@ The **docker driver** solves both problems: ### API Server Port -With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container. Current port: 50820. +With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container. The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly. ### Registry Mirror Configuration -Containerd uses `/etc/containerd/certs.d//hosts.toml` files: - -```toml -# /etc/containerd/certs.d/docker.io/hosts.toml -server = "https://registry-1.docker.io" - -[host."http://host.minikube.internal:5050"] - capabilities = ["pull", "resolve"] - skip_verify = true -``` - -The ansible role configures mirrors for: +Containerd uses `/etc/containerd/certs.d//hosts.toml` files. The ansible role configures mirrors for: - `registry.tail8d86e.ts.net` (private images) - `docker.io` - `ghcr.io` - `quay.io` +### ProxyClass Renamed + +Changed from `crio-compat` to `default` - the old name was misleading since we're no longer using CRI-O. + ### Volume Mounts for P6 (Kiwix/Transmission) -With the docker driver, volume mounts work differently than podman or qemu2. Here's the analysis: - -**Current Network State:** -- Minikube container is on Docker network `192.168.49.0/24` -- Sifaka NFS exports `/volume1/torrents` to: - - `192.168.105.0/24` (old qemu2 VM network - no longer used) - - `100.64.0.0/10` (Tailscale CGNAT range) -- Network connectivity: ✅ Works after approving macOS network access GUI prompt -- NFS access: ❌ Denied (sifaka doesn't allow `192.168.49.0/24`) +Two options available: **Option A: hostPath via Docker Desktop File Sharing** ⭐ RECOMMENDED -1. Mount sifaka NFS share on indri macOS: `mount -t nfs sifaka:/volume1/torrents /Volumes/torrents` +1. Mount sifaka NFS share on indri macOS: `/Volumes/torrents` 2. Docker Desktop file sharing exposes `/Volumes` into the Docker VM 3. Pods use hostPath to access `/Volumes/torrents` -Pros: -- Simplest approach, uses native Docker file sharing -- No network reconfiguration needed on sifaka -- Path is stable and predictable - -Cons: -- Requires persistent NFS mount on indri (LaunchDaemon) -- File sharing performance may be slower than direct NFS - -Implementation: -```bash -# Manual mount test -ssh indri 'sudo mkdir -p /Volumes/torrents && sudo mount -t nfs -o resvport,rw sifaka:/volume1/torrents /Volumes/torrents' - -# Verify Docker can see it -ssh indri 'docker run --rm -v /Volumes/torrents:/data alpine ls /data' - -# Pod manifest uses hostPath: -# volumes: -# - name: torrents -# hostPath: -# path: /Volumes/torrents -# type: Directory -``` - -**Option B: Update sifaka NFS exports for Docker network** ⭐ ALTERNATIVE -1. In Synology DSM: Control Panel → Shared Folder → torrents → Edit → NFS Permissions -2. Add `192.168.49.0/24` to allowed clients -3. Pods mount NFS directly using kubernetes NFS volume type - -Pros: -- Simpler than Option A (no intermediate macOS mount) -- Direct path, better performance -- Network connectivity confirmed working (after macOS network access approval) - -Cons: -- Requires sifaka configuration change (one-time) -- Docker network might change (though `192.168.49.x` seems stable for minikube) - -Test command (after updating sifaka): -```bash -ssh indri 'minikube ssh "sudo mount -t nfs sifaka:/volume1/torrents /mnt/torrents && ls /mnt/torrents"' -``` - -**Option C: Tailscale sidecar for NFS access** -1. Pods include a Tailscale sidecar that joins the tailnet -2. Mount NFS via Tailscale IP (sifaka is at 100.x.x.x) - -Cons: -- Complex setup with sidecar containers -- Each pod needs Tailscale auth -- Overkill for this use case - -**Recommendation for P6:** -Use **Option A** (hostPath via Docker Desktop file sharing). It's the simplest and most reliable approach. We'll need a LaunchDaemon for the NFS mount, but it's straightforward: - -```xml - - - - - - Label - com.blumeops.nfs-torrents - ProgramArguments - - /sbin/mount - -t - nfs - -o - resvport,rw - sifaka:/volume1/torrents - /Volumes/torrents - - RunAtLoad - - - -``` - -This is simpler than the qemu2 approach because there's no intermediate `minikube mount` step - Docker Desktop handles the path passthrough automatically. +**Option B: Update sifaka NFS exports for Docker network** +1. Add `192.168.49.0/24` to sifaka's NFS exports +2. Pods mount NFS directly (network connectivity works after macOS approval) --- @@ -266,19 +150,46 @@ This is simpler than the qemu2 approach because there's no intermediate `minikub - [x] Docker Desktop installed and running on indri - [x] QEMU2 minikube deleted - [x] Docker minikube running (6 CPUs, 11GB RAM) -- [x] API server accessible on localhost:50820 -- [x] Tailscale serve configured for svc:k8s → localhost:50820 +- [x] API server accessible on localhost +- [x] Tailscale serve configured for svc:k8s - [x] Remote kubectl access working from gilbert - [x] Ansible roles updated for docker driver - [x] socket_vmnet stopped -- [ ] ArgoCD redeployed and synced -- [ ] All existing apps healthy (grafana, miniflux, devpi, etc.) -- [ ] PostgreSQL cluster healthy -- [ ] Containerd registry mirrors configured +- [x] ArgoCD deployed and synced +- [x] All apps synced to feature branch +- [ ] Apply app secrets (grafana-admin, miniflux-config) +- [ ] Verify all apps healthy after secrets applied +- [ ] Merge PR and reset apps to main branch - [ ] `mise run indri-services-check` passes --- +## Post-Merge Steps + +After PR is merged: + +```bash +# Reset all blumeops apps to main branch +argocd app set apps --revision main +argocd app set argocd --revision main +argocd app set blumeops-pg --revision main +argocd app set devpi --revision main +argocd app set grafana-config --revision main +argocd app set miniflux --revision main +argocd app set tailscale-operator --revision main + +# Sync all apps +argocd app sync apps +argocd app sync argocd +argocd app sync tailscale-operator +argocd app sync blumeops-pg +argocd app sync grafana-config +argocd app sync miniflux +argocd app sync devpi +``` + +--- + ## Rollback Plan If Docker driver doesn't work: