## Summary Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent. ## What changed - **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box). - **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`. - **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role. - **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`. - **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted. - **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list. ## Already deployed (live on indri) - Service running: `launchctl list mcquack.eblume.devpi` → PID 53888 - `curl https://pypi.ops.eblu.me/+api` returns 200 ✅ - `mise run docs-mikado` works again ✅ - 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/` - Minikube namespace and PVC fully reclaimed ## Test plan - [ ] `mise run services-check` (after merge) - [ ] CI workflows that use devpi succeed - [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines) ## Context This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain. Reviewed-on: #341
7.3 KiB
| title | modified | last-reviewed | tags | ||
|---|---|---|---|---|---|
| Restart Indri | 2026-03-14 | 2026-03-14 |
|
Restart Indri
How to safely shut down and restart indri, the primary BlumeOps server.
Prerequisites
- SSH access to indri
- Tailscale connected
Shutdown Procedure
1. Stop Kubernetes Gracefully
Minikube runs on the Docker driver, so stopping it cleanly ensures pods terminate gracefully and persistent volumes are properly unmounted.
ssh indri 'minikube stop'
This may take a minute as pods receive termination signals. You can verify it stopped:
ssh indri 'minikube status'
2. Stop Native Services (Optional)
Native services managed by launchd will stop automatically during macOS shutdown. However, if you want to stop them explicitly first:
# LaunchAgent services
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.caddy.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.devpi.plist' # see [[devpi-on-indri]]
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.jellyfin.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.alloy.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.borgmatic.plist'
3. Quit GUI Applications
These apps don't autostart and should be quit cleanly before reboot:
- Docker Desktop - Quit from menubar or:
ssh indri 'osascript -e "quit app \"Docker\""' - Amphetamine - Quit from menubar (prevents sleep; will need restart)
- AutoMounter - Quit from menubar (mounts sifaka SMB shares)
4. Reboot
ssh indri 'sudo shutdown -r now'
Or if you're at the console, use the Apple menu.
Startup Procedure
After indri boots, most services recover automatically. Only a few things need manual attention.
What autostarts: Docker Desktop and all mcquack LaunchAgent services (Forgejo, Caddy, Zot, Jellyfin, Alloy, Borgmatic, metrics collectors).
What needs manual action: Amphetamine, AutoMounter, and minikube (including its Tailscale serve port).
Warning: Do NOT run
minikube delete— it destroys all PersistentVolumes, etcd state, and requires a full DR rebuild. Useminikube stop/minikube startinstead. If minikube is stuck, see #Troubleshooting CNI Conflict After Unclean Shutdown. For full cluster rebuild, see rebuild-minikube-cluster.
0. Dismiss macOS Permission Dialogs
After a cold boot, the first inbound Tailscale SSH connection to indri triggers a macOS GUI permission dialog from tailscaled. This blocks the SSH session (and anything downstream like ansible) until dismissed at the console. You must be logged in to indri (via Screen Sharing or physically) to approve it before running any remote commands.
1. Log In and Start GUI Apps
Log in to indri (via Screen Sharing or physically) and launch:
| App | Purpose | Launch Method |
|---|---|---|
| Amphetamine | Prevents sleep | Spotlight or App Store apps |
| AutoMounter | Mounts sifaka SMB shares to /Volumes/ |
Spotlight or App Store apps |
Docker Desktop autostarts on login. Wait for it to finish starting (whale icon in menubar stops animating) before proceeding.
2. Verify Sifaka Mounts
AutoMounter should automatically mount the sifaka shares. Verify:
ssh indri 'ls /Volumes/'
You should see: allisonflix, backups, music, photos, torrents (or similar).
If mounts are missing, open AutoMounter and trigger a reconnect.
3. Fix Minikube Remote Access
Minikube uses the Docker driver, which assigns a random API server port on each start. After a reboot, the Tailscale serve proxy (k8s.tail8d86e.ts.net) will still point to the old port, breaking remote kubectl access.
Run the minikube ansible role to detect the new port and update Tailscale serve:
mise run provision-indri -- --tags minikube
Note: Do NOT run the full
mise run provision-indriwithout tags during startup — theforgejo_actions_secretsrole will timeout because the Forgejo API routes through Caddy → k8s, which isn't up yet. Use--tags minikube(or--tags minikube,minikube_metrics) to target just the minikube role.
This will:
- Start minikube if it hasn't started yet
- Detect the current API server port
- Update
tailscale serveto forward to the correct port
You can verify remote access works:
kubectl --context=minikube-indri get nodes
4. Run Health Check
Once everything is up, verify all services:
mise run services-check
All checks should pass. If any fail, see troubleshooting.
Troubleshooting: CNI Conflict After Unclean Shutdown
After a power loss or unclean reboot, minikube may come up with broken pod networking. The symptom is that new pods cannot reach CoreDNS — services crash-loop with DNS errors (EAI_AGAIN, connection timed out; no servers could be reached) or fail liveness probes because their event loops hang on blocked network calls.
Existing pods that were restarted (not recreated) may appear healthy because the kubelet reuses their cached network namespaces.
Cause
During minikube recovery from a bad state, the CRI-O / Docker networking bootstrap can regenerate a default CNI config file (1-k8s.conflist) that conflicts with kindnet's config (10-kindnet.conflist). Since 1- sorts before 10-, the stale bridge+firewall config takes precedence, and new pods get attached to a different network topology than existing pods.
Diagnosis
1. Check if new pods can resolve DNS:
kubectl --context=minikube-indri run dns-test --image=alpine:3.21 --restart=Never \
--command -- sh -c 'nslookup kubernetes.default.svc.cluster.local'
sleep 10
kubectl --context=minikube-indri logs dns-test
kubectl --context=minikube-indri delete pod dns-test
If this shows connection timed out; no servers could be reached, pod networking is broken.
2. Check for conflicting CNI configs:
ssh indri 'minikube ssh "ls -la /etc/cni/net.d/"'
You should see only 10-kindnet.conflist (plus 200-loopback.conf and disabled .mk_disabled files). If 1-k8s.conflist or any other active config exists alongside 10-kindnet.conflist, that's the conflict.
3. Confirm the conflict by inspecting the stale config:
ssh indri 'minikube ssh "cat /etc/cni/net.d/1-k8s.conflist"'
If it uses a bridge plugin with a firewall plugin (instead of kindnet's ptp plugin), it's the culprit.
Fix
1. Remove the stale CNI config:
ssh indri 'minikube ssh "sudo rm /etc/cni/net.d/1-k8s.conflist"'
2. Delete all pods that were created while the bad config was active. The simplest approach is to restart all deployments:
kubectl --context=minikube-indri get deployments -A --no-headers | \
awk '{print "-n " $1 " " $2}' | \
xargs -L1 kubectl --context=minikube-indri rollout restart deployment
StatefulSets managed by operators (CNPG, Tailscale) generally survive because the kubelet restarts their containers in-place rather than creating new pods.
3. Verify with the DNS test above, then run mise run services-check.
Related
- indri - Server specifications
- troubleshooting - Diagnose issues
- cluster - Kubernetes details
- sifaka - NAS storage