Migrate devpi from minikube to indri (launchd) (#341)

## Summary

Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent.

## What changed

- **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box).
- **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`.
- **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role.
- **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`.
- **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted.
- **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list.

## Already deployed (live on indri)

- Service running: `launchctl list mcquack.eblume.devpi` → PID 53888
- `curl https://pypi.ops.eblu.me/+api` returns 200 
- `mise run docs-mikado` works again 
- 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/`
- Minikube namespace and PVC fully reclaimed

## Test plan

- [ ] `mise run services-check` (after merge)
- [ ] CI workflows that use devpi succeed
- [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines)

## Context

This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain.

Reviewed-on: #341
This commit is contained in:
Erich Blume 2026-04-29 13:38:36 -07:00
commit 14ca0160ba
24 changed files with 260 additions and 289 deletions

View file

@ -0,0 +1,74 @@
---
title: Devpi on Indri
modified: 2026-04-29
last-reviewed: 2026-04-29
tags:
- how-to
- operations
---
# Devpi on Indri
How devpi (the PyPI caching mirror at `pypi.ops.eblu.me`) is deployed on indri as a launchd-managed native service. Replaces the prior minikube StatefulSet.
## Why native, not Kubernetes
Devpi has no runtime dependencies beyond a Python interpreter, a writable directory, and outbound HTTPS to upstream PyPI. Running it on indri natively removes a layer of operational complexity, frees minikube resources, and decouples this critical-path tooling (used by every Python build, including `mise run docs-mikado` itself) from cluster health.
## Layout
| Concern | Path / detail |
|---|---|
| Service binary | `/Users/erichblume/devpi/venv/bin/devpi-server` |
| Server-dir (data) | `/Users/erichblume/devpi/server-dir/` |
| Logs | `/Users/erichblume/Library/Logs/mcquack.devpi.{out,err}.log` |
| LaunchAgent label | `mcquack.eblume.devpi` |
| LaunchAgent plist | `~/Library/LaunchAgents/mcquack.eblume.devpi.plist` |
| Listen address | `127.0.0.1:3141` (loopback only) |
| Public URL | `https://pypi.ops.eblu.me` (via Caddy reverse proxy) |
| Root password secret | 1Password item `devpi`, field `root password` |
The venv is built fresh by ansible from a pinned `devpi-server` and `devpi-web` version; bumping versions is a config change in `ansible/roles/devpi/defaults/main.yml`.
## Deploy
```fish
mise run provision-indri -- --tags devpi
```
Ansible will:
1. Fetch the root password from 1Password (in playbook `pre_tasks`)
2. Create the venv at `~/devpi/venv` if absent and install/upgrade `devpi-server` + `devpi-web` to the pinned versions
3. Initialize the server-dir (only on first run, when `.serverversion` is missing)
4. Render and load the LaunchAgent plist
5. Restart the service if the plist or config changed
Caddy already proxies `pypi.ops.eblu.me``127.0.0.1:3141`; nothing else routes traffic.
## Verify
```fish
ssh indri 'launchctl list mcquack.eblume.devpi'
curl -fsS https://pypi.ops.eblu.me/+api | jq
uv pip install --index-url https://pypi.ops.eblu.me/root/pypi/+simple/ requests
```
## Logs
```fish
ssh indri 'tail -f ~/Library/Logs/mcquack.devpi.err.log'
```
## Bumping devpi versions
Edit `devpi_server_version` / `devpi_web_version` in `ansible/roles/devpi/defaults/main.yml`, then re-run the playbook with `--tags devpi`. The role rebuilds the venv in-place; the server-dir survives.
## Backup
The server-dir is **not** in `borgmatic_source_directories` and is not backed up. The PyPI cache (`+files/`) is re-fetchable from upstream on first request; the local `eblume/dev` index can be republished from source. If retention becomes important, add `/Users/erichblume/devpi/server-dir/` to the borgmatic source list.
## Related
- [[restart-indri]] — devpi is one of the LaunchAgents to stop on graceful shutdown
- [[connect-to-postgres]] — pattern for indri-native services (different stack, similar shape)

View file

@ -235,25 +235,7 @@ mise run services-check
## Post-Rebuild: Cold Cache Failures
### Devpi (PyPI Cache)
After a rebuild, devpi's package cache is empty. The first Dagger-based container build will trigger a flood of concurrent package downloads. Devpi uses lazy caching — it serves package metadata (simple index) immediately from upstream PyPI but fetches wheel files on demand. Under heavy concurrent load with a cold cache, the upstream fetch can race with the client request, causing devpi to return `no such file` (HTTP 404) for packages it knows about but hasn't finished downloading yet.
**Why devpi, not PyPI?** The repo's `uv.lock` was generated with devpi as the index, so every package source URL points at `pypi.ops.eblu.me`. Dagger's Python SDK runtime does a locked install (`uv sync`), not fresh resolution — it fetches from whatever URLs are in the lockfile. This is intentional (supply chain control), but means all builds — local and CI — depend on devpi being available and warm.
**Symptoms:** Forgejo Actions Dagger builds fail during module initialization with errors like:
```
Failed to download `googleapis-common-protos==1.74.0`
HTTP status client error (404 Not Found) for url (https://pypi.ops.eblu.me/root/pypi/+f/...)
```
**Fix:** Re-run the failed build. The first attempt warms the cache; subsequent builds succeed. Alternatively, warm the cache manually before triggering CI builds:
```bash
# From any machine that can reach pypi.ops.eblu.me, install the Dagger SDK
# to pre-populate the most common packages:
pip install --dry-run --index-url https://pypi.ops.eblu.me/root/pypi/+simple/ dagger-io
```
Devpi runs natively on indri (see [[devpi-on-indri]]) and is unaffected by minikube rebuilds, so the historical "devpi cold cache after rebuild" failure mode no longer applies. If devpi itself goes cold (fresh server-dir), the same lazy-cache race can still cause `404` on the first Dagger build under concurrent load — re-run the build to warm the cache, or pre-warm with `uv pip install --dry-run --index-url https://pypi.ops.eblu.me/root/pypi/+simple/ dagger-io`.
## Related

View file

@ -41,6 +41,7 @@ Native services managed by launchd will stop automatically during macOS shutdown
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.caddy.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.zot.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.devpi.plist' # see [[devpi-on-indri]]
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.jellyfin.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.alloy.plist'
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.borgmatic.plist'