blumeops/docs/reference/infrastructure/tailscale.md
Erich Blume 14ca0160ba Migrate devpi from minikube to indri (launchd) (#341)
## Summary

Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent.

## What changed

- **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box).
- **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`.
- **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role.
- **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`.
- **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted.
- **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list.

## Already deployed (live on indri)

- Service running: `launchctl list mcquack.eblume.devpi` → PID 53888
- `curl https://pypi.ops.eblu.me/+api` returns 200 
- `mise run docs-mikado` works again 
- 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/`
- Minikube namespace and PVC fully reclaimed

## Test plan

- [ ] `mise run services-check` (after merge)
- [ ] CI workflows that use devpi succeed
- [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines)

## Context

This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain.

Reviewed-on: #341
2026-04-29 13:38:36 -07:00

4.8 KiB

title modified last-reviewed tags
Tailscale 2026-04-18 2026-04-18
infrastructure
networking

Tailscale

Tailnet tail8d86e.ts.net provides secure networking for all BlumeOps infrastructure.

ACL Management

ACLs managed via Pulumi in pulumi/tailscale/policy.hujson.

Groups

Group Members Purpose
group:allisonflix admin, member jellyfin media access

Device Tags

Tag Devices Purpose
tag:homelab indri, ringtail Server infrastructure
tag:nas sifaka Network-attached storage
tag:blumeops indri, sifaka, ringtail Pulumi IaC managed resources
tag:registry indri Container registry (Zot)
tag:forge indri Forgejo git hosting
tag:loki indri Loki log aggregation
tag:k8s-api indri Kubernetes API server (minikube)
tag:k8s-operator (operator pod) Tailscale operator for k8s — see tailscale-operator
tag:k8s (Ingress proxy pods) Kubernetes Tailscale Ingress nodes; each also carries a per-service tag (tag:grafana, tag:kiwix, tag:feed, tag:pg)
tag:ci-gateway (ephemeral CI containers) CI containers pushing images to registry
tag:flyio-proxy (Fly.io proxy container) Public reverse proxy
tag:flyio-target indri, designated Ingress endpoints Endpoints reachable by the Fly.io proxy (indri for Caddy routing, Ingress pods for Alloy metrics/logs)

Important: Don't tag user-owned devices (like gilbert) via Pulumi. Tagging converts them to "tagged devices" which lose user identity and break user-based SSH rules. Gilbert is referenced as tag:workstation in tagOwners for ownership purposes but remains user-owned so blume.erich@gmail.com identity is preserved.

Access Matrix

Source Kiwix Forge DevPI Miniflux PostgreSQL NAS Grafana Loki
autogroup:admin Y Y Y Y Y Y Y Y
autogroup:member Y Y (443, SSH) Y Y Y (5432) - - -
tag:homelab - - - - Y (5432) Y - Y (3100)
tag:k8s - Y (3001, 2200) - - - - - -
  • Admins — full access to all services
  • Members — user-facing services only; no Grafana, Loki, or NAS
  • Homelab — server-to-server: full mutual access between homelab peers (including SSH), full NAS access, and k8s service access (443, 5432, 9187)
  • K8s — can reach registry (443) and forge on indri (HTTP 3001, SSH 2200) for GitOps

Additional grants not shown in the matrix:

  • tag:flyio-proxytag:flyio-target on tcp:443 only
  • tag:ci-gatewaytag:registry on tcp:443
  • tag:k8stag:registry on tcp:443
  • tag:homelabtag:k8s on tcp:443, tcp:5432, tcp:9187

See pulumi/tailscale/policy.hujson for the full grant definitions.

SSH Access

Source Destinations Auth
autogroup:member autogroup:self check
autogroup:admin tag:homelab check (12h)
autogroup:admin tag:nas check (12h)
tag:homelab tag:homelab accept (tagged devices cannot perform interactive auth)

Auto Approvers

ProxyGroup pods (tag:k8s) can auto-approve their own VIP Services. This is required for multi-cluster Tailscale Ingress routing — without it, advertised ProxyGroup routes are not approved. See tailscale-operator for ProxyGroup configuration details.

OAuth Credentials

Pulumi uses OAuth client from 1Password (blumeops vault):

  • Scopes: acl, dns, devices, services
  • Auto-applies tag:blumeops to IaC-managed resources

Direct Peering vs DERP Relay

Just because Tailscale can route traffic does not mean it routes it efficiently. DERP relay servers are a fallback for when direct WireGuard connections cannot be established — they add significant latency (20+ seconds observed under load) because every packet bounces through a relay server.

Direct peering is critical for any production-like traffic path. Check with tailscale ping <host> — it should say via <ip>:<port>, not via DERP(<region>).

Common reasons direct peering fails:

  • k8s pods: Tailscale Ingress pods behind pod-network NAT cannot hole-punch. Route through a host-level Tailscale node (e.g., Caddy on indri) instead.
  • Cloud VMs: Some cloud providers block incoming UDP. Pin the WireGuard port (tailscaled --port=41641) and expose it as a UDP service if possible.
  • Double NAT / CGNAT: Multiple NAT layers make hole punching unreliable.

The flyio-proxy uses --port=41641 pinning to enable direct peering with indri, and routes through caddy (host-level Tailscale) to avoid the DERP bottleneck of k8s-hosted Tailscale Ingress pods.