blumeops/docs/how-to/operations/shower-on-ringtail.md
Erich Blume 292d354902
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m12s
C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
## Summary

Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).

### What's included

- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.

### Defense layers on the public surface

1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)

## Prerequisites for the user to do (NOT in this PR)

Halted on these per request — they touch shared/manual systems:

- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging

## Test plan

- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes

Reviewed-on: #349
2026-05-11 13:47:18 -07:00

9.2 KiB

title modified last-reviewed tags
Shower App on Ringtail 2026-05-10 2026-05-10
how-to
operations
kubernetes
django

Shower App on Ringtail

How the Adelaide / Heidi / Addie baby shower app is deployed. The app is a Django project (adelaide-baby-shower-app) released as a wheel to the Forgejo Packages PyPI index and run on ringtail's k3s cluster. Public landing page at shower.eblu.me, staff console + admin UI at shower.ops.eblu.me (tailnet only).

The contract this deploy implements is defined in the app repo's docs/how-to/hosting.md — read that for the env-var contract, security model, and storage requirements before changing anything here.

Routing

Internet → shower.eblu.me
            │ (Fly.io nginx — public)
            ▼
        Caddy on indri (shower.ops.eblu.me)
            │
            ▼
        Tailscale ProxyGroup ingress (shower.tail8d86e.ts.net)
            │
            ▼
        Service shower:8000 → Pod (Django + gunicorn)
Hostname Reachable from Notes
shower.eblu.me Public internet Guest surface only — splash, /prizes/<token>/, /static/, /media/. Everything authenticated 403s with a tailnet pointer.
shower.ops.eblu.me Tailnet Full app surface — /host/, /admin/, the works
shower.tail8d86e.ts.net Tailnet Bare ProxyGroup endpoint Caddy proxies to

Defense layers (public side)

The public surface is guest-only, so the threat model collapses: there is no credential-accepting endpoint reachable from WAN, and nothing on WAN that requires authentication.

  1. edge auth lockout — fly nginx 403s /admin/, /host/, and anything that would redirect into them. Anyone hitting an auth URL on WAN gets a "tailnet only" message.
  2. fly nginx limit_req zone=general — 10 r/s per Fly-Client-IP cushion for the splash form.
  3. django-axes — 5 fails / 1 hour lockout per (username, ip_address), running on the tailnet-side login. Provides the only credential defense, since brute-force is only reachable to tailnet members.

The QR codes that /host/ (on tailnet) generates for guests embed https://shower.eblu.me/... even though the QR view is served from the tailnet host. The app's PUBLIC_URL_BASE setting (added in v1.0.1) overrides Django's request.build_absolute_uri() for those URLs.

Persistent storage

Mount PVC Type Why
/app/media shower-media NFS RWX on sifaka (/volume1/shower) Prize photos survive pod rescheduling
/app/data shower-data k3s local-path RWO SQLite DB; NFS file locking can't be trusted for WAL/journal

The container has the app + its Python deps baked in at nix build time (buildPythonPackage against the wheel fetched from forge PyPI). The entrypoint runs migrations, runs collectstatic, and execs gunicorn — no pip-at-boot. A local_settings.py shim overrides DATABASES.NAME, MEDIA_ROOT, and STATIC_ROOT to absolute paths under /app/, sidestepping the wheel's BASE_DIR = parent.parent of an in-site-packages settings module.

Backups

borgmatic (running on indri) captures both halves of the persistent state on its daily 2 a.m. run:

  • /app/data/db.sqlite3 — dumped via kubectl exec's sqlite3.backup() against the live pod (entry in borgmatic_k8s_sqlite_dumps, context k3s-ringtail). The dumped file lands in borgmatic_k8s_dump_dir on indri and is picked up by the main source-directory sweep.
  • /app/media — picked up via /Volumes/shower, the SMB mount of sifaka:/volume1/shower on indri. The same Synology share is exposed via SMB and NFS simultaneously; ringtail's pod uses the NFS export, while indri reads the SMB side for the borgmatic source.

Both archive to sifaka (borg-backups) and BorgBase offsite, with retention keep_daily=7 / keep_monthly=12 / keep_yearly=1000.

The SMB mount on indri is set up manually once via Finder (Cmd-K → smb://sifaka/shower, save credentials, "Always log in" so it reconnects after reboot). If /Volumes/shower is missing at backup time borgmatic will fail loudly — source_directories_must_exist: true applies to all entries.

One-time setup steps

These steps are required the first time the service is deployed and are not encoded in the manifests.

1. NFS + SMB share on sifaka

On the Synology DSM web UI:

  1. Control Panel → Shared Folder → Create. Name: shower, Location: Volume 1. Leave the rest at default.
  2. Control Panel → File Services → NFS → NFS Rules (on the shower row's Permissions tab). Add a rule mirroring the other shares' pattern: Hostname/IP=192.168.1.0/24 and again for 100.64.0.0/10, Privilege=Read/Write, Squash=Map all users to admin (= all_squash), and tick Allow connections from non-privileged ports. (See sifaka#NFS Exports — the existing frigate, paperless, etc. shares use this exact pattern.)
  3. Control Panel → File Services → SMB: leave SMB enabled globally. No per-share rule required — the share inherits the default eblume access.
  4. The directory ownership at /volume1/shower will end up root:root, mode 0777 (DSM default) — which is fine because all_squash rewrites every NFS write to admin:users, and the 0777 lets pods read what other pods wrote. No chown needed.

After the share exists, mount it on indri for borgmatic:

  • In Finder, Cmd-K → smb://sifaka/shower, sign in as eblume, and tick Remember in Keychain + Always log in so it reconnects on reboot. This produces /Volumes/shower, which the borgmatic source-directory list points at.

2. 1Password item

Item name: Shower (blumeops) in the blumeops vault. Required property:

Field Value
secret-key Output of openssl rand -base64 48

The ExternalSecret shower-app-secrets will sync this into the shower namespace as a Secret and envFrom exposes it as DJANGO_SECRET_KEY to the container.

Never reuse a key that has ever been in git history. Per the app's hosting.md, an early dev key was committed before being replaced with the django-insecure-... placeholder; the production key must be freshly generated.

3. Container image

Built by the build-container Forgejo Actions workflow on the nix-container-builder runner (ringtail, amd64). The wheel is fetched from forge PyPI at nix build time and baked into the image — no pip-at-runtime. To bump the version, change version in containers/shower/default.nix and update wheelHash (or set it to pkgs.lib.fakeHash and let the next build print the correct one).

Trigger with:

mise run container-build-and-release shower

After the workflow finishes, update images[].newTag in argocd/manifests/shower/kustomization.yaml to the resulting vX.Y.Z-<sha>-nix tag, then commit (C0).

4. DNS

pulumi/gandi/__main__.py declares the shower-public CNAME pointing at blumeops-proxy.fly.dev.. Apply with:

mise run dns-preview
mise run dns-up

5. Fly.io certificate

fly certs add shower.eblu.me -a blumeops-proxy

(Add to mise-tasks/fly-setup so re-runs of the one-time setup pick it up.)

6. Caddy on indri

shower is in ansible/roles/caddy/defaults/main.yml. Push with:

mise run provision-indri -- --tags caddy

7. Create the admin user

The container's entrypoint runs migrate --noinput + collectstatic --noinput --clear before gunicorn, so a fresh db.sqlite3 is schema- ready as soon as the pod boots. It does not create a Django superuser — that has to happen once, interactively, after the first pod is up:

kubectl --context=k3s-ringtail -n shower exec -it deploy/shower -- \
    python -m django createsuperuser

Use erich / your usual email. The same account doubles as the @staff_member_required login for /host/. Subsequent staff accounts can be created from /admin/auth/user/ once you're signed in.

Deploying a new version

  1. Bump the wheel version in the app repo (adelaide-baby-shower-app) and release it to Forgejo PyPI.
  2. Bump appVersion in containers/shower/default.nix to match.
  3. mise run container-build-and-release shower. Verify the build with mise run runner-logs.
  4. Update the newTag in argocd/manifests/shower/kustomization.yaml to the new [main] SHA tag.
  5. Commit (C0 after PR merge — see build-container-image#Squash-merge and container tags).
  6. argocd app sync shower.

Verifying after a deploy

kubectl --context=k3s-ringtail -n shower get pods
kubectl --context=k3s-ringtail -n shower logs deploy/shower
curl -sf https://shower.ops.eblu.me/  # tailnet
curl -sf https://shower.eblu.me/      # public
curl -I https://shower.eblu.me/admin/users/  # expect 403 (edge block)
curl -I https://shower.ops.eblu.me/admin/    # expect 200 / 302 (login)