C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
## Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
### What's included
- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.
### Defense layers on the public surface
1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)
## Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging
## Test plan
- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/349
2026-05-11 13:47:18 -07:00
|
|
|
# Nix-built shower app container — Adelaide / Heidi / Addie baby shower.
|
|
|
|
|
#
|
|
|
|
|
# The app is published as a wheel to the Forgejo PyPI index at
|
C1: deploy shower v1.1.0 (phases + guest memories) (#354)
## Summary
Deploys `adelaide-baby-shower-app` **v1.1.0** to ringtail k3s.
### App changes (since v1.0.2)
- **Four-phase `ShowerState`** replaces the boolean `locked` flag — `pre_event` → `party` → `prizes_locked` → `event_locked` — with a backfill migration that maps `locked=True → pre_event`, `locked=False → party`.
- **Guest memories**: append-only photos + comments panel where guests can leave notes for the baby. Adds `GuestPhoto` + `GuestComment` models with file-extension validators and a max-size validator; new `shower.imaging` module for thumbnail generation.
- **Admin + QR polish**: configurable host link, fixed "View Site" URL, guest-facing QR copy improvements, contest tweaks.
Three Django migrations run automatically in the entrypoint against the SQLite PV:
- `0009_shower_phase`
- `0010_guest_memories`
- `0011_book_description`
No ConfigMap / env-var changes. The deploy uses `strategy: Recreate` with a single replica, so the old pod releases the data PVC before the new one mounts it and runs migrations.
### Container build changes
The v1.1.0 tag exposed a latent issue with the Forgejo PyPI install path:
- The recent commit [2d38418e](https://forge.eblu.me/eblume/blumeops/commit/2d38418e) closed the forge package leak at the Fly edge by blocking `/api/packages/*` publicly.
- Forgejo's PyPI simple index returns absolute file URLs hardcoded to its public `ROOT_URL` (`forge.eblu.me`), so pip-installing from the tailnet index URL still tries to download from `forge.eblu.me` → 403.
- Previous shower builds escaped this because their FOD outputs were already in the nix store; bumping to a new version forced a fresh pip run that hit the block.
Fix mirrors what we already do for the sdist: both wheel and sdist are pulled via direct `fetchurl` against `forge.ops.eblu.me`, then the wheel is copied to TMPDIR under its clean filename (nix store path's hash prefix breaks pip's wheel-filename parser) and handed to pip as a local path. The forge `--extra-index-url` is no longer needed.
FOD outputHash pinned to `sha256-kTNOswobtkgyQmmqbQM8XO4vvaGg57nCuuZGbNXb0NM=` from run 547. Image: `registry.ops.eblu.me/blumeops/shower:v1.1.0-444ff91-nix`.
### Adjacent finding (already handled)
The ringtail `gitea-runner-nix_container_builder` systemd unit was left `inactive` after the recent `provision-ringtail` (matches the known `sshd-restart-hangs-mux` lesson — the rebuild changed the unit's PATH closure + config.yaml, systemd stopped it, then the playbook hung before the activation could restart it). Manually started; the existing memory `lesson_provision_ringtail_ssh_hang.md` was extended to mention the runner as the canary service to check after provisions.
## Test plan
- [ ] `argocd app diff shower --revision shower-v1.1.0` — review the manifest change
- [ ] `argocd app set shower --revision shower-v1.1.0 && argocd app sync shower`
- [ ] `kubectl --context=k3s-ringtail logs -n shower deploy/shower` — confirm migrations 0009/0010/0011 applied, no errors
- [ ] Hit `https://shower.ops.eblu.me/` (tailnet) — splash page renders, phase indicator visible
- [ ] Hit `https://shower.ops.eblu.me/host/` — host console loads, phase dropdown shows the four states
- [ ] Hit `https://shower.eblu.me/` (public via Fly) — splash page still served
- [ ] After merge: `argocd app set shower --revision main && argocd app sync shower`
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/354
2026-05-11 20:08:03 -07:00
|
|
|
# https://forge.ops.eblu.me/api/packages/eblume/pypi/ (tailnet-only — the
|
|
|
|
|
# public forge.eblu.me /api/packages/* surface is blocked at the Fly edge).
|
|
|
|
|
# We can't point pip at Forgejo's simple index even from the tailnet,
|
|
|
|
|
# because Forgejo's index returns absolute file URLs hardcoded to its
|
|
|
|
|
# public ROOT_URL (forge.eblu.me), which then 403s. So both the wheel and
|
|
|
|
|
# the sdist are pulled by direct `fetchurl` against forge.ops.eblu.me, and
|
|
|
|
|
# the wheel is then handed to `pip install` as a local path; transitive
|
|
|
|
|
# deps come from pypi.ops.eblu.me. Build runs on the nix-container-builder
|
|
|
|
|
# runner (ringtail, amd64) so the image is native.
|
C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
## Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
### What's included
- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.
### Defense layers on the public surface
1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)
## Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging
## Test plan
- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/349
2026-05-11 13:47:18 -07:00
|
|
|
#
|
|
|
|
|
# Going through pip-install-target rather than nixpkgs Python packages
|
|
|
|
|
# sidesteps two issues we hit going through `python.pkgs.buildPythonPackage`:
|
|
|
|
|
# 1. python314Packages.django still aliases to Django 4.2 LTS, which
|
|
|
|
|
# doesn't support Python 3.14 at all.
|
|
|
|
|
# 2. django-axes pulls selenium + browser fonts into its check phase
|
|
|
|
|
# and the nix sandbox can't provide those.
|
|
|
|
|
#
|
|
|
|
|
# To bump the version:
|
|
|
|
|
# 1. Update `version` below.
|
|
|
|
|
# 2. Set `outputHash` to `pkgs.lib.fakeHash`, run the build, copy the
|
|
|
|
|
# real hash out of the error, and commit it.
|
|
|
|
|
{ pkgs ? import <nixpkgs> { } }:
|
|
|
|
|
|
|
|
|
|
let
|
2026-05-13 20:12:00 -07:00
|
|
|
version = "1.1.1";
|
C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
## Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
### What's included
- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.
### Defense layers on the public surface
1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)
## Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging
## Test plan
- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/349
2026-05-11 13:47:18 -07:00
|
|
|
|
|
|
|
|
python = pkgs.python314;
|
|
|
|
|
|
|
|
|
|
# The repo's top-level static/ directory (vendored Sortable + cropper
|
|
|
|
|
# JS/CSS, prize placeholder SVG) isn't shipped in the wheel — hatchling
|
|
|
|
|
# only packages config/ and shower/, leaving the repo-root static/
|
|
|
|
|
# behind. Pull the sdist (which contains the full source tree) and
|
|
|
|
|
# extract just the static/ subtree into the image as /app/static.
|
|
|
|
|
# local_settings adds it to STATICFILES_DIRS so collectstatic at boot
|
|
|
|
|
# picks it up alongside the Django admin's static files.
|
|
|
|
|
#
|
|
|
|
|
# Fetched from forge.ops.eblu.me (tailnet) because /api/packages/* is
|
|
|
|
|
# blocked at the fly edge — see fly/nginx.conf forge.eblu.me block.
|
|
|
|
|
# Hash is the upstream sha256 from forge PyPI's simple index.
|
|
|
|
|
showerSdist = pkgs.fetchurl {
|
|
|
|
|
name = "adelaide_baby_shower_app-${version}.tar.gz";
|
|
|
|
|
url = "https://forge.ops.eblu.me/api/packages/eblume/pypi/files/adelaide-baby-shower-app/${version}/adelaide_baby_shower_app-${version}.tar.gz";
|
2026-05-13 20:12:00 -07:00
|
|
|
hash = "sha256-muvjkcKnLrrQTb8HZ4cH9SD0pab05JSFSgwheqb0AyM=";
|
C1: deploy shower v1.1.0 (phases + guest memories) (#354)
## Summary
Deploys `adelaide-baby-shower-app` **v1.1.0** to ringtail k3s.
### App changes (since v1.0.2)
- **Four-phase `ShowerState`** replaces the boolean `locked` flag — `pre_event` → `party` → `prizes_locked` → `event_locked` — with a backfill migration that maps `locked=True → pre_event`, `locked=False → party`.
- **Guest memories**: append-only photos + comments panel where guests can leave notes for the baby. Adds `GuestPhoto` + `GuestComment` models with file-extension validators and a max-size validator; new `shower.imaging` module for thumbnail generation.
- **Admin + QR polish**: configurable host link, fixed "View Site" URL, guest-facing QR copy improvements, contest tweaks.
Three Django migrations run automatically in the entrypoint against the SQLite PV:
- `0009_shower_phase`
- `0010_guest_memories`
- `0011_book_description`
No ConfigMap / env-var changes. The deploy uses `strategy: Recreate` with a single replica, so the old pod releases the data PVC before the new one mounts it and runs migrations.
### Container build changes
The v1.1.0 tag exposed a latent issue with the Forgejo PyPI install path:
- The recent commit [2d38418e](https://forge.eblu.me/eblume/blumeops/commit/2d38418e) closed the forge package leak at the Fly edge by blocking `/api/packages/*` publicly.
- Forgejo's PyPI simple index returns absolute file URLs hardcoded to its public `ROOT_URL` (`forge.eblu.me`), so pip-installing from the tailnet index URL still tries to download from `forge.eblu.me` → 403.
- Previous shower builds escaped this because their FOD outputs were already in the nix store; bumping to a new version forced a fresh pip run that hit the block.
Fix mirrors what we already do for the sdist: both wheel and sdist are pulled via direct `fetchurl` against `forge.ops.eblu.me`, then the wheel is copied to TMPDIR under its clean filename (nix store path's hash prefix breaks pip's wheel-filename parser) and handed to pip as a local path. The forge `--extra-index-url` is no longer needed.
FOD outputHash pinned to `sha256-kTNOswobtkgyQmmqbQM8XO4vvaGg57nCuuZGbNXb0NM=` from run 547. Image: `registry.ops.eblu.me/blumeops/shower:v1.1.0-444ff91-nix`.
### Adjacent finding (already handled)
The ringtail `gitea-runner-nix_container_builder` systemd unit was left `inactive` after the recent `provision-ringtail` (matches the known `sshd-restart-hangs-mux` lesson — the rebuild changed the unit's PATH closure + config.yaml, systemd stopped it, then the playbook hung before the activation could restart it). Manually started; the existing memory `lesson_provision_ringtail_ssh_hang.md` was extended to mention the runner as the canary service to check after provisions.
## Test plan
- [ ] `argocd app diff shower --revision shower-v1.1.0` — review the manifest change
- [ ] `argocd app set shower --revision shower-v1.1.0 && argocd app sync shower`
- [ ] `kubectl --context=k3s-ringtail logs -n shower deploy/shower` — confirm migrations 0009/0010/0011 applied, no errors
- [ ] Hit `https://shower.ops.eblu.me/` (tailnet) — splash page renders, phase indicator visible
- [ ] Hit `https://shower.ops.eblu.me/host/` — host console loads, phase dropdown shows the four states
- [ ] Hit `https://shower.eblu.me/` (public via Fly) — splash page still served
- [ ] After merge: `argocd app set shower --revision main && argocd app sync shower`
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/354
2026-05-11 20:08:03 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
# Wheel pulled from forge.ops.eblu.me (tailnet) for the same reason the
|
|
|
|
|
# sdist is: Forgejo's PyPI simple index would return forge.eblu.me URLs
|
|
|
|
|
# that the Fly edge 403s on /api/packages/*. We hand this path to pip
|
|
|
|
|
# below so it never touches the forge index at all.
|
|
|
|
|
showerWheel = pkgs.fetchurl {
|
|
|
|
|
name = "adelaide_baby_shower_app-${version}-py3-none-any.whl";
|
|
|
|
|
url = "https://forge.ops.eblu.me/api/packages/eblume/pypi/files/adelaide-baby-shower-app/${version}/adelaide_baby_shower_app-${version}-py3-none-any.whl";
|
2026-05-13 20:12:00 -07:00
|
|
|
hash = "sha256-dorrwHhZhOn9Qq6Wk3Su24HckgaWtWbkMY7RtAvomv4=";
|
C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
## Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
### What's included
- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.
### Defense layers on the public surface
1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)
## Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging
## Test plan
- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/349
2026-05-11 13:47:18 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
staticAssets = pkgs.runCommand "shower-static-assets-${version}" { } ''
|
|
|
|
|
${pkgs.gnutar}/bin/tar -xzf ${showerSdist} -C $TMPDIR
|
|
|
|
|
cp -r $TMPDIR/adelaide_baby_shower_app-${version}/static $out
|
|
|
|
|
'';
|
|
|
|
|
|
|
|
|
|
# Fixed-output derivation: pip-installs the app wheel + every transitive
|
|
|
|
|
# dep into a single target dir. FODs get network access in exchange for
|
|
|
|
|
# a pinned output hash, which means the whole dependency closure is
|
|
|
|
|
# immutable across rebuilds.
|
|
|
|
|
pyDepsFOD = pkgs.stdenv.mkDerivation {
|
|
|
|
|
pname = "shower-python-deps-fod";
|
|
|
|
|
inherit version;
|
|
|
|
|
|
|
|
|
|
dontUnpack = true;
|
|
|
|
|
|
|
|
|
|
nativeBuildInputs = [ python pkgs.cacert pkgs.removeReferencesTo ];
|
|
|
|
|
|
|
|
|
|
buildPhase = ''
|
|
|
|
|
runHook preBuild
|
|
|
|
|
|
|
|
|
|
export HOME=$TMPDIR
|
|
|
|
|
export SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt
|
|
|
|
|
export PIP_DISABLE_PIP_VERSION_CHECK=1
|
|
|
|
|
|
|
|
|
|
${python}/bin/python -m venv "$TMPDIR/venv"
|
|
|
|
|
"$TMPDIR/venv/bin/pip" install --upgrade pip
|
C1: deploy shower v1.1.0 (phases + guest memories) (#354)
## Summary
Deploys `adelaide-baby-shower-app` **v1.1.0** to ringtail k3s.
### App changes (since v1.0.2)
- **Four-phase `ShowerState`** replaces the boolean `locked` flag — `pre_event` → `party` → `prizes_locked` → `event_locked` — with a backfill migration that maps `locked=True → pre_event`, `locked=False → party`.
- **Guest memories**: append-only photos + comments panel where guests can leave notes for the baby. Adds `GuestPhoto` + `GuestComment` models with file-extension validators and a max-size validator; new `shower.imaging` module for thumbnail generation.
- **Admin + QR polish**: configurable host link, fixed "View Site" URL, guest-facing QR copy improvements, contest tweaks.
Three Django migrations run automatically in the entrypoint against the SQLite PV:
- `0009_shower_phase`
- `0010_guest_memories`
- `0011_book_description`
No ConfigMap / env-var changes. The deploy uses `strategy: Recreate` with a single replica, so the old pod releases the data PVC before the new one mounts it and runs migrations.
### Container build changes
The v1.1.0 tag exposed a latent issue with the Forgejo PyPI install path:
- The recent commit [2d38418e](https://forge.eblu.me/eblume/blumeops/commit/2d38418e) closed the forge package leak at the Fly edge by blocking `/api/packages/*` publicly.
- Forgejo's PyPI simple index returns absolute file URLs hardcoded to its public `ROOT_URL` (`forge.eblu.me`), so pip-installing from the tailnet index URL still tries to download from `forge.eblu.me` → 403.
- Previous shower builds escaped this because their FOD outputs were already in the nix store; bumping to a new version forced a fresh pip run that hit the block.
Fix mirrors what we already do for the sdist: both wheel and sdist are pulled via direct `fetchurl` against `forge.ops.eblu.me`, then the wheel is copied to TMPDIR under its clean filename (nix store path's hash prefix breaks pip's wheel-filename parser) and handed to pip as a local path. The forge `--extra-index-url` is no longer needed.
FOD outputHash pinned to `sha256-kTNOswobtkgyQmmqbQM8XO4vvaGg57nCuuZGbNXb0NM=` from run 547. Image: `registry.ops.eblu.me/blumeops/shower:v1.1.0-444ff91-nix`.
### Adjacent finding (already handled)
The ringtail `gitea-runner-nix_container_builder` systemd unit was left `inactive` after the recent `provision-ringtail` (matches the known `sshd-restart-hangs-mux` lesson — the rebuild changed the unit's PATH closure + config.yaml, systemd stopped it, then the playbook hung before the activation could restart it). Manually started; the existing memory `lesson_provision_ringtail_ssh_hang.md` was extended to mention the runner as the canary service to check after provisions.
## Test plan
- [ ] `argocd app diff shower --revision shower-v1.1.0` — review the manifest change
- [ ] `argocd app set shower --revision shower-v1.1.0 && argocd app sync shower`
- [ ] `kubectl --context=k3s-ringtail logs -n shower deploy/shower` — confirm migrations 0009/0010/0011 applied, no errors
- [ ] Hit `https://shower.ops.eblu.me/` (tailnet) — splash page renders, phase indicator visible
- [ ] Hit `https://shower.ops.eblu.me/host/` — host console loads, phase dropdown shows the four states
- [ ] Hit `https://shower.eblu.me/` (public via Fly) — splash page still served
- [ ] After merge: `argocd app set shower --revision main && argocd app sync shower`
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/354
2026-05-11 20:08:03 -07:00
|
|
|
|
|
|
|
|
# Nix store paths embed a 32-char hash prefix, which pip's wheel
|
|
|
|
|
# filename parser rejects ("Invalid wheel filename"). Copy to a
|
|
|
|
|
# clean filename in TMPDIR before installing.
|
|
|
|
|
cp ${showerWheel} "$TMPDIR/${showerWheel.name}"
|
|
|
|
|
|
C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
## Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
### What's included
- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.
### Defense layers on the public surface
1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)
## Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging
## Test plan
- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/349
2026-05-11 13:47:18 -07:00
|
|
|
"$TMPDIR/venv/bin/pip" install \
|
|
|
|
|
--no-cache-dir \
|
|
|
|
|
--index-url=https://pypi.ops.eblu.me/root/pypi/+simple/ \
|
C1: deploy shower v1.1.0 (phases + guest memories) (#354)
## Summary
Deploys `adelaide-baby-shower-app` **v1.1.0** to ringtail k3s.
### App changes (since v1.0.2)
- **Four-phase `ShowerState`** replaces the boolean `locked` flag — `pre_event` → `party` → `prizes_locked` → `event_locked` — with a backfill migration that maps `locked=True → pre_event`, `locked=False → party`.
- **Guest memories**: append-only photos + comments panel where guests can leave notes for the baby. Adds `GuestPhoto` + `GuestComment` models with file-extension validators and a max-size validator; new `shower.imaging` module for thumbnail generation.
- **Admin + QR polish**: configurable host link, fixed "View Site" URL, guest-facing QR copy improvements, contest tweaks.
Three Django migrations run automatically in the entrypoint against the SQLite PV:
- `0009_shower_phase`
- `0010_guest_memories`
- `0011_book_description`
No ConfigMap / env-var changes. The deploy uses `strategy: Recreate` with a single replica, so the old pod releases the data PVC before the new one mounts it and runs migrations.
### Container build changes
The v1.1.0 tag exposed a latent issue with the Forgejo PyPI install path:
- The recent commit [2d38418e](https://forge.eblu.me/eblume/blumeops/commit/2d38418e) closed the forge package leak at the Fly edge by blocking `/api/packages/*` publicly.
- Forgejo's PyPI simple index returns absolute file URLs hardcoded to its public `ROOT_URL` (`forge.eblu.me`), so pip-installing from the tailnet index URL still tries to download from `forge.eblu.me` → 403.
- Previous shower builds escaped this because their FOD outputs were already in the nix store; bumping to a new version forced a fresh pip run that hit the block.
Fix mirrors what we already do for the sdist: both wheel and sdist are pulled via direct `fetchurl` against `forge.ops.eblu.me`, then the wheel is copied to TMPDIR under its clean filename (nix store path's hash prefix breaks pip's wheel-filename parser) and handed to pip as a local path. The forge `--extra-index-url` is no longer needed.
FOD outputHash pinned to `sha256-kTNOswobtkgyQmmqbQM8XO4vvaGg57nCuuZGbNXb0NM=` from run 547. Image: `registry.ops.eblu.me/blumeops/shower:v1.1.0-444ff91-nix`.
### Adjacent finding (already handled)
The ringtail `gitea-runner-nix_container_builder` systemd unit was left `inactive` after the recent `provision-ringtail` (matches the known `sshd-restart-hangs-mux` lesson — the rebuild changed the unit's PATH closure + config.yaml, systemd stopped it, then the playbook hung before the activation could restart it). Manually started; the existing memory `lesson_provision_ringtail_ssh_hang.md` was extended to mention the runner as the canary service to check after provisions.
## Test plan
- [ ] `argocd app diff shower --revision shower-v1.1.0` — review the manifest change
- [ ] `argocd app set shower --revision shower-v1.1.0 && argocd app sync shower`
- [ ] `kubectl --context=k3s-ringtail logs -n shower deploy/shower` — confirm migrations 0009/0010/0011 applied, no errors
- [ ] Hit `https://shower.ops.eblu.me/` (tailnet) — splash page renders, phase indicator visible
- [ ] Hit `https://shower.ops.eblu.me/host/` — host console loads, phase dropdown shows the four states
- [ ] Hit `https://shower.eblu.me/` (public via Fly) — splash page still served
- [ ] After merge: `argocd app set shower --revision main && argocd app sync shower`
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/354
2026-05-11 20:08:03 -07:00
|
|
|
"$TMPDIR/${showerWheel.name}" \
|
C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
## Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
### What's included
- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.
### Defense layers on the public surface
1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)
## Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging
## Test plan
- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/349
2026-05-11 13:47:18 -07:00
|
|
|
gunicorn
|
|
|
|
|
|
|
|
|
|
runHook postBuild
|
|
|
|
|
'';
|
|
|
|
|
|
|
|
|
|
installPhase = ''
|
|
|
|
|
runHook preInstall
|
|
|
|
|
|
|
|
|
|
mkdir -p $out/lib/python3.14 $out/bin
|
|
|
|
|
cp -r "$TMPDIR/venv/lib/python3.14/site-packages" $out/lib/python3.14/site-packages
|
|
|
|
|
|
|
|
|
|
for script in "$TMPDIR/venv/bin/"*; do
|
|
|
|
|
[ -f "$script" ] || continue
|
|
|
|
|
name=$(basename "$script")
|
|
|
|
|
case "$name" in
|
|
|
|
|
python*|pip*|activate*) continue ;;
|
|
|
|
|
esac
|
|
|
|
|
cp "$script" "$out/bin/$name"
|
|
|
|
|
chmod +x "$out/bin/$name"
|
|
|
|
|
done
|
|
|
|
|
|
|
|
|
|
# --- Strip Nix store references (FOD outputs must be self-contained) ---
|
|
|
|
|
# The wrapper derivation below restores them via autoPatchelfHook + a
|
|
|
|
|
# python wrapper that points pyc-less imports at the on-image python.
|
|
|
|
|
|
|
|
|
|
# Strip bytecode entirely — pyc files embed compile-time paths.
|
|
|
|
|
find $out -type f -name '*.pyc' -delete
|
|
|
|
|
find $out -type d -name '__pycache__' -exec rm -rf {} + 2>/dev/null || true
|
|
|
|
|
|
|
|
|
|
# Dynamically discover all nix store references and strip them. We
|
|
|
|
|
# don't have a static list because pip pulls in stdenv via Python's
|
|
|
|
|
# build env (gcc-lib, libstdc++, etc.) and the closure is opaque.
|
|
|
|
|
{ find $out -type f -print0 \
|
|
|
|
|
| xargs -0 grep -aohE '/nix/store/[a-z0-9]{32}-[^/"[:space:]]+' 2>/dev/null \
|
|
|
|
|
|| true; } | sort -u > $TMPDIR/store-refs.txt
|
|
|
|
|
echo "Found $(wc -l < $TMPDIR/store-refs.txt) unique store path references to strip"
|
|
|
|
|
|
|
|
|
|
refs_args=""
|
|
|
|
|
while IFS= read -r ref; do
|
|
|
|
|
refs_args="$refs_args -t $ref"
|
|
|
|
|
done < $TMPDIR/store-refs.txt
|
|
|
|
|
|
|
|
|
|
if [ -n "$refs_args" ]; then
|
|
|
|
|
find $out -type f -exec remove-references-to $refs_args {} + 2>/dev/null || true
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
remaining=$({ find $out -type f -print0 | xargs -0 grep -cl '/nix/store/' 2>/dev/null || true; } | wc -l)
|
|
|
|
|
echo "Files with remaining store references: $remaining"
|
|
|
|
|
|
|
|
|
|
runHook postInstall
|
|
|
|
|
'';
|
|
|
|
|
|
|
|
|
|
outputHashMode = "recursive";
|
|
|
|
|
outputHashAlgo = "sha256";
|
|
|
|
|
# Pinned dep closure — reproducible until version bumps. To recompute,
|
|
|
|
|
# set to pkgs.lib.fakeHash and read the failure.
|
2026-05-13 20:40:22 -07:00
|
|
|
outputHash = "sha256-HTTmAldIijG03pYZNyO72LBNPCrjmyJQKgW+gU9NplI=";
|
C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
## Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
### What's included
- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.
### Defense layers on the public surface
1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)
## Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging
## Test plan
- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/349
2026-05-11 13:47:18 -07:00
|
|
|
|
|
|
|
|
dontFixup = true;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
# Non-FOD wrapper: re-applies RPATHs to pre-built .so files (pillow,
|
|
|
|
|
# scipy) so they find libstdc++ / libz / etc. at runtime. autoPatchelfHook
|
|
|
|
|
# discovers needed libraries from buildInputs.
|
|
|
|
|
pyDeps = pkgs.stdenv.mkDerivation {
|
|
|
|
|
pname = "shower-python-deps";
|
|
|
|
|
inherit version;
|
|
|
|
|
|
|
|
|
|
dontUnpack = true;
|
|
|
|
|
|
|
|
|
|
nativeBuildInputs = [ pkgs.autoPatchelfHook ];
|
|
|
|
|
|
|
|
|
|
buildInputs = with pkgs; [
|
|
|
|
|
python
|
|
|
|
|
stdenv.cc.cc.lib # libstdc++, libgcc_s
|
|
|
|
|
zlib
|
|
|
|
|
libjpeg
|
|
|
|
|
libwebp
|
|
|
|
|
libtiff
|
|
|
|
|
openjpeg
|
|
|
|
|
lcms2
|
|
|
|
|
freetype
|
|
|
|
|
];
|
|
|
|
|
|
|
|
|
|
installPhase = ''
|
|
|
|
|
cp -r ${pyDepsFOD} $out
|
|
|
|
|
chmod -R u+w $out
|
|
|
|
|
'';
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
sitePackages = "${pyDeps}/lib/python3.14/site-packages";
|
|
|
|
|
|
|
|
|
|
# Settings shim — config/settings.py's `BASE_DIR = parent.parent` would
|
|
|
|
|
# otherwise resolve to site-packages, scattering db.sqlite3 / media /
|
|
|
|
|
# staticfiles into the venv. Pin them to /app/{data,media,data/staticfiles}.
|
|
|
|
|
localSettings = pkgs.writeText "local_settings.py" ''
|
|
|
|
|
from pathlib import Path
|
|
|
|
|
|
|
|
|
|
from config.settings import * # noqa: F401,F403
|
|
|
|
|
|
|
|
|
|
DATABASES["default"]["NAME"] = "/app/data/db.sqlite3"
|
|
|
|
|
MEDIA_ROOT = "/app/media"
|
|
|
|
|
STATIC_ROOT = "/app/data/staticfiles"
|
|
|
|
|
# /app/static comes from the repo-root static/ subtree of the sdist
|
|
|
|
|
# (see default.nix staticAssets). Added because the wheel doesn't
|
|
|
|
|
# ship vendored Sortable/cropper assets.
|
|
|
|
|
STATICFILES_DIRS = [Path("/app/static")]
|
|
|
|
|
'';
|
|
|
|
|
|
|
|
|
|
# PYTHONPATH, DJANGO_SETTINGS_MODULE, PATH, and HOME live in the image's
|
|
|
|
|
# `Env` block below — that way `kubectl exec deploy/shower -- python -m
|
|
|
|
|
# django <subcommand>` Just Works without an inline `env` ceremony.
|
|
|
|
|
# The entrypoint just changes directory and runs the boot sequence.
|
|
|
|
|
entrypoint = pkgs.writeShellScript "shower-entrypoint" ''
|
|
|
|
|
set -eu
|
|
|
|
|
|
|
|
|
|
cd /app
|
|
|
|
|
|
|
|
|
|
mkdir -p /app/data /app/media
|
|
|
|
|
|
|
|
|
|
echo "shower: running migrations"
|
|
|
|
|
python -m django migrate --noinput
|
|
|
|
|
|
|
|
|
|
echo "shower: collecting static files"
|
|
|
|
|
python -m django collectstatic --noinput --clear
|
|
|
|
|
|
|
|
|
|
echo "shower: starting gunicorn"
|
|
|
|
|
exec gunicorn \
|
|
|
|
|
--bind 0.0.0.0:8000 \
|
|
|
|
|
--workers 2 \
|
|
|
|
|
--forwarded-allow-ips='*' \
|
|
|
|
|
config.wsgi:application
|
|
|
|
|
'';
|
|
|
|
|
in
|
|
|
|
|
|
|
|
|
|
pkgs.dockerTools.buildLayeredImage {
|
|
|
|
|
name = "blumeops/shower";
|
|
|
|
|
contents = [
|
|
|
|
|
python
|
|
|
|
|
pyDeps
|
|
|
|
|
pkgs.cacert
|
|
|
|
|
pkgs.tzdata
|
|
|
|
|
pkgs.bashInteractive
|
|
|
|
|
pkgs.coreutils
|
|
|
|
|
];
|
|
|
|
|
|
|
|
|
|
extraCommands = ''
|
|
|
|
|
mkdir -p app/data app/media tmp
|
|
|
|
|
chmod 1777 tmp
|
|
|
|
|
cp ${localSettings} app/local_settings.py
|
|
|
|
|
cp -r ${staticAssets} app/static
|
|
|
|
|
chmod -R u+w app/static
|
|
|
|
|
'';
|
|
|
|
|
|
|
|
|
|
fakeRootCommands = ''
|
|
|
|
|
chown -R 1000:1000 app
|
|
|
|
|
'';
|
|
|
|
|
enableFakechroot = true;
|
|
|
|
|
|
|
|
|
|
config = {
|
|
|
|
|
Entrypoint = [ "${entrypoint}" ];
|
|
|
|
|
Env = [
|
|
|
|
|
"SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt"
|
|
|
|
|
"TZDIR=${pkgs.tzdata}/share/zoneinfo"
|
|
|
|
|
"TZ=America/Los_Angeles"
|
|
|
|
|
"TMPDIR=/tmp"
|
|
|
|
|
"LANG=C.UTF-8"
|
|
|
|
|
"LC_ALL=C.UTF-8"
|
|
|
|
|
"PYTHONDONTWRITEBYTECODE=1"
|
|
|
|
|
"HOME=/app/data"
|
|
|
|
|
"PATH=${pyDeps}/bin:${python}/bin:/bin"
|
|
|
|
|
# /app first so local_settings.py is importable; sitePackages second so
|
|
|
|
|
# django, gunicorn, etc. resolve. Inherited by entrypoint + any
|
|
|
|
|
# `kubectl exec` so manual django subcommands work without ceremony.
|
|
|
|
|
"PYTHONPATH=/app:${sitePackages}"
|
|
|
|
|
"DJANGO_SETTINGS_MODULE=local_settings"
|
|
|
|
|
];
|
|
|
|
|
ExposedPorts = {
|
|
|
|
|
"8000/tcp" = { };
|
|
|
|
|
};
|
|
|
|
|
User = "1000";
|
|
|
|
|
WorkingDir = "/app";
|
|
|
|
|
};
|
|
|
|
|
}
|