C1: deploy adelaide-baby-shower-app to ringtail k3s #349
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "shower-app-deploy"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for:
shower.eblu.me(public, via Fly proxy) andshower.ops.eblu.me(tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives atadelaide-baby-shower-app.What's included
argocd/manifests/shower/(deployment, service, ProxyGroup ingress, ConfigMap forDJANGO_DEBUG/DJANGO_ADMIN_URL, ExternalSecret forDJANGO_SECRET_KEYfrom 1Password itemShower (blumeops), NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.fly/): newshower.eblu.meserver block proxying toshower.ops.eblu.me./admin/returns 403 at the edge except/admin/login/and/admin/logout/, which are rate-limited via a newshower_authzone.X-Clacks-Overheadon. GNU Terry Pratchett.shower-admin-login.conf) matching 401/403/429 on/admin/login/and jail (shower.conf) withmaxretry=5/findtime=600/bantime=3600. Thenginx-denyaction was generalized to take a per-jailnginx_deny_fileso the shower has its own deny list (forge keeps using the legacy default).shower.ops.eblu.me→https://shower.tail8d86e.ts.net).shower.eblu.me → blumeops-proxy.fly.dev..configmap-shower-apm.yaml(request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroringdocs-apm.jsonwith ahost="shower.eblu.me"filter.containers/shower/default.nix—dockerTools.buildLayeredImagewith a nixpkgs Python and a startup wrapper that creates/app/data/.venv, pip-installsadelaide-baby-shower-app==1.0.0from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. Alocal_settings.pyshim pinsDATABASES.NAME/MEDIA_ROOT/STATIC_ROOTto absolute paths so they don't end up in site-packages.docs/how-to/operations/shower-app.mdlinked from the apps registry, plus changelog fragments.Defense layers on the public surface
$shower_banned(per-service deny list)limit_req zone=shower_auth(3 r/s per Fly-Client-IP)/admin/block (returns 403 for anything that isn't login/logout)Prerequisites for the user to do (NOT in this PR)
Halted on these per request — they touch shared/manual systems:
/volume1/shower, NFS rule for ringtail RW,chown 1000:1000Shower (blumeops)in the blumeops vault with a freshly mintedsecret-keyfield (openssl rand -base64 48) — do NOT reuse anything that has lived in gitmise run container-build-and-release shower, then updateimages[].newTaginargocd/manifests/shower/kustomization.yamlto the resultingv1.0.0-<sha>-nixmise run dns-upafter mergefly certs add shower.eblu.me -a blumeops-proxymise run provision-indri -- --tags caddymise run fly-deployargocd app set shower --revision shower-app-deploy && argocd app sync showerto test from this branch before mergingTest plan
kubectl --context=k3s-ringtail -n shower logs deploy/showercleancurl -sf https://shower.ops.eblu.me/returns the splash page (tailnet)curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/returns 200 (pre-DNS verification)curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/returns 403 (edge block)curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/returns a Django login responsecurl -I https://shower.eblu.me/returns 200 withX-Clacks-Overheadmise run services-checkpassesAdds the Adelaide / Heidi / Addie baby shower app — a Django guest splash, raffle picker, and prize-assignment console — on ringtail k3s. Public landing at shower.eblu.me (via fly proxy), tailnet admin at shower.ops.eblu.me. App source: forge.eblu.me/eblume/adelaide-baby-shower-app, wheel-published to the Forgejo Packages PyPI index. Manifests under argocd/manifests/shower/: NFS-backed PVC for /app/media, local-path PVC for SQLite, ExternalSecret pulling DJANGO_SECRET_KEY from 1Password (item "Shower (blumeops)"), Tailscale ProxyGroup ingress. Defense-in-depth for the public surface: - /admin/ blocked at the fly edge except /admin/login/ and /admin/logout/ - shower_auth rate limit on the login path - new fail2ban filter+jail with a per-service shower-deny.conf (nginx-deny action generalized to accept nginx_deny_file) - django-axes (5 / 1h) keyed on (username, ip_address) Plus: Caddy route on indri, Pulumi gandi CNAME, Grafana APM dashboard mirroring docs-apm.json, runbook at how-to/operations/shower-app.md, and a service-versions entry. X-Clacks-Overhead set on the new server block — GNU Terry Pratchett. Build: containers/shower/default.nix uses dockerTools to ship a nixpkgs Python plus a startup wrapper that installs the wheel into /app/data/.venv on first boot and execs gunicorn. Lets the wheel come from forge PyPI without pinning hashes for every transitive dep. Prerequisites tracked in the runbook (not yet executed): - NFS share sifaka:/volume1/shower (manual Synology step) - 1Password item "Shower (blumeops)" with secret-key field - container build via `mise run container-build-and-release shower` - Pulumi dns-up after merge - fly certs add shower.eblu.me Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Three follow-ups on the shower deployment branch: 1. containers/shower/default.nix now uses buildPythonPackage to install the adelaide-baby-shower-app wheel + its deps at nix build time. The wheel comes from the forge PyPI index with a pinned SRI hash. The entrypoint no longer does pip-at-boot — it just runs migrations, collectstatic, and execs gunicorn. 2. ansible/roles/borgmatic/defaults/main.yml: - Adds shower to borgmatic_k8s_sqlite_dumps (context k3s-ringtail) so /app/data/db.sqlite3 is dumped via kubectl exec on every run. - Adds /Volumes/shower (sifaka SMB mount on indri) to borgmatic_source_directories so prize-photo media gets archived. 3. NFS share docs corrected to match the real on-sifaka pattern: exports allowlist 192.168.1.0/24 + 100.64.0.0/10 with all_squash to admin (matching frigate/paperless/etc.), not "Squash=No mapping". The pod's runAsUser doesn't need to match an on-disk uid because all_squash rewrites every write to admin:users. Also adds a missing service-versions entry for the tailscale container introduced in PR #347 — pre-existing gap surfaced by the container-version-check hook on this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>The buildPythonPackage approach with `propagatedBuildInputs = [ python.pkgs.django ... ]` doesn't work: 1. nixpkgs python314Packages.django still aliases to Django 4.2 LTS, which doesn't support Python 3.14. 2. django-axes from nixpkgs pulls selenium + browser fonts into its check phase, and the nix sandbox can't provide those (fontconfig errors, then build dep tree collapses). Switching to authentik's FOD pattern instead: a single fixed-output derivation that pip-installs the adelaide-baby-shower-app wheel + every transitive dep from forge PyPI into a target dir. FODs get network access in exchange for a pinned output hash, so the closure stays reproducible. outputHash is set to fakeHash for the first build — the runner will print the real hash on failure; a follow-up commit will pin it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>@ -0,0 +6,4 @@data:DJANGO_DEBUG: "0"# Admin lives behind the tailnet; the public proxy blocks /admin/ except# /admin/login/ and /admin/logout/. /host/'s "Django admin" link followsHmm can you please remind me why /admin/login and /admin/logout need to be accessible on WAN? Can't we just forward any logins/logouts to the tailnet hostname as well, and thus not expose admin login on WAN at all?
@ -0,0 +1,224 @@---title: Shower App on RingtailThis is a good how-to article, but let's also have a reference page for this app too - aim for a 30s read time, just basic facts and links to other cards.
PR review caught that we didn't need an admin login surface on WAN. App v1.0.1 adds DJANGO_PUBLIC_URL_BASE so QR codes generated from /host/ (now tailnet-only) still point at shower.eblu.me for guest phones — that closes the loop and lets us strip the WAN admin surface entirely. Container: - bump version to 1.0.1 - outputHash → fakeHash (build will print the real one) - entrypoint still does migrate + collectstatic before gunicorn — the app is small enough that auto-migration is fine Manifests: - configmap adds DJANGO_PUBLIC_URL_BASE=https://shower.eblu.me Fly nginx (shower.eblu.me): - drop the /admin/(login|logout) carveout - 403 anything under /admin/ AND /host/ with a "tailnet only" pointer - drop the shower_auth limit_req zone and \$shower_banned geo - drop the shower-admin-login fail2ban filter + jail - drop the shower-deny.conf touch from start.sh Docs: - rename how-to docs/how-to/operations/shower-app.md → shower-on-ringtail.md (mirrors cv-on-indri / docs-on-indri) - new reference card docs/reference/services/shower-app.md per PR review comment 2 (≈30s read; quick facts + cross-links) - rewrite Defense layers section: collapses to general rate limit + django-axes on the tailnet-side login (the only credential surface) - rewrite the .infra.md changelog fragment to match - add a 'Create the admin user' step (kubectl exec createsuperuser) so first-time deploys aren't locked out The nginx-deny action's per-jail \`nginx_deny_file\` generalization stays — harmless future-proofing for the next public service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Doc said "Store the auth key in 1Password as well for the \`fly-setup\` mise task" right next to the description of fly-setup, which reads the key from Pulumi state, not 1Password. No code path anywhere reads this key from 1P — the instruction is vestigial from an earlier design and confused us during the v1.0.1 rotation when the flyio-proxy-key expired. Rewrite the section to: - point at \`mise run fly-setup\` as the canonical path - state explicitly that Pulumi state is the only source of truth - document the rotation recipe (tailnet-up --replace=<urn> + fly-setup + fly-deploy) for the next time this 90-day key lapses Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>App v1.0.2 ships WhiteNoise for /static/ and /media/, so the blumeops-side workaround is no longer needed: - containers/shower/default.nix: drop the WhiteNoise pip dep + the middleware-injection block from local_settings. The shim is back to just path overrides (DATABASES.NAME, MEDIA_ROOT, STATIC_ROOT). - version → 1.0.2, outputHash → fakeHash for re-pinning. - service-versions.yaml mirrored. fly/nginx.conf: cache /static/ (1y) and /media/ (1d) per location for shower.eblu.me. /static/ filenames are content-hashed thanks to CompressedManifestStaticFilesStorage so a year is safe and invalidation is automatic on the next collectstatic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>forge.eblu.me's package registry (/api/packages/* and /api/v1/packages/*) served anonymous reads to the world even for private-repo releases — Forgejo's per-user visibility treats packages as world-readable when the owner's Visibility is Public, and we keep eblume Public so the profile page stays open. The sdist downloads include full source trees of private repos; that's the leak. The fix is to keep the user public but block /api/packages/* and /api/v1/packages/* at the proxy edge. forge.ops.eblu.me (tailnet) is untouched, so CI workflows + gilbert's uv + the nix-container-builder still work — they just need to use the tailnet hostname. Three consumers updated to forge.ops.eblu.me: - containers/shower/default.nix (the FOD pip --extra-index-url) - ansible/roles/cv/defaults/main.yml (cv_release_url for generic package) - chezmoi-tracked fish dotfiles (devpi.fish + conf.d/pypi.fish) — edited in chezmoi source, user will apply separately The blumeops repo had no other forge-pypi consumers (audited: workers, runner-job-image, ansible roles, container builds). Doc references in changelog fragments + comments left as-is — they describe history. The proper long-term fix is to move private packages to a Limited- visibility Forgejo org instead of relying on a proxy-side block (see queued Todoist for the migration plan). Edge block stays as defense in depth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>The wheel ships config/ and shower/ only (per pyproject hatchling config), leaving the repo's top-level static/ dir — Sortable.min.js, cropper.min.js, cropper.min.css, prize-placeholder.svg — behind. At runtime, host_dashboard.html's {% static 'css/cropper.min.css' %} hits the manifest, CompressedManifestStaticFilesStorage raises ValueError on the missing entry, /host/ returns 500. Fix on the deploy side: fetch the sdist via fetchurl (pinned SRI hash from forge PyPI), extract its top-level static/ subtree into a non-FOD derivation, lay it down at /app/static in the image. The local_settings shim adds /app/static to STATICFILES_DIRS so collectstatic at boot picks the vendored assets up alongside the Django admin's own static files. Sdist URL is forge.ops.eblu.me/api/packages/... (tailnet) — matches the just-landed edge block on forge.eblu.me/api/packages/*. The nix-container-builder runner on ringtail is on the tailnet, so the FOD fetch works. App doesn't change. v1.0.3 is no longer needed for the static gap — the wheel's "packages = [config, shower]" pattern stays as-is, and we treat the sdist as the canonical bundle for the assets the wheel intentionally omits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>