The wheel ships config/ and shower/ only (per pyproject hatchling
config), leaving the repo's top-level static/ dir — Sortable.min.js,
cropper.min.js, cropper.min.css, prize-placeholder.svg — behind. At
runtime, host_dashboard.html's {% static 'css/cropper.min.css' %}
hits the manifest, CompressedManifestStaticFilesStorage raises
ValueError on the missing entry, /host/ returns 500.
Fix on the deploy side: fetch the sdist via fetchurl (pinned SRI hash
from forge PyPI), extract its top-level static/ subtree into a
non-FOD derivation, lay it down at /app/static in the image. The
local_settings shim adds /app/static to STATICFILES_DIRS so
collectstatic at boot picks the vendored assets up alongside the
Django admin's own static files.
Sdist URL is forge.ops.eblu.me/api/packages/... (tailnet) — matches
the just-landed edge block on forge.eblu.me/api/packages/*. The
nix-container-builder runner on ringtail is on the tailnet, so the
FOD fetch works.
App doesn't change. v1.0.3 is no longer needed for the static gap —
the wheel's "packages = [config, shower]" pattern stays as-is, and we
treat the sdist as the canonical bundle for the assets the wheel
intentionally omits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
forge.eblu.me's package registry (/api/packages/* and /api/v1/packages/*)
served anonymous reads to the world even for private-repo releases —
Forgejo's per-user visibility treats packages as world-readable when
the owner's Visibility is Public, and we keep eblume Public so the
profile page stays open. The sdist downloads include full source
trees of private repos; that's the leak.
The fix is to keep the user public but block /api/packages/* and
/api/v1/packages/* at the proxy edge. forge.ops.eblu.me (tailnet) is
untouched, so CI workflows + gilbert's uv + the nix-container-builder
still work — they just need to use the tailnet hostname.
Three consumers updated to forge.ops.eblu.me:
- containers/shower/default.nix (the FOD pip --extra-index-url)
- ansible/roles/cv/defaults/main.yml (cv_release_url for generic package)
- chezmoi-tracked fish dotfiles (devpi.fish + conf.d/pypi.fish) —
edited in chezmoi source, user will apply separately
The blumeops repo had no other forge-pypi consumers (audited: workers,
runner-job-image, ansible roles, container builds). Doc references in
changelog fragments + comments left as-is — they describe history.
The proper long-term fix is to move private packages to a Limited-
visibility Forgejo org instead of relying on a proxy-side block (see
queued Todoist for the migration plan). Edge block stays as
defense in depth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
App v1.0.2 ships WhiteNoise for /static/ and /media/, so the
blumeops-side workaround is no longer needed:
- containers/shower/default.nix: drop the WhiteNoise pip dep + the
middleware-injection block from local_settings. The shim is back
to just path overrides (DATABASES.NAME, MEDIA_ROOT, STATIC_ROOT).
- version → 1.0.2, outputHash → fakeHash for re-pinning.
- service-versions.yaml mirrored.
fly/nginx.conf: cache /static/ (1y) and /media/ (1d) per location for
shower.eblu.me. /static/ filenames are content-hashed thanks to
CompressedManifestStaticFilesStorage so a year is safe and invalidation
is automatic on the next collectstatic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Doc said "Store the auth key in 1Password as well for the \`fly-setup\`
mise task" right next to the description of fly-setup, which reads
the key from Pulumi state, not 1Password. No code path anywhere reads
this key from 1P — the instruction is vestigial from an earlier
design and confused us during the v1.0.1 rotation when the
flyio-proxy-key expired.
Rewrite the section to:
- point at \`mise run fly-setup\` as the canonical path
- state explicitly that Pulumi state is the only source of truth
- document the rotation recipe (tailnet-up --replace=<urn> +
fly-setup + fly-deploy) for the next time this 90-day key lapses
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two complementary fixes for the deploy that just landed:
1. Pod was 0/1 Running because the readiness probe sends
`Host: shower.ops.eblu.me` and the app's hardcoded ALLOWED_HOSTS
only includes `shower.eblu.me`. settings.py exposes a
DJANGO_ALLOWED_HOSTS env-var extras hook for exactly this case —
wired into the configmap.
2. `kubectl exec deploy/shower -- python -m django <cmd>` returned
"No module named django" because PYTHONPATH lived only inside the
entrypoint script. Moved PYTHONPATH, DJANGO_SETTINGS_MODULE, PATH,
and HOME into the image's Env block so exec'd shells inherit them.
The entrypoint now just runs the boot sequence; the exports are
redundant (image Env covers them) and gone.
FOD inputs are unchanged so outputHash stays valid; no fakeHash dance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR review caught that we didn't need an admin login surface on WAN.
App v1.0.1 adds DJANGO_PUBLIC_URL_BASE so QR codes generated from
/host/ (now tailnet-only) still point at shower.eblu.me for guest
phones — that closes the loop and lets us strip the WAN admin surface
entirely.
Container:
- bump version to 1.0.1
- outputHash → fakeHash (build will print the real one)
- entrypoint still does migrate + collectstatic before gunicorn —
the app is small enough that auto-migration is fine
Manifests:
- configmap adds DJANGO_PUBLIC_URL_BASE=https://shower.eblu.me
Fly nginx (shower.eblu.me):
- drop the /admin/(login|logout) carveout
- 403 anything under /admin/ AND /host/ with a "tailnet only" pointer
- drop the shower_auth limit_req zone and \$shower_banned geo
- drop the shower-admin-login fail2ban filter + jail
- drop the shower-deny.conf touch from start.sh
Docs:
- rename how-to docs/how-to/operations/shower-app.md →
shower-on-ringtail.md (mirrors cv-on-indri / docs-on-indri)
- new reference card docs/reference/services/shower-app.md per PR
review comment 2 (≈30s read; quick facts + cross-links)
- rewrite Defense layers section: collapses to general rate limit +
django-axes on the tailnet-side login (the only credential surface)
- rewrite the .infra.md changelog fragment to match
- add a 'Create the admin user' step (kubectl exec createsuperuser)
so first-time deploys aren't locked out
The nginx-deny action's per-jail \`nginx_deny_file\` generalization
stays — harmless future-proofing for the next public service.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets a re-run of `mise run fly-setup` (e.g. after a fly-app rebuild or
when bootstrapping fresh) re-issue the cert without remembering the
ad-hoc `fly certs add` we did during this deployment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build 536 finished cleanly with the strip-refs FOD + autopatchelf
wrapper. The [branch] tag is fine for ArgoCD branch-revision testing;
a follow-up C0 will rebuild from main and re-pin to the [main] SHA tag
after merge, per docs/how-to/deployment/build-container-image.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run 534 failed with 'fixed-output derivations must not reference store
paths: ... gcc-14.3.0-lib' because pip-installed wheels pulled stdenv
into the venv (Python's setup, gcc-lib runtime references).
Adapts authentik's two-stage pattern:
- pyDepsFOD: pip-installs into the venv, then strips every nix store
ref it can find (find+remove-references-to). Output is fully
self-contained — pinned by outputHash.
- pyDeps (non-FOD wrapper): copies the FOD output and runs
autoPatchelfHook against runtime buildInputs (libstdc++, zlib, image
libs for pillow). This restores RPATHs on the .so files that pillow
and scipy ship, against the real on-image library locations.
outputHash still fakeHash — next build prints the real one.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The buildPythonPackage approach with `propagatedBuildInputs = [ python.pkgs.django ... ]` doesn't work:
1. nixpkgs python314Packages.django still aliases to Django 4.2 LTS,
which doesn't support Python 3.14.
2. django-axes from nixpkgs pulls selenium + browser fonts into its
check phase, and the nix sandbox can't provide those (fontconfig
errors, then build dep tree collapses).
Switching to authentik's FOD pattern instead: a single fixed-output
derivation that pip-installs the adelaide-baby-shower-app wheel + every
transitive dep from forge PyPI into a target dir. FODs get network
access in exchange for a pinned output hash, so the closure stays
reproducible.
outputHash is set to fakeHash for the first build — the runner will
print the real hash on failure; a follow-up commit will pin it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups on the shower deployment branch:
1. containers/shower/default.nix now uses buildPythonPackage to install
the adelaide-baby-shower-app wheel + its deps at nix build time. The
wheel comes from the forge PyPI index with a pinned SRI hash. The
entrypoint no longer does pip-at-boot — it just runs migrations,
collectstatic, and execs gunicorn.
2. ansible/roles/borgmatic/defaults/main.yml:
- Adds shower to borgmatic_k8s_sqlite_dumps (context k3s-ringtail)
so /app/data/db.sqlite3 is dumped via kubectl exec on every run.
- Adds /Volumes/shower (sifaka SMB mount on indri) to
borgmatic_source_directories so prize-photo media gets archived.
3. NFS share docs corrected to match the real on-sifaka pattern:
exports allowlist 192.168.1.0/24 + 100.64.0.0/10 with all_squash to
admin (matching frigate/paperless/etc.), not "Squash=No mapping".
The pod's runAsUser doesn't need to match an on-disk uid because
all_squash rewrites every write to admin:users.
Also adds a missing service-versions entry for the tailscale container
introduced in PR #347 — pre-existing gap surfaced by the
container-version-check hook on this commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the Adelaide / Heidi / Addie baby shower app — a Django guest
splash, raffle picker, and prize-assignment console — on ringtail k3s.
Public landing at shower.eblu.me (via fly proxy), tailnet admin at
shower.ops.eblu.me. App source: forge.eblu.me/eblume/adelaide-baby-shower-app,
wheel-published to the Forgejo Packages PyPI index.
Manifests under argocd/manifests/shower/: NFS-backed PVC for /app/media,
local-path PVC for SQLite, ExternalSecret pulling DJANGO_SECRET_KEY from
1Password (item "Shower (blumeops)"), Tailscale ProxyGroup ingress.
Defense-in-depth for the public surface:
- /admin/ blocked at the fly edge except /admin/login/ and /admin/logout/
- shower_auth rate limit on the login path
- new fail2ban filter+jail with a per-service shower-deny.conf
(nginx-deny action generalized to accept nginx_deny_file)
- django-axes (5 / 1h) keyed on (username, ip_address)
Plus: Caddy route on indri, Pulumi gandi CNAME, Grafana APM dashboard
mirroring docs-apm.json, runbook at how-to/operations/shower-app.md,
and a service-versions entry. X-Clacks-Overhead set on the new server
block — GNU Terry Pratchett.
Build: containers/shower/default.nix uses dockerTools to ship a
nixpkgs Python plus a startup wrapper that installs the wheel into
/app/data/.venv on first boot and execs gunicorn. Lets the wheel come
from forge PyPI without pinning hashes for every transitive dep.
Prerequisites tracked in the runbook (not yet executed):
- NFS share sifaka:/volume1/shower (manual Synology step)
- 1Password item "Shower (blumeops)" with secret-key field
- container build via `mise run container-build-and-release shower`
- Pulumi dns-up after merge
- fly certs add shower.eblu.me
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous Dockerfile chowned /app/config to 1000:1000 so the runtime
user could seed missing skeleton configs (e.g. proxmox.yaml) and write
/app/config/logs. The nix derivation didn't replicate that, so the new
amd64 image crashed with EACCES on cold start (fixed-forward — caught
during ringtail cutover, ArgoCD #348).
Add fakeRootCommands to dockerTools to create /app and /app/config and
chown them at build time. The deployment's ConfigMap subPath mounts
leave the parent directory as image filesystem, so its ownership has to
be set at build time, not at runtime.
Repoint the ArgoCD Application destination from minikube to ringtail and
bump the image tag to the new amd64 nix-built v1.11.0-b87f62e-nix.
Rework services.yaml for the autodiscovery shift: 11 services that
previously auto-populated via minikube Ingress annotations (ArgoCD,
Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, Navidrome,
Paperless, TeslaMate, Transmission) become explicit static entries with
their widget configs preserved. Conversely, the ringtail services that
will now auto-populate (Frigate/NVR, Authentik, Ntfy) are removed from
the static list to avoid duplicates; Ollama becomes newly visible.
Add a Content group for Immich/Kiwix/Miniflux which previously lived
under the autodiscovered "Content" group from annotations.
Replace Dockerfile (arm64-only, indri-built) with a nix derivation
adapted from nixpkgs pkgs/by-name/ho/homepage-dashboard. Built via the
nix-container-builder runner on ringtail, producing an amd64 image
suitable for k3s.
Includes the upstream Next.js file-system-cache patch to avoid
prerender cache write failures on a read-only nix store path
(nixpkgs issues #328621 and #458494).
Pinned to v1.11.0 (current production version).
Routine post-squash-merge cleanup. Bumps the ProxyClass image tag from
the now-orphaned PR branch SHA (67af7a8) to the merge commit SHA
(0108b68) so the deployed image stays traceable after branch cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
Adds the first cut of a local nix build for `docker.io/tailscale/tailscale` and rewires only the ringtail tailscale-operator overlay to use it. Indri's overlay continues pulling upstream — minikube on indri is being decommissioned in favor of ringtail's k3s, so investing in dual-cluster routing here would be wasted churn.
## Changes
- `containers/tailscale/default.nix` — `buildGoModule` over `cmd/tailscale`, `cmd/tailscaled`, `cmd/containerboot`; packaged via `dockerTools.buildLayeredImage` with `cacert`, `iptables` (legacy symlink to match upstream Synology compat), `iproute2`, `tzdata`, `busybox`.
- `argocd/manifests/tailscale-operator-ringtail/kustomization.yaml` — kustomize `images:` rewrite swapping `docker.io/tailscale/tailscale` → `registry.ops.eblu.me/blumeops/tailscale:v1.94.2-67af7a8-nix`.
- `docs/changelog.d/mirror-tailscale-container.infra.md` — fragment.
## Pin rationale
v1.94.2 matches `service-versions.yaml:96` and the current ProxyClass exactly — this PR is "make it local," not "upgrade tailscale." Version bumps come as follow-up C0/C1 changes once we decide to test newer (v1.96.x had a Fly-side MagicDNS regression; v1.98.0 is current upstream stable).
## Test plan
- [x] Image built successfully on ringtail nix-container-builder (run #528).
- [x] Image visible in registry: `registry.ops.eblu.me/blumeops/tailscale:v1.94.2-67af7a8-nix`.
- [ ] Deploy from branch: `argocd app set tailscale-operator-ringtail --revision mirror-tailscale-container && argocd app sync tailscale-operator-ringtail`.
- [ ] Verify proxy pods restart with new image and existing tailnet ingresses (e.g., authentik, immich, tempo) keep resolving.
- [ ] After merge: rebuild on main SHA, update kustomization, run `services-check`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #347
Runtime grafana pod matches the manifest and the CC's claim; bumped
last-reviewed. Noted that retiring init-chown-data in favor of fsGroup
alone should wait until grafana migrates to ringtail's k3s, since the
storage backend will change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the CC vs NA vs RA distinction surfaced during the 2026-05-03
weekly compliance review (CVE-2026-31789), and the image-scan mutelist
gap that blocks acting on it. Links the new article from the
review-compensating-controls how-to so it isn't orphaned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Security fixes only (TLS verification on metrics client, CORS
Allow-Credentials suppression on wildcard origin, manifest/API-key
body-size limits, dependabot bumps). No config changes required;
re-built from source on indri and bounced launchagent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Combine mint+store into a single command with both fish and bash
forms (the doc previously only showed manual paste). Document the
1Password CLI "Password item requires ps value" validator error and
the placeholder-password workaround for Password-category items with
empty primary password fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routine post-merge follow-up after #346. Branch SHA tag (946fa75) replaced
with the main-SHA-built tag (fabca04) so paperless and immich reference an
image traceable to a commit on main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
- Add native Dagger build of valkey 8.1.6-r0 on Alpine 3.22 at `containers/valkey/`
- Swap paperless redis sidecar and immich-valkey from `docker.io/valkey/valkey:8.1-alpine` to `registry.ops.eblu.me/blumeops/valkey:v8.1.6-r0-946fa75`
- Resolves the DR-2026-04 TODO in paperless kustomization about multi-arch redis
## Why
Move toward fully locally-built containers for supply chain control. Paperless and immich both pulled the same upstream tag — one mirror serves both. Authentik's nix-built Redis stays separate (different image entirely).
## Risk
Low. Both sidecars are stateless caches:
- paperless redis: no volumeMount (in-pod localhost, pure memory)
- immich-valkey: `emptyDir` (cache only)
Pod restart rebuilds the cache. Smoke-tested locally (PING/SET/GET roundtrip on `valkey 8.1.6` with `--bind 0.0.0.0 --protected-mode no`).
## Test plan
- [ ] After merge: `mise run container-build-and-release valkey` to rebuild with main SHA
- [ ] Update kustomizations to the `[main]` SHA tag (C0 follow-up)
- [ ] `argocd app sync paperless` and `argocd app sync immich`
- [ ] Verify pods come up healthy (paperless OCR queue functional, immich job queue functional)
- [ ] `mise run services-check`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #346
Verified Forgejo runner is registered only to forge.ops.eblu.me and the
forge has registration disabled, so no untrusted users can trigger
privileged CI. Tightened notes to reflect the closed-forge mechanism
(not a per-repo allow-list).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumped documented image tag to 0.20.4 (matches kustomization newTag),
added the two qwen3.5 models from models.txt, and stamped the card.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the v1.16.0 fleet upgrade for the fourth alloy service
(type: ansible, built from source on indri). Binary built on gilbert
with Go 1.26.2 + CGO, scp'd to indri, codesigned, LaunchAgent reloaded.
Service reports clean WAL replay and resumed metric/log shipping.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the build-container-image squash-merge convention, rebuild alloy v1.16.0
container images from the main SHA (9564435) and update the three alloy
kustomizations to reference :v1.16.0-9564435[-nix] instead of the branch
SHA :v1.16.0-26a3ab5[-nix] left over from #345.
Both images were rebuilt locally on gilbert (dagger) and ringtail (nix)
because indri is still under heavy macOS memory-compressor pressure (see
separate ticket); CI on indri can't reliably run the dagger publish step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump Grafana Alloy v1.14.0 → v1.16.0 across all four services (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail; alloy native ansible). Also migrate the indri build path from `Dockerfile` to a native Dagger `container.py` per the build-container-image migration playbook.
## Highlights from upstream
- v1.15: database observability promoted to stable, OTel Collector → v0.147.0
- v1.16: clustering for `loki.source.kubernetes_events`, MySQL exporter 0.19.0
- One pre-existing breaking change in v1.15 (`loki.source.awsfirehose` undocumented metric prefix rename) — not used here.
## Build infra
Alloy v1.16.0's go.mod requires Go 1.26.2. The nix derivation now uses `pkgs.go_1_26` with `GOTOOLCHAIN=local` to avoid auto-downloading a toolchain blob that violated the fixed-output rule.
## Test plan
- [ ] CI: `mise run container-build-and-release alloy --ref alloy-v1.16.0` (dispatched as run 522; nix job to be re-triggered with the v1.16.0 goModules outputHash once the local ringtail build surfaces it)
- [ ] After CI green, bump `images[].newTag` in three kustomizations to the new `-<sha>` and `-<sha>-nix` tags, deploy from this branch via `argocd app set <app> --revision alloy-v1.16.0 && argocd app sync <app>`
- [ ] Manual rebuild of macOS native binary on gilbert (per ansible/roles/alloy README) and `mise run provision-indri -- --tags alloy --check --diff`
- [ ] `mise run services-check` after merge & redeploy
Reviewed-on: #345
## Summary
Monthly tooling dependency refresh, with a one-time conversion from version-tag pins (`rev = "vX.Y.Z"`, `image:tag`, `>=`) to SHA / digest pins everywhere.
## Changes
- **prek hooks**: all `rev = "vX.Y.Z"` → commit SHA + `# vX.Y.Z` comment. Bumped trufflehog (3.94.0→3.95.2), kingfisher (1.91.0→1.97.0), ruff (0.15.7→0.15.12), shfmt (3.13.0→3.13.1), prettier (3.8.1→3.8.3), actionlint (1.7.11→1.7.12).
- **fly/Dockerfile**: tag pins → `image@sha256:...` digest pins. Bumped nginx (1.29.6→1.30.0-alpine), tailscale (v1.94.1→v1.94.2 — still inside the safe pre-1.96.5 range), alloy (v1.14.1→v1.16.0).
- **mise-tasks**: PEP 723 inline deps converted from `>=` to `==` (PEP 508 doesn't support hashes inline). All scripts pinned to current latest: rich 15.0.0, typer 0.25.0, pyyaml 6.0.3, httpx 0.28.1.
- **prek `additional_dependencies`**: ansible-lint==26.4.0, ansible-core==2.20.5.
- **taplo-lint**: pass `--no-schema`. Upstream's `--default-schema-catalogs` returns a format taplo v0.9.3 can't parse — we don't validate against TOML schemas anyway, so this turns off the broken catalog fetch.
- **docs/update-tooling-dependencies**: documents the SHA-pin convention, `docker buildx imagetools inspect` for digest lookup, and `prek clean` before re-verifying (cache grows to several GiB).
Forgejo workflow `actions/checkout@v6.0.2` was already at the latest SHA — no change.
## Test plan
- [x] `prek run --all-files` passes after `prek clean`
- [x] `deploy-fly` workflow builds and deploys the new fly image on merge
- [x] `fly status -a blumeops-proxy` healthy after deploy
- [x] Spot-check a few mise tasks (`mise run blumeops-tasks`, `mise run docs-check-links`) to confirm pinned deps resolve cleanly
Reviewed-on: #344
## Summary
Follow-up to #342. The cv and docs services are now live on indri (Caddy file_server backed by ansible-managed tarball extraction) and verified working. This PR removes the dead minikube artifacts and the tooling shims that referenced them.
## Changes
**Deletions:**
- ``argocd/apps/{cv,docs}.yaml``
- ``argocd/manifests/{cv,docs}/`` (deployment, service, ingress, pdb, kustomization)
- ``containers/{cv,quartz}/`` (Dockerfiles + start scripts)
**Tooling:**
- ``mise-tasks/container-version-check``: remove the ``quartz``→``docs`` CONTAINER_TO_SERVICE mapping (containers/quartz no longer exists)
- ``service-versions.yaml``: bump ``docs.current-version`` to ``v1.16.0`` (the blumeops docs release tag) and trim the migration-window comment
## Live state context
The argocd Applications ``cv`` and ``docs`` were already deleted from the cluster manually as part of the cutover; this PR just removes the YAML files that the ``apps`` app-of-apps was still ingesting. After merge, ``argocd app sync apps`` will reconcile and the ``apps`` Application returns to Synced.
The Caddyfile ``handle_errors`` bug that briefly crashed all ``*.ops.eblu.me`` services during cutover is fixed in a separate C0 (``2ee53fe``) on main, not here.
## Test plan
- [x] ``mise run container-version-check --all-files`` clean
- [x] ``mise run service-review --type ansible`` shows cv at 1.0.3, docs at v1.16.0
- [ ] After merge: ``argocd app sync apps`` returns clean (cv/docs entries gone, no children to reconcile)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #343
The kind=static branch added in #342 put handle_errors inside the
@host handle{} block. handle_errors is a top-level site-block directive,
not an ordered HTTP handler, so Caddy refuses to load the config:
parsing caddyfile tokens for 'handle': directive 'handle_errors'
is not an ordered HTTP handler
This crash-loops the whole reverse proxy and takes down every
*.ops.eblu.me service. Tripped today during the live cv/docs cutover.
Fix: drop handle_errors and append /404.html as the final try_files
candidate. The 404 page is served with status 200 instead of 404, but
that's acceptable for a human-facing curated 404 — the page renders
correctly. Documented inline.
The running Caddy on indri already has the fixed config (deployed
manually during the cutover); this lands the fix in main so future
provision-indri --tags caddy runs don't re-break it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
Replace the cv (`cv.eblu.me`) and docs (`docs.eblu.me`) minikube Deployments with indri-native ansible roles. Caddy serves the extracted release tarballs directly via a new `kind: static` service-block — no daemon, no nginx pod, no ProxyGroup ingress on the request path. Mirrors the rationale of the recent devpi migration; part of the broader minikube wind-down.
## What's in this commit
- `ansible/roles/{cv,docs}` — sentinel-gated tarball download + extract into `~/{cv,docs}/content/`
- `ansible/roles/caddy/` — new `kind: static` branch in the Caddyfile template (encoded gzip, immutable cache headers for fingerprinted assets, optional `try_html` for Quartz-style clean URLs, optional per-path `download_paths` for the resume PDF's `Content-Disposition`)
- `ansible/playbooks/indri.yml` — wires `cv` and `docs` roles before `caddy`
- `service-versions.yaml` — both services flip to `type: ansible`. `docs.current-version` stays at `1.28.2` for this commit so `container-version-check` keeps passing while `containers/quartz/Dockerfile` still exists; it moves to the docs release tag in the cleanup commit
- `.forgejo/workflows/{cv-deploy,build-blumeops}.yaml` — deploy step now bumps `cv_version`/`docs_version` in the role defaults and pushes; running ansible + purging the Fly cache is manual from gilbert (matches devpi)
- Docs: `docs/how-to/operations/{cv,docs}-on-indri.md`, updated `docs/reference/services/{cv,docs}.md`, changelog fragment
## What is not in this commit
The dead artifacts. After PR review and successful cutover, a follow-up commit deletes:
- `argocd/apps/{cv,docs}.yaml` and `argocd/manifests/{cv,docs}/`
- `containers/cv/`, `containers/quartz/`
- `CONTAINER_TO_SERVICE['quartz']` mapping in `mise-tasks/container-version-check`
- bumps `docs.current-version` in `service-versions.yaml` to the release tag
## Cutover plan (manual, from gilbert, after review)
1. **Take down old:**
- Remove the cv and docs Applications: `argocd app delete cv --cascade && argocd app delete docs --cascade`
- Verify k8s namespaces gone: `kubectl --context=minikube-indri get ns | grep -E '^(cv|docs)\\b'` (should be empty)
- Verify tailnet MagicDNS no longer advertises the VIPs: `nslookup cv.tail8d86e.ts.net` and `nslookup docs.tail8d86e.ts.net` should both fail
2. **Bring up new:**
- `mise run provision-indri -- --tags cv,docs,caddy --check --diff` (already validated on branch)
- `mise run provision-indri -- --tags cv,docs,caddy`
- `fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"`
3. **Verify:** `mise run services-check` and the curl checks listed in `docs/how-to/operations/{cv,docs}-on-indri.md`
4. **Cleanup commit + merge.**
Total expected downtime: minutes (not the few-hour budget you authorized).
## Test plan
- [ ] `mise run provision-indri -- --tags cv,docs --check --diff` clean
- [ ] `mise run provision-indri -- --tags caddy --check --diff` shows only the cv + docs blocks changing as previewed in the PR thread
- [ ] After cutover: `cv.eblu.me`, `cv.ops.eblu.me`, `docs.eblu.me`, `docs.ops.eblu.me` all return 200
- [ ] `cv.eblu.me/resume.pdf` includes `Content-Disposition: attachment`
- [ ] A clean Quartz URL (e.g. `docs.eblu.me/explanation/agent-change-process`) resolves to the right page
- [ ] `mise run services-check` clean
- [ ] `mise run service-review --type ansible` shows cv and docs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #342
Devpi now runs natively on indri (uv venv via ansible role), so the
Dagger container build at containers/devpi/ is unused. Removing it.
Also updated dagger.md examples to use 'miniflux' as the example
container-name argument.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent.
## What changed
- **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box).
- **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`.
- **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role.
- **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`.
- **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted.
- **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list.
## Already deployed (live on indri)
- Service running: `launchctl list mcquack.eblume.devpi` → PID 53888
- `curl https://pypi.ops.eblu.me/+api` returns 200 ✅
- `mise run docs-mikado` works again ✅
- 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/`
- Minikube namespace and PVC fully reclaimed
## Test plan
- [ ] `mise run services-check` (after merge)
- [ ] CI workflows that use devpi succeed
- [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines)
## Context
This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain.
Reviewed-on: #341
Verified TTL=604800s and hostPID limited to ephemeral Prowler CronJob
on indri. Noted that alloy-tracing on ringtail also uses hostPID but
is out of scope until Prowler scans ringtail (tracked in Todoist).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Marked last-reviewed: 2026-04-29. Fixed the storage layout table —
`/config/` is an emptyDir (ephemeral), not NFS, and the watch directory
is disabled. Documented the transmission-exporter sidecar that exposes
Prometheus metrics on port 19091.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Squash-merge of #340 changed the SHA. Bump prowler tag from
v5.23.0-2daf629 (PR branch) to v5.23.0-495e45d (main HEAD) so the
Dockerfile changes are present in the image deployed off main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
The weekly Prowler IaC scan reported 6 critical findings against `argocd/manifests/`. They split cleanly into two patterns:
- **Legitimate-by-design RBAC → mute with new compensating controls**
- `external-secrets-controller`, `external-secrets-cert-controller` manage `secrets` (KSV-0041) and the cert-controller mutates its own webhook configurations (KSV-0114). This is what the operator is *for*. New CC: `operator-purpose-bound-rbac`.
- `kube-state-metrics` (both `minikube-indri` and `k3s-ringtail`) holds `list/watch` on secrets to expose `kube_secret_info` and `kube_secret_labels` metrics. KSM's metric schema only reads metadata, never the `data:` field. New CC: `kube-state-metrics-metadata-only`.
- **Over-broad RBAC → fix**
- `grafana-clusterrole` had `get/watch/list` on `secrets` because the dashboard-sidecar config used `RESOURCE=both` (ConfigMaps + Secrets). Nothing in the cluster labels Secrets with `grafana_dashboard=1`, so this was unused power. Switched both sidecar instances to `RESOURCE=configmap` and removed `secrets` from the ClusterRole.
The IaC cronjob also did not previously pass `--mutelist-file`, which is why every IaC finding reported as unmuted regardless of mutelist configuration. The new `mutelist/iac.yaml` is bundled into the existing `prowler-mutelist` ConfigMap and mounted via `items:` selector.
## Test plan
- [ ] `kubectl --context=minikube-indri kustomize argocd/manifests/prowler/` — already passes locally
- [ ] `kubectl --context=minikube-indri kustomize argocd/manifests/grafana/` — already passes locally
- [ ] Deploy from this branch via `argocd app set prowler --revision prowler-iac-mutelist && argocd app sync prowler` and same for `grafana`
- [ ] Manually trigger the IaC cronjob and verify `MUTED=True` on the 6 critical findings (`kubectl --context=minikube-indri -n prowler create job --from=cronjob/prowler-iac-scan prowler-iac-test`)
- [ ] Restart grafana pod and confirm dashboards still render (sidecar still finds them via ConfigMap watch)
- [ ] After verify, `argocd app set <app> --revision main && argocd app sync <app>` post-merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #340