• v1.17.0 29e0f012cd

    eblume released this 2026-06-03 21:52:18 -07:00 | 13 commits to main since this release

    BlumeOps release v1.17.0

    What's Changed

    Features

    • Deploy the Adelaide / Heidi / Addie baby shower app — guest splash, raffle
      picker, and prize assignment console — on ringtail k3s with shower.eblu.me
      as the public entry and shower.ops.eblu.me as the tailnet admin host. App
      source: adelaide-baby-shower-app.

    • Deploy adelaide-baby-shower-app v1.1.0 to ringtail k3s. Replaces the
      boolean lock with a four-phase ShowerState (pre_eventparty
      prizes_lockedevent_locked), adds an append-only "guest memories"
      panel where guests can leave photos and comments for the baby, and
      polishes the admin and QR views. Three Django migrations
      (0009_shower_phase, 0010_guest_memories, 0011_book_description)
      run automatically in the entrypoint against the SQLite PV. No config
      or env-var changes.

      Container build also gains a Forgejo-PyPI workaround: Forgejo's simple
      index returns absolute file URLs hardcoded to the public ROOT_URL
      (forge.eblu.me), which the Fly edge 403s on /api/packages/*. The
      wheel and sdist are now both pulled via direct fetchurl against
      forge.ops.eblu.me (tailnet-only) and the wheel is handed to pip as
      a local path.

    • review-compliance-reports now also fetches and summarizes the weekly Prowler container-image and IaC scans (previously only the K8s CIS in-cluster scan was processed). For each scan it shows status counts, severity breakdown, week-over-week delta, and — for the high-volume image/IaC scans — top-N tables grouped by check ID and resource instead of per-finding listings.

    • runner-logs now authenticates with Forgejo API token and auto-detects the repo from git remote. Job logs are fetched via SSH to indri (reading Forgejo's on-disk zstd log files) instead of the web endpoint, which doesn't support token auth for private repos.

    Bug Fixes

    • Fix nightly borgmatic backups failing for 2 days. The shower SQLite
      dump hook referenced kubectl --context=k3s-ringtail, but indri's
      kubeconfig deliberately doesn't carry the ringtail credentials. The
      before_backup hook's failure aborted the entire run, taking out
      both the local sifaka repo and the BorgBase offsite. Replaced
      the inline-shell dump with a ~/bin/borgmatic-k8s-sqlite-dump
      helper deployed by the ansible role. Each dump entry now declares a
      target of either local:<context> (mealie — kubectl uses indri's
      kubeconfig) or ssh:<user@host> (shower — ssh into ringtail and
      run k3s kubectl there, no indri-side kubeconfig needed; k3s.yaml
      on ringtail is mode 644 so no sudo required). Bytes stream back via
      kubectl exec ... -- cat rather than kubectl cp, since kubectl cp requires tar inside the pod and nix-built images like shower
      don't bundle it.

    • Shower app container now bakes the wheel + Python deps into the image
      at build time via buildPythonPackage instead of pip-installing on
      first boot. Boots are deterministic and don't depend on forge PyPI
      being reachable from the pod. The wheelHash in
      containers/shower/default.nix is the sha256 sourced from the
      forge PyPI simple index;
      bumping the version means bumping that hash too.

      Borgmatic now covers the shower app: SQLite is dumped from the live
      pod via kubectl exec (mirroring the existing mealie entry, with
      context: k3s-ringtail), and the prize-photo media share is picked up
      through /Volumes/shower (sifaka SMB mount on indri, same pattern as
      /Volumes/photos).

    • Disabled adaptive sync (VRR) on ringtail's DP-1 output. The OMEN 27i IPS panel pumps brightness when its refresh rate swings into the low VRR range during low-framerate content (e.g. game cutscenes), producing a flicker that worsened over a session until a reboot. Pinning the panel to a fixed 165Hz eliminates it.

    • Fixed forge.eblu.me static assets (CSS, JS, images, fonts) not loading — the proxy's static asset cache block was missing the Host header, so Caddy couldn't route the requests.

    • Fixed homepage container EACCES on cold start: the nix-built image now chowns
      /app/config to uid 1000 at build time via fakeRootCommands, matching the
      behavior of the old Dockerfile. Without this, homepage couldn't seed missing
      skeleton configs (proxmox.yaml etc.) or create /app/config/logs, crashing on
      its first uncached request. Caught during the ringtail cutover.

    • Fixed sway keybindings on ringtail — the home-manager keybindings block was replacing the module's defaults entirely, leaving only explicit overrides (no workspace switching, focus, move, splits, resize mode, etc). Switched to lib.mkOptionDefault with lib.mkForce on the conflicting custom binds (Mod+Return, Mod+d, Mod+space, Mod+l) so defaults merge back in. Also added Mod+F1 to show a filterable fuzzel list of current keybindings.

      Fixed fuzzel config errors on launch — border-radius and border-width were under [main], but fuzzel expects them as radius/width under a [border] section.

    • Pin the Quartz docs build to v4.5.2. The Dagger build_docs pipeline cloned Quartz from the default branch unpinned; Quartz v5.0.0 restructured its config layout (.quartz/plugins, ../quartz imports) and broke the docs build against our existing quartz.config.ts/quartz.layout.ts.

    Infrastructure

    • Wire the ringtail blumeops-pg cluster (which holds the wave-1-migrated
      paperless + teslamate databases) into backups and Grafana. Adds a Tailscale
      LoadBalancer Service (blumeops-pg-ringtail.tail8d86e.ts.net) and a Caddy L4
      route (pg.ops.eblu.me:5434), then repoints borgmatic's teslamate +
      paperless postgres dumps and the mealie SQLite dump at ringtail, and the
      Grafana TeslaMate datasource at the ringtail DB. Closes the backup gap that
      opened at cutover (the migrated live data was still being backed up from the
      now-frozen minikube copies) and unblocks the wave-1 decommission.

    • Migrated homepage dashboard from minikube (indri/arm64) to k3s (ringtail/amd64).
      The container is now built via nix (containers/homepage/default.nix), adapted
      from nixpkgs homepage-dashboard with the upstream Next.js cache patches and
      wrapped with dockerTools.buildLayeredImage. Autodiscovery shifts: services on
      minikube (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus,
      Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries
      in services.yaml; ringtail services (Authentik, Frigate/NVR, Ntfy, Ollama)
      auto-populate via Ingress annotations.

    • Migrated CV (cv.eblu.me) and Docs (docs.eblu.me) from minikube Deployments to indri-native ansible roles. Caddy now serves the extracted release tarballs directly via a new kind: static service-block in the Caddy template — no daemon, no container — replacing the prior nginx-in-a-pod layer. Removes a network hop on every request and shrinks minikube's footprint. See cv-on-indri and docs-on-indri. Part of the broader minikube wind-down.

    • Migrated devpi (PyPI mirror at pypi.ops.eblu.me) from a minikube StatefulSet to a launchd-managed service on indri. devpi-server now runs in a uv-managed venv with pinned devpi-server and devpi-web versions, listens on 127.0.0.1:3141, and is fronted by Caddy. The minikube StatefulSet was crash-looping under memory pressure (and breaking the Python toolchain everywhere); the new layout removes a layer of dependency on cluster health for critical-path tooling. See devpi-on-indri.

    • Move the entire Immich stack — server, machine-learning, valkey,
      and the PostgreSQL+VectorChord cluster — off minikube-indri and
      onto k3s-ringtail. Postgres data migrated zero-loss via CNPG
      pg_basebackup (replica catch-up then promote); row counts on
      asset, user, album, smart_search, activity, asset_face
      verified equal between source and replica before cutover. The ML
      pod now uses ringtail's RTX 4080 via the nvidia-device-plugin
      (time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy
      routing at photos.ops.eblu.me is unchanged (still
      photos.tail8d86e.ts.net, the device just lives on ringtail now).
      Borgmatic backups continue against the same immich-pg tailnet
      hostname. First concrete chain in the broader indri-k8s
      decommission effort.

    • Add local nix container build for tailscale (containers/tailscale/default.nix) so ringtail's tailscale-operator ProxyClass proxy pods pull from the forge mirror instead of docker.io/tailscale/tailscale. Pinned at v1.94.2 to match service-versions.yaml. Indri's tailscale-operator continues to use upstream during the k8s-to-ringtail migration.

    • Address the 6 critical Prowler IaC findings against argocd/manifests/. Prowler's IaC provider hardcodes self._mutelist = None and delegates filtering to Trivy, but doesn't plumb --ignorefile through — so the documented "use Trivy filtering" path is actually broken. Added a shim around trivy in the Prowler image that injects --ignorefile $TRIVY_IGNOREFILE for trivy fs invocations when the env var points at a real file. The IaC cronjob now mounts mutelist/trivyignore.yaml (Trivy's per-path schema) and sets the env var, muting the external-secrets and kube-state-metrics Secret-access findings (KSV-0041, KSV-0114). Separately, grafana-clusterrole is tightened to remove secrets access entirely: the dashboard sidecar already only consumes ConfigMap-labeled dashboards, so its RESOURCE env var is now configmap instead of both.

    • Pin ringtail's wired IP to 192.168.1.21 via NixOS scripted networking; NetworkManager no longer manages enp5s0. Removes DHCP lease renewal as a failure mode after a silent lease teardown took ringtail offline. Also explicitly enables net.ipv4.ip_forward (previously set implicitly by scripted-DHCP) so k3s pod networking and Tailscale routing continue to work with static networking.

    • Ripped out the compensating-controls (CC) framework: deleted compensating-controls.yaml, the review-compensating-controls mise task, and the associated how-to / explanation docs. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files remain in place but no longer carry CC: <id> prefixes — each entry just keeps a free-form Description of why the finding is muted. The CC review cadence proved to be more overhead than this single-operator homelab needed.

    • Wire shower app for public exposure: fly nginx shower.eblu.me server
      block as a guest-only surface — splash page, /prizes/<token>/, static
      assets, media. Everything authenticated (/admin/, /host/,
      /accounts/) returns 403 with a "tailnet only" pointer. Staff hit
      shower.ops.eblu.me for the operator console + admin; the app's
      v1.0.1 DJANGO_PUBLIC_URL_BASE setting makes QR codes generated on
      the tailnet point back at the WAN host for guests. Plus a Caddy route
      on indri, Pulumi Gandi CNAME, and a Grafana APM dashboard tracking
      request rate, error rate, latency, bandwidth, and access logs.

    • Mirror Valkey 8.1 locally as registry.ops.eblu.me/blumeops/valkey. Replaces direct pulls of docker.io/valkey/valkey:8.1-alpine for paperless and immich sidecars. Built via native Dagger pipeline on Alpine 3.22. Stateless swap — no data migration. Authentik's nix-built Redis remains separate.

    • Add nix-built amd64 valkey for ringtail (containers/valkey/default.nix) so immich-ringtail can stop pulling the upstream multi-arch docker.io/valkey/valkey image. Existing container.py continues to build Alpine arm64 for paperless on indri. Both bump to valkey 8.1.7 (Alpine 3.22 8.1.7-r0 / nixpkgs 8.1.7).

    • Upgrade Grafana Alloy v1.14.0 → v1.16.0 across all four service deployments
      (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail on k8s; alloy native on
      indri). Pulls in stable database observability (v1.15) and the OTel Collector
      v0.147.0 bump. Container build also migrated from Dockerfile to native Dagger
      container.py per the build-container-image migration playbook.

    • Upgraded Dagger from v0.20.1 to v0.20.6 (engine, CLI pin, and SDK regen) and migrated runner-job-image from a Debian-based Dockerfile to a native Dagger container.py on Alpine 3.23, reusing the shared alpine_runtime helper.

    • Decommission the wave-1 services on minikube-indri now that paperless,
      teslamate, and mealie run on ringtail with their data backed up. Removes the
      minikube paperless/teslamate/mealie manifest dirs + ArgoCD app
      definitions (pruning the parked Deployments, Services, and the redundant
      minikube mealie/paperless PVCs), and drops the paperless/teslamate roles
      from the minikube blumeops-pg cluster. The paperless and teslamate
      databases are dropped from indri's blumeops-pg as the finalization step.
      miniflux + authentik remain on the minikube cluster (later waves).

    • Upgraded the k8s Forgejo runner to the v12.8 line, switched it from first-boot registration to declarative server.connections credentials from 1Password, and consolidated the supporting runner how-to documentation.

    • Move paperless, teslamate, and mealie off minikube-indri onto
      k3s-ringtail, shedding ~1.1 GiB of resident load from the
      OOM-thrashing 8 GiB minikube node (the kernel OOM killer had been
      killing kube-apiserver/dockerd/argocd, flapping every
      minikube-hosted service at once). paperless + teslamate databases
      move into a fresh CNPG blumeops-pg cluster on ringtail via a cold
      pg_dump/pg_restore from the quiesced source — row counts verified
      equal before any routing flip; source DBs dropped only after the
      ringtail side serves traffic. mealie's SQLite PVC is copied as-is.
      paperless media stays on sifaka NFS. Downtime-tolerant cold cutover
      (no streaming replication); rollback is repoint-and-scale-up with the
      source untouched. Second chain in the indri-k8s decommission after
      migrate-immich-to-ringtail.

    • Recurring maintenance batch:

      • Ringtail flake inputs refreshed (disko, home-manager, nixpkgs).
      • Tooling deps bumped: prek hooks (trufflehog v3.95.3, kingfisher v1.101.0, ruff v0.15.14, ansible-core 2.21.0); fly proxy base images (nginx 1.30.1-alpine, alloy v1.16.1); typer==0.26.2 in mise tasks.
    • Updated nixos/ringtail/flake.lock (weekly cadence): disko, home-manager, and nixpkgs inputs refreshed. nixpkgs-services skipped per overlay convention.

    • Reviewed mealie service version freshness; upstream is 5 minor versions ahead (v3.17.0 vs deployed v3.12.0). Marked reviewed; upgrade deferred.

    • Deploy shower v1.1.2 — bump container build to new app release.

    • Upgrade unpoller v2.34.0 → v3.2.0 and migrate container build from Dockerfile to native Dagger (container.py). v3.0.0 carries breaking UniFi API changes; v3.2.0 introduces a 60s background poll (cached scrapes) by default — set interval = 0 in up.conf to restore on-demand polling.

    • Monthly tooling dependency refresh: prek hooks (trufflehog, kingfisher, ruff, shfmt, prettier, actionlint, ansible-lint), fly proxy base images (nginx 1.30.0, tailscale v1.94.2, alloy v1.16.0), normalize pyyaml lower bound in mise-tasks.

    • Add GE-Proton (pkgs.proton-ge-bin) to programs.steam.extraCompatPackages
      on ringtail. Subnautica 2 hangs at Mercuna plugin init under Proton
      Experimental + DXVK D3D12; GE-Proton is available as a Steam per-game
      compatibility option to work around it.

    • Add sn2-prelaunch Steam launch wrapper on ringtail that removes
      Subnautica 2's stale Saved/running.dat and Saved/beforelobby.dat
      lockfiles before each launch. SN2 pops up an invisible (0×0-sized)
      Error dialog when it detects an unclean exit, blocking GameThread
      forever; this is observable only as a black screen with a spinning
      loader. Use via Steam launch option: sn2-prelaunch %command%.

    • Add local nix container build for frigate-notify (containers/frigate-notify/default.nix) so the Frigate→ntfy bridge is rebuilt on ringtail from the forge mirror instead of pulled from ghcr.io/0x2142/frigate-notify.

    • Add resource limits to all ArgoCD pods to prevent unbounded resource consumption during node-wide pressure events.

    • Black-hole the /mirrors/* repositories at the Fly proxy edge (return 403forge.ops.eblu.me). A surprise $29.60 Fly bill traced to ~1.24 TB/30d of egress on forge.eblu.me, 99.95% of all proxy egress — of which ~71% was AI scrapers (Meta meta-externalagent, OpenAI GPTBot, Amazonbot) crawling the near-infinite git-history URL space of the public mirror repos and timing out Forgejo in the process. Mirrors exist for supply-chain control and are consumed over the tailnet, so their public web UI had no legitimate audience. robots.txt already disallowed /mirrors/, but the offending agents ignore it. Tier-2 mitigations (user-agent denylist, Anubis proof-of-work gateway) are documented in docs/explanation/ai-scraper-mitigation.md.

    • Bump paperless and immich kustomizations to the main-SHA-built valkey tag (v8.1.6-r0-fabca04). Routine post-merge follow-up to keep production manifests pointing at images built from a commit on main.

    • Bump shower container to v1.1.1 (probe FOD hash).

    • Bumped shower app to v1.1.3 (wheel/sdist + FOD hashes probed on ringtail).

    • Cap systemd-coredump on ringtail (ProcessSizeMax/ExternalSizeMax 1G, MaxUse 2G) so multi-GB Wine/Proton game crash dumps no longer thrash the disk and lock up the desktop.

    • Deploy shower v1.1.1 to ringtail (kustomize newTag bump).

    • Deployed shower v1.1.3 to ringtail (image built and pushed from ringtail; runner bypassed due to indri overload).

    • Fix three follow-ups from the wave-1 decommission: grant the local
      break-glass admin account ArgoCD admin rights (g, admin, role:admin
      previously only the Authentik admins group had access, so admin was
      locked out whenever its token expired), and repoint the alloy blackbox
      probe for teslamate from the deleted minikube service to
      https://tesla.ops.eblu.me/ (through Caddy over Tailscale). The orphaned
      paperless/teslamate roles + ExternalSecrets left on the minikube
      blumeops-pg are also cleaned up.

    • Moved the Immich blackbox health probe from indri's alloy to ringtail's alloy. After the immich migration to ringtail, the probe still targeted immich-server.immich.svc.cluster.local on indri's cluster where the service no longer exists, causing a persistent ServiceProbeFailure alert.

    • Pin shower v1.1.1 FOD outputHash (probed locally on ringtail).

    • Rebuild Prowler container against main HEAD (v5.23.0-495e45d) after merging the IaC mutelist Dockerfile changes.

    • Rebuild and retag alloy v1.16.0 container images from the main-branch SHA
      following the squash-merge of #345, per the build-container-image
      squash-merge convention. Both images (registry.ops.eblu.me/blumeops/alloy)
      now reference 9564435 rather than the branch SHA 26a3ab5, restoring
      source traceability after branch cleanup.

    • Rebuild shower from the post-merge commit on main so the container's
      SHA tag points at a commit that will still exist after the 30-day
      branch-cleanup window. Functionally identical to the branch-tag image
      already deployed, just preserves source traceability per
      build-container-image#Squash-merge and container tags.

    • Rebuild unpoller container from squashed main commit so the image SHA tag matches a commit in main's history (was tagged with the pre-squash branch SHA).

    • Rebuild valkey container from squashed main commit (both arm64 dagger and amd64 nix variants), and update paperless + immich-ringtail kustomizations to the main-SHA tags v8.1.7-ecded30 and v8.1.7-ecded30-nix.

    • Retired the blumeops-tasks mise task (Todoist API) in favor of heph list --project Blumeops --json from the self-hosted hephaestus system. Updated docs to point task discovery and rotation reminders at heph, and noted that the ~/code/personal/zk zettelkasten is migrating into heph docs.

    • Switch the Fly proxy deploy strategy from bluegreen to immediate in fly/fly.toml. With a single proxy machine, bluegreen offers little benefit — the green machine routinely failed to reach "started" inside Fly's default 5-minute deploy timeout (the cold-start sequence of tailscaledtailscale up → wait-for-MagicDNS → nginx startup eats most of the budget), and the failed deploys would roll back. immediate replaces the machine in place with a brief downtime (~5–10s) but actually completes.

    • Switch the ringtail provisioning playbook's blumeops clone URL from forge.eblu.me (public, via Fly proxy) to forge.ops.eblu.me (tailnet, direct via Caddy on indri). Ringtail is always on the tailnet, so the WAN round-trip is pure overhead — it also made provision-ringtail brittle whenever the Fly proxy was slow or down.

    • Switched Grafana's deployment strategy from RollingUpdate to Recreate. With an RWO PVC holding the SQLite database and Bleve search index, RollingUpdate reliably crashloops the new pod on the index lock until rollout timeout. Recreate terminates the old pod first so the new one acquires the lock cleanly.

    • Update tailscale-operator-ringtail ProxyClass to reference the 0108b68 main-SHA build of the tailscale container. Routine post-merge cleanup so the deployed image traces to a commit that survives PR branch cleanup.

    • Update the ringtail NixOS flake lockfile (nixos/ringtail/flake.lock): bump
      nixpkgs (b77b3de → 25f5383) and disko (5ba0c95 → 115e521) to latest.
      nixpkgs-services was intentionally left pinned (skipped by the
      flake-update pipeline). Routine recurring maintenance per manage-lockfile.

    • Upgrade native macOS Alloy on indri to v1.16.0. Built on gilbert with Go
      1.26.2 + CGO (required for the macOS native DNS resolver, which Tailscale
      MagicDNS depends on), scp'd to ~/.local/bin/alloy on indri, codesigned,
      and the LaunchAgent reloaded. Completes the v1.16.0 fleet upgrade started
      in #345 — all four Alloy services (alloy-k8s, alloy-ringtail,
      alloy-tracing-ringtail, alloy ansible) now run v1.16.0.

    • Upgraded zot on indri from v2.1.15 to v2.1.16 (security fixes: TLS verification on metrics client, CORS Allow-Credentials suppression on wildcard origins, manifest/API-key body size limits).

    Documentation

    • Reviewed replicating-blumeops tutorial: fixed "BluemeOps" typos (also in contributing.md) and added last-reviewed frontmatter.
    • Reviewed indri reference card: added devpi, cv, and docs to the native-services list; widened the k8s note to reflect the growing set of apps now on ringtail and the planned indri-minikube decommission; added CPU/RAM specs.
    • New how-to: rotate-fly-deploy-token. Documents the 75-day rotation cadence, why we use org-scoped tokens (silences the cosmetic metrics-token warning on fly status with marginal blast-radius cost given the single-app personal org), and the procedure for rotation + Forgejo Actions secret sync.
    • Add docs/explanation/ai-scraper-mitigation.md — the egress-cost / AI-crawler threat model for the public Fly proxy, the tiered mitigation plan (Tier 1: mirror black-hole, shipped; Tier 2: user-agent denylist + Anubis; Tier 3: Cloudflare, rejected on principle), and the data behind it.
    • Fix manage-forgejo-mirrors verify step — sync button is on the repo settings page ("Synchronize now"), not the main repo page.
    • Fixed the op item edit invocation in the zot API-key rotation procedure: the previous pbpaste | op item edit ... "field[password]=-" stdin syntax is rejected by op 2.34 as "invalid JSON" (recent op versions treat piped input as a full JSON template, not a single field value). Procedure now reads the clipboard into a local fish variable and passes it as an inline assignment.
    • Fixed the export-filename step in run-1password-backup: 1Password's desktop app names the export 1PasswordExport-<account-uuid>-<timestamp>.1pux automatically rather than letting you save to a fixed name, so the procedure now points the task at that glob instead of pretending the default name is 1Password-export.1pux.
    • Refresh the contributing tutorial: add last-reviewed, include the .ai.md changelog fragment type, and clarify that prek is pinned via mise.
    • Review and refresh the Navidrome reference card: add last-reviewed, correct the scanner env var name, document the current image/version, and record routing and runtime details from the manifests.
    • Review and refresh the Ollama reference card: add last-reviewed, bump the documented image tag to 0.20.4, and add the two qwen3.5 models now declared in models.txt.
    • Reviewed 1password reference card: added the blumeops vs Personal vault split, noted that onepassword-connect runs on both indri and ringtail (not just one cluster), and pulled the op read vs op item get --fields guidance up from agent memory into the card.
    • Reviewed index.md; added ringtail to the infrastructure overview and stamped last-reviewed.
    • Reviewed transmission card: corrected storage layout (/config/ is emptyDir, watch dir disabled) and noted the Prometheus exporter sidecar.
    • rotate-fly-deploy-token: combine mint+store into one command with both fish and bash forms; document the op item edit "Password item requires ps value" validator gotcha and the placeholder-password workaround.

    AI Assistance

    • Adopt AGENTS.md as the canonical agent instruction file, keep CLAUDE.md as a compatibility shim, and update docs to reference the neutral file and the correct agent-change-process path.
    • CLAUDE.md now imports AGENTS.md via @AGENTS.md instead of telling agents to go read it. Claude Code only auto-loads CLAUDE.md, so the prose shim was easy to skip; the import inlines AGENTS.md into the session prompt unconditionally.

    Miscellaneous

    • Removed the dead minikube manifests, container builds, and tooling shims left behind after the cv + docs migration to indri-native (#342). Deletes argocd/{apps,manifests}/{cv,docs}/, containers/{cv,quartz}/, and the quartzdocs mapping in mise-tasks/container-version-check. Bumps docs.current-version to v1.16.0 (the blumeops release tag) now that the legacy nginx-base version pin is gone.
    • Rebuild shower v1.1.0 container from main HEAD (3c7967e) and bump the
      kustomization tag to v1.1.0-3c7967e-nix. The PR was squash-merged, so
      the branch commit 444ff91 baked into the prior tag isn't reachable
      from main's history. The new tag points at a commit that exists on
      main; image content is byte-identical because the FOD output is content
      addressed and the inputs didn't change.
    • Rebuild shower v1.1.2 from main HEAD (a33fa47) and retag — PR #358 was squash-merged so the branch SHA baked into the prior image tag isn't reachable from main. FOD is content-addressed, so image bytes are identical; only provenance changes.
    • Remove the duplicate Homepage tiles for Mealie, Paperless, Immich, and
      TeslaMate. Homepage runs on ringtail and autodiscovers ringtail Ingresses via
      gethomepage.dev/* annotations; once these services migrated to ringtail they
      were discovered automatically, making their leftover static services.yaml
      entries (needed only while they lived on minikube) redundant.
    • Removed the now-unused containers/devpi/ Dagger build artifact. Devpi runs natively on indri via uv venv; the container image is no longer referenced anywhere. Doc examples in docs/reference/tools/dagger.md updated to use miniflux as the example container name.
    • container-build-and-release now prints the specific mise run runner-logs <N> command after dispatching, polling the Forgejo API to resolve the run number for the commit it just triggered.
    • mise run runner-logs <run> -j <n> now reports a clear error when the log file doesn't exist on indri (e.g. a runner crash that left action_task.log_in_storage = 0). Previously it printed only the header and exited 0, because zstdcat exits 0 with a "can't stat … -- ignored" stderr message and ssh+fish on indri swallows the remote exit code.

    Documentation

    Download docs-v1.17.0.tar.gz directly, or bump docs_version
    in ansible/roles/docs/defaults/main.yml and run:

    mise run provision-indri -- --tags docs
    
    Downloads
  • v1.16.0 d26a6ae3b2

    eblume released this 2026-04-18 10:00:51 -07:00 | 134 commits to main since this release

    BlumeOps release v1.16.0

    What's Changed

    Infrastructure

    • Route Fly.io proxy through Caddy on indri with direct WireGuard peering, reducing public-facing latency from 20+ seconds (DERP relay) to sub-second. Fixed Beyla eBPF tracing on ringtail (memlock rlimit + BPF permissions). Restored trace collection to Tempo.

    Documentation

    Download docs-v1.16.0.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.16.0/docs-v1.16.0.tar.gz
    
    Downloads
  • v1.15.7 9bafe85b2b

    eblume released this 2026-04-18 08:14:51 -07:00 | 141 commits to main since this release

    BlumeOps release v1.15.7

    What's Changed

    Bug Fixes

    • Fix borgmatic LaunchAgent failing silently due to macOS TCC permission dialogs. LaunchAgents now call borgmatic directly instead of routing through mise x, which triggered "wants to access Documents" dialogs that hung headless sessions. The ansible role now also manages borgmatic installation via mise install.

    Infrastructure

    • Automate verification of Prowler MANUAL findings (kubelet file perms, kubelet config, etcd CA, RBAC cluster-admin) in review-compliance-reports and mute them with node-config-automated-verification compensating control.
    • Migrate transmission and transmission-exporter containers from Dockerfile to native Dagger builds (container.py). Updates base images to Alpine 3.23 and Python 3.14, pins uv to 0.11.6.
    • Switched Fly proxy to upstream keepalive pools, reducing forge.eblu.me latency from 35s+ p50 to sub-second. Added mise run fly-reload for DNS re-resolution without redeploy.
    • Upgrade Prowler from 5.22.0 to 5.23.0; remove init container workaround for broken --registry flag (upstream fix in PR #10470).
    • Added robots.txt to forge.eblu.me blocking crawlers from /mirrors/ to reduce load from Facebook scraping.
    • Container builds are now manual-only via mise run container-build-and-release. Removed auto-trigger on push to main — shared Dagger helpers made path-based detection unreliable.
    • Migrate devpi container from Dockerfile to native Dagger build; bump devpi-server 6.19.1→6.19.3 and devpi-web 5.0.1→5.0.2.
    • Migrated kiwix-serve container from Dockerfile to native Dagger build, bumping Alpine base from 3.22 to 3.23.
    • Mitigated Forgejo archive endpoint DoS: redirect public archive requests to tailnet, expanded robots.txt, enabled archive cleanup cron, cached release downloads at proxy.
    • Refactored Dagger container pipelines: extended go_build() helper with buildmode and extra_env params, migrated miniflux and forgejo-runner to use it, and standardized all Alpine bases from 3.22 to 3.23.

    Miscellaneous

    • Review compensating control sso-gated-admin-tools: tightened scope to ArgoCD only, removed Grafana reference.
    • container-build-and-release now verifies the commit exists on the remote before dispatching a build.

    Documentation

    Download docs-v1.15.7.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.15.7/docs-v1.15.7.tar.gz
    
    Downloads
  • v1.15.6 04b44b350b

    eblume released this 2026-04-14 11:46:28 -07:00 | 174 commits to main since this release

    BlumeOps release v1.15.6

    What's Changed

    Bug Fixes

    • Rotate ArgoCD workflow-bot token and admin password after DR rebuild invalidated signing keys, fixing build-blumeops workflow failures.

    Documentation

    Download docs-v1.15.6.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.15.6/docs-v1.15.6.tar.gz
    
    Downloads
  • v1.15.5 9d85c97b9b

    eblume released this 2026-04-14 11:29:22 -07:00 | 176 commits to main since this release

    BlumeOps release v1.15.5

    What's Changed

    Features

    • Deploy Paperless-ngx document management system at paperless.ops.eblu.me with OCR, Authentik SSO, and NFS storage on sifaka.
    • Add ty (Astral) Python typechecker to prek hooks, configured for Dagger SDK and container.py modules. Add type: mise to service-versions.yaml for tracking development tool versions (dagger, ansible-core, prek, pulumi, ty) through the standard service review process.
    • Upgrade grafana-sidecar from 1.28.0 to 2.6.0, adding health probes and porting build to native Dagger container.py.
    • Upgrade Navidrome to v0.61.1 — major artwork overhaul with per-disc cover art, rebuilt search engine (SQLite FTS5), server-managed transcoding, and WebP performance fix.
    • Add mise run review-compliance-reports task for weekly compliance report review with muted/unmuted distinction and week-over-week delta

    Bug Fixes

    • Add paperless database to borgmatic backup configuration. Previously the only service DB not included in nightly pg_dump backups.
    • Fix Fly.io proxy rate limiting to key on real client IP instead of Fly's internal proxy IP, so crawlers no longer consume the shared rate limit bucket for all clients.
    • Fix UnPoller (UniFi) Grafana dashboards failing to load due to UID exceeding Grafana 12's 40-character limit.
    • Fix blumeops-tasks swallowing wiki-link brackets in task descriptions (rich markup escaping)
    • Fix dagger flake-update pipeline: replace nonexistent --exclude flag with dynamic input discovery
    • Fix services-check to display all firing alerts for a given alert name, not just the first one.
    • Pin Fly.io proxy Tailscale to v1.94.1 — the :stable tag pulled v1.96.5 which has a MagicDNS regression (SERVFAIL on tailnet names), breaking all public routing through forge.eblu.me, docs.eblu.me, and cv.eblu.me.
    • Rewrite mise run runner-logs CLI: list runs by run number (not task ID), drill into jobs per run, fetch logs via Forgejo web API instead of SSH+filesystem. Fixes broken log retrieval caused by incorrect hex path calculation and stale data directory. Added --repo to query any forge repo (e.g. sporks) and --limit/-n to control listing size (0 for all).
    • Route Dagger build telemetry to Tempo, fixing OTEL metrics exporter warnings.
    • Switch paperless redis sidecar from amd64-only nix-built authentik-redis image to upstream valkey:8.1-alpine (multi-arch). The nix image was previously running under QEMU emulation on arm64 minikube.

    Infrastructure

    • Build forgejo-runner container locally via native Dagger pipeline instead of pulling from upstream.
    • Build kube-state-metrics container locally (Dockerfile + nix) from forge mirror, replacing upstream registry.k8s.io image on both indri and ringtail.
    • Upgrade miniflux from 2.2.17 to 2.2.19 and migrate from Dockerfile to native Dagger container.py build (second container after navidrome). Refactor alpine_runtime() with create_user parameter to support Alpine's built-in nobody user. Pin all mise.toml tool versions to explicit versions instead of "latest".
    • Migrate Dagger module from .dagger/ to repo root (src/blumeops/) and replace docker_build() with native Dagger pipelines for container builds. Navidrome is the first container migrated, with full build error visibility.
    • Migrate teslamate container build from legacy Dockerfile to native Dagger container.py.
    • Add seccomp RuntimeDefault profiles to alloy-k8s and immich pods, resolving 4 unmuted Prowler findings
    • Full DR recovery from power loss and minikube cluster rebuild. Validated bootstrap procedure, identified circular dependencies (forge.eblu.me, Zot/Authentik OIDC), Tailscale device name collision issues, and documented recovery steps for restart-indri.
    • Set Frigate preview quality to CRF 8 (from default 1) to reduce preview file sizes and improve review timeline loading over NFS.
    • Track Fly.io proxy component versions (Tailscale, nginx, Alloy) in service-versions.yaml with new fly service type.
    • Upgrade ArgoCD from v3.3.2 to v3.3.6 (bug-fix patches), SHA-pin install manifest
    • Upgrade authentik 2026.2.0 → 2026.2.2 (bug-fix patch release)
    • Upgrade ollama from 0.17.5 to 0.20.4 (adds Gemma 4 support, benchmark tooling, Apple Silicon perf improvements)

    Documentation

    • Delete outdated install-dagger-on-nix-runner card; add service-versions reference card; clean up zot.md and review-services.md links.
    • Enhanced the adding-a-service tutorial with kustomization setup, corrected Tailscale ingress format, updated ArgoCD repoURL, and added a step for creating service reference cards.
    • Review gandi.md: add missing forge.eblu.me CNAME, fix program description, stamp review date.

    Documentation

    Download docs-v1.15.5.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.15.5/docs-v1.15.5.tar.gz
    
    Downloads
  • v1.15.4 0eaf8680fd

    eblume released this 2026-04-06 07:53:51 -07:00 | 230 commits to main since this release

    BlumeOps release v1.15.4

    What's Changed

    Infrastructure

    • Migrate 1Password Connect from Helm to kustomize (1.8.1 → 1.8.2), completing the no-helm-policy migration.

    Documentation

    • Rewrite observability stack tutorial: replace Helm instructions with actual kustomize/ArgoCD patterns, fix typos, document Alloy as core component

    Documentation

    Download docs-v1.15.4.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.15.4/docs-v1.15.4.tar.gz
    
    Downloads
  • v1.15.3 f9397b7fa0

    eblume released this 2026-04-05 21:24:21 -07:00 | 234 commits to main since this release

    BlumeOps release v1.15.3

    What's Changed

    Infrastructure

    • Build Tempo container from source via forge mirror; bump 2.10.1 → 2.10.3
    • Pin NixOS service versions (forgejo-runner, snowflake, k3s) via nixpkgs-services overlay in ringtail flake, preventing silent upgrades from nix flake update. Add k3s and minikube to service-versions.yaml tracking. Fix stale nix-container-builder version (was 12.6.4, actually running 12.7.2).
    • Migrate Immich from Helm chart to kustomize manifests and upgrade from v2.5.6 to v2.6.3
    • Upgrade Grafana from 12.3.3 to 12.4.2 — patches 7 CVEs including an unauthenticated DoS (CVE-2026-27880).

    Documentation

    • First compensating control review: verified single-user-cluster still in effect. Added aspirational how-to card for PCI DSS evidence collection.
    • Prowler --registry fix merged upstream (PR #10470); initContainer workaround documented as pending release.

    Documentation

    Download docs-v1.15.3.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.15.3/docs-v1.15.3.tar.gz
    
    Downloads
  • v1.15.2 4059b3d27b

    eblume released this 2026-03-30 17:48:36 -07:00 | 253 commits to main since this release

    BlumeOps release v1.15.2

    What's Changed

    Features

    • Build custom Kingfisher container from sporked deploy branch, replacing upstream image with locally-built version including --clone-url-base patch.
    • Add Kingfisher secret scanner as a weekly CronJob scanning all Forgejo repos, with HTML and JSON reports written to sifaka NFS.
    • Add MongoDB Kingfisher secret scanner as a prek hook alongside TruffleHog for comparative coverage evaluation.
    • Add spork strategy: floating-branch soft-fork tooling (mise run spork-create) and documentation for maintaining local patches against upstream projects.

    Infrastructure

    • Add compensating controls framework: tracking file, review mise task, and how-to doc. Map all Prowler mutelist entries to named controls with CC: prefixes.
    • Add Prowler mutelist to suppress expected findings from system components, operator-managed pods, and accepted operational needs. Fix missing seccomp profile on kube-state-metrics.
    • Borgmatic photos backup: restrict to library/ and upload/ (skip regenerable dirs), add SSH keepalives and checkpoint interval to prevent broken pipe failures on large initial syncs.
    • Upgrade forgejo-runner from 12.7.0 to 12.7.3 (bug fixes, security dep update). Add service reference card.

    Documentation

    • Add service reference documentation for Kingfisher secret scanner.
    • Review and update Ansible reference doc: add missing roles, sibling playbooks, and clarify Ansible's role in the IaC stack.

    Documentation

    Download docs-v1.15.2.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.15.2/docs-v1.15.2.tar.gz
    
    Downloads
  • v1.15.1 2bd1611ac1

    eblume released this 2026-03-28 09:15:18 -07:00 | 285 commits to main since this release

    BlumeOps release v1.15.1

    What's Changed

    Features

    • Add Tor Snowflake proxy on ringtail as a systemd service to support anti-censorship efforts.
    • Add offsite backup for immich photo library to BorgBase, running daily at 4 AM from indri via sifaka SMB mount.
    • Add QArt Tuner — a Go tool that generates QR codes whose data modules form a recognizable image, with an interactive web UI for parameter tuning. Based on the QArt technique by Russ Cox. Lives in utils/qart/.

    Infrastructure

    • Migrate Forgejo from Homebrew to source build with mcquack LaunchAgent, matching the pattern used by zot, caddy, and alloy. Upgrades to v14.0.3 (7 security fixes including PKCE bypass and OAuth scope bypass).
    • Add borgmatic pg_dump backups for authentik and immich databases. Authentik uses the existing blumeops-pg cluster on port 5432. Immich requires a new borgmatic role on the immich-pg cluster, a Tailscale service, and Caddy L4 proxy on port 5433.
    • Upgrade External Secrets Operator from v1.3.2 to v2.2.0 and migrate from Helm chart to static kustomize manifests.
    • Add post-deploy maintenance docs and generation pruning task for ringtail.
    • Fix Immich Helm values: resource limits and probe timeouts were silently ignored due to wrong value keys. Resources now actually apply to pods, and liveness/readiness probe timeouts increased from 1s to 5s to prevent kubelet from killing pods during ML inference.
    • Reduce PodNotReady alert lookback window from 5m to 60s to clear faster after rollouts.
    • Tighten ArgoCDAppOutOfSync alert: reduce pending duration from 30m to 5m and lookback window from 5m to 1m so alerts clear faster after sync.
    • Update ringtail flake inputs (nixpkgs, home-manager).
    • Upgrade Homepage dashboard from v1.10.1 to v1.11.0
    • Upgrade nvidia-device-plugin from v0.18.2 to v0.19.0

    Documentation

    • Review and fix CV service doc (correct URL, forge domain, container tag link) and add private forge repo review guidance to review-services process.
    • Review tailscale-setup tutorial: fix macOS install steps, add --accept-routes tip, correct tag name, add ACL apply instructions, add [[tailscale-operator]] cross-reference.

    Miscellaneous

    • Add preserve/* branch prefix exclusion to branch-cleanup task; document Pyroscope profiling work and blockers in observability reference.

    Documentation

    Download docs-v1.15.1.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.15.1/docs-v1.15.1.tar.gz
    
    Downloads
  • v1.15.0 a1c2e0833d

    eblume released this 2026-03-24 19:50:58 -07:00 | 307 commits to main since this release

    BlumeOps release v1.15.0

    What's Changed

    Features

    • Deploy Prowler CIS scanner as a weekly CronJob on minikube-indri, with reports written to sifaka NFS share.
    • Add Grafana "Alerts" dashboard showing currently firing alerts and recent state changes.
    • Add IaC scanning via Prowler IaC provider (Saturday 2am, Dockerfiles and K8s manifests).
    • Add container image vulnerability scanning via Prowler image provider (Saturday 3am, all blumeops/* images).

    Bug Fixes

    • Fix authentik worker OOMKill by setting AUTHENTIK_WORKER_CONCURRENCY=2 (was defaulting to 16 based on CPU count).
    • Remove group: "" from tailscale-operator ignoreDifferences — ArgoCD normalizes away the empty string, causing permanent OutOfSync on the apps app.

    Infrastructure

    • Decommission JobSync service — removed ArgoCD app, k8s manifests, container build, Caddy proxy, Homepage entry, docs, and forge mirror. Replaced by datasette-based job tracking (coming soon).
    • Localize authentik-redis container: replace upstream redis:7-alpine with nix-built image from nixpkgs (Redis 8.2.3). Introduces attached service pattern with parent field in service-versions.yaml and version assertion in default.nix to prevent silent version drift.
    • Unified Dockerfile and Nix container build workflows into a single workflow that auto-classifies containers by build type and routes to the correct runner (k8s for Dockerfile, nix-container-builder for Nix). Removed nettest container (outgrown). Nix builds now require an explicit version = "..." declaration — no implicit nixpkgs fallback.
    • Monthly tooling dependency update: bump prek hooks (trufflehog 3.94.0, ruff 0.15.7, shfmt 3.13.0), Fly.io images (nginx 1.29.6, Alloy 1.14.1), actions/checkout v4.3.1→v6.0.2, tighten mise task Python lower bounds (rich 14, typer 0.24, httpx 0.28.1, pyyaml 6.0.2), and bump ansible-lint/ansible-core floors.
    • Upgrade ntfy v2.17.0 → v2.19.2 (adds experimental PostgreSQL support, read replicas, web push fixes)
    • Revert Tailscale operator to v1.94.2 (v1.96.3 images not yet published); keep Fly proxy tailscale wait improvement
    • Add RuntimeDefault seccomp profiles to all managed deployments, statefulsets, and cronjobs.
    • Upgrade Frigate from 0.17.0-rc2 to 0.17.1 (security fixes, bugfixes). Add motion retention tier (365 days), reduce continuous retention from 180 to 30 days.

    Documentation

    • Review and fix ArgoCD config tutorial: correct sync policy example, fix typo, add missing cross-references and frontmatter.
    • Review and update 12 reference docs: fix stale image references to point at kustomization manifests instead of hardcoded tags, correct Prometheus scrape target, expand external-secrets stub, add cross-references between backup/disaster-recovery docs, and remove misleading .ts.net URLs from Quick Reference tables.

    Documentation

    Download docs-v1.15.0.tar.gz and configure the quartz container with:

    DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/v1.15.0/docs-v1.15.0.tar.gz
    
    Downloads