diff --git a/CHANGELOG.md b/CHANGELOG.md index 7ae5f8e..0499154 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,259 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). +## [v1.17.0] - 2026-06-03 + +### Features + +- Deploy the Adelaide / Heidi / Addie baby shower app — guest splash, raffle + picker, and prize assignment console — on ringtail k3s with `shower.eblu.me` + as the public entry and `shower.ops.eblu.me` as the tailnet admin host. App + source: [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app). +- Deploy adelaide-baby-shower-app v1.1.0 to ringtail k3s. Replaces the + boolean lock with a four-phase `ShowerState` (`pre_event` → `party` → + `prizes_locked` → `event_locked`), adds an append-only "guest memories" + panel where guests can leave photos and comments for the baby, and + polishes the admin and QR views. Three Django migrations + (`0009_shower_phase`, `0010_guest_memories`, `0011_book_description`) + run automatically in the entrypoint against the SQLite PV. No config + or env-var changes. + + Container build also gains a Forgejo-PyPI workaround: Forgejo's simple + index returns absolute file URLs hardcoded to the public ROOT_URL + (`forge.eblu.me`), which the Fly edge 403s on `/api/packages/*`. The + wheel and sdist are now both pulled via direct `fetchurl` against + `forge.ops.eblu.me` (tailnet-only) and the wheel is handed to pip as + a local path. +- `review-compliance-reports` now also fetches and summarizes the weekly Prowler container-image and IaC scans (previously only the K8s CIS in-cluster scan was processed). For each scan it shows status counts, severity breakdown, week-over-week delta, and — for the high-volume image/IaC scans — top-N tables grouped by check ID and resource instead of per-finding listings. +- runner-logs now authenticates with Forgejo API token and auto-detects the repo from git remote. Job logs are fetched via SSH to indri (reading Forgejo's on-disk zstd log files) instead of the web endpoint, which doesn't support token auth for private repos. + +### Bug Fixes + +- Fix nightly borgmatic backups failing for 2 days. The shower SQLite + dump hook referenced `kubectl --context=k3s-ringtail`, but indri's + kubeconfig deliberately doesn't carry the ringtail credentials. The + `before_backup` hook's failure aborted the entire run, taking out + *both* the local sifaka repo and the BorgBase offsite. Replaced + the inline-shell dump with a `~/bin/borgmatic-k8s-sqlite-dump` + helper deployed by the ansible role. Each dump entry now declares a + `target` of either `local:` (mealie — kubectl uses indri's + kubeconfig) or `ssh:` (shower — ssh into ringtail and + run `k3s kubectl` there, no indri-side kubeconfig needed; k3s.yaml + on ringtail is mode 644 so no sudo required). Bytes stream back via + `kubectl exec ... -- cat` rather than `kubectl cp`, since `kubectl + cp` requires `tar` inside the pod and nix-built images like shower + don't bundle it. +- Shower app container now bakes the wheel + Python deps into the image + at build time via `buildPythonPackage` instead of pip-installing on + first boot. Boots are deterministic and don't depend on forge PyPI + being reachable from the pod. The `wheelHash` in + `containers/shower/default.nix` is the sha256 sourced from the + [forge PyPI simple index](https://forge.eblu.me/api/packages/eblume/pypi/simple/adelaide-baby-shower-app/); + bumping the version means bumping that hash too. + + Borgmatic now covers the shower app: SQLite is dumped from the live + pod via `kubectl exec` (mirroring the existing mealie entry, with + `context: k3s-ringtail`), and the prize-photo media share is picked up + through `/Volumes/shower` (sifaka SMB mount on indri, same pattern as + `/Volumes/photos`). +- Disabled adaptive sync (VRR) on ringtail's DP-1 output. The OMEN 27i IPS panel pumps brightness when its refresh rate swings into the low VRR range during low-framerate content (e.g. game cutscenes), producing a flicker that worsened over a session until a reboot. Pinning the panel to a fixed 165Hz eliminates it. +- Fixed forge.eblu.me static assets (CSS, JS, images, fonts) not loading — the proxy's static asset cache block was missing the `Host` header, so Caddy couldn't route the requests. +- Fixed homepage container EACCES on cold start: the nix-built image now chowns + `/app/config` to uid 1000 at build time via `fakeRootCommands`, matching the + behavior of the old Dockerfile. Without this, homepage couldn't seed missing + skeleton configs (proxmox.yaml etc.) or create `/app/config/logs`, crashing on + its first uncached request. Caught during the ringtail cutover. +- Fixed sway keybindings on ringtail — the home-manager `keybindings` block was replacing the module's defaults entirely, leaving only explicit overrides (no workspace switching, focus, move, splits, resize mode, etc). Switched to `lib.mkOptionDefault` with `lib.mkForce` on the conflicting custom binds (`Mod+Return`, `Mod+d`, `Mod+space`, `Mod+l`) so defaults merge back in. Also added `Mod+F1` to show a filterable fuzzel list of current keybindings. + + Fixed fuzzel config errors on launch — `border-radius` and `border-width` were under `[main]`, but fuzzel expects them as `radius`/`width` under a `[border]` section. +- Pin the Quartz docs build to v4.5.2. The Dagger `build_docs` pipeline cloned Quartz from the default branch unpinned; Quartz v5.0.0 restructured its config layout (`.quartz/plugins`, `../quartz` imports) and broke the docs build against our existing `quartz.config.ts`/`quartz.layout.ts`. + +### Infrastructure + +- Wire the ringtail `blumeops-pg` cluster (which holds the wave-1-migrated + paperless + teslamate databases) into backups and Grafana. Adds a Tailscale + LoadBalancer Service (`blumeops-pg-ringtail.tail8d86e.ts.net`) and a Caddy L4 + route (`pg.ops.eblu.me:5434`), then repoints borgmatic's `teslamate` + + `paperless` postgres dumps and the `mealie` SQLite dump at ringtail, and the + Grafana TeslaMate datasource at the ringtail DB. Closes the backup gap that + opened at cutover (the migrated live data was still being backed up from the + now-frozen minikube copies) and unblocks the wave-1 decommission. +- Migrated homepage dashboard from minikube (indri/arm64) to k3s (ringtail/amd64). + The container is now built via nix (`containers/homepage/default.nix`), adapted + from nixpkgs `homepage-dashboard` with the upstream Next.js cache patches and + wrapped with `dockerTools.buildLayeredImage`. Autodiscovery shifts: services on + minikube (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, + Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries + in `services.yaml`; ringtail services (Authentik, Frigate/NVR, Ntfy, Ollama) + auto-populate via Ingress annotations. +- Migrated CV (`cv.eblu.me`) and Docs (`docs.eblu.me`) from minikube Deployments to indri-native ansible roles. Caddy now serves the extracted release tarballs directly via a new `kind: static` service-block in the Caddy template — no daemon, no container — replacing the prior nginx-in-a-pod layer. Removes a network hop on every request and shrinks minikube's footprint. See [[cv-on-indri]] and [[docs-on-indri]]. Part of the broader minikube wind-down. +- Migrated devpi (PyPI mirror at `pypi.ops.eblu.me`) from a minikube StatefulSet to a launchd-managed service on indri. devpi-server now runs in a uv-managed venv with pinned `devpi-server` and `devpi-web` versions, listens on `127.0.0.1:3141`, and is fronted by Caddy. The minikube StatefulSet was crash-looping under memory pressure (and breaking the Python toolchain everywhere); the new layout removes a layer of dependency on cluster health for critical-path tooling. See [[devpi-on-indri]]. +- Move the entire Immich stack — server, machine-learning, valkey, + and the PostgreSQL+VectorChord cluster — off `minikube-indri` and + onto `k3s-ringtail`. Postgres data migrated zero-loss via CNPG + `pg_basebackup` (replica catch-up then promote); row counts on + `asset`, `user`, `album`, `smart_search`, `activity`, `asset_face` + verified equal between source and replica before cutover. The ML + pod now uses ringtail's RTX 4080 via the nvidia-device-plugin + (time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy + routing at `photos.ops.eblu.me` is unchanged (still + `photos.tail8d86e.ts.net`, the device just lives on ringtail now). + Borgmatic backups continue against the same `immich-pg` tailnet + hostname. First concrete chain in the broader indri-k8s + decommission effort. +- Add local nix container build for `tailscale` (`containers/tailscale/default.nix`) so ringtail's tailscale-operator ProxyClass proxy pods pull from the forge mirror instead of `docker.io/tailscale/tailscale`. Pinned at v1.94.2 to match `service-versions.yaml`. Indri's tailscale-operator continues to use upstream during the k8s-to-ringtail migration. +- Address the 6 critical Prowler IaC findings against `argocd/manifests/`. Prowler's IaC provider hardcodes `self._mutelist = None` and delegates filtering to Trivy, but doesn't plumb `--ignorefile` through — so the documented "use Trivy filtering" path is actually broken. Added a shim around `trivy` in the Prowler image that injects `--ignorefile $TRIVY_IGNOREFILE` for `trivy fs` invocations when the env var points at a real file. The IaC cronjob now mounts `mutelist/trivyignore.yaml` (Trivy's per-path schema) and sets the env var, muting the `external-secrets` and `kube-state-metrics` Secret-access findings (KSV-0041, KSV-0114). Separately, `grafana-clusterrole` is tightened to remove `secrets` access entirely: the dashboard sidecar already only consumes ConfigMap-labeled dashboards, so its `RESOURCE` env var is now `configmap` instead of `both`. +- Pin ringtail's wired IP to `192.168.1.21` via NixOS scripted networking; NetworkManager no longer manages `enp5s0`. Removes DHCP lease renewal as a failure mode after a silent lease teardown took ringtail offline. Also explicitly enables `net.ipv4.ip_forward` (previously set implicitly by scripted-DHCP) so k3s pod networking and Tailscale routing continue to work with static networking. +- Ripped out the compensating-controls (CC) framework: deleted `compensating-controls.yaml`, the `review-compensating-controls` mise task, and the associated how-to / explanation docs. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files remain in place but no longer carry `CC: ` prefixes — each entry just keeps a free-form `Description` of why the finding is muted. The CC review cadence proved to be more overhead than this single-operator homelab needed. +- Wire shower app for public exposure: fly nginx `shower.eblu.me` server + block as a guest-only surface — splash page, `/prizes//`, static + assets, media. Everything authenticated (`/admin/`, `/host/`, + `/accounts/`) returns 403 with a "tailnet only" pointer. Staff hit + `shower.ops.eblu.me` for the operator console + admin; the app's + v1.0.1 `DJANGO_PUBLIC_URL_BASE` setting makes QR codes generated on + the tailnet point back at the WAN host for guests. Plus a Caddy route + on indri, Pulumi Gandi CNAME, and a Grafana APM dashboard tracking + request rate, error rate, latency, bandwidth, and access logs. +- Mirror Valkey 8.1 locally as `registry.ops.eblu.me/blumeops/valkey`. Replaces direct pulls of `docker.io/valkey/valkey:8.1-alpine` for paperless and immich sidecars. Built via native Dagger pipeline on Alpine 3.22. Stateless swap — no data migration. Authentik's nix-built Redis remains separate. +- Add nix-built amd64 valkey for ringtail (`containers/valkey/default.nix`) so immich-ringtail can stop pulling the upstream multi-arch `docker.io/valkey/valkey` image. Existing `container.py` continues to build Alpine arm64 for paperless on indri. Both bump to valkey 8.1.7 (Alpine 3.22 8.1.7-r0 / nixpkgs 8.1.7). +- Upgrade Grafana Alloy v1.14.0 → v1.16.0 across all four service deployments + (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail on k8s; alloy native on + indri). Pulls in stable database observability (v1.15) and the OTel Collector + v0.147.0 bump. Container build also migrated from Dockerfile to native Dagger + `container.py` per the build-container-image migration playbook. +- Upgraded Dagger from v0.20.1 to v0.20.6 (engine, CLI pin, and SDK regen) and migrated `runner-job-image` from a Debian-based Dockerfile to a native Dagger `container.py` on Alpine 3.23, reusing the shared `alpine_runtime` helper. +- Decommission the wave-1 services on minikube-indri now that paperless, + teslamate, and mealie run on ringtail with their data backed up. Removes the + minikube `paperless`/`teslamate`/`mealie` manifest dirs + ArgoCD app + definitions (pruning the parked Deployments, Services, and the redundant + minikube mealie/paperless PVCs), and drops the `paperless`/`teslamate` roles + from the minikube `blumeops-pg` cluster. The `paperless` and `teslamate` + databases are dropped from indri's blumeops-pg as the finalization step. + miniflux + authentik remain on the minikube cluster (later waves). +- Upgraded the k8s Forgejo runner to the v12.8 line, switched it from first-boot registration to declarative `server.connections` credentials from 1Password, and consolidated the supporting runner how-to documentation. +- Move paperless, teslamate, and mealie off `minikube-indri` onto + `k3s-ringtail`, shedding ~1.1 GiB of resident load from the + OOM-thrashing 8 GiB minikube node (the kernel OOM killer had been + killing `kube-apiserver`/`dockerd`/argocd, flapping every + minikube-hosted service at once). paperless + teslamate databases + move into a fresh CNPG `blumeops-pg` cluster on ringtail via a cold + `pg_dump`/`pg_restore` from the quiesced source — row counts verified + equal before any routing flip; source DBs dropped only after the + ringtail side serves traffic. mealie's SQLite PVC is copied as-is. + paperless media stays on sifaka NFS. Downtime-tolerant cold cutover + (no streaming replication); rollback is repoint-and-scale-up with the + source untouched. Second chain in the indri-k8s decommission after + [[migrate-immich-to-ringtail]]. +- Recurring maintenance batch: + + - Ringtail flake inputs refreshed (`disko`, `home-manager`, `nixpkgs`). + - Tooling deps bumped: prek hooks (trufflehog v3.95.3, kingfisher v1.101.0, ruff v0.15.14, `ansible-core` 2.21.0); fly proxy base images (nginx 1.30.1-alpine, alloy v1.16.1); `typer==0.26.2` in mise tasks. +- Updated `nixos/ringtail/flake.lock` (weekly cadence): `disko`, `home-manager`, and `nixpkgs` inputs refreshed. `nixpkgs-services` skipped per overlay convention. +- Reviewed `mealie` service version freshness; upstream is 5 minor versions ahead (v3.17.0 vs deployed v3.12.0). Marked reviewed; upgrade deferred. +- Deploy shower v1.1.2 — bump container build to new app release. +- Upgrade unpoller v2.34.0 → v3.2.0 and migrate container build from Dockerfile to native Dagger (container.py). v3.0.0 carries breaking UniFi API changes; v3.2.0 introduces a 60s background poll (cached scrapes) by default — set `interval = 0` in `up.conf` to restore on-demand polling. +- Monthly tooling dependency refresh: prek hooks (trufflehog, kingfisher, ruff, shfmt, prettier, actionlint, ansible-lint), fly proxy base images (nginx 1.30.0, tailscale v1.94.2, alloy v1.16.0), normalize pyyaml lower bound in mise-tasks. +- Add GE-Proton (`pkgs.proton-ge-bin`) to `programs.steam.extraCompatPackages` + on ringtail. Subnautica 2 hangs at Mercuna plugin init under Proton + Experimental + DXVK D3D12; GE-Proton is available as a Steam per-game + compatibility option to work around it. +- Add `sn2-prelaunch` Steam launch wrapper on ringtail that removes + Subnautica 2's stale `Saved/running.dat` and `Saved/beforelobby.dat` + lockfiles before each launch. SN2 pops up an invisible (0×0-sized) + Error dialog when it detects an unclean exit, blocking GameThread + forever; this is observable only as a black screen with a spinning + loader. Use via Steam launch option: `sn2-prelaunch %command%`. +- Add local nix container build for `frigate-notify` (`containers/frigate-notify/default.nix`) so the Frigate→ntfy bridge is rebuilt on ringtail from the forge mirror instead of pulled from `ghcr.io/0x2142/frigate-notify`. +- Add resource limits to all ArgoCD pods to prevent unbounded resource consumption during node-wide pressure events. +- Black-hole the `/mirrors/*` repositories at the Fly proxy edge (`return 403` → `forge.ops.eblu.me`). A surprise $29.60 Fly bill traced to ~1.24 TB/30d of egress on `forge.eblu.me`, 99.95% of all proxy egress — of which ~71% was AI scrapers (Meta `meta-externalagent`, OpenAI `GPTBot`, Amazonbot) crawling the near-infinite git-history URL space of the public mirror repos and timing out Forgejo in the process. Mirrors exist for supply-chain control and are consumed over the tailnet, so their public web UI had no legitimate audience. `robots.txt` already disallowed `/mirrors/`, but the offending agents ignore it. Tier-2 mitigations (user-agent denylist, Anubis proof-of-work gateway) are documented in `docs/explanation/ai-scraper-mitigation.md`. +- Bump paperless and immich kustomizations to the main-SHA-built valkey tag (`v8.1.6-r0-fabca04`). Routine post-merge follow-up to keep production manifests pointing at images built from a commit on main. +- Bump shower container to v1.1.1 (probe FOD hash). +- Bumped shower app to v1.1.3 (wheel/sdist + FOD hashes probed on ringtail). +- Cap systemd-coredump on ringtail (ProcessSizeMax/ExternalSizeMax 1G, MaxUse 2G) so multi-GB Wine/Proton game crash dumps no longer thrash the disk and lock up the desktop. +- Deploy shower v1.1.1 to ringtail (kustomize newTag bump). +- Deployed shower v1.1.3 to ringtail (image built and pushed from ringtail; runner bypassed due to indri overload). +- Fix three follow-ups from the wave-1 decommission: grant the local + break-glass `admin` account ArgoCD admin rights (`g, admin, role:admin` — + previously only the Authentik `admins` group had access, so admin was + locked out whenever its token expired), and repoint the alloy blackbox + probe for teslamate from the deleted minikube service to + `https://tesla.ops.eblu.me/` (through Caddy over Tailscale). The orphaned + paperless/teslamate roles + ExternalSecrets left on the minikube + blumeops-pg are also cleaned up. +- Moved the Immich blackbox health probe from indri's alloy to ringtail's alloy. After the immich migration to ringtail, the probe still targeted `immich-server.immich.svc.cluster.local` on indri's cluster where the service no longer exists, causing a persistent `ServiceProbeFailure` alert. +- Pin shower v1.1.1 FOD outputHash (probed locally on ringtail). +- Rebuild Prowler container against main HEAD (v5.23.0-495e45d) after merging the IaC mutelist Dockerfile changes. +- Rebuild and retag alloy v1.16.0 container images from the main-branch SHA + following the squash-merge of #345, per the build-container-image + squash-merge convention. Both images (`registry.ops.eblu.me/blumeops/alloy`) + now reference `9564435` rather than the branch SHA `26a3ab5`, restoring + source traceability after branch cleanup. +- Rebuild shower from the post-merge commit on main so the container's + SHA tag points at a commit that will still exist after the 30-day + branch-cleanup window. Functionally identical to the branch-tag image + already deployed, just preserves source traceability per + [[build-container-image#Squash-merge and container tags]]. +- Rebuild unpoller container from squashed main commit so the image SHA tag matches a commit in main's history (was tagged with the pre-squash branch SHA). +- Rebuild valkey container from squashed main commit (both arm64 dagger and amd64 nix variants), and update paperless + immich-ringtail kustomizations to the main-SHA tags `v8.1.7-ecded30` and `v8.1.7-ecded30-nix`. +- Retired the `blumeops-tasks` mise task (Todoist API) in favor of `heph list --project Blumeops --json` from the self-hosted [hephaestus](https://github.com/eblume/hephaestus) system. Updated docs to point task discovery and rotation reminders at heph, and noted that the `~/code/personal/zk` zettelkasten is migrating into heph docs. +- Switch the Fly proxy deploy strategy from `bluegreen` to `immediate` in `fly/fly.toml`. With a single proxy machine, bluegreen offers little benefit — the green machine routinely failed to reach "started" inside Fly's default 5-minute deploy timeout (the cold-start sequence of `tailscaled` → `tailscale up` → wait-for-MagicDNS → nginx startup eats most of the budget), and the failed deploys would roll back. `immediate` replaces the machine in place with a brief downtime (~5–10s) but actually completes. +- Switch the ringtail provisioning playbook's blumeops clone URL from `forge.eblu.me` (public, via Fly proxy) to `forge.ops.eblu.me` (tailnet, direct via Caddy on indri). Ringtail is always on the tailnet, so the WAN round-trip is pure overhead — it also made `provision-ringtail` brittle whenever the Fly proxy was slow or down. +- Switched Grafana's deployment strategy from `RollingUpdate` to `Recreate`. With an RWO PVC holding the SQLite database and Bleve search index, `RollingUpdate` reliably crashloops the new pod on the index lock until rollout timeout. `Recreate` terminates the old pod first so the new one acquires the lock cleanly. +- Update `tailscale-operator-ringtail` ProxyClass to reference the `0108b68` main-SHA build of the tailscale container. Routine post-merge cleanup so the deployed image traces to a commit that survives PR branch cleanup. +- Update the ringtail NixOS flake lockfile (`nixos/ringtail/flake.lock`): bump + `nixpkgs` (b77b3de → 25f5383) and `disko` (5ba0c95 → 115e521) to latest. + `nixpkgs-services` was intentionally left pinned (skipped by the + `flake-update` pipeline). Routine recurring maintenance per [[manage-lockfile]]. +- Upgrade native macOS Alloy on indri to v1.16.0. Built on gilbert with Go + 1.26.2 + CGO (required for the macOS native DNS resolver, which Tailscale + MagicDNS depends on), scp'd to `~/.local/bin/alloy` on indri, codesigned, + and the LaunchAgent reloaded. Completes the v1.16.0 fleet upgrade started + in #345 — all four Alloy services (alloy-k8s, alloy-ringtail, + alloy-tracing-ringtail, alloy ansible) now run v1.16.0. +- Upgraded zot on indri from v2.1.15 to v2.1.16 (security fixes: TLS verification on metrics client, CORS Allow-Credentials suppression on wildcard origins, manifest/API-key body size limits). + +### Documentation + +- Reviewed `replicating-blumeops` tutorial: fixed "BluemeOps" typos (also in `contributing.md`) and added `last-reviewed` frontmatter. +- Reviewed [[indri]] reference card: added `devpi`, `cv`, and `docs` to the native-services list; widened the k8s note to reflect the growing set of apps now on ringtail and the planned indri-minikube decommission; added CPU/RAM specs. +- New how-to: rotate-fly-deploy-token. Documents the 75-day rotation cadence, why we use `org`-scoped tokens (silences the cosmetic metrics-token warning on `fly status` with marginal blast-radius cost given the single-app personal org), and the procedure for rotation + Forgejo Actions secret sync. +- Add `docs/explanation/ai-scraper-mitigation.md` — the egress-cost / AI-crawler threat model for the public Fly proxy, the tiered mitigation plan (Tier 1: mirror black-hole, shipped; Tier 2: user-agent denylist + Anubis; Tier 3: Cloudflare, rejected on principle), and the data behind it. +- Fix manage-forgejo-mirrors verify step — sync button is on the repo settings page ("Synchronize now"), not the main repo page. +- Fixed the `op item edit` invocation in the [[zot]] API-key rotation procedure: the previous `pbpaste | op item edit ... "field[password]=-"` stdin syntax is rejected by op 2.34 as "invalid JSON" (recent op versions treat piped input as a full JSON template, not a single field value). Procedure now reads the clipboard into a local fish variable and passes it as an inline assignment. +- Fixed the export-filename step in [[run-1password-backup]]: 1Password's desktop app names the export `1PasswordExport--.1pux` automatically rather than letting you save to a fixed name, so the procedure now points the task at that glob instead of pretending the default name is `1Password-export.1pux`. +- Refresh the contributing tutorial: add `last-reviewed`, include the `.ai.md` changelog fragment type, and clarify that `prek` is pinned via `mise`. +- Review and refresh the Navidrome reference card: add `last-reviewed`, correct the scanner env var name, document the current image/version, and record routing and runtime details from the manifests. +- Review and refresh the Ollama reference card: add `last-reviewed`, bump the documented image tag to 0.20.4, and add the two `qwen3.5` models now declared in `models.txt`. +- Reviewed [[1password]] reference card: added the `blumeops` vs `Personal` vault split, noted that `onepassword-connect` runs on both indri and ringtail (not just one cluster), and pulled the `op read` vs `op item get --fields` guidance up from agent memory into the card. +- Reviewed `index.md`; added ringtail to the infrastructure overview and stamped `last-reviewed`. +- Reviewed transmission card: corrected storage layout (`/config/` is emptyDir, watch dir disabled) and noted the Prometheus exporter sidecar. +- rotate-fly-deploy-token: combine mint+store into one command with both fish and bash forms; document the `op item edit` "Password item requires ps value" validator gotcha and the placeholder-password workaround. + +### AI Assistance + +- Adopt `AGENTS.md` as the canonical agent instruction file, keep `CLAUDE.md` as a compatibility shim, and update docs to reference the neutral file and the correct agent-change-process path. +- CLAUDE.md now imports AGENTS.md via `@AGENTS.md` instead of telling agents to go read it. Claude Code only auto-loads CLAUDE.md, so the prose shim was easy to skip; the import inlines AGENTS.md into the session prompt unconditionally. + +### Miscellaneous + +- Removed the dead minikube manifests, container builds, and tooling shims left behind after the cv + docs migration to indri-native (#342). Deletes `argocd/{apps,manifests}/{cv,docs}/`, `containers/{cv,quartz}/`, and the `quartz`→`docs` mapping in `mise-tasks/container-version-check`. Bumps `docs.current-version` to `v1.16.0` (the blumeops release tag) now that the legacy nginx-base version pin is gone. +- Rebuild shower v1.1.0 container from main HEAD (`3c7967e`) and bump the + kustomization tag to `v1.1.0-3c7967e-nix`. The PR was squash-merged, so + the branch commit `444ff91` baked into the prior tag isn't reachable + from main's history. The new tag points at a commit that exists on + main; image content is byte-identical because the FOD output is content + addressed and the inputs didn't change. +- Rebuild shower v1.1.2 from main HEAD (a33fa47) and retag — PR #358 was squash-merged so the branch SHA baked into the prior image tag isn't reachable from main. FOD is content-addressed, so image bytes are identical; only provenance changes. +- Remove the duplicate Homepage tiles for Mealie, Paperless, Immich, and + TeslaMate. Homepage runs on ringtail and autodiscovers ringtail Ingresses via + `gethomepage.dev/*` annotations; once these services migrated to ringtail they + were discovered automatically, making their leftover static `services.yaml` + entries (needed only while they lived on minikube) redundant. +- Removed the now-unused `containers/devpi/` Dagger build artifact. Devpi runs natively on indri via uv venv; the container image is no longer referenced anywhere. Doc examples in `docs/reference/tools/dagger.md` updated to use `miniflux` as the example container name. +- `container-build-and-release` now prints the specific `mise run runner-logs ` command after dispatching, polling the Forgejo API to resolve the run number for the commit it just triggered. +- `mise run runner-logs -j ` now reports a clear error when the log file doesn't exist on indri (e.g. a runner crash that left `action_task.log_in_storage = 0`). Previously it printed only the header and exited 0, because `zstdcat` exits 0 with a "can't stat … -- ignored" stderr message and ssh+fish on indri swallows the remote exit code. + + ## [v1.16.0] - 2026-04-18 ### Infrastructure diff --git a/ansible/playbooks/indri.yml b/ansible/playbooks/indri.yml index ddb57f8..1e33bb1 100644 --- a/ansible/playbooks/indri.yml +++ b/ansible/playbooks/indri.yml @@ -260,5 +260,7 @@ tags: cv - role: docs tags: docs + - role: heph + tags: heph - role: caddy tags: caddy diff --git a/ansible/roles/caddy/defaults/main.yml b/ansible/roles/caddy/defaults/main.yml index 363d09e..e6d7385 100644 --- a/ansible/roles/caddy/defaults/main.yml +++ b/ansible/roles/caddy/defaults/main.yml @@ -52,6 +52,9 @@ caddy_services: - name: devpi host: "pypi.{{ caddy_domain }}" backend: "http://localhost:3141" + - name: heph + host: "heph.{{ caddy_domain }}" + backend: "http://localhost:8787" # hephaestus hub (server mode) + PWA shell - name: kiwix host: "kiwix.{{ caddy_domain }}" backend: "https://kiwix.tail8d86e.ts.net" diff --git a/ansible/roles/docs/defaults/main.yml b/ansible/roles/docs/defaults/main.yml index f09221b..a5a1a8a 100644 --- a/ansible/roles/docs/defaults/main.yml +++ b/ansible/roles/docs/defaults/main.yml @@ -3,9 +3,8 @@ # Caddy serves docs_content_dir directly via the static-kind service block, # with Quartz-style try_files (path → path/ → path.html → 404). -docs_version: "v1.16.0" +docs_version: "v1.17.0" docs_release_url: "https://forge.eblu.me/eblume/blumeops/releases/download/{{ docs_version }}/docs-{{ docs_version }}.tar.gz" - docs_home: /Users/erichblume/blumeops/docs docs_content_dir: "{{ docs_home }}/content" docs_version_sentinel: "{{ docs_home }}/.installed-version" diff --git a/ansible/roles/heph/defaults/main.yml b/ansible/roles/heph/defaults/main.yml new file mode 100644 index 0000000..88d2240 --- /dev/null +++ b/ansible/roles/heph/defaults/main.yml @@ -0,0 +1,49 @@ +--- +# hephaestus hub — the canonical heph replica (server mode) on indri. +# Other devices (e.g. gilbert) are spokes that sync against this hub. +# See [[set-up-sync-hub]] and [[host-heph-pwa]] in the hephaestus repo. + +# Pinned release used for the initial `cargo install` and the PWA shell. +# After bootstrap, hephd's own --self-update keeps the binary current; this +# pin only governs the first install and the bundled PWA shell version. +heph_version: v1.2.1 + +# Anonymous public HTTPS clone — matches hephd's INSTALL_GIT_URL so the initial +# install and unattended self-update build from the same source (no ssh-agent). +heph_repo_url: https://forge.eblu.me/eblume/hephaestus.git + +heph_bin_dir: /Users/erichblume/.cargo/bin +heph_binary: "{{ heph_bin_dir }}/hephd" + +# rustc/cargo here are rustup shims. The bare (non-mise) environment that the +# launchagent and ansible run in falls back to rustup's *default* toolchain, +# which can lag behind heph's rust-version floor (Cargo.toml: 1.89). Pin the +# channel explicitly so both the bootstrap build and unattended self-update +# always use a current toolchain regardless of the host's rustup default. +heph_rust_toolchain: stable + +heph_data_dir: /Users/erichblume/.local/share/heph +heph_db: "{{ heph_data_dir }}/heph.db" +heph_socket: "{{ heph_data_dir }}/hephd.sock" +heph_log_dir: /Users/erichblume/Library/Logs + +# Version-pinned source checkout; the PWA static shell is served directly from +# its heph-pwa/ subdir (no copy), keeping shell and hub in lockstep at heph_version. +heph_pwa_src_dir: /Users/erichblume/.cache/heph-pwa-src +heph_web_root: "{{ heph_pwa_src_dir }}/heph-pwa" + +# Hub listens on all interfaces so tailnet spokes can reach it directly +# (http://indri.tail8d86e.ts.net:8787) and Caddy can proxy heph.ops.eblu.me. +# Access is gated by Authentik OIDC regardless — tailnet reachability is not +# enough (this is the owner's most sensitive data). +heph_http_addr: 0.0.0.0:8787 +heph_port: 8787 +heph_external_url: https://heph.ops.eblu.me + +# Authentik OIDC — issuer + audience together turn hub auth on. The audience is +# the device-code client id (see argocd/manifests/authentik heph blueprint). +heph_oidc_issuer: https://authentik.ops.eblu.me/application/o/heph/ +heph_oidc_audience: heph + +# Self-update poll interval (seconds). 10 minutes. +heph_self_update_interval_secs: 600 diff --git a/ansible/roles/heph/handlers/main.yml b/ansible/roles/heph/handlers/main.yml new file mode 100644 index 0000000..92fe9d7 --- /dev/null +++ b/ansible/roles/heph/handlers/main.yml @@ -0,0 +1,6 @@ +--- +- name: Restart heph + ansible.builtin.shell: | + launchctl unload ~/Library/LaunchAgents/mcquack.eblume.heph.plist 2>/dev/null || true + launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist + changed_when: true diff --git a/ansible/roles/heph/tasks/main.yml b/ansible/roles/heph/tasks/main.yml new file mode 100644 index 0000000..7a45fe3 --- /dev/null +++ b/ansible/roles/heph/tasks/main.yml @@ -0,0 +1,82 @@ +--- +# hephaestus hub (server mode) on indri. +# +# DATA SEEDING (one-time, Path A — do this BEFORE the first provision so the hub +# adopts gilbert's existing data instead of being born empty): +# +# 1. On the seed device (gilbert): heph daemon stop +# 2. Copy its store to indri: scp ~/.local/share/heph/heph.db \ +# indri:~/.local/share/heph/heph.db +# 3. On indri, give the hub its OWN device origin (keeps gilbert's owner_id + +# data; hephd regenerates a fresh origin on next start when it is missing): +# sqlite3 ~/.local/share/heph/heph.db "DELETE FROM meta WHERE key='origin';" +# 4. Run this role (installs hephd, stages the PWA, loads the launchagent). +# +# hephd auto-creates an empty store on first start if none exists, so seeding is +# optional — skip it only if you intend a fresh, empty hub. + +- name: Ensure heph data directory exists + ansible.builtin.file: + path: "{{ heph_data_dir }}" + state: directory + mode: '0700' + +- name: Check for installed hephd binary + ansible.builtin.stat: + path: "{{ heph_binary }}" + register: heph_binary_stat + +# Bootstrap install only when hephd is absent. Thereafter hephd's own +# --self-update keeps it current; ansible must not fight (or downgrade) it. +# This builds from source and can take several minutes on a cold cargo cache. +- name: Bootstrap-install heph + hephd from the forge ({{ heph_version }}) + ansible.builtin.command: + cmd: >- + {{ heph_bin_dir }}/cargo install --locked + --git {{ heph_repo_url }} + --tag {{ heph_version }} + heph hephd + environment: + PATH: "{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin" + RUSTUP_TOOLCHAIN: "{{ heph_rust_toolchain }}" + when: not heph_binary_stat.stat.exists + changed_when: true + notify: Restart heph + +# Checkout provides the PWA shell at {{ heph_web_root }} (heph-pwa/ subdir), +# served directly by hephd. Static files are read from disk per request, so a +# version bump needs no restart; the service worker (CACHE = "heph-pwa-vN") +# evicts stale assets on next load. +- name: Ensure heph cache parent directory exists + ansible.builtin.file: + path: "{{ heph_pwa_src_dir | dirname }}" + state: directory + mode: '0755' + +- name: Stage heph-pwa source at {{ heph_version }} + ansible.builtin.git: + repo: "{{ heph_repo_url }}" + dest: "{{ heph_pwa_src_dir }}" + version: "{{ heph_version }}" + depth: 1 + single_branch: true + force: true + +- name: Deploy heph LaunchAgent plist + ansible.builtin.template: + src: heph.plist.j2 + dest: ~/Library/LaunchAgents/mcquack.eblume.heph.plist + mode: '0644' + notify: Restart heph + +- name: Check if heph LaunchAgent is loaded + ansible.builtin.command: launchctl list mcquack.eblume.heph + register: heph_launchctl_check + changed_when: false + failed_when: false + +- name: Load heph LaunchAgent if not loaded + ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist + when: heph_launchctl_check.rc != 0 + changed_when: true + failed_when: false diff --git a/ansible/roles/heph/templates/heph.plist.j2 b/ansible/roles/heph/templates/heph.plist.j2 new file mode 100644 index 0000000..19a2367 --- /dev/null +++ b/ansible/roles/heph/templates/heph.plist.j2 @@ -0,0 +1,50 @@ + + + + + + Label + mcquack.eblume.heph + ProgramArguments + + {{ heph_binary }} + --mode + server + --http-addr + {{ heph_http_addr }} + --db + {{ heph_db }} + --socket + {{ heph_socket }} + --web-root + {{ heph_web_root }} + --oidc-issuer + {{ heph_oidc_issuer }} + --oidc-audience + {{ heph_oidc_audience }} + --self-update + --self-update-interval-secs + {{ heph_self_update_interval_secs }} + + RunAtLoad + + KeepAlive + + EnvironmentVariables + + + PATH + {{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin + HOME + /Users/erichblume + + RUSTUP_TOOLCHAIN + {{ heph_rust_toolchain }} + + StandardOutPath + {{ heph_log_dir }}/mcquack.heph.out.log + StandardErrorPath + {{ heph_log_dir }}/mcquack.heph.err.log + + diff --git a/argocd/apps/external-secrets-ringtail.yaml b/argocd/apps/external-secrets-ringtail.yaml index e2f5898..0bb8bd7 100644 --- a/argocd/apps/external-secrets-ringtail.yaml +++ b/argocd/apps/external-secrets-ringtail.yaml @@ -15,7 +15,7 @@ spec: source: repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git targetRevision: main - path: argocd/manifests/external-secrets + path: argocd/manifests/external-secrets-ringtail destination: server: https://ringtail.tail8d86e.ts.net:6443 namespace: external-secrets diff --git a/argocd/manifests/authentik/configmap-blueprint.yaml b/argocd/manifests/authentik/configmap-blueprint.yaml index fcbb99b..cc97dea 100644 --- a/argocd/manifests/authentik/configmap-blueprint.yaml +++ b/argocd/manifests/authentik/configmap-blueprint.yaml @@ -434,3 +434,93 @@ data: provider: !KeyOf mealie-provider meta_launch_url: https://meals.ops.eblu.me policy_engine_mode: all + + heph.yaml: | + version: 1 + metadata: + name: BlumeOps Heph SSO + labels: + blueprints.goauthentik.io/description: "Hephaestus hub OIDC (device-code) provider, application, and device-code flow" + entries: + # Device-code flow (RFC 8628). authentik ships no default for this, so we + # create one and bind it to the brand below. An empty stage_configuration + # flow is sufficient: the already-authenticated user just confirms the code. + - model: authentik_flows.flow + id: device-code-flow + identifiers: + slug: default-device-code-flow + attrs: + name: Device code flow + title: Device code flow + slug: default-device-code-flow + designation: stage_configuration + authentication: require_authenticated + + # Enable the device-code grant globally by binding the flow to the default + # brand (domain authentik-default). Partial update — only sets this field. + - model: authentik_brands.brand + identifiers: + domain: authentik-default + attrs: + flow_device_code: !KeyOf device-code-flow + + # OAuth2 provider for heph — PUBLIC client (device-code + PKCE, no secret). + # client_id doubles as the token audience the hub verifies (--oidc-audience heph), + # and the app slug 'heph' is the issuer path (/application/o/heph/). + - model: authentik_providers_oauth2.oauth2provider + id: heph-provider + identifiers: + name: Heph + attrs: + name: Heph + authorization_flow: !Find [authentik_flows.flow, [slug, default-provider-authorization-implicit-consent]] + invalidation_flow: !Find [authentik_flows.flow, [slug, default-provider-invalidation-flow]] + client_type: public + client_id: heph + # CLI/TUI use the device-code grant (no redirect). The heph-pwa browser + # login uses Authorization Code + PKCE, which DOES redirect back to the + # app's origin — register those here (Authentik also keys token-endpoint + # CORS off these origins). Trailing slash matters: the PWA's redirect_uri + # is its base dir, e.g. https://heph.ops.eblu.me/. + redirect_uris: + - matching_mode: strict + url: https://heph.ops.eblu.me/ + - matching_mode: strict + url: http://localhost:8787/ # local dev (hephd --web-root) + signing_key: !Find [authentik_crypto.certificatekeypair, [name, authentik Self-signed Certificate]] + property_mappings: + - !Find [authentik_providers_oauth2.scopemapping, [scope_name, openid]] + - !Find [authentik_providers_oauth2.scopemapping, [scope_name, email]] + - !Find [authentik_providers_oauth2.scopemapping, [scope_name, profile]] + # offline_access: heph CLI requests "openid offline_access"; without + # this mapping the refresh token is session-bound and hephd's + # refresh_token grant 400s once the session lapses (spoke sync dies). + - !Find [authentik_providers_oauth2.scopemapping, [scope_name, offline_access]] + sub_mode: hashed_user_id + include_claims_in_id_token: true + + # Heph application — linked to the OAuth2 provider + - model: authentik_core.application + id: heph-app + identifiers: + slug: heph + attrs: + name: Hephaestus + slug: heph + provider: !KeyOf heph-provider + meta_launch_url: https://heph.ops.eblu.me + policy_engine_mode: any + + # Policy binding — restrict heph to admins group (single-owner, sensitive data) + - model: authentik_policies.policybinding + identifiers: + order: 0 + target: !KeyOf heph-app + group: !Find [authentik_core.group, [name, admins]] + attrs: + target: !KeyOf heph-app + group: !Find [authentik_core.group, [name, admins]] + order: 0 + enabled: true + negate: false + timeout: 30 diff --git a/argocd/manifests/external-secrets-ringtail/kustomization.yaml b/argocd/manifests/external-secrets-ringtail/kustomization.yaml new file mode 100644 index 0000000..9fd4e2f --- /dev/null +++ b/argocd/manifests/external-secrets-ringtail/kustomization.yaml @@ -0,0 +1,16 @@ +# Ringtail (amd64) overlay for external-secrets. +# +# Reuses the shared indri manifest as a base and only overrides the controller +# image to the nix-built amd64 variant (`-nix` tag). The base sets the arm64 +# image (built via containers/external-secrets/container.py on indri's Dagger +# runner); ringtail's k3s is amd64 and needs the image built by +# containers/external-secrets/default.nix on the nix-container-builder. +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +resources: + - ../external-secrets + +images: + - name: registry.ops.eblu.me/blumeops/external-secrets + newTag: v2.2.0-13895bb-nix diff --git a/argocd/manifests/external-secrets/kustomization.yaml b/argocd/manifests/external-secrets/kustomization.yaml index 574aaa7..639db66 100644 --- a/argocd/manifests/external-secrets/kustomization.yaml +++ b/argocd/manifests/external-secrets/kustomization.yaml @@ -12,4 +12,5 @@ resources: images: - name: ghcr.io/external-secrets/external-secrets - newTag: v2.2.0 + newName: registry.ops.eblu.me/blumeops/external-secrets + newTag: v2.2.0-13895bb diff --git a/argocd/manifests/nvidia-device-plugin/kustomization.yaml b/argocd/manifests/nvidia-device-plugin/kustomization.yaml index a46edf6..f5a33ae 100644 --- a/argocd/manifests/nvidia-device-plugin/kustomization.yaml +++ b/argocd/manifests/nvidia-device-plugin/kustomization.yaml @@ -10,4 +10,4 @@ resources: images: - name: nvcr.io/nvidia/k8s-device-plugin - newTag: v0.19.0 + newTag: v0.19.2 diff --git a/argocd/manifests/tailscale-operator-base/kustomization.yaml b/argocd/manifests/tailscale-operator-base/kustomization.yaml index 4519af6..9d117ef 100644 --- a/argocd/manifests/tailscale-operator-base/kustomization.yaml +++ b/argocd/manifests/tailscale-operator-base/kustomization.yaml @@ -6,8 +6,11 @@ namespace: tailscale # Upstream Tailscale operator manifest from forge mirror. # To upgrade: update the ref in the URL AND the newTag below. +# Must use the tailnet host forge.ops.eblu.me — the public forge.eblu.me +# black-holes /mirrors/ at the Fly edge (AI-scraper mitigation), which the +# in-cluster ArgoCD repo-server would otherwise hit and fail with a 403. resources: - - https://forge.eblu.me/mirrors/tailscale/raw/tag/v1.94.2/cmd/k8s-operator/deploy/manifests/operator.yaml + - https://forge.ops.eblu.me/mirrors/tailscale/raw/tag/v1.94.2/cmd/k8s-operator/deploy/manifests/operator.yaml - proxyclass.yaml - dnsconfig.yaml diff --git a/containers/external-secrets/container.py b/containers/external-secrets/container.py new file mode 100644 index 0000000..6be5765 --- /dev/null +++ b/containers/external-secrets/container.py @@ -0,0 +1,51 @@ +"""External Secrets Operator — native Dagger build. + +Two-stage build: Go binary (all providers), Alpine runtime. +Source cloned from forge mirror. + +A single binary serves as the controller, webhook, and cert-controller; the +Deployments select the role via a subcommand passed in `args:`, so the image +ENTRYPOINT must be the binary itself (matching upstream's distroless image). +""" + +import dagger + +from blumeops.containers import ( + alpine_runtime, + clone_from_forge, + go_build, + oci_labels, +) + +VERSION = "v2.2.0" + + +async def build(src: dagger.Directory) -> dagger.Container: + source = clone_from_forge("external-secrets", VERSION) + + # Upstream `make build` compiles every secret provider into a single + # static binary (`-tags all_providers`, CGO disabled). Mirror that so the + # local image is functionally identical to ghcr.io/.../external-secrets. + backend = go_build( + source, + "/external-secrets", + tags="all_providers", + ) + + runtime = alpine_runtime( + extra_apk=["ca-certificates"], + create_user=False, + ) + runtime = oci_labels( + runtime, + title="External Secrets Operator", + description=( + "Kubernetes operator that integrates external secret management systems" + ), + version=VERSION, + ) + return ( + runtime.with_file("/bin/external-secrets", backend.file("/external-secrets")) + .with_user("65534") + .with_entrypoint(["/bin/external-secrets"]) + ) diff --git a/containers/external-secrets/default.nix b/containers/external-secrets/default.nix new file mode 100644 index 0000000..eabe03d --- /dev/null +++ b/containers/external-secrets/default.nix @@ -0,0 +1,56 @@ +# Nix-built External Secrets Operator (amd64, for ringtail k3s). +# Builds v2.2.0 from the forge mirror with all secret providers compiled in, +# faithful to upstream's `make build` (-tags all_providers). The container.py +# sibling builds the arm64 image for indri's minikube; this default.nix builds +# the amd64 image on ringtail's nix-container-builder. +{ pkgs ? import { } }: + +let + version = "2.2.0"; + + src = pkgs.fetchgit { + url = "https://forge.ops.eblu.me/mirrors/external-secrets.git"; + rev = "v${version}"; + hash = "sha256-eAocOAp5s4CFRrpKfQr2lf3Ji+6nQQ1A5/eTw5B7v9U="; + }; + + # external-secrets v2.2.0 requires Go >= 1.26.1; nixpkgs default go is 1.25.x. + external-secrets = (pkgs.buildGoModule.override { go = pkgs.go_1_26; }) { + inherit src version; + pname = "external-secrets"; + vendorHash = "sha256-0xuBK3fjAplPLAElHvKB6d+2lDz+De/s91fV4dPZwjE="; + + doCheck = false; + + subPackages = [ "." ]; + + tags = [ "all_providers" ]; + + ldflags = [ "-s" "-w" ]; + + meta = with pkgs.lib; { + description = "Kubernetes operator that integrates external secret management systems"; + homepage = "https://github.com/external-secrets/external-secrets"; + license = licenses.asl20; + mainProgram = "external-secrets"; + }; + }; +in + +pkgs.dockerTools.buildLayeredImage { + name = "blumeops/external-secrets"; + contents = [ + external-secrets + pkgs.cacert + pkgs.tzdata + ]; + + config = { + Entrypoint = [ "${external-secrets}/bin/external-secrets" ]; + Env = [ + "SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt" + "TZDIR=${pkgs.tzdata}/share/zoneinfo" + ]; + User = "65534"; + }; +} diff --git a/docs/changelog.d/+1password-backup-doc-export-name.doc.md b/docs/changelog.d/+1password-backup-doc-export-name.doc.md deleted file mode 100644 index 6c4d262..0000000 --- a/docs/changelog.d/+1password-backup-doc-export-name.doc.md +++ /dev/null @@ -1 +0,0 @@ -Fixed the export-filename step in [[run-1password-backup]]: 1Password's desktop app names the export `1PasswordExport--.1pux` automatically rather than letting you save to a fixed name, so the procedure now points the task at that glob instead of pretending the default name is `1Password-export.1pux`. diff --git a/docs/changelog.d/+agent-file-neutralization.ai.md b/docs/changelog.d/+agent-file-neutralization.ai.md deleted file mode 100644 index da16fba..0000000 --- a/docs/changelog.d/+agent-file-neutralization.ai.md +++ /dev/null @@ -1 +0,0 @@ -Adopt `AGENTS.md` as the canonical agent instruction file, keep `CLAUDE.md` as a compatibility shim, and update docs to reference the neutral file and the correct agent-change-process path. diff --git a/docs/changelog.d/+ai-scraper-mitigation-doc.doc.md b/docs/changelog.d/+ai-scraper-mitigation-doc.doc.md deleted file mode 100644 index 246fedb..0000000 --- a/docs/changelog.d/+ai-scraper-mitigation-doc.doc.md +++ /dev/null @@ -1 +0,0 @@ -Add `docs/explanation/ai-scraper-mitigation.md` — the egress-cost / AI-crawler threat model for the public Fly proxy, the tiered mitigation plan (Tier 1: mirror black-hole, shipped; Tier 2: user-agent denylist + Anubis; Tier 3: Cloudflare, rejected on principle), and the data behind it. diff --git a/docs/changelog.d/+alloy-main-sha-rebuild.infra.md b/docs/changelog.d/+alloy-main-sha-rebuild.infra.md deleted file mode 100644 index 42a7b37..0000000 --- a/docs/changelog.d/+alloy-main-sha-rebuild.infra.md +++ /dev/null @@ -1,5 +0,0 @@ -Rebuild and retag alloy v1.16.0 container images from the main-branch SHA -following the squash-merge of #345, per the build-container-image -squash-merge convention. Both images (`registry.ops.eblu.me/blumeops/alloy`) -now reference `9564435` rather than the branch SHA `26a3ab5`, restoring -source traceability after branch cleanup. diff --git a/docs/changelog.d/+alloy-native-macos-v1.16.0.infra.md b/docs/changelog.d/+alloy-native-macos-v1.16.0.infra.md deleted file mode 100644 index 471990f..0000000 --- a/docs/changelog.d/+alloy-native-macos-v1.16.0.infra.md +++ /dev/null @@ -1,6 +0,0 @@ -Upgrade native macOS Alloy on indri to v1.16.0. Built on gilbert with Go -1.26.2 + CGO (required for the macOS native DNS resolver, which Tailscale -MagicDNS depends on), scp'd to `~/.local/bin/alloy` on indri, codesigned, -and the LaunchAgent reloaded. Completes the v1.16.0 fleet upgrade started -in #345 — all four Alloy services (alloy-k8s, alloy-ringtail, -alloy-tracing-ringtail, alloy ansible) now run v1.16.0. diff --git a/docs/changelog.d/+argocd-resource-limits.infra.md b/docs/changelog.d/+argocd-resource-limits.infra.md deleted file mode 100644 index ba24a5a..0000000 --- a/docs/changelog.d/+argocd-resource-limits.infra.md +++ /dev/null @@ -1 +0,0 @@ -Add resource limits to all ArgoCD pods to prevent unbounded resource consumption during node-wide pressure events. diff --git a/docs/changelog.d/+claude-md-import-agents.ai.md b/docs/changelog.d/+claude-md-import-agents.ai.md deleted file mode 100644 index f63231e..0000000 --- a/docs/changelog.d/+claude-md-import-agents.ai.md +++ /dev/null @@ -1 +0,0 @@ -CLAUDE.md now imports AGENTS.md via `@AGENTS.md` instead of telling agents to go read it. Claude Code only auto-loads CLAUDE.md, so the prose shim was easy to skip; the import inlines AGENTS.md into the session prompt unconditionally. diff --git a/docs/changelog.d/+container-build-suggest-runner-logs.misc.md b/docs/changelog.d/+container-build-suggest-runner-logs.misc.md deleted file mode 100644 index d10ea51..0000000 --- a/docs/changelog.d/+container-build-suggest-runner-logs.misc.md +++ /dev/null @@ -1 +0,0 @@ -`container-build-and-release` now prints the specific `mise run runner-logs ` command after dispatching, polling the Forgejo API to resolve the run number for the commit it just triggered. diff --git a/docs/changelog.d/+external-secrets-main-sha-rebuild.infra.md b/docs/changelog.d/+external-secrets-main-sha-rebuild.infra.md new file mode 100644 index 0000000..2e931d4 --- /dev/null +++ b/docs/changelog.d/+external-secrets-main-sha-rebuild.infra.md @@ -0,0 +1 @@ +Rebuilt the locally-built external-secrets image from the `main` branch so the deployed tag (`v2.2.0-0e70a1b`) traces to a `main` commit rather than the now-merged feature branch, giving a stable provenance reference. diff --git a/docs/changelog.d/+external-secrets-stable-main-sha.infra.md b/docs/changelog.d/+external-secrets-stable-main-sha.infra.md new file mode 100644 index 0000000..fbe3c21 --- /dev/null +++ b/docs/changelog.d/+external-secrets-stable-main-sha.infra.md @@ -0,0 +1 @@ +Rebuilt the external-secrets images off `main` and repointed both clusters to the stable main-sha tags (`v2.2.0-13895bb` arm64 / `v2.2.0-13895bb-nix` amd64), so the deployed images on indri and ringtail trace to the same `main` commit rather than earlier feature-branch builds. diff --git a/docs/changelog.d/+fix-forge-static-assets.bugfix.md b/docs/changelog.d/+fix-forge-static-assets.bugfix.md deleted file mode 100644 index de0517e..0000000 --- a/docs/changelog.d/+fix-forge-static-assets.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -Fixed forge.eblu.me static assets (CSS, JS, images, fonts) not loading — the proxy's static asset cache block was missing the `Host` header, so Caddy couldn't route the requests. diff --git a/docs/changelog.d/+fly-deploy-immediate-strategy.infra.md b/docs/changelog.d/+fly-deploy-immediate-strategy.infra.md deleted file mode 100644 index 205bd6a..0000000 --- a/docs/changelog.d/+fly-deploy-immediate-strategy.infra.md +++ /dev/null @@ -1 +0,0 @@ -Switch the Fly proxy deploy strategy from `bluegreen` to `immediate` in `fly/fly.toml`. With a single proxy machine, bluegreen offers little benefit — the green machine routinely failed to reach "started" inside Fly's default 5-minute deploy timeout (the cold-start sequence of `tailscaled` → `tailscale up` → wait-for-MagicDNS → nginx startup eats most of the budget), and the failed deploys would roll back. `immediate` replaces the machine in place with a brief downtime (~5–10s) but actually completes. diff --git a/docs/changelog.d/+forge-mirrors-blackhole.infra.md b/docs/changelog.d/+forge-mirrors-blackhole.infra.md deleted file mode 100644 index 29a5e6a..0000000 --- a/docs/changelog.d/+forge-mirrors-blackhole.infra.md +++ /dev/null @@ -1 +0,0 @@ -Black-hole the `/mirrors/*` repositories at the Fly proxy edge (`return 403` → `forge.ops.eblu.me`). A surprise $29.60 Fly bill traced to ~1.24 TB/30d of egress on `forge.eblu.me`, 99.95% of all proxy egress — of which ~71% was AI scrapers (Meta `meta-externalagent`, OpenAI `GPTBot`, Amazonbot) crawling the near-infinite git-history URL space of the public mirror repos and timing out Forgejo in the process. Mirrors exist for supply-chain control and are consumed over the tailnet, so their public web UI had no legitimate audience. `robots.txt` already disallowed `/mirrors/`, but the offending agents ignore it. Tier-2 mitigations (user-agent denylist, Anubis proof-of-work gateway) are documented in `docs/explanation/ai-scraper-mitigation.md`. diff --git a/docs/changelog.d/+frigate-notify-local.infra.md b/docs/changelog.d/+frigate-notify-local.infra.md deleted file mode 100644 index 120f915..0000000 --- a/docs/changelog.d/+frigate-notify-local.infra.md +++ /dev/null @@ -1 +0,0 @@ -Add local nix container build for `frigate-notify` (`containers/frigate-notify/default.nix`) so the Frigate→ntfy bridge is rebuilt on ringtail from the forge mirror instead of pulled from `ghcr.io/0x2142/frigate-notify`. diff --git a/docs/changelog.d/+grafana-recreate-strategy.infra.md b/docs/changelog.d/+grafana-recreate-strategy.infra.md deleted file mode 100644 index 3662e10..0000000 --- a/docs/changelog.d/+grafana-recreate-strategy.infra.md +++ /dev/null @@ -1 +0,0 @@ -Switched Grafana's deployment strategy from `RollingUpdate` to `Recreate`. With an RWO PVC holding the SQLite database and Bleve search index, `RollingUpdate` reliably crashloops the new pod on the index lock until rollout timeout. `Recreate` terminates the old pod first so the new one acquires the lock cleanly. diff --git a/docs/changelog.d/+heph-hub-v1.2.1.infra.md b/docs/changelog.d/+heph-hub-v1.2.1.infra.md new file mode 100644 index 0000000..c203323 --- /dev/null +++ b/docs/changelog.d/+heph-hub-v1.2.1.infra.md @@ -0,0 +1 @@ +Bumped the indri heph hub to v1.2.1, which adds the hub `GET /config` endpoint and ships the heph-pwa **Login with Authentik** flow (Authorization Code + PKCE). Pairs with the Authentik `heph` provider redirect URIs registered earlier. diff --git a/docs/changelog.d/+homepage-config-perms-fix.bugfix.md b/docs/changelog.d/+homepage-config-perms-fix.bugfix.md deleted file mode 100644 index 20e1135..0000000 --- a/docs/changelog.d/+homepage-config-perms-fix.bugfix.md +++ /dev/null @@ -1,5 +0,0 @@ -Fixed homepage container EACCES on cold start: the nix-built image now chowns -`/app/config` to uid 1000 at build time via `fakeRootCommands`, matching the -behavior of the old Dockerfile. Without this, homepage couldn't seed missing -skeleton configs (proxmox.yaml etc.) or create `/app/config/logs`, crashing on -its first uncached request. Caught during the ringtail cutover. diff --git a/docs/changelog.d/+homepage-dedup-migrated.misc.md b/docs/changelog.d/+homepage-dedup-migrated.misc.md deleted file mode 100644 index 9efc5ba..0000000 --- a/docs/changelog.d/+homepage-dedup-migrated.misc.md +++ /dev/null @@ -1,5 +0,0 @@ -Remove the duplicate Homepage tiles for Mealie, Paperless, Immich, and -TeslaMate. Homepage runs on ringtail and autodiscovers ringtail Ingresses via -`gethomepage.dev/*` annotations; once these services migrated to ringtail they -were discovered automatically, making their leftover static `services.yaml` -entries (needed only while they lived on minikube) redundant. diff --git a/docs/changelog.d/+immich-probe-ringtail.infra.md b/docs/changelog.d/+immich-probe-ringtail.infra.md deleted file mode 100644 index f2d3dee..0000000 --- a/docs/changelog.d/+immich-probe-ringtail.infra.md +++ /dev/null @@ -1 +0,0 @@ -Moved the Immich blackbox health probe from indri's alloy to ringtail's alloy. After the immich migration to ringtail, the probe still targeted `immich-server.immich.svc.cluster.local` on indri's cluster where the service no longer exists, causing a persistent `ServiceProbeFailure` alert. diff --git a/docs/changelog.d/+manage-forgejo-mirrors-sync-location.doc.md b/docs/changelog.d/+manage-forgejo-mirrors-sync-location.doc.md deleted file mode 100644 index f71fc81..0000000 --- a/docs/changelog.d/+manage-forgejo-mirrors-sync-location.doc.md +++ /dev/null @@ -1 +0,0 @@ -Fix manage-forgejo-mirrors verify step — sync button is on the repo settings page ("Synchronize now"), not the main repo page. diff --git a/docs/changelog.d/+pin-quartz-v4.bugfix.md b/docs/changelog.d/+pin-quartz-v4.bugfix.md deleted file mode 100644 index e073bbb..0000000 --- a/docs/changelog.d/+pin-quartz-v4.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -Pin the Quartz docs build to v4.5.2. The Dagger `build_docs` pipeline cloned Quartz from the default branch unpinned; Quartz v5.0.0 restructured its config layout (`.quartz/plugins`, `../quartz` imports) and broke the docs build against our existing `quartz.config.ts`/`quartz.layout.ts`. diff --git a/docs/changelog.d/+prowler-rebuild-on-main.infra.md b/docs/changelog.d/+prowler-rebuild-on-main.infra.md deleted file mode 100644 index 107b687..0000000 --- a/docs/changelog.d/+prowler-rebuild-on-main.infra.md +++ /dev/null @@ -1 +0,0 @@ -Rebuild Prowler container against main HEAD (v5.23.0-495e45d) after merging the IaC mutelist Dockerfile changes. diff --git a/docs/changelog.d/+remove-devpi-container-build.misc.md b/docs/changelog.d/+remove-devpi-container-build.misc.md deleted file mode 100644 index 8ebec54..0000000 --- a/docs/changelog.d/+remove-devpi-container-build.misc.md +++ /dev/null @@ -1 +0,0 @@ -Removed the now-unused `containers/devpi/` Dagger build artifact. Devpi runs natively on indri via uv venv; the container image is no longer referenced anywhere. Doc examples in `docs/reference/tools/dagger.md` updated to use `miniflux` as the example container name. diff --git a/docs/changelog.d/+retire-todoist-for-heph.infra.md b/docs/changelog.d/+retire-todoist-for-heph.infra.md deleted file mode 100644 index f6284d0..0000000 --- a/docs/changelog.d/+retire-todoist-for-heph.infra.md +++ /dev/null @@ -1 +0,0 @@ -Retired the `blumeops-tasks` mise task (Todoist API) in favor of `heph list --project Blumeops --json` from the self-hosted [hephaestus](https://github.com/eblume/hephaestus) system. Updated docs to point task discovery and rotation reminders at heph, and noted that the `~/code/personal/zk` zettelkasten is migrating into heph docs. diff --git a/docs/changelog.d/+review-1password-doc.doc.md b/docs/changelog.d/+review-1password-doc.doc.md deleted file mode 100644 index bba9591..0000000 --- a/docs/changelog.d/+review-1password-doc.doc.md +++ /dev/null @@ -1 +0,0 @@ -Reviewed [[1password]] reference card: added the `blumeops` vs `Personal` vault split, noted that `onepassword-connect` runs on both indri and ringtail (not just one cluster), and pulled the `op read` vs `op item get --fields` guidance up from agent memory into the card. diff --git a/docs/changelog.d/+review-compliance-image-iac.feature.md b/docs/changelog.d/+review-compliance-image-iac.feature.md deleted file mode 100644 index 1125359..0000000 --- a/docs/changelog.d/+review-compliance-image-iac.feature.md +++ /dev/null @@ -1 +0,0 @@ -`review-compliance-reports` now also fetches and summarizes the weekly Prowler container-image and IaC scans (previously only the K8s CIS in-cluster scan was processed). For each scan it shows status counts, severity breakdown, week-over-week delta, and — for the high-volume image/IaC scans — top-N tables grouped by check ID and resource instead of per-finding listings. diff --git a/docs/changelog.d/+review-contributing-doc.doc.md b/docs/changelog.d/+review-contributing-doc.doc.md deleted file mode 100644 index c394a01..0000000 --- a/docs/changelog.d/+review-contributing-doc.doc.md +++ /dev/null @@ -1 +0,0 @@ -Refresh the contributing tutorial: add `last-reviewed`, include the `.ai.md` changelog fragment type, and clarify that `prek` is pinned via `mise`. diff --git a/docs/changelog.d/+review-index-doc.doc.md b/docs/changelog.d/+review-index-doc.doc.md deleted file mode 100644 index 7016a7a..0000000 --- a/docs/changelog.d/+review-index-doc.doc.md +++ /dev/null @@ -1 +0,0 @@ -Reviewed `index.md`; added ringtail to the infrastructure overview and stamped `last-reviewed`. diff --git a/docs/changelog.d/+review-navidrome-doc.doc.md b/docs/changelog.d/+review-navidrome-doc.doc.md deleted file mode 100644 index fbe5e79..0000000 --- a/docs/changelog.d/+review-navidrome-doc.doc.md +++ /dev/null @@ -1 +0,0 @@ -Review and refresh the Navidrome reference card: add `last-reviewed`, correct the scanner env var name, document the current image/version, and record routing and runtime details from the manifests. diff --git a/docs/changelog.d/+review-ollama-doc.doc.md b/docs/changelog.d/+review-ollama-doc.doc.md deleted file mode 100644 index 05ef23e..0000000 --- a/docs/changelog.d/+review-ollama-doc.doc.md +++ /dev/null @@ -1 +0,0 @@ -Review and refresh the Ollama reference card: add `last-reviewed`, bump the documented image tag to 0.20.4, and add the two `qwen3.5` models now declared in `models.txt`. diff --git a/docs/changelog.d/+ringtail-clone-via-tailnet.infra.md b/docs/changelog.d/+ringtail-clone-via-tailnet.infra.md deleted file mode 100644 index d664163..0000000 --- a/docs/changelog.d/+ringtail-clone-via-tailnet.infra.md +++ /dev/null @@ -1 +0,0 @@ -Switch the ringtail provisioning playbook's blumeops clone URL from `forge.eblu.me` (public, via Fly proxy) to `forge.ops.eblu.me` (tailnet, direct via Caddy on indri). Ringtail is always on the tailnet, so the WAN round-trip is pure overhead — it also made `provision-ringtail` brittle whenever the Fly proxy was slow or down. diff --git a/docs/changelog.d/+ringtail-coredump-size-cap.infra.md b/docs/changelog.d/+ringtail-coredump-size-cap.infra.md deleted file mode 100644 index 824b2df..0000000 --- a/docs/changelog.d/+ringtail-coredump-size-cap.infra.md +++ /dev/null @@ -1 +0,0 @@ -Cap systemd-coredump on ringtail (ProcessSizeMax/ExternalSizeMax 1G, MaxUse 2G) so multi-GB Wine/Proton game crash dumps no longer thrash the disk and lock up the desktop. diff --git a/docs/changelog.d/+ringtail-flake-update-2026-06-01.infra.md b/docs/changelog.d/+ringtail-flake-update-2026-06-01.infra.md deleted file mode 100644 index dd488b6..0000000 --- a/docs/changelog.d/+ringtail-flake-update-2026-06-01.infra.md +++ /dev/null @@ -1,4 +0,0 @@ -Update the ringtail NixOS flake lockfile (`nixos/ringtail/flake.lock`): bump -`nixpkgs` (b77b3de → 25f5383) and `disko` (5ba0c95 → 115e521) to latest. -`nixpkgs-services` was intentionally left pinned (skipped by the -`flake-update` pipeline). Routine recurring maintenance per [[manage-lockfile]]. diff --git a/docs/changelog.d/+ringtail-proton-ge.infra.md b/docs/changelog.d/+ringtail-proton-ge.infra.md deleted file mode 100644 index 0d8bc04..0000000 --- a/docs/changelog.d/+ringtail-proton-ge.infra.md +++ /dev/null @@ -1,4 +0,0 @@ -Add GE-Proton (`pkgs.proton-ge-bin`) to `programs.steam.extraCompatPackages` -on ringtail. Subnautica 2 hangs at Mercuna plugin init under Proton -Experimental + DXVK D3D12; GE-Proton is available as a Steam per-game -compatibility option to work around it. diff --git a/docs/changelog.d/+ringtail-sn2-prelaunch.infra.md b/docs/changelog.d/+ringtail-sn2-prelaunch.infra.md deleted file mode 100644 index f9c68e2..0000000 --- a/docs/changelog.d/+ringtail-sn2-prelaunch.infra.md +++ /dev/null @@ -1,6 +0,0 @@ -Add `sn2-prelaunch` Steam launch wrapper on ringtail that removes -Subnautica 2's stale `Saved/running.dat` and `Saved/beforelobby.dat` -lockfiles before each launch. SN2 pops up an invisible (0×0-sized) -Error dialog when it detects an unclean exit, blocking GameThread -forever; this is observable only as a black screen with a spinning -loader. Use via Steam launch option: `sn2-prelaunch %command%`. diff --git a/docs/changelog.d/+ringtail-sway-fuzzel.bugfix.md b/docs/changelog.d/+ringtail-sway-fuzzel.bugfix.md deleted file mode 100644 index 6801040..0000000 --- a/docs/changelog.d/+ringtail-sway-fuzzel.bugfix.md +++ /dev/null @@ -1,3 +0,0 @@ -Fixed sway keybindings on ringtail — the home-manager `keybindings` block was replacing the module's defaults entirely, leaving only explicit overrides (no workspace switching, focus, move, splits, resize mode, etc). Switched to `lib.mkOptionDefault` with `lib.mkForce` on the conflicting custom binds (`Mod+Return`, `Mod+d`, `Mod+space`, `Mod+l`) so defaults merge back in. Also added `Mod+F1` to show a filterable fuzzel list of current keybindings. - -Fixed fuzzel config errors on launch — `border-radius` and `border-width` were under `[main]`, but fuzzel expects them as `radius`/`width` under a `[border]` section. diff --git a/docs/changelog.d/+ringtail-vrr-flicker.bugfix.md b/docs/changelog.d/+ringtail-vrr-flicker.bugfix.md deleted file mode 100644 index cb23344..0000000 --- a/docs/changelog.d/+ringtail-vrr-flicker.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -Disabled adaptive sync (VRR) on ringtail's DP-1 output. The OMEN 27i IPS panel pumps brightness when its refresh rate swings into the low VRR range during low-framerate content (e.g. game cutscenes), producing a flicker that worsened over a session until a reboot. Pinning the panel to a fixed 165Hz eliminates it. diff --git a/docs/changelog.d/+rotate-fly-deploy-token-shell-examples.doc.md b/docs/changelog.d/+rotate-fly-deploy-token-shell-examples.doc.md deleted file mode 100644 index 24ffcb9..0000000 --- a/docs/changelog.d/+rotate-fly-deploy-token-shell-examples.doc.md +++ /dev/null @@ -1 +0,0 @@ -rotate-fly-deploy-token: combine mint+store into one command with both fish and bash forms; document the `op item edit` "Password item requires ps value" validator gotcha and the placeholder-password workaround. diff --git a/docs/changelog.d/+runner-logs-auth.feature.md b/docs/changelog.d/+runner-logs-auth.feature.md deleted file mode 100644 index 9ee6fa1..0000000 --- a/docs/changelog.d/+runner-logs-auth.feature.md +++ /dev/null @@ -1 +0,0 @@ -runner-logs now authenticates with Forgejo API token and auto-detects the repo from git remote. Job logs are fetched via SSH to indri (reading Forgejo's on-disk zstd log files) instead of the web endpoint, which doesn't support token auth for private repos. diff --git a/docs/changelog.d/+runner-logs-missing-log.misc.md b/docs/changelog.d/+runner-logs-missing-log.misc.md deleted file mode 100644 index c06704a..0000000 --- a/docs/changelog.d/+runner-logs-missing-log.misc.md +++ /dev/null @@ -1 +0,0 @@ -`mise run runner-logs -j ` now reports a clear error when the log file doesn't exist on indri (e.g. a runner crash that left `action_task.log_in_storage = 0`). Previously it printed only the header and exited 0, because `zstdcat` exits 0 with a "can't stat … -- ignored" stderr message and ssh+fish on indri swallows the remote exit code. diff --git a/docs/changelog.d/+shower-1.1.1-deploy.infra.md b/docs/changelog.d/+shower-1.1.1-deploy.infra.md deleted file mode 100644 index 61244ac..0000000 --- a/docs/changelog.d/+shower-1.1.1-deploy.infra.md +++ /dev/null @@ -1 +0,0 @@ -Deploy shower v1.1.1 to ringtail (kustomize newTag bump). diff --git a/docs/changelog.d/+shower-1.1.1-fod-pin.infra.md b/docs/changelog.d/+shower-1.1.1-fod-pin.infra.md deleted file mode 100644 index a19b578..0000000 --- a/docs/changelog.d/+shower-1.1.1-fod-pin.infra.md +++ /dev/null @@ -1 +0,0 @@ -Pin shower v1.1.1 FOD outputHash (probed locally on ringtail). diff --git a/docs/changelog.d/+shower-1.1.1.infra.md b/docs/changelog.d/+shower-1.1.1.infra.md deleted file mode 100644 index eb9476c..0000000 --- a/docs/changelog.d/+shower-1.1.1.infra.md +++ /dev/null @@ -1 +0,0 @@ -Bump shower container to v1.1.1 (probe FOD hash). diff --git a/docs/changelog.d/+shower-1.1.3-deploy.infra.md b/docs/changelog.d/+shower-1.1.3-deploy.infra.md deleted file mode 100644 index 833fac6..0000000 --- a/docs/changelog.d/+shower-1.1.3-deploy.infra.md +++ /dev/null @@ -1 +0,0 @@ -Deployed shower v1.1.3 to ringtail (image built and pushed from ringtail; runner bypassed due to indri overload). diff --git a/docs/changelog.d/+shower-1.1.3.infra.md b/docs/changelog.d/+shower-1.1.3.infra.md deleted file mode 100644 index 33ee49d..0000000 --- a/docs/changelog.d/+shower-1.1.3.infra.md +++ /dev/null @@ -1 +0,0 @@ -Bumped shower app to v1.1.3 (wheel/sdist + FOD hashes probed on ringtail). diff --git a/docs/changelog.d/+shower-main-sha-rebuild.infra.md b/docs/changelog.d/+shower-main-sha-rebuild.infra.md deleted file mode 100644 index f1751b5..0000000 --- a/docs/changelog.d/+shower-main-sha-rebuild.infra.md +++ /dev/null @@ -1,5 +0,0 @@ -Rebuild shower from the post-merge commit on main so the container's -SHA tag points at a commit that will still exist after the 30-day -branch-cleanup window. Functionally identical to the branch-tag image -already deployed, just preserves source traceability per -[[build-container-image#Squash-merge and container tags]]. diff --git a/docs/changelog.d/+shower-rebuild-from-main-sha.misc.md b/docs/changelog.d/+shower-rebuild-from-main-sha.misc.md deleted file mode 100644 index a9495cd..0000000 --- a/docs/changelog.d/+shower-rebuild-from-main-sha.misc.md +++ /dev/null @@ -1,6 +0,0 @@ -Rebuild shower v1.1.0 container from main HEAD (`3c7967e`) and bump the -kustomization tag to `v1.1.0-3c7967e-nix`. The PR was squash-merged, so -the branch commit `444ff91` baked into the prior tag isn't reachable -from main's history. The new tag points at a commit that exists on -main; image content is byte-identical because the FOD output is content -addressed and the inputs didn't change. diff --git a/docs/changelog.d/+shower-v1.1.2-rebuild-from-main-sha.misc.md b/docs/changelog.d/+shower-v1.1.2-rebuild-from-main-sha.misc.md deleted file mode 100644 index 9355a54..0000000 --- a/docs/changelog.d/+shower-v1.1.2-rebuild-from-main-sha.misc.md +++ /dev/null @@ -1 +0,0 @@ -Rebuild shower v1.1.2 from main HEAD (a33fa47) and retag — PR #358 was squash-merged so the branch SHA baked into the prior image tag isn't reachable from main. FOD is content-addressed, so image bytes are identical; only provenance changes. diff --git a/docs/changelog.d/+tailscale-main-sha-rebuild.infra.md b/docs/changelog.d/+tailscale-main-sha-rebuild.infra.md deleted file mode 100644 index 24bb81c..0000000 --- a/docs/changelog.d/+tailscale-main-sha-rebuild.infra.md +++ /dev/null @@ -1 +0,0 @@ -Update `tailscale-operator-ringtail` ProxyClass to reference the `0108b68` main-SHA build of the tailscale container. Routine post-merge cleanup so the deployed image traces to a commit that survives PR branch cleanup. diff --git a/docs/changelog.d/+tailscale-operator-mirror-tailnet-url.bugfix.md b/docs/changelog.d/+tailscale-operator-mirror-tailnet-url.bugfix.md new file mode 100644 index 0000000..cc29cf7 --- /dev/null +++ b/docs/changelog.d/+tailscale-operator-mirror-tailnet-url.bugfix.md @@ -0,0 +1 @@ +Fixed the `tailscale-operator` and `tailscale-operator-ringtail` ArgoCD apps showing `Unknown` sync status. Their shared base kustomization fetched the upstream operator manifest from the public `forge.eblu.me/mirrors/...`, which the AI-scraper mitigation now black-holes (403). Pointed the remote resource at the tailnet host `forge.ops.eblu.me` instead, which the in-cluster repo-server can reach. diff --git a/docs/changelog.d/+transmission-doc-review.doc.md b/docs/changelog.d/+transmission-doc-review.doc.md deleted file mode 100644 index 418504f..0000000 --- a/docs/changelog.d/+transmission-doc-review.doc.md +++ /dev/null @@ -1 +0,0 @@ -Reviewed transmission card: corrected storage layout (`/config/` is emptyDir, watch dir disabled) and noted the Prometheus exporter sidecar. diff --git a/docs/changelog.d/+unpoller-rebuild-on-main.infra.md b/docs/changelog.d/+unpoller-rebuild-on-main.infra.md deleted file mode 100644 index 60ae8fa..0000000 --- a/docs/changelog.d/+unpoller-rebuild-on-main.infra.md +++ /dev/null @@ -1 +0,0 @@ -Rebuild unpoller container from squashed main commit so the image SHA tag matches a commit in main's history (was tagged with the pre-squash branch SHA). diff --git a/docs/changelog.d/+valkey-main-tag-bump.infra.md b/docs/changelog.d/+valkey-main-tag-bump.infra.md deleted file mode 100644 index cd19f60..0000000 --- a/docs/changelog.d/+valkey-main-tag-bump.infra.md +++ /dev/null @@ -1 +0,0 @@ -Bump paperless and immich kustomizations to the main-SHA-built valkey tag (`v8.1.6-r0-fabca04`). Routine post-merge follow-up to keep production manifests pointing at images built from a commit on main. diff --git a/docs/changelog.d/+valkey-rebuild-on-main.infra.md b/docs/changelog.d/+valkey-rebuild-on-main.infra.md deleted file mode 100644 index c743e61..0000000 --- a/docs/changelog.d/+valkey-rebuild-on-main.infra.md +++ /dev/null @@ -1 +0,0 @@ -Rebuild valkey container from squashed main commit (both arm64 dagger and amd64 nix variants), and update paperless + immich-ringtail kustomizations to the main-SHA tags `v8.1.7-ecded30` and `v8.1.7-ecded30-nix`. diff --git a/docs/changelog.d/+wave1-decommission-followups.infra.md b/docs/changelog.d/+wave1-decommission-followups.infra.md deleted file mode 100644 index 7b54d52..0000000 --- a/docs/changelog.d/+wave1-decommission-followups.infra.md +++ /dev/null @@ -1,8 +0,0 @@ -Fix three follow-ups from the wave-1 decommission: grant the local -break-glass `admin` account ArgoCD admin rights (`g, admin, role:admin` — -previously only the Authentik `admins` group had access, so admin was -locked out whenever its token expired), and repoint the alloy blackbox -probe for teslamate from the deleted minikube service to -`https://tesla.ops.eblu.me/` (through Caddy over Tailscale). The orphaned -paperless/teslamate roles + ExternalSecrets left on the minikube -blumeops-pg are also cleaned up. diff --git a/docs/changelog.d/+zot-ci-rotation-op-syntax.doc.md b/docs/changelog.d/+zot-ci-rotation-op-syntax.doc.md deleted file mode 100644 index ec8834f..0000000 --- a/docs/changelog.d/+zot-ci-rotation-op-syntax.doc.md +++ /dev/null @@ -1 +0,0 @@ -Fixed the `op item edit` invocation in the [[zot]] API-key rotation procedure: the previous `pbpaste | op item edit ... "field[password]=-"` stdin syntax is rejected by op 2.34 as "invalid JSON" (recent op versions treat piped input as a full JSON template, not a single field value). Procedure now reads the clipboard into a local fish variable and passes it as an inline assignment. diff --git a/docs/changelog.d/+zot-v2.1.16.infra.md b/docs/changelog.d/+zot-v2.1.16.infra.md deleted file mode 100644 index f007164..0000000 --- a/docs/changelog.d/+zot-v2.1.16.infra.md +++ /dev/null @@ -1 +0,0 @@ -Upgraded zot on indri from v2.1.15 to v2.1.16 (security fixes: TLS verification on metrics client, CORS Allow-Credentials suppression on wildcard origins, manifest/API-key body size limits). diff --git a/docs/changelog.d/alloy-v1.16.0.infra.md b/docs/changelog.d/alloy-v1.16.0.infra.md deleted file mode 100644 index cd9a1ef..0000000 --- a/docs/changelog.d/alloy-v1.16.0.infra.md +++ /dev/null @@ -1,5 +0,0 @@ -Upgrade Grafana Alloy v1.14.0 → v1.16.0 across all four service deployments -(alloy-k8s, alloy-ringtail, alloy-tracing-ringtail on k8s; alloy native on -indri). Pulls in stable database observability (v1.15) and the OTel Collector -v0.147.0 bump. Container build also migrated from Dockerfile to native Dagger -`container.py` per the build-container-image migration playbook. diff --git a/docs/changelog.d/backup-grafana-ringtail-blumeops-pg.infra.md b/docs/changelog.d/backup-grafana-ringtail-blumeops-pg.infra.md deleted file mode 100644 index 33b041f..0000000 --- a/docs/changelog.d/backup-grafana-ringtail-blumeops-pg.infra.md +++ /dev/null @@ -1,8 +0,0 @@ -Wire the ringtail `blumeops-pg` cluster (which holds the wave-1-migrated -paperless + teslamate databases) into backups and Grafana. Adds a Tailscale -LoadBalancer Service (`blumeops-pg-ringtail.tail8d86e.ts.net`) and a Caddy L4 -route (`pg.ops.eblu.me:5434`), then repoints borgmatic's `teslamate` + -`paperless` postgres dumps and the `mealie` SQLite dump at ringtail, and the -Grafana TeslaMate datasource at the ringtail DB. Closes the backup gap that -opened at cutover (the migrated live data was still being backed up from the -now-frozen minikube copies) and unblocks the wave-1 decommission. diff --git a/docs/changelog.d/cleanup-cv-docs-minikube-artifacts.misc.md b/docs/changelog.d/cleanup-cv-docs-minikube-artifacts.misc.md deleted file mode 100644 index 79a81cf..0000000 --- a/docs/changelog.d/cleanup-cv-docs-minikube-artifacts.misc.md +++ /dev/null @@ -1 +0,0 @@ -Removed the dead minikube manifests, container builds, and tooling shims left behind after the cv + docs migration to indri-native (#342). Deletes `argocd/{apps,manifests}/{cv,docs}/`, `containers/{cv,quartz}/`, and the `quartz`→`docs` mapping in `mise-tasks/container-version-check`. Bumps `docs.current-version` to `v1.16.0` (the blumeops release tag) now that the legacy nginx-base version pin is gone. diff --git a/docs/changelog.d/dagger-0-20-6-runner-image-alpine.infra.md b/docs/changelog.d/dagger-0-20-6-runner-image-alpine.infra.md deleted file mode 100644 index 35f77c2..0000000 --- a/docs/changelog.d/dagger-0-20-6-runner-image-alpine.infra.md +++ /dev/null @@ -1 +0,0 @@ -Upgraded Dagger from v0.20.1 to v0.20.6 (engine, CLI pin, and SDK regen) and migrated `runner-job-image` from a Debian-based Dockerfile to a native Dagger `container.py` on Alpine 3.23, reusing the shared `alpine_runtime` helper. diff --git a/docs/changelog.d/decommission-wave1-minikube.infra.md b/docs/changelog.d/decommission-wave1-minikube.infra.md deleted file mode 100644 index 63b3ab5..0000000 --- a/docs/changelog.d/decommission-wave1-minikube.infra.md +++ /dev/null @@ -1,8 +0,0 @@ -Decommission the wave-1 services on minikube-indri now that paperless, -teslamate, and mealie run on ringtail with their data backed up. Removes the -minikube `paperless`/`teslamate`/`mealie` manifest dirs + ArgoCD app -definitions (pruning the parked Deployments, Services, and the redundant -minikube mealie/paperless PVCs), and drops the `paperless`/`teslamate` roles -from the minikube `blumeops-pg` cluster. The `paperless` and `teslamate` -databases are dropped from indri's blumeops-pg as the finalization step. -miniflux + authentik remain on the minikube cluster (later waves). diff --git a/docs/changelog.d/doc-review-replicating-blumeops.doc.md b/docs/changelog.d/doc-review-replicating-blumeops.doc.md deleted file mode 100644 index e9e6d0f..0000000 --- a/docs/changelog.d/doc-review-replicating-blumeops.doc.md +++ /dev/null @@ -1 +0,0 @@ -Reviewed `replicating-blumeops` tutorial: fixed "BluemeOps" typos (also in `contributing.md`) and added `last-reviewed` frontmatter. diff --git a/docs/changelog.d/external-secrets-ringtail-nix.infra.md b/docs/changelog.d/external-secrets-ringtail-nix.infra.md new file mode 100644 index 0000000..9ce3f85 --- /dev/null +++ b/docs/changelog.d/external-secrets-ringtail-nix.infra.md @@ -0,0 +1 @@ +Completed the external-secrets localization for the ringtail (amd64) cluster. The indri Dagger build (`container.py`) only produces an arm64 image; added `containers/external-secrets/default.nix` to build the amd64 variant on ringtail's nix-container-builder, and gave `external-secrets-ringtail` a thin kustomize overlay that reuses the shared manifest and points at the `-nix` image. Both clusters now run the locally-built external-secrets binary on their native architecture. diff --git a/docs/changelog.d/fix-borgmatic-shower-via-ssh.bugfix.md b/docs/changelog.d/fix-borgmatic-shower-via-ssh.bugfix.md deleted file mode 100644 index e18272c..0000000 --- a/docs/changelog.d/fix-borgmatic-shower-via-ssh.bugfix.md +++ /dev/null @@ -1,14 +0,0 @@ -Fix nightly borgmatic backups failing for 2 days. The shower SQLite -dump hook referenced `kubectl --context=k3s-ringtail`, but indri's -kubeconfig deliberately doesn't carry the ringtail credentials. The -`before_backup` hook's failure aborted the entire run, taking out -*both* the local sifaka repo and the BorgBase offsite. Replaced -the inline-shell dump with a `~/bin/borgmatic-k8s-sqlite-dump` -helper deployed by the ansible role. Each dump entry now declares a -`target` of either `local:` (mealie — kubectl uses indri's -kubeconfig) or `ssh:` (shower — ssh into ringtail and -run `k3s kubectl` there, no indri-side kubeconfig needed; k3s.yaml -on ringtail is mode 644 so no sudo required). Bytes stream back via -`kubectl exec ... -- cat` rather than `kubectl cp`, since `kubectl -cp` requires `tar` inside the pod and nix-built images like shower -don't bundle it. diff --git a/docs/changelog.d/forgejo-runner-v12-8-server-connections.infra.md b/docs/changelog.d/forgejo-runner-v12-8-server-connections.infra.md deleted file mode 100644 index cc35684..0000000 --- a/docs/changelog.d/forgejo-runner-v12-8-server-connections.infra.md +++ /dev/null @@ -1 +0,0 @@ -Upgraded the k8s Forgejo runner to the v12.8 line, switched it from first-boot registration to declarative `server.connections` credentials from 1Password, and consolidated the supporting runner how-to documentation. diff --git a/docs/changelog.d/heph-indri-hub.infra.md b/docs/changelog.d/heph-indri-hub.infra.md new file mode 100644 index 0000000..6761cb7 --- /dev/null +++ b/docs/changelog.d/heph-indri-hub.infra.md @@ -0,0 +1 @@ +Added the [[hephaestus]] (`heph`) sync hub to indri as a self-updating LaunchAgent managed by Ansible (`ansible/roles/heph`, tag `heph`). The hub runs `hephd --mode server` behind `heph.ops.eblu.me` (Caddy TLS), with self-update on a 10-minute interval and the heph-pwa mobile shell served from `--web-root`. Access is gated by a new Authentik device-code (RFC 8628) OIDC application. Indri is now the canonical hub; other devices (e.g. gilbert) attach as offline-capable spokes. The hub's store was seeded from gilbert via the data-safe Path A bring-up (copy store, reset `meta.origin`). diff --git a/docs/changelog.d/heph-offline-access.bugfix.md b/docs/changelog.d/heph-offline-access.bugfix.md new file mode 100644 index 0000000..e9721bc --- /dev/null +++ b/docs/changelog.d/heph-offline-access.bugfix.md @@ -0,0 +1 @@ +Granted the `offline_access` scope on the Authentik `heph` OAuth2 provider so hephaestus spokes receive a durable 30-day refresh token. Previously the refresh token was session-bound, so spoke sync would silently fail with a `400 Bad Request` on the `refresh_token` grant once the Authentik session lapsed. diff --git a/docs/changelog.d/heph-pwa-redirect-uris.infra.md b/docs/changelog.d/heph-pwa-redirect-uris.infra.md new file mode 100644 index 0000000..f887eed --- /dev/null +++ b/docs/changelog.d/heph-pwa-redirect-uris.infra.md @@ -0,0 +1 @@ +Registered the heph-pwa redirect URIs (`https://heph.ops.eblu.me/`, plus `http://localhost:8787/` for dev) on the Authentik `heph` OAuth2 provider, enabling the PWA's new Authorization Code + PKCE "Login with Authentik" flow (and the token-endpoint CORS it needs). Pairs with hephaestus PR #9. diff --git a/docs/changelog.d/homepage-to-ringtail.infra.md b/docs/changelog.d/homepage-to-ringtail.infra.md deleted file mode 100644 index 1e3e795..0000000 --- a/docs/changelog.d/homepage-to-ringtail.infra.md +++ /dev/null @@ -1,8 +0,0 @@ -Migrated homepage dashboard from minikube (indri/arm64) to k3s (ringtail/amd64). -The container is now built via nix (`containers/homepage/default.nix`), adapted -from nixpkgs `homepage-dashboard` with the upstream Next.js cache patches and -wrapped with `dockerTools.buildLayeredImage`. Autodiscovery shifts: services on -minikube (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, -Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries -in `services.yaml`; ringtail services (Authentik, Frigate/NVR, Ntfy, Ollama) -auto-populate via Ingress annotations. diff --git a/docs/changelog.d/local-external-secrets.infra.md b/docs/changelog.d/local-external-secrets.infra.md new file mode 100644 index 0000000..13cbb05 --- /dev/null +++ b/docs/changelog.d/local-external-secrets.infra.md @@ -0,0 +1 @@ +Localized the external-secrets controller image. It now builds from the forge mirror via a native Dagger `container.py` (single `all_providers` static Go binary, faithful to upstream's `make build`) and is served from `registry.ops.eblu.me/blumeops/external-secrets` instead of `ghcr.io`, bringing another platform component under local supply-chain control. diff --git a/docs/changelog.d/migrate-cv-docs-to-indri.infra.md b/docs/changelog.d/migrate-cv-docs-to-indri.infra.md deleted file mode 100644 index 608a6b9..0000000 --- a/docs/changelog.d/migrate-cv-docs-to-indri.infra.md +++ /dev/null @@ -1 +0,0 @@ -Migrated CV (`cv.eblu.me`) and Docs (`docs.eblu.me`) from minikube Deployments to indri-native ansible roles. Caddy now serves the extracted release tarballs directly via a new `kind: static` service-block in the Caddy template — no daemon, no container — replacing the prior nginx-in-a-pod layer. Removes a network hop on every request and shrinks minikube's footprint. See [[cv-on-indri]] and [[docs-on-indri]]. Part of the broader minikube wind-down. diff --git a/docs/changelog.d/migrate-devpi-to-indri.infra.md b/docs/changelog.d/migrate-devpi-to-indri.infra.md deleted file mode 100644 index 418db70..0000000 --- a/docs/changelog.d/migrate-devpi-to-indri.infra.md +++ /dev/null @@ -1 +0,0 @@ -Migrated devpi (PyPI mirror at `pypi.ops.eblu.me`) from a minikube StatefulSet to a launchd-managed service on indri. devpi-server now runs in a uv-managed venv with pinned `devpi-server` and `devpi-web` versions, listens on `127.0.0.1:3141`, and is fronted by Caddy. The minikube StatefulSet was crash-looping under memory pressure (and breaking the Python toolchain everywhere); the new layout removes a layer of dependency on cluster health for critical-path tooling. See [[devpi-on-indri]]. diff --git a/docs/changelog.d/migrate-immich-to-ringtail.infra.md b/docs/changelog.d/migrate-immich-to-ringtail.infra.md deleted file mode 100644 index b47742f..0000000 --- a/docs/changelog.d/migrate-immich-to-ringtail.infra.md +++ /dev/null @@ -1,13 +0,0 @@ -Move the entire Immich stack — server, machine-learning, valkey, -and the PostgreSQL+VectorChord cluster — off `minikube-indri` and -onto `k3s-ringtail`. Postgres data migrated zero-loss via CNPG -`pg_basebackup` (replica catch-up then promote); row counts on -`asset`, `user`, `album`, `smart_search`, `activity`, `asset_face` -verified equal between source and replica before cutover. The ML -pod now uses ringtail's RTX 4080 via the nvidia-device-plugin -(time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy -routing at `photos.ops.eblu.me` is unchanged (still -`photos.tail8d86e.ts.net`, the device just lives on ringtail now). -Borgmatic backups continue against the same `immich-pg` tailnet -hostname. First concrete chain in the broader indri-k8s -decommission effort. diff --git a/docs/changelog.d/migrate-wave1-ringtail.infra.md b/docs/changelog.d/migrate-wave1-ringtail.infra.md deleted file mode 100644 index c44263a..0000000 --- a/docs/changelog.d/migrate-wave1-ringtail.infra.md +++ /dev/null @@ -1,13 +0,0 @@ -Move paperless, teslamate, and mealie off `minikube-indri` onto -`k3s-ringtail`, shedding ~1.1 GiB of resident load from the -OOM-thrashing 8 GiB minikube node (the kernel OOM killer had been -killing `kube-apiserver`/`dockerd`/argocd, flapping every -minikube-hosted service at once). paperless + teslamate databases -move into a fresh CNPG `blumeops-pg` cluster on ringtail via a cold -`pg_dump`/`pg_restore` from the quiesced source — row counts verified -equal before any routing flip; source DBs dropped only after the -ringtail side serves traffic. mealie's SQLite PVC is copied as-is. -paperless media stays on sifaka NFS. Downtime-tolerant cold cutover -(no streaming replication); rollback is repoint-and-scale-up with the -source untouched. Second chain in the indri-k8s decommission after -[[migrate-immich-to-ringtail]]. diff --git a/docs/changelog.d/mirror-tailscale-container.infra.md b/docs/changelog.d/mirror-tailscale-container.infra.md deleted file mode 100644 index 54ca3ba..0000000 --- a/docs/changelog.d/mirror-tailscale-container.infra.md +++ /dev/null @@ -1 +0,0 @@ -Add local nix container build for `tailscale` (`containers/tailscale/default.nix`) so ringtail's tailscale-operator ProxyClass proxy pods pull from the forge mirror instead of `docker.io/tailscale/tailscale`. Pinned at v1.94.2 to match `service-versions.yaml`. Indri's tailscale-operator continues to use upstream during the k8s-to-ringtail migration. diff --git a/docs/changelog.d/prowler-iac-mutelist.infra.md b/docs/changelog.d/prowler-iac-mutelist.infra.md deleted file mode 100644 index 077cfa8..0000000 --- a/docs/changelog.d/prowler-iac-mutelist.infra.md +++ /dev/null @@ -1 +0,0 @@ -Address the 6 critical Prowler IaC findings against `argocd/manifests/`. Prowler's IaC provider hardcodes `self._mutelist = None` and delegates filtering to Trivy, but doesn't plumb `--ignorefile` through — so the documented "use Trivy filtering" path is actually broken. Added a shim around `trivy` in the Prowler image that injects `--ignorefile $TRIVY_IGNOREFILE` for `trivy fs` invocations when the env var points at a real file. The IaC cronjob now mounts `mutelist/trivyignore.yaml` (Trivy's per-path schema) and sets the env var, muting the `external-secrets` and `kube-state-metrics` Secret-access findings (KSV-0041, KSV-0114). Separately, `grafana-clusterrole` is tightened to remove `secrets` access entirely: the dashboard sidecar already only consumes ConfigMap-labeled dashboards, so its `RESOURCE` env var is now `configmap` instead of `both`. diff --git a/docs/changelog.d/recurring-maintenance-2026-05-27.doc.md b/docs/changelog.d/recurring-maintenance-2026-05-27.doc.md deleted file mode 100644 index af30489..0000000 --- a/docs/changelog.d/recurring-maintenance-2026-05-27.doc.md +++ /dev/null @@ -1 +0,0 @@ -Reviewed [[indri]] reference card: added `devpi`, `cv`, and `docs` to the native-services list; widened the k8s note to reflect the growing set of apps now on ringtail and the planned indri-minikube decommission; added CPU/RAM specs. diff --git a/docs/changelog.d/recurring-maintenance-2026-05-27.infra.md b/docs/changelog.d/recurring-maintenance-2026-05-27.infra.md deleted file mode 100644 index f2d48ad..0000000 --- a/docs/changelog.d/recurring-maintenance-2026-05-27.infra.md +++ /dev/null @@ -1,4 +0,0 @@ -Recurring maintenance batch: - -- Ringtail flake inputs refreshed (`disko`, `home-manager`, `nixpkgs`). -- Tooling deps bumped: prek hooks (trufflehog v3.95.3, kingfisher v1.101.0, ruff v0.15.14, `ansible-core` 2.21.0); fly proxy base images (nginx 1.30.1-alpine, alloy v1.16.1); `typer==0.26.2` in mise tasks. diff --git a/docs/changelog.d/review-ringtail-flake-2026-05-11.infra.md b/docs/changelog.d/review-ringtail-flake-2026-05-11.infra.md deleted file mode 100644 index f39f9f4..0000000 --- a/docs/changelog.d/review-ringtail-flake-2026-05-11.infra.md +++ /dev/null @@ -1 +0,0 @@ -Updated `nixos/ringtail/flake.lock` (weekly cadence): `disko`, `home-manager`, and `nixpkgs` inputs refreshed. `nixpkgs-services` skipped per overlay convention. diff --git a/docs/changelog.d/reviews-jun4.doc.md b/docs/changelog.d/reviews-jun4.doc.md new file mode 100644 index 0000000..f1aeaa8 --- /dev/null +++ b/docs/changelog.d/reviews-jun4.doc.md @@ -0,0 +1 @@ +Reviewed four never-reviewed reference cards (`cluster`, `ntfy`, `tempo`, `alloy`) and corrected drift: minikube is now Kubernetes v1.35.0; ntfy, tempo, and alloy-k8s images are now locally-built `registry.ops.eblu.me/blumeops/*` nix containers (v2.19.2, v2.10.3, v1.16.0) rather than upstream Docker Hub; the Fly.io alloy binary is v1.16.1; and the ringtail workload list reflects the in-progress minikube→k3s migration. diff --git a/docs/changelog.d/reviews-jun4.infra.md b/docs/changelog.d/reviews-jun4.infra.md new file mode 100644 index 0000000..c128e70 --- /dev/null +++ b/docs/changelog.d/reviews-jun4.infra.md @@ -0,0 +1 @@ +Upgraded the nvidia-device-plugin on ringtail from v0.19.0 to v0.19.2 (upstream patch release: CDI/Tegra fixes and dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup). diff --git a/docs/changelog.d/ringtail-static-ip.infra.md b/docs/changelog.d/ringtail-static-ip.infra.md deleted file mode 100644 index 8474b0a..0000000 --- a/docs/changelog.d/ringtail-static-ip.infra.md +++ /dev/null @@ -1 +0,0 @@ -Pin ringtail's wired IP to `192.168.1.21` via NixOS scripted networking; NetworkManager no longer manages `enp5s0`. Removes DHCP lease renewal as a failure mode after a silent lease teardown took ringtail offline. Also explicitly enables `net.ipv4.ip_forward` (previously set implicitly by scripted-DHCP) so k3s pod networking and Tailscale routing continue to work with static networking. diff --git a/docs/changelog.d/rip-out-compensating-controls.infra.md b/docs/changelog.d/rip-out-compensating-controls.infra.md deleted file mode 100644 index d41fd1a..0000000 --- a/docs/changelog.d/rip-out-compensating-controls.infra.md +++ /dev/null @@ -1 +0,0 @@ -Ripped out the compensating-controls (CC) framework: deleted `compensating-controls.yaml`, the `review-compensating-controls` mise task, and the associated how-to / explanation docs. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files remain in place but no longer carry `CC: ` prefixes — each entry just keeps a free-form `Description` of why the finding is muted. The CC review cadence proved to be more overhead than this single-operator homelab needed. diff --git a/docs/changelog.d/service-review-mealie-2026-05-11.infra.md b/docs/changelog.d/service-review-mealie-2026-05-11.infra.md deleted file mode 100644 index 074cd21..0000000 --- a/docs/changelog.d/service-review-mealie-2026-05-11.infra.md +++ /dev/null @@ -1 +0,0 @@ -Reviewed `mealie` service version freshness; upstream is 5 minor versions ahead (v3.17.0 vs deployed v3.12.0). Marked reviewed; upgrade deferred. diff --git a/docs/changelog.d/shower-app-deploy.bugfix.md b/docs/changelog.d/shower-app-deploy.bugfix.md deleted file mode 100644 index 91d2b3b..0000000 --- a/docs/changelog.d/shower-app-deploy.bugfix.md +++ /dev/null @@ -1,13 +0,0 @@ -Shower app container now bakes the wheel + Python deps into the image -at build time via `buildPythonPackage` instead of pip-installing on -first boot. Boots are deterministic and don't depend on forge PyPI -being reachable from the pod. The `wheelHash` in -`containers/shower/default.nix` is the sha256 sourced from the -[forge PyPI simple index](https://forge.eblu.me/api/packages/eblume/pypi/simple/adelaide-baby-shower-app/); -bumping the version means bumping that hash too. - -Borgmatic now covers the shower app: SQLite is dumped from the live -pod via `kubectl exec` (mirroring the existing mealie entry, with -`context: k3s-ringtail`), and the prize-photo media share is picked up -through `/Volumes/shower` (sifaka SMB mount on indri, same pattern as -`/Volumes/photos`). diff --git a/docs/changelog.d/shower-app-deploy.feature.md b/docs/changelog.d/shower-app-deploy.feature.md deleted file mode 100644 index 96218be..0000000 --- a/docs/changelog.d/shower-app-deploy.feature.md +++ /dev/null @@ -1,4 +0,0 @@ -Deploy the Adelaide / Heidi / Addie baby shower app — guest splash, raffle -picker, and prize assignment console — on ringtail k3s with `shower.eblu.me` -as the public entry and `shower.ops.eblu.me` as the tailnet admin host. App -source: [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app). diff --git a/docs/changelog.d/shower-app-deploy.infra.md b/docs/changelog.d/shower-app-deploy.infra.md deleted file mode 100644 index 157a068..0000000 --- a/docs/changelog.d/shower-app-deploy.infra.md +++ /dev/null @@ -1,9 +0,0 @@ -Wire shower app for public exposure: fly nginx `shower.eblu.me` server -block as a guest-only surface — splash page, `/prizes//`, static -assets, media. Everything authenticated (`/admin/`, `/host/`, -`/accounts/`) returns 403 with a "tailnet only" pointer. Staff hit -`shower.ops.eblu.me` for the operator console + admin; the app's -v1.0.1 `DJANGO_PUBLIC_URL_BASE` setting makes QR codes generated on -the tailnet point back at the WAN host for guests. Plus a Caddy route -on indri, Pulumi Gandi CNAME, and a Grafana APM dashboard tracking -request rate, error rate, latency, bandwidth, and access logs. diff --git a/docs/changelog.d/shower-v1.1.0.feature.md b/docs/changelog.d/shower-v1.1.0.feature.md deleted file mode 100644 index d2c3400..0000000 --- a/docs/changelog.d/shower-v1.1.0.feature.md +++ /dev/null @@ -1,15 +0,0 @@ -Deploy adelaide-baby-shower-app v1.1.0 to ringtail k3s. Replaces the -boolean lock with a four-phase `ShowerState` (`pre_event` → `party` → -`prizes_locked` → `event_locked`), adds an append-only "guest memories" -panel where guests can leave photos and comments for the baby, and -polishes the admin and QR views. Three Django migrations -(`0009_shower_phase`, `0010_guest_memories`, `0011_book_description`) -run automatically in the entrypoint against the SQLite PV. No config -or env-var changes. - -Container build also gains a Forgejo-PyPI workaround: Forgejo's simple -index returns absolute file URLs hardcoded to the public ROOT_URL -(`forge.eblu.me`), which the Fly edge 403s on `/api/packages/*`. The -wheel and sdist are now both pulled via direct `fetchurl` against -`forge.ops.eblu.me` (tailnet-only) and the wheel is handed to pip as -a local path. diff --git a/docs/changelog.d/shower-v1.1.2.infra.md b/docs/changelog.d/shower-v1.1.2.infra.md deleted file mode 100644 index aa2db0d..0000000 --- a/docs/changelog.d/shower-v1.1.2.infra.md +++ /dev/null @@ -1 +0,0 @@ -Deploy shower v1.1.2 — bump container build to new app release. diff --git a/docs/changelog.d/unpoller-v3.infra.md b/docs/changelog.d/unpoller-v3.infra.md deleted file mode 100644 index fa6eaf9..0000000 --- a/docs/changelog.d/unpoller-v3.infra.md +++ /dev/null @@ -1 +0,0 @@ -Upgrade unpoller v2.34.0 → v3.2.0 and migrate container build from Dockerfile to native Dagger (container.py). v3.0.0 carries breaking UniFi API changes; v3.2.0 introduces a 60s background poll (cached scrapes) by default — set `interval = 0` in `up.conf` to restore on-demand polling. diff --git a/docs/changelog.d/update-tooling-deps-2026-04.doc.md b/docs/changelog.d/update-tooling-deps-2026-04.doc.md deleted file mode 100644 index 141e975..0000000 --- a/docs/changelog.d/update-tooling-deps-2026-04.doc.md +++ /dev/null @@ -1 +0,0 @@ -New how-to: rotate-fly-deploy-token. Documents the 75-day rotation cadence, why we use `org`-scoped tokens (silences the cosmetic metrics-token warning on `fly status` with marginal blast-radius cost given the single-app personal org), and the procedure for rotation + Forgejo Actions secret sync. diff --git a/docs/changelog.d/update-tooling-deps-2026-04.infra.md b/docs/changelog.d/update-tooling-deps-2026-04.infra.md deleted file mode 100644 index 4731eca..0000000 --- a/docs/changelog.d/update-tooling-deps-2026-04.infra.md +++ /dev/null @@ -1 +0,0 @@ -Monthly tooling dependency refresh: prek hooks (trufflehog, kingfisher, ruff, shfmt, prettier, actionlint, ansible-lint), fly proxy base images (nginx 1.30.0, tailscale v1.94.2, alloy v1.16.0), normalize pyyaml lower bound in mise-tasks. diff --git a/docs/changelog.d/valkey-mirror.infra.md b/docs/changelog.d/valkey-mirror.infra.md deleted file mode 100644 index 06f8d98..0000000 --- a/docs/changelog.d/valkey-mirror.infra.md +++ /dev/null @@ -1 +0,0 @@ -Mirror Valkey 8.1 locally as `registry.ops.eblu.me/blumeops/valkey`. Replaces direct pulls of `docker.io/valkey/valkey:8.1-alpine` for paperless and immich sidecars. Built via native Dagger pipeline on Alpine 3.22. Stateless swap — no data migration. Authentik's nix-built Redis remains separate. diff --git a/docs/changelog.d/valkey-nix.infra.md b/docs/changelog.d/valkey-nix.infra.md deleted file mode 100644 index e41eb63..0000000 --- a/docs/changelog.d/valkey-nix.infra.md +++ /dev/null @@ -1 +0,0 @@ -Add nix-built amd64 valkey for ringtail (`containers/valkey/default.nix`) so immich-ringtail can stop pulling the upstream multi-arch `docker.io/valkey/valkey` image. Existing `container.py` continues to build Alpine arm64 for paperless on indri. Both bump to valkey 8.1.7 (Alpine 3.22 8.1.7-r0 / nixpkgs 8.1.7). diff --git a/docs/reference/infrastructure/indri.md b/docs/reference/infrastructure/indri.md index 67652ca..8364ba0 100644 --- a/docs/reference/infrastructure/indri.md +++ b/docs/reference/infrastructure/indri.md @@ -33,6 +33,7 @@ Primary BlumeOps server. Mac Mini M1 (2020). - [[alloy|Alloy]] - Metrics/logs collector - [[caddy]] - Reverse proxy for `*.ops.eblu.me` - [[devpi]] - PyPI mirror (LaunchAgent) +- [[hephaestus]] - heph task/context sync hub (LaunchAgent, self-updating) - [[cv]] - Static CV site, served by Caddy - [[docs]] - Quartz-built docs site, served by Caddy diff --git a/docs/reference/kubernetes/cluster.md b/docs/reference/kubernetes/cluster.md index 9b632bd..07c14af 100644 --- a/docs/reference/kubernetes/cluster.md +++ b/docs/reference/kubernetes/cluster.md @@ -1,6 +1,7 @@ --- title: Cluster -modified: 2026-02-19 +modified: 2026-06-04 +last-reviewed: 2026-06-04 tags: - kubernetes --- @@ -15,7 +16,7 @@ BlumeOps runs two Kubernetes clusters: a Minikube cluster on [[indri]] (most ser |----------|-------| | **Driver** | docker | | **Container Runtime** | docker | -| **Kubernetes Version** | v1.34.0 | +| **Kubernetes Version** | v1.35.0 | | **CPUs** | 6 | | **Memory** | 11GB | | **Disk** | 200GB | @@ -41,7 +42,9 @@ Single-node k3s cluster for workloads requiring amd64 or GPU access. See [[ringt |----------|-------| | **Context** | `k3s-ringtail` | | **API Server** | `https://ringtail.tail8d86e.ts.net:6443` | -| **Workloads** | Frigate (GPU), ntfy, frigate-notify, nvidia-device-plugin | +| **Workloads** | GPU workloads (Frigate, Ollama), notifications (ntfy, frigate-notify), [[authentik]], and services migrated off indri minikube (Immich, Mealie, Paperless, TeslaMate). See [[ringtail]] for the authoritative list. | + +Services are being progressively migrated from indri's minikube to ringtail's k3s; the split above reflects an in-progress state, not a fixed boundary. ## Related diff --git a/docs/reference/services/alloy.md b/docs/reference/services/alloy.md index d781f2f..97d1e77 100644 --- a/docs/reference/services/alloy.md +++ b/docs/reference/services/alloy.md @@ -1,6 +1,7 @@ --- title: Alloy -modified: 2026-03-13 +modified: 2026-06-04 +last-reviewed: 2026-06-04 tags: - service - observability @@ -20,10 +21,10 @@ Unified observability collector for metrics and logs with three deployments: | **Indri Binary** | `~/.local/bin/alloy` | | **Indri Config** | `~/.config/grafana-alloy/config.alloy` | | **K8s Namespace** | `alloy` | -| **K8s Image** | `grafana/alloy:v1.14.0` | +| **K8s Image** | `registry.ops.eblu.me/blumeops/alloy:v1.16.0-9564435` (locally built) | | **ArgoCD App** | `alloy-k8s` | | **Fly.io Config** | `fly/alloy.river` | -| **Fly.io Image** | `grafana/alloy:v1.5.1` (binary copied into nginx container) | +| **Fly.io Image** | `grafana/alloy:v1.16.1` (binary copied into nginx container, sha-pinned) | ## Metrics Collected diff --git a/docs/reference/services/hephaestus.md b/docs/reference/services/hephaestus.md new file mode 100644 index 0000000..7abc35b --- /dev/null +++ b/docs/reference/services/hephaestus.md @@ -0,0 +1,141 @@ +--- +title: Hephaestus +modified: 2026-06-04 +last-reviewed: 2026-06-04 +tags: + - service + - hephaestus +--- + +# Hephaestus + +[hephaestus](https://github.com/eblume/hephaestus) (`heph`) is the user's +self-hosted task + context/knowledge system. It is **hub-and-spoke**: each device +runs a full local SQLite replica (`hephd --mode local`) and background-syncs +against one canonical **hub**. Indri runs that hub. + +## Quick Reference + +| Property | Value | +|----------|-------| +| **PWA URL** | https://heph.ops.eblu.me (browser PWA, Caddy TLS) | +| **Spoke sync URL** | http://indri.tail8d86e.ts.net:8787 (direct, tailnet) | +| **Local Port** | 8787 (`hephd --mode server`, bound `0.0.0.0`) | +| **Binary** | `~/.cargo/bin/hephd` (self-updating) | +| **Data** | `~/.local/share/heph/heph.db` | +| **PWA shell** | `~/.local/share/heph/web` | +| **Logs** | `~/Library/Logs/mcquack.heph.{out,err}.log` | +| **LaunchAgent** | `mcquack.eblume.heph` | +| **Ansible role** | `ansible/roles/heph` (tag `heph`) | + +## What runs on indri + +The launchagent runs the hub in server mode with three features enabled: + +``` +hephd --mode server --http-addr 0.0.0.0:8787 --db ~/.local/share/heph/heph.db + --web-root ~/.local/share/heph/web + --oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ + --oidc-audience heph + --self-update --self-update-interval-secs 600 +``` + +- **Server mode** exposes the HTTP sync endpoint (`/rpc`, `/sync/*`) that spokes + reconcile their op-log against. +- **Self-update** (10-minute poll) rebuilds `hephd` from the forge when a newer + release tag appears (`cargo install --git https://forge.eblu.me/eblume/hephaestus.git`). + Indri's Rust toolchain (`~/.cargo/bin`) is on the agent's `PATH` for this, and + the plist pins `RUSTUP_TOOLCHAIN=stable` — the + launchagent runs without mise, so a bare `cargo` shim would otherwise fall back + to rustup's *default* toolchain, which can lag behind heph's `rust-version` floor + (1.89) and silently fail the build. +- **PWA** (`--web-root`) serves the [heph-pwa] mobile shell; Caddy terminates TLS + at `heph.ops.eblu.me` so the PWA runs in a secure context (service worker, + install-to-home-screen, voice capture). + +[heph-pwa]: https://github.com/eblume/hephaestus + +The hub binds `0.0.0.0` so tailnet spokes can also sync directly +(`http://indri.tail8d86e.ts.net:8787`); access is gated by Authentik OIDC either +way — tailnet reachability alone is not enough. + +## Authentication (Authentik OIDC, device-code) + +The hub verifies an OIDC bearer token on every sync. The `heph` application is a +**public** OAuth2 client using the **device-code flow** (RFC 8628), provisioned +in the [[authentik]] blueprint (`argocd/manifests/authentik/configmap-blueprint.yaml`): + +- Issuer: `https://authentik.ops.eblu.me/application/o/heph/` +- Audience / client id: `heph` +- Restricted to the `admins` group (single-owner, sensitive data). +- Scope mappings: `openid`, `email`, `profile`, **`offline_access`**. + +> **`offline_access` is required for durable sync.** The `heph` CLI requests +> `scope = "openid offline_access"`, and a refresh token is only issued for the +> 30-day refresh-token window when the provider actually grants `offline_access`. +> Without that scope mapping the refresh token is bound to the login **session**; +> once the session lapses, hephd's `refresh_token` grant returns `400 Bad +> Request`, the bearer can't be refreshed, and spoke sync silently degrades +> (`heph sync --status` → `auth_failure: true`). `heph auth login` papers over it +> until the next session expiry. Keep `offline_access` in the provider's +> `property_mappings`. + +Because no Authentik instance ships a device-code flow by default, the blueprint +also creates `default-device-code-flow` and binds it to the default brand's +`flow_device_code`. Devices obtain a token with `heph auth login`; the PWA +currently takes a pasted token (in-app device-code login is upstream follow-up). + +## Data seeding (Path A, one-time) + +The hub was seeded from the existing `gilbert` device so no task history was +lost. heph's data-safe bring-up ("Path A") has the hub **adopt the device's +identity** rather than rewriting the device: + +1. Quiesce the seed device: `heph daemon stop` (on gilbert). +2. Copy its store to indri: `scp ~/.local/share/heph/heph.db indri:~/.local/share/heph/heph.db`. +3. Give the hub its **own device origin** (keeps gilbert's `owner_id` + data; + `hephd` regenerates a fresh `origin` on next start when it is missing): + ```fish + ssh indri "sqlite3 ~/.local/share/heph/heph.db \"DELETE FROM meta WHERE key='origin';\"" + ``` +4. `mise run provision-indri -- --tags heph` (installs hephd, stages the PWA, + loads the launchagent → hub starts on the seeded store). + +Only `meta.origin` changes; `owner_id`, nodes, op-log, and links are copied +untouched. A clean `hephd --owner-id` / seed command is tracked upstream as +hephaestus follow-up — until then this manual reset is the documented path. + +## Connecting a spoke (e.g. gilbert) + +A device joins by running its local daemon with the hub URL + OIDC client and +logging in once: + +```bash +hephd --mode local --hub-url http://indri.tail8d86e.ts.net:8787 \ + --oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ \ + --oidc-client-id heph +heph auth login --hub-url http://indri.tail8d86e.ts.net:8787 \ + --issuer https://authentik.ops.eblu.me/application/o/heph/ --client-id heph +``` + +> **Use the direct `http://…:8787` tailnet URL for sync, not the Caddy HTTPS +> URL.** hephd's sync client is plain-HTTP-only; pointing `--hub-url` at +> `https://heph.ops.eblu.me` fails with a confusing `error sending request` +> (the HTTP connector rejects the `https` scheme before connecting). Tailscale +> encrypts the transport, and the OIDC bearer token still gates every request. +> `heph.ops.eblu.me` (Caddy TLS) exists only for the browser PWA, which needs a +> secure context. The cached token is keyed by the exact `--hub-url`, so use the +> same value for `hephd` and `heph auth login`. + +> **Caveat:** `heph daemon` cannot yet bake hub/spoke flags into the generated +> launchd plist (upstream gap). On a spoke whose plist is managed by `heph +> daemon`, the hub/OIDC flags must be hand-added — and a later `heph daemon +> start/restart` will regenerate the plist and drop them. Avoid `heph daemon` +> subcommands on a configured spoke until that gap is closed; reload via +> `launchctl` instead. + +## Related + +- [[indri]] — host +- [[authentik]] — OIDC provider +- [[caddy]] — TLS termination for `heph.ops.eblu.me` diff --git a/docs/reference/services/ntfy.md b/docs/reference/services/ntfy.md index b549a6d..1bf45af 100644 --- a/docs/reference/services/ntfy.md +++ b/docs/reference/services/ntfy.md @@ -1,6 +1,7 @@ --- title: Ntfy -modified: 2026-02-17 +modified: 2026-06-04 +last-reviewed: 2026-06-04 tags: - service - notifications @@ -17,7 +18,7 @@ Self-hosted push notification service. Ntfy receives HTTP POST messages and deli | **URL** | https://ntfy.ops.eblu.me | | **Tailscale URL** | https://ntfy.tail8d86e.ts.net | | **Namespace** | `ntfy` | -| **Image** | `binwiederhier/ntfy:v2.17.0` | +| **Image** | `registry.ops.eblu.me/blumeops/ntfy:v2.19.2-fd0bebb-nix` (locally built) | | **Upstream** | https://github.com/binwiederhier/ntfy | | **Manifests** | `argocd/manifests/ntfy/` | diff --git a/docs/reference/services/tempo.md b/docs/reference/services/tempo.md index 771b97f..5eb5d87 100644 --- a/docs/reference/services/tempo.md +++ b/docs/reference/services/tempo.md @@ -1,6 +1,7 @@ --- title: Tempo -modified: 2026-03-05 +modified: 2026-06-04 +last-reviewed: 2026-06-04 tags: - service - observability @@ -18,7 +19,7 @@ Distributed tracing backend for BlumeOps infrastructure. Receives traces via OTL | **Tailscale URL** | https://tempo.tail8d86e.ts.net | | **OTLP Endpoint** | https://tempo-otlp.tail8d86e.ts.net | | **Namespace** | `monitoring` | -| **Image** | `grafana/tempo:2.10.1` | +| **Image** | `registry.ops.eblu.me/blumeops/tempo:v2.10.3-75f9ba4` (locally built) | | **Storage** | 10Gi PVC (local filesystem) | | **Retention** | 7 days | diff --git a/service-versions.yaml b/service-versions.yaml index 699f89c..866c687 100644 --- a/service-versions.yaml +++ b/service-versions.yaml @@ -56,8 +56,8 @@ services: - name: nvidia-device-plugin type: argocd - last-reviewed: 2026-03-27 - current-version: "v0.19.0" + last-reviewed: 2026-06-04 + current-version: "v0.19.2" upstream-source: https://github.com/NVIDIA/k8s-device-plugin/releases notes: DaemonSet + RuntimeClass on ringtail for GPU workloads @@ -159,10 +159,13 @@ services: - name: external-secrets type: argocd - last-reviewed: 2026-03-25 + last-reviewed: 2026-06-04 current-version: "v2.2.0" upstream-source: https://github.com/external-secrets/external-secrets/releases - notes: Static kustomize manifests rendered from upstream Helm chart + notes: >- + Static kustomize manifests rendered from upstream Helm chart. Controller + image is locally built from the forge mirror via containers/external-secrets/container.py + (single all_providers static Go binary). - name: 1password-connect type: argocd @@ -411,6 +414,23 @@ services: upstream-source: https://github.com/caddyserver/caddy/releases notes: Built from source with Gandi DNS and Layer 4 plugins + - name: heph + type: ansible + last-reviewed: 2026-06-05 + current-version: "v1.2.1" + upstream-source: https://forge.eblu.me/eblume/hephaestus/releases + notes: >- + hephaestus task/context sync hub on indri (server-mode launchagent, + ansible/roles/heph; cargo-built from the forge). SELF-UPDATING: hephd + polls the forge for newer releases every 10 min and rebuilds + restarts + itself, so the running version drifts AHEAD of the ansible heph_version + pin. current-version here is the last observed/deployed tag, not a hard + pin — verify the live version via `curl https://heph.ops.eblu.me/config` + is served (hub up) and the hub log's `current=` line. Reconciling this + self-update vs IaC-pin drift is tracked in the heph "Hephaestus" project: + "Reconcile hephd self-update with ansible-pinned version (drift on indri + hub)" (node 01KTBXWT6XTHNDH92CVJY88E5K). + - name: borgmatic type: ansible last-reviewed: 2026-04-15