Compare commits

..

13 commits

Author SHA1 Message Date
bc34b601be Merge pull request 'heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)' (#371) from heph-offline-access into main 2026-06-06 18:29:47 -07:00
50a36ff93a heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)
The heph CLI requests scope "openid offline_access", but the Authentik
heph OAuth2 provider only mapped openid/email/profile. Without the
offline_access mapping the issued refresh token is bound to the login
session rather than the 30-day refresh-token window; once the session
lapses, hephd's refresh_token grant returns 400 Bad Request and spoke
sync silently degrades (heph sync --status -> auth_failure: true).

Add the built-in offline_access scope mapping to the provider's
property_mappings and document the requirement in the service reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:07:13 -07:00
cf63fcb5b5 C0: track heph in service-versions (self-updating; note drift task)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 08:22:46 -07:00
3abe80523a C0: bump indri heph hub to v1.2.1 (PWA Authentik login + /config)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:40:51 -07:00
6576880b0e heph Authentik: register heph-pwa redirect URIs (PKCE login) (#370)
Adds the heph-pwa redirect URIs to the Authentik `heph` OAuth2 provider so the new browser **Login with Authentik** flow (Authorization Code + PKCE, hephaestus PR #9) can redirect back and exchange the code:

- `https://heph.ops.eblu.me/` (the PWA origin)
- `http://localhost:8787/` (local dev: `hephd --web-root`)

Authentik also keys token-endpoint CORS off these origins, so they're required for the browser token exchange. Additive (the provider was `redirect_uris: []`); harmless until the PWA feature deploys.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #370
2026-06-05 07:30:31 -07:00
a2f1e06224 Add hephaestus sync hub to indri (launchagent, PWA, device-code OIDC) (#369)
Makes indri the canonical **heph** hub for the hub-and-spoke task/context system, deployed as a self-updating LaunchAgent managed by Ansible. Other devices (gilbert) attach as offline-capable spokes.

## What's here
- **`ansible/roles/heph`** (tag `heph`) — bootstrap `cargo install hephd` (only if absent; `--self-update` keeps it current after), version-pinned `heph-pwa` checkout served via `--web-root`, launchagent `mcquack.eblume.heph`:
  ```
  hephd --mode server --http-addr 0.0.0.0:8787 --db … --web-root …
        --oidc-issuer …/o/heph/ --oidc-audience heph
        --self-update --self-update-interval-secs 600
  ```
  `~/.cargo/bin` is on the agent `PATH` so self-update's `cargo install` works.
- **Caddy** — `heph.ops.eblu.me → localhost:8787` (TLS for the PWA secure context).
- **Authentik** — new `heph` **public device-code** OIDC app + `default-device-code-flow` bound to the default brand's `flow_device_code` (verified live: brand `authentik-default`, field currently unset → additive).
- **Docs** — `services/hephaestus.md` (Path-A seeding runbook + spoke caveat), `indri.md`, changelog fragment.

## Three features requested
- **Autoupdate** — 10-min interval (`--self-update-interval-secs 600`).
- **PWA** — `--web-root` (confirmed shipped in v1.2.0).
- **Spoke** — gilbert reconfig documented (post-merge step).

## Deploy plan (not done yet — awaiting review)
1. Seed from gilbert (Path A): `heph daemon stop` → copy `heph.db` → `DELETE FROM meta WHERE key='origin'`.
2. Sync Authentik `apps`/blueprint; verify blueprint status via API (not just logs).
3. `provision-indri --tags heph,caddy` from this branch.
4. Point gilbert at the hub + `heph auth login`.

## Known follow-ups (heph-side, tracked in the Hephaestus project)
- `heph daemon` can't bake hub/spoke config or pass `--self-update-interval-secs` → worked around by the ansible plist.
- Path-A seeding lacks a clean `hephd --owner-id`/seed command → manual `meta.origin` reset for now.
- Self-update moves hephd ahead of the ansible-pinned PWA shell over time (drift; tolerated by the SW cache, revisit on next release).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #369
2026-06-05 06:46:58 -07:00
f6c926f1f5 C0: rebuild external-secrets off main, repoint both clusters to stable tags
indri -> v2.2.0-13895bb (arm64), ringtail -> v2.2.0-13895bb-nix (amd64).
Both deployed images now trace to main commit 13895bb instead of earlier
branch builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:19:20 -07:00
13895bb04a Localize external-secrets on ringtail (amd64 nix build) (#368)
Follow-up to #367. That PR localized external-secrets but the Dagger build (on indri's Apple Silicon runner) only produces an **arm64** image — and external-secrets also runs on **ringtail (amd64)** via the same shared manifest. This completes the localization so both clusters run the local binary on their native arch.

## Approach (matches the kube-state-metrics dual-build pattern)
- **`containers/external-secrets/default.nix`** (new) — builds the **amd64** image on ringtail's nix-container-builder. `buildGoModule` with Go 1.26 (v2.2.0 requires ≥1.26.1; nixpkgs default is 1.25.x) and `-tags all_providers`, faithful to upstream. Same v2.2.0 source from the forge mirror.
- **`argocd/manifests/external-secrets-ringtail/`** (new) — thin kustomize overlay that reuses the shared indri manifest as a base and overrides **only** the image to the `-nix` (amd64) tag. No manifest duplication.
- **`argocd/apps/external-secrets-ringtail.yaml`** — repointed at the new overlay.

Result: indri → `v2.2.0-…` (arm64, Dagger), ringtail → `v2.2.0-…-nix` (amd64, nix).

## Build
Run #581 built both arches at the branch commit. Verified the nix image is `linux/amd64`, entrypoint = the binary, user 65534.

## Deployed from branch & verified on ringtail (k3s, amd64)
- All 3 pods rolled to the nix amd64 image, `1/1 Running` (no exec-format error → arch correct)
- Controller logs clean
- **Live secret fetch proven:** force-synced `homepage/homepage-grafana` → `refreshTime` advanced, `Ready=True`
- **All 20** ringtail ExternalSecrets remain `SecretSynced=True`

## Post-merge
The `external-secrets-ringtail` app is temporarily pointed at this branch + overlay path (apps app left on `main`, manual-sync, untouched). After merge:
```
argocd app sync apps                       # picks up the new Application path on main
argocd app set external-secrets-ringtail --revision main && argocd app sync external-secrets-ringtail
```
I'll also rebuild off `main` so both clusters land on stable main-sha tags (as done for indri in #367).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #368
2026-06-04 15:37:42 -07:00
30c82079b9 C0: rebuild external-secrets image off main (v2.2.0-0e70a1b)
Repoint to the main-branch-built image so the deployed tag traces to a main
commit rather than the merged feature branch. Same v2.2.0 source, stable
provenance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:59:17 -07:00
0e70a1b524 Localize external-secrets container (native container.py build) (#367)
Knocks out the weekly "pick one non-local container and make it local" task by moving **external-secrets** off `ghcr.io` onto a locally-built image, under our own supply-chain control. Doubles as its overdue service review.

## What changed
- **`containers/external-secrets/container.py`** (new) — native Dagger build (the Dockerfile→container.py migration pattern). Clones the forge mirror at `v2.2.0` and builds the single `all_providers` static Go binary, faithful to upstream's `make build` (CGO off, no version ldflags upstream). ENTRYPOINT is `/bin/external-secrets` so the controller/webhook/cert-controller Deployments select their role via `args:` exactly as before.
- **`argocd/manifests/external-secrets/kustomization.yaml`** — image swapped to `registry.ops.eblu.me/blumeops/external-secrets:v2.2.0-2985007`. **Like-for-like (v2.2.0)**, not an upgrade.
- **`service-versions.yaml`** — marked reviewed (2026-06-04), noted the local build.

## Build
Built on the indri forge runner (run #579, ~4 min) → pushed to Zot. Image config verified: `Entrypoint=/bin/external-secrets`, `User=65534`, version label `v2.2.0`.

## Deployed from branch & verified
- All 3 pods (controller / webhook / cert-controller) rolled to the local image, `1/1 Running`
- Controller + webhook logs clean (no errors; webhook serving TLS)
- **End-to-end secret fetch proven:** force-synced `monitoring/grafana-admin` → `refreshTime` advanced to now, `Ready=True`
- All 10 ExternalSecrets cluster-wide remain `SecretSynced=True` — no collateral damage
- App `Healthy`

## Post-merge
`external-secrets` currently points at this branch (so `apps` reads OutOfSync — expected). After merge:
```
argocd app set external-secrets --revision main && argocd app sync external-secrets
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #367
2026-06-04 14:55:55 -07:00
bb55fa9566 Recurring review sweep: 4 doc cards + nvidia-device-plugin v0.19.2 (#366)
Knocks out the two daily recurring review tasks (doc review + service review) in one PR.

## Doc review (4 never-reviewed reference cards, `last-reviewed: 2026-06-04`)
- **cluster.md** — Kubernetes version v1.34.0 → **v1.35.0**; refreshed the stale ringtail workload list and noted the in-progress minikube→k3s migration (points to `[[ringtail]]` as the canonical list).
- **ntfy.md / tempo.md / alloy.md** — corrected image references: these are now **locally-built `registry.ops.eblu.me/blumeops/*` nix containers** (ntfy v2.19.2, tempo v2.10.3, alloy-k8s v1.16.0), not upstream Docker Hub. Fly.io alloy binary bumped to v1.16.1.

## Service review
- **nvidia-device-plugin** (ringtail GPU): v0.19.0 → **v0.19.2**. Upstream patch releases — CDI/Tegra fixes + dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup (the service-account change in the notes is helm-only).

## Not in this PR (need container rebuilds, deferred)
The other stale services are locally-built nix images, so upgrading them is a forge-runner rebuild rather than a clean tag bump — left untouched (not date-bumped, so they resurface): **prometheus** (v3.10.0→v3.12.0), **loki** (3.6.7→3.7.2), **kube-state-metrics**, **homepage**. Happy to do these as a follow-up rebuild PR.

## Deploy / verify
Not yet deployed — `nvidia-device-plugin` still points at `main`. After review:
```
argocd app set nvidia-device-plugin --revision reviews-jun4 && argocd app sync nvidia-device-plugin
# after merge:
argocd app set nvidia-device-plugin --revision main && argocd app sync nvidia-device-plugin
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #366
2026-06-04 13:37:02 -07:00
02ea1cc72a C0: point tailscale-operator base mirror fetch at tailnet forge
The public forge.eblu.me now black-holes /mirrors/ at the Fly edge
(AI-scraper mitigation), so the in-cluster ArgoCD repo-server got a 403
fetching the upstream operator manifest — leaving tailscale-operator and
tailscale-operator-ringtail in Unknown sync. Use forge.ops.eblu.me.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 12:40:21 -07:00
Forgejo Actions
8f72f04d5c Update docs release to v1.17.0
- Built changelog from towncrier fragments

[skip ci]
2026-06-03 21:52:22 -07:00
118 changed files with 860 additions and 252 deletions

View file

@ -12,6 +12,259 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
<!-- towncrier release notes start -->
## [v1.17.0] - 2026-06-03
### Features
- Deploy the Adelaide / Heidi / Addie baby shower app — guest splash, raffle
picker, and prize assignment console — on ringtail k3s with `shower.eblu.me`
as the public entry and `shower.ops.eblu.me` as the tailnet admin host. App
source: [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
- Deploy adelaide-baby-shower-app v1.1.0 to ringtail k3s. Replaces the
boolean lock with a four-phase `ShowerState` (`pre_event``party`
`prizes_locked``event_locked`), adds an append-only "guest memories"
panel where guests can leave photos and comments for the baby, and
polishes the admin and QR views. Three Django migrations
(`0009_shower_phase`, `0010_guest_memories`, `0011_book_description`)
run automatically in the entrypoint against the SQLite PV. No config
or env-var changes.
Container build also gains a Forgejo-PyPI workaround: Forgejo's simple
index returns absolute file URLs hardcoded to the public ROOT_URL
(`forge.eblu.me`), which the Fly edge 403s on `/api/packages/*`. The
wheel and sdist are now both pulled via direct `fetchurl` against
`forge.ops.eblu.me` (tailnet-only) and the wheel is handed to pip as
a local path.
- `review-compliance-reports` now also fetches and summarizes the weekly Prowler container-image and IaC scans (previously only the K8s CIS in-cluster scan was processed). For each scan it shows status counts, severity breakdown, week-over-week delta, and — for the high-volume image/IaC scans — top-N tables grouped by check ID and resource instead of per-finding listings.
- runner-logs now authenticates with Forgejo API token and auto-detects the repo from git remote. Job logs are fetched via SSH to indri (reading Forgejo's on-disk zstd log files) instead of the web endpoint, which doesn't support token auth for private repos.
### Bug Fixes
- Fix nightly borgmatic backups failing for 2 days. The shower SQLite
dump hook referenced `kubectl --context=k3s-ringtail`, but indri's
kubeconfig deliberately doesn't carry the ringtail credentials. The
`before_backup` hook's failure aborted the entire run, taking out
*both* the local sifaka repo and the BorgBase offsite. Replaced
the inline-shell dump with a `~/bin/borgmatic-k8s-sqlite-dump`
helper deployed by the ansible role. Each dump entry now declares a
`target` of either `local:<context>` (mealie — kubectl uses indri's
kubeconfig) or `ssh:<user@host>` (shower — ssh into ringtail and
run `k3s kubectl` there, no indri-side kubeconfig needed; k3s.yaml
on ringtail is mode 644 so no sudo required). Bytes stream back via
`kubectl exec ... -- cat` rather than `kubectl cp`, since `kubectl
cp` requires `tar` inside the pod and nix-built images like shower
don't bundle it.
- Shower app container now bakes the wheel + Python deps into the image
at build time via `buildPythonPackage` instead of pip-installing on
first boot. Boots are deterministic and don't depend on forge PyPI
being reachable from the pod. The `wheelHash` in
`containers/shower/default.nix` is the sha256 sourced from the
[forge PyPI simple index](https://forge.eblu.me/api/packages/eblume/pypi/simple/adelaide-baby-shower-app/);
bumping the version means bumping that hash too.
Borgmatic now covers the shower app: SQLite is dumped from the live
pod via `kubectl exec` (mirroring the existing mealie entry, with
`context: k3s-ringtail`), and the prize-photo media share is picked up
through `/Volumes/shower` (sifaka SMB mount on indri, same pattern as
`/Volumes/photos`).
- Disabled adaptive sync (VRR) on ringtail's DP-1 output. The OMEN 27i IPS panel pumps brightness when its refresh rate swings into the low VRR range during low-framerate content (e.g. game cutscenes), producing a flicker that worsened over a session until a reboot. Pinning the panel to a fixed 165Hz eliminates it.
- Fixed forge.eblu.me static assets (CSS, JS, images, fonts) not loading — the proxy's static asset cache block was missing the `Host` header, so Caddy couldn't route the requests.
- Fixed homepage container EACCES on cold start: the nix-built image now chowns
`/app/config` to uid 1000 at build time via `fakeRootCommands`, matching the
behavior of the old Dockerfile. Without this, homepage couldn't seed missing
skeleton configs (proxmox.yaml etc.) or create `/app/config/logs`, crashing on
its first uncached request. Caught during the ringtail cutover.
- Fixed sway keybindings on ringtail — the home-manager `keybindings` block was replacing the module's defaults entirely, leaving only explicit overrides (no workspace switching, focus, move, splits, resize mode, etc). Switched to `lib.mkOptionDefault` with `lib.mkForce` on the conflicting custom binds (`Mod+Return`, `Mod+d`, `Mod+space`, `Mod+l`) so defaults merge back in. Also added `Mod+F1` to show a filterable fuzzel list of current keybindings.
Fixed fuzzel config errors on launch — `border-radius` and `border-width` were under `[main]`, but fuzzel expects them as `radius`/`width` under a `[border]` section.
- Pin the Quartz docs build to v4.5.2. The Dagger `build_docs` pipeline cloned Quartz from the default branch unpinned; Quartz v5.0.0 restructured its config layout (`.quartz/plugins`, `../quartz` imports) and broke the docs build against our existing `quartz.config.ts`/`quartz.layout.ts`.
### Infrastructure
- Wire the ringtail `blumeops-pg` cluster (which holds the wave-1-migrated
paperless + teslamate databases) into backups and Grafana. Adds a Tailscale
LoadBalancer Service (`blumeops-pg-ringtail.tail8d86e.ts.net`) and a Caddy L4
route (`pg.ops.eblu.me:5434`), then repoints borgmatic's `teslamate` +
`paperless` postgres dumps and the `mealie` SQLite dump at ringtail, and the
Grafana TeslaMate datasource at the ringtail DB. Closes the backup gap that
opened at cutover (the migrated live data was still being backed up from the
now-frozen minikube copies) and unblocks the wave-1 decommission.
- Migrated homepage dashboard from minikube (indri/arm64) to k3s (ringtail/amd64).
The container is now built via nix (`containers/homepage/default.nix`), adapted
from nixpkgs `homepage-dashboard` with the upstream Next.js cache patches and
wrapped with `dockerTools.buildLayeredImage`. Autodiscovery shifts: services on
minikube (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus,
Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries
in `services.yaml`; ringtail services (Authentik, Frigate/NVR, Ntfy, Ollama)
auto-populate via Ingress annotations.
- Migrated CV (`cv.eblu.me`) and Docs (`docs.eblu.me`) from minikube Deployments to indri-native ansible roles. Caddy now serves the extracted release tarballs directly via a new `kind: static` service-block in the Caddy template — no daemon, no container — replacing the prior nginx-in-a-pod layer. Removes a network hop on every request and shrinks minikube's footprint. See [[cv-on-indri]] and [[docs-on-indri]]. Part of the broader minikube wind-down.
- Migrated devpi (PyPI mirror at `pypi.ops.eblu.me`) from a minikube StatefulSet to a launchd-managed service on indri. devpi-server now runs in a uv-managed venv with pinned `devpi-server` and `devpi-web` versions, listens on `127.0.0.1:3141`, and is fronted by Caddy. The minikube StatefulSet was crash-looping under memory pressure (and breaking the Python toolchain everywhere); the new layout removes a layer of dependency on cluster health for critical-path tooling. See [[devpi-on-indri]].
- Move the entire Immich stack — server, machine-learning, valkey,
and the PostgreSQL+VectorChord cluster — off `minikube-indri` and
onto `k3s-ringtail`. Postgres data migrated zero-loss via CNPG
`pg_basebackup` (replica catch-up then promote); row counts on
`asset`, `user`, `album`, `smart_search`, `activity`, `asset_face`
verified equal between source and replica before cutover. The ML
pod now uses ringtail's RTX 4080 via the nvidia-device-plugin
(time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy
routing at `photos.ops.eblu.me` is unchanged (still
`photos.tail8d86e.ts.net`, the device just lives on ringtail now).
Borgmatic backups continue against the same `immich-pg` tailnet
hostname. First concrete chain in the broader indri-k8s
decommission effort.
- Add local nix container build for `tailscale` (`containers/tailscale/default.nix`) so ringtail's tailscale-operator ProxyClass proxy pods pull from the forge mirror instead of `docker.io/tailscale/tailscale`. Pinned at v1.94.2 to match `service-versions.yaml`. Indri's tailscale-operator continues to use upstream during the k8s-to-ringtail migration.
- Address the 6 critical Prowler IaC findings against `argocd/manifests/`. Prowler's IaC provider hardcodes `self._mutelist = None` and delegates filtering to Trivy, but doesn't plumb `--ignorefile` through — so the documented "use Trivy filtering" path is actually broken. Added a shim around `trivy` in the Prowler image that injects `--ignorefile $TRIVY_IGNOREFILE` for `trivy fs` invocations when the env var points at a real file. The IaC cronjob now mounts `mutelist/trivyignore.yaml` (Trivy's per-path schema) and sets the env var, muting the `external-secrets` and `kube-state-metrics` Secret-access findings (KSV-0041, KSV-0114). Separately, `grafana-clusterrole` is tightened to remove `secrets` access entirely: the dashboard sidecar already only consumes ConfigMap-labeled dashboards, so its `RESOURCE` env var is now `configmap` instead of `both`.
- Pin ringtail's wired IP to `192.168.1.21` via NixOS scripted networking; NetworkManager no longer manages `enp5s0`. Removes DHCP lease renewal as a failure mode after a silent lease teardown took ringtail offline. Also explicitly enables `net.ipv4.ip_forward` (previously set implicitly by scripted-DHCP) so k3s pod networking and Tailscale routing continue to work with static networking.
- Ripped out the compensating-controls (CC) framework: deleted `compensating-controls.yaml`, the `review-compensating-controls` mise task, and the associated how-to / explanation docs. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files remain in place but no longer carry `CC: <id>` prefixes — each entry just keeps a free-form `Description` of why the finding is muted. The CC review cadence proved to be more overhead than this single-operator homelab needed.
- Wire shower app for public exposure: fly nginx `shower.eblu.me` server
block as a guest-only surface — splash page, `/prizes/<token>/`, static
assets, media. Everything authenticated (`/admin/`, `/host/`,
`/accounts/`) returns 403 with a "tailnet only" pointer. Staff hit
`shower.ops.eblu.me` for the operator console + admin; the app's
v1.0.1 `DJANGO_PUBLIC_URL_BASE` setting makes QR codes generated on
the tailnet point back at the WAN host for guests. Plus a Caddy route
on indri, Pulumi Gandi CNAME, and a Grafana APM dashboard tracking
request rate, error rate, latency, bandwidth, and access logs.
- Mirror Valkey 8.1 locally as `registry.ops.eblu.me/blumeops/valkey`. Replaces direct pulls of `docker.io/valkey/valkey:8.1-alpine` for paperless and immich sidecars. Built via native Dagger pipeline on Alpine 3.22. Stateless swap — no data migration. Authentik's nix-built Redis remains separate.
- Add nix-built amd64 valkey for ringtail (`containers/valkey/default.nix`) so immich-ringtail can stop pulling the upstream multi-arch `docker.io/valkey/valkey` image. Existing `container.py` continues to build Alpine arm64 for paperless on indri. Both bump to valkey 8.1.7 (Alpine 3.22 8.1.7-r0 / nixpkgs 8.1.7).
- Upgrade Grafana Alloy v1.14.0 → v1.16.0 across all four service deployments
(alloy-k8s, alloy-ringtail, alloy-tracing-ringtail on k8s; alloy native on
indri). Pulls in stable database observability (v1.15) and the OTel Collector
v0.147.0 bump. Container build also migrated from Dockerfile to native Dagger
`container.py` per the build-container-image migration playbook.
- Upgraded Dagger from v0.20.1 to v0.20.6 (engine, CLI pin, and SDK regen) and migrated `runner-job-image` from a Debian-based Dockerfile to a native Dagger `container.py` on Alpine 3.23, reusing the shared `alpine_runtime` helper.
- Decommission the wave-1 services on minikube-indri now that paperless,
teslamate, and mealie run on ringtail with their data backed up. Removes the
minikube `paperless`/`teslamate`/`mealie` manifest dirs + ArgoCD app
definitions (pruning the parked Deployments, Services, and the redundant
minikube mealie/paperless PVCs), and drops the `paperless`/`teslamate` roles
from the minikube `blumeops-pg` cluster. The `paperless` and `teslamate`
databases are dropped from indri's blumeops-pg as the finalization step.
miniflux + authentik remain on the minikube cluster (later waves).
- Upgraded the k8s Forgejo runner to the v12.8 line, switched it from first-boot registration to declarative `server.connections` credentials from 1Password, and consolidated the supporting runner how-to documentation.
- Move paperless, teslamate, and mealie off `minikube-indri` onto
`k3s-ringtail`, shedding ~1.1 GiB of resident load from the
OOM-thrashing 8 GiB minikube node (the kernel OOM killer had been
killing `kube-apiserver`/`dockerd`/argocd, flapping every
minikube-hosted service at once). paperless + teslamate databases
move into a fresh CNPG `blumeops-pg` cluster on ringtail via a cold
`pg_dump`/`pg_restore` from the quiesced source — row counts verified
equal before any routing flip; source DBs dropped only after the
ringtail side serves traffic. mealie's SQLite PVC is copied as-is.
paperless media stays on sifaka NFS. Downtime-tolerant cold cutover
(no streaming replication); rollback is repoint-and-scale-up with the
source untouched. Second chain in the indri-k8s decommission after
[[migrate-immich-to-ringtail]].
- Recurring maintenance batch:
- Ringtail flake inputs refreshed (`disko`, `home-manager`, `nixpkgs`).
- Tooling deps bumped: prek hooks (trufflehog v3.95.3, kingfisher v1.101.0, ruff v0.15.14, `ansible-core` 2.21.0); fly proxy base images (nginx 1.30.1-alpine, alloy v1.16.1); `typer==0.26.2` in mise tasks.
- Updated `nixos/ringtail/flake.lock` (weekly cadence): `disko`, `home-manager`, and `nixpkgs` inputs refreshed. `nixpkgs-services` skipped per overlay convention.
- Reviewed `mealie` service version freshness; upstream is 5 minor versions ahead (v3.17.0 vs deployed v3.12.0). Marked reviewed; upgrade deferred.
- Deploy shower v1.1.2 — bump container build to new app release.
- Upgrade unpoller v2.34.0 → v3.2.0 and migrate container build from Dockerfile to native Dagger (container.py). v3.0.0 carries breaking UniFi API changes; v3.2.0 introduces a 60s background poll (cached scrapes) by default — set `interval = 0` in `up.conf` to restore on-demand polling.
- Monthly tooling dependency refresh: prek hooks (trufflehog, kingfisher, ruff, shfmt, prettier, actionlint, ansible-lint), fly proxy base images (nginx 1.30.0, tailscale v1.94.2, alloy v1.16.0), normalize pyyaml lower bound in mise-tasks.
- Add GE-Proton (`pkgs.proton-ge-bin`) to `programs.steam.extraCompatPackages`
on ringtail. Subnautica 2 hangs at Mercuna plugin init under Proton
Experimental + DXVK D3D12; GE-Proton is available as a Steam per-game
compatibility option to work around it.
- Add `sn2-prelaunch` Steam launch wrapper on ringtail that removes
Subnautica 2's stale `Saved/running.dat` and `Saved/beforelobby.dat`
lockfiles before each launch. SN2 pops up an invisible (0×0-sized)
Error dialog when it detects an unclean exit, blocking GameThread
forever; this is observable only as a black screen with a spinning
loader. Use via Steam launch option: `sn2-prelaunch %command%`.
- Add local nix container build for `frigate-notify` (`containers/frigate-notify/default.nix`) so the Frigate→ntfy bridge is rebuilt on ringtail from the forge mirror instead of pulled from `ghcr.io/0x2142/frigate-notify`.
- Add resource limits to all ArgoCD pods to prevent unbounded resource consumption during node-wide pressure events.
- Black-hole the `/mirrors/*` repositories at the Fly proxy edge (`return 403``forge.ops.eblu.me`). A surprise $29.60 Fly bill traced to ~1.24 TB/30d of egress on `forge.eblu.me`, 99.95% of all proxy egress — of which ~71% was AI scrapers (Meta `meta-externalagent`, OpenAI `GPTBot`, Amazonbot) crawling the near-infinite git-history URL space of the public mirror repos and timing out Forgejo in the process. Mirrors exist for supply-chain control and are consumed over the tailnet, so their public web UI had no legitimate audience. `robots.txt` already disallowed `/mirrors/`, but the offending agents ignore it. Tier-2 mitigations (user-agent denylist, Anubis proof-of-work gateway) are documented in `docs/explanation/ai-scraper-mitigation.md`.
- Bump paperless and immich kustomizations to the main-SHA-built valkey tag (`v8.1.6-r0-fabca04`). Routine post-merge follow-up to keep production manifests pointing at images built from a commit on main.
- Bump shower container to v1.1.1 (probe FOD hash).
- Bumped shower app to v1.1.3 (wheel/sdist + FOD hashes probed on ringtail).
- Cap systemd-coredump on ringtail (ProcessSizeMax/ExternalSizeMax 1G, MaxUse 2G) so multi-GB Wine/Proton game crash dumps no longer thrash the disk and lock up the desktop.
- Deploy shower v1.1.1 to ringtail (kustomize newTag bump).
- Deployed shower v1.1.3 to ringtail (image built and pushed from ringtail; runner bypassed due to indri overload).
- Fix three follow-ups from the wave-1 decommission: grant the local
break-glass `admin` account ArgoCD admin rights (`g, admin, role:admin`
previously only the Authentik `admins` group had access, so admin was
locked out whenever its token expired), and repoint the alloy blackbox
probe for teslamate from the deleted minikube service to
`https://tesla.ops.eblu.me/` (through Caddy over Tailscale). The orphaned
paperless/teslamate roles + ExternalSecrets left on the minikube
blumeops-pg are also cleaned up.
- Moved the Immich blackbox health probe from indri's alloy to ringtail's alloy. After the immich migration to ringtail, the probe still targeted `immich-server.immich.svc.cluster.local` on indri's cluster where the service no longer exists, causing a persistent `ServiceProbeFailure` alert.
- Pin shower v1.1.1 FOD outputHash (probed locally on ringtail).
- Rebuild Prowler container against main HEAD (v5.23.0-495e45d) after merging the IaC mutelist Dockerfile changes.
- Rebuild and retag alloy v1.16.0 container images from the main-branch SHA
following the squash-merge of #345, per the build-container-image
squash-merge convention. Both images (`registry.ops.eblu.me/blumeops/alloy`)
now reference `9564435` rather than the branch SHA `26a3ab5`, restoring
source traceability after branch cleanup.
- Rebuild shower from the post-merge commit on main so the container's
SHA tag points at a commit that will still exist after the 30-day
branch-cleanup window. Functionally identical to the branch-tag image
already deployed, just preserves source traceability per
[[build-container-image#Squash-merge and container tags]].
- Rebuild unpoller container from squashed main commit so the image SHA tag matches a commit in main's history (was tagged with the pre-squash branch SHA).
- Rebuild valkey container from squashed main commit (both arm64 dagger and amd64 nix variants), and update paperless + immich-ringtail kustomizations to the main-SHA tags `v8.1.7-ecded30` and `v8.1.7-ecded30-nix`.
- Retired the `blumeops-tasks` mise task (Todoist API) in favor of `heph list --project Blumeops --json` from the self-hosted [hephaestus](https://github.com/eblume/hephaestus) system. Updated docs to point task discovery and rotation reminders at heph, and noted that the `~/code/personal/zk` zettelkasten is migrating into heph docs.
- Switch the Fly proxy deploy strategy from `bluegreen` to `immediate` in `fly/fly.toml`. With a single proxy machine, bluegreen offers little benefit — the green machine routinely failed to reach "started" inside Fly's default 5-minute deploy timeout (the cold-start sequence of `tailscaled``tailscale up` → wait-for-MagicDNS → nginx startup eats most of the budget), and the failed deploys would roll back. `immediate` replaces the machine in place with a brief downtime (~510s) but actually completes.
- Switch the ringtail provisioning playbook's blumeops clone URL from `forge.eblu.me` (public, via Fly proxy) to `forge.ops.eblu.me` (tailnet, direct via Caddy on indri). Ringtail is always on the tailnet, so the WAN round-trip is pure overhead — it also made `provision-ringtail` brittle whenever the Fly proxy was slow or down.
- Switched Grafana's deployment strategy from `RollingUpdate` to `Recreate`. With an RWO PVC holding the SQLite database and Bleve search index, `RollingUpdate` reliably crashloops the new pod on the index lock until rollout timeout. `Recreate` terminates the old pod first so the new one acquires the lock cleanly.
- Update `tailscale-operator-ringtail` ProxyClass to reference the `0108b68` main-SHA build of the tailscale container. Routine post-merge cleanup so the deployed image traces to a commit that survives PR branch cleanup.
- Update the ringtail NixOS flake lockfile (`nixos/ringtail/flake.lock`): bump
`nixpkgs` (b77b3de → 25f5383) and `disko` (5ba0c95 → 115e521) to latest.
`nixpkgs-services` was intentionally left pinned (skipped by the
`flake-update` pipeline). Routine recurring maintenance per [[manage-lockfile]].
- Upgrade native macOS Alloy on indri to v1.16.0. Built on gilbert with Go
1.26.2 + CGO (required for the macOS native DNS resolver, which Tailscale
MagicDNS depends on), scp'd to `~/.local/bin/alloy` on indri, codesigned,
and the LaunchAgent reloaded. Completes the v1.16.0 fleet upgrade started
in #345 — all four Alloy services (alloy-k8s, alloy-ringtail,
alloy-tracing-ringtail, alloy ansible) now run v1.16.0.
- Upgraded zot on indri from v2.1.15 to v2.1.16 (security fixes: TLS verification on metrics client, CORS Allow-Credentials suppression on wildcard origins, manifest/API-key body size limits).
### Documentation
- Reviewed `replicating-blumeops` tutorial: fixed "BluemeOps" typos (also in `contributing.md`) and added `last-reviewed` frontmatter.
- Reviewed [[indri]] reference card: added `devpi`, `cv`, and `docs` to the native-services list; widened the k8s note to reflect the growing set of apps now on ringtail and the planned indri-minikube decommission; added CPU/RAM specs.
- New how-to: rotate-fly-deploy-token. Documents the 75-day rotation cadence, why we use `org`-scoped tokens (silences the cosmetic metrics-token warning on `fly status` with marginal blast-radius cost given the single-app personal org), and the procedure for rotation + Forgejo Actions secret sync.
- Add `docs/explanation/ai-scraper-mitigation.md` — the egress-cost / AI-crawler threat model for the public Fly proxy, the tiered mitigation plan (Tier 1: mirror black-hole, shipped; Tier 2: user-agent denylist + Anubis; Tier 3: Cloudflare, rejected on principle), and the data behind it.
- Fix manage-forgejo-mirrors verify step — sync button is on the repo settings page ("Synchronize now"), not the main repo page.
- Fixed the `op item edit` invocation in the [[zot]] API-key rotation procedure: the previous `pbpaste | op item edit ... "field[password]=-"` stdin syntax is rejected by op 2.34 as "invalid JSON" (recent op versions treat piped input as a full JSON template, not a single field value). Procedure now reads the clipboard into a local fish variable and passes it as an inline assignment.
- Fixed the export-filename step in [[run-1password-backup]]: 1Password's desktop app names the export `1PasswordExport-<account-uuid>-<timestamp>.1pux` automatically rather than letting you save to a fixed name, so the procedure now points the task at that glob instead of pretending the default name is `1Password-export.1pux`.
- Refresh the contributing tutorial: add `last-reviewed`, include the `.ai.md` changelog fragment type, and clarify that `prek` is pinned via `mise`.
- Review and refresh the Navidrome reference card: add `last-reviewed`, correct the scanner env var name, document the current image/version, and record routing and runtime details from the manifests.
- Review and refresh the Ollama reference card: add `last-reviewed`, bump the documented image tag to 0.20.4, and add the two `qwen3.5` models now declared in `models.txt`.
- Reviewed [[1password]] reference card: added the `blumeops` vs `Personal` vault split, noted that `onepassword-connect` runs on both indri and ringtail (not just one cluster), and pulled the `op read` vs `op item get --fields` guidance up from agent memory into the card.
- Reviewed `index.md`; added ringtail to the infrastructure overview and stamped `last-reviewed`.
- Reviewed transmission card: corrected storage layout (`/config/` is emptyDir, watch dir disabled) and noted the Prometheus exporter sidecar.
- rotate-fly-deploy-token: combine mint+store into one command with both fish and bash forms; document the `op item edit` "Password item requires ps value" validator gotcha and the placeholder-password workaround.
### AI Assistance
- Adopt `AGENTS.md` as the canonical agent instruction file, keep `CLAUDE.md` as a compatibility shim, and update docs to reference the neutral file and the correct agent-change-process path.
- CLAUDE.md now imports AGENTS.md via `@AGENTS.md` instead of telling agents to go read it. Claude Code only auto-loads CLAUDE.md, so the prose shim was easy to skip; the import inlines AGENTS.md into the session prompt unconditionally.
### Miscellaneous
- Removed the dead minikube manifests, container builds, and tooling shims left behind after the cv + docs migration to indri-native (#342). Deletes `argocd/{apps,manifests}/{cv,docs}/`, `containers/{cv,quartz}/`, and the `quartz``docs` mapping in `mise-tasks/container-version-check`. Bumps `docs.current-version` to `v1.16.0` (the blumeops release tag) now that the legacy nginx-base version pin is gone.
- Rebuild shower v1.1.0 container from main HEAD (`3c7967e`) and bump the
kustomization tag to `v1.1.0-3c7967e-nix`. The PR was squash-merged, so
the branch commit `444ff91` baked into the prior tag isn't reachable
from main's history. The new tag points at a commit that exists on
main; image content is byte-identical because the FOD output is content
addressed and the inputs didn't change.
- Rebuild shower v1.1.2 from main HEAD (a33fa47) and retag — PR #358 was squash-merged so the branch SHA baked into the prior image tag isn't reachable from main. FOD is content-addressed, so image bytes are identical; only provenance changes.
- Remove the duplicate Homepage tiles for Mealie, Paperless, Immich, and
TeslaMate. Homepage runs on ringtail and autodiscovers ringtail Ingresses via
`gethomepage.dev/*` annotations; once these services migrated to ringtail they
were discovered automatically, making their leftover static `services.yaml`
entries (needed only while they lived on minikube) redundant.
- Removed the now-unused `containers/devpi/` Dagger build artifact. Devpi runs natively on indri via uv venv; the container image is no longer referenced anywhere. Doc examples in `docs/reference/tools/dagger.md` updated to use `miniflux` as the example container name.
- `container-build-and-release` now prints the specific `mise run runner-logs <N>` command after dispatching, polling the Forgejo API to resolve the run number for the commit it just triggered.
- `mise run runner-logs <run> -j <n>` now reports a clear error when the log file doesn't exist on indri (e.g. a runner crash that left `action_task.log_in_storage = 0`). Previously it printed only the header and exited 0, because `zstdcat` exits 0 with a "can't stat … -- ignored" stderr message and ssh+fish on indri swallows the remote exit code.
## [v1.16.0] - 2026-04-18
### Infrastructure

View file

@ -260,5 +260,7 @@
tags: cv
- role: docs
tags: docs
- role: heph
tags: heph
- role: caddy
tags: caddy

View file

@ -52,6 +52,9 @@ caddy_services:
- name: devpi
host: "pypi.{{ caddy_domain }}"
backend: "http://localhost:3141"
- name: heph
host: "heph.{{ caddy_domain }}"
backend: "http://localhost:8787" # hephaestus hub (server mode) + PWA shell
- name: kiwix
host: "kiwix.{{ caddy_domain }}"
backend: "https://kiwix.tail8d86e.ts.net"

View file

@ -3,9 +3,8 @@
# Caddy serves docs_content_dir directly via the static-kind service block,
# with Quartz-style try_files (path → path/ → path.html → 404).
docs_version: "v1.16.0"
docs_version: "v1.17.0"
docs_release_url: "https://forge.eblu.me/eblume/blumeops/releases/download/{{ docs_version }}/docs-{{ docs_version }}.tar.gz"
docs_home: /Users/erichblume/blumeops/docs
docs_content_dir: "{{ docs_home }}/content"
docs_version_sentinel: "{{ docs_home }}/.installed-version"

View file

@ -0,0 +1,49 @@
---
# hephaestus hub — the canonical heph replica (server mode) on indri.
# Other devices (e.g. gilbert) are spokes that sync against this hub.
# See [[set-up-sync-hub]] and [[host-heph-pwa]] in the hephaestus repo.
# Pinned release used for the initial `cargo install` and the PWA shell.
# After bootstrap, hephd's own --self-update keeps the binary current; this
# pin only governs the first install and the bundled PWA shell version.
heph_version: v1.2.1
# Anonymous public HTTPS clone — matches hephd's INSTALL_GIT_URL so the initial
# install and unattended self-update build from the same source (no ssh-agent).
heph_repo_url: https://forge.eblu.me/eblume/hephaestus.git
heph_bin_dir: /Users/erichblume/.cargo/bin
heph_binary: "{{ heph_bin_dir }}/hephd"
# rustc/cargo here are rustup shims. The bare (non-mise) environment that the
# launchagent and ansible run in falls back to rustup's *default* toolchain,
# which can lag behind heph's rust-version floor (Cargo.toml: 1.89). Pin the
# channel explicitly so both the bootstrap build and unattended self-update
# always use a current toolchain regardless of the host's rustup default.
heph_rust_toolchain: stable
heph_data_dir: /Users/erichblume/.local/share/heph
heph_db: "{{ heph_data_dir }}/heph.db"
heph_socket: "{{ heph_data_dir }}/hephd.sock"
heph_log_dir: /Users/erichblume/Library/Logs
# Version-pinned source checkout; the PWA static shell is served directly from
# its heph-pwa/ subdir (no copy), keeping shell and hub in lockstep at heph_version.
heph_pwa_src_dir: /Users/erichblume/.cache/heph-pwa-src
heph_web_root: "{{ heph_pwa_src_dir }}/heph-pwa"
# Hub listens on all interfaces so tailnet spokes can reach it directly
# (http://indri.tail8d86e.ts.net:8787) and Caddy can proxy heph.ops.eblu.me.
# Access is gated by Authentik OIDC regardless — tailnet reachability is not
# enough (this is the owner's most sensitive data).
heph_http_addr: 0.0.0.0:8787
heph_port: 8787
heph_external_url: https://heph.ops.eblu.me
# Authentik OIDC — issuer + audience together turn hub auth on. The audience is
# the device-code client id (see argocd/manifests/authentik heph blueprint).
heph_oidc_issuer: https://authentik.ops.eblu.me/application/o/heph/
heph_oidc_audience: heph
# Self-update poll interval (seconds). 10 minutes.
heph_self_update_interval_secs: 600

View file

@ -0,0 +1,6 @@
---
- name: Restart heph
ansible.builtin.shell: |
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.heph.plist 2>/dev/null || true
launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
changed_when: true

View file

@ -0,0 +1,82 @@
---
# hephaestus hub (server mode) on indri.
#
# DATA SEEDING (one-time, Path A — do this BEFORE the first provision so the hub
# adopts gilbert's existing data instead of being born empty):
#
# 1. On the seed device (gilbert): heph daemon stop
# 2. Copy its store to indri: scp ~/.local/share/heph/heph.db \
# indri:~/.local/share/heph/heph.db
# 3. On indri, give the hub its OWN device origin (keeps gilbert's owner_id +
# data; hephd regenerates a fresh origin on next start when it is missing):
# sqlite3 ~/.local/share/heph/heph.db "DELETE FROM meta WHERE key='origin';"
# 4. Run this role (installs hephd, stages the PWA, loads the launchagent).
#
# hephd auto-creates an empty store on first start if none exists, so seeding is
# optional — skip it only if you intend a fresh, empty hub.
- name: Ensure heph data directory exists
ansible.builtin.file:
path: "{{ heph_data_dir }}"
state: directory
mode: '0700'
- name: Check for installed hephd binary
ansible.builtin.stat:
path: "{{ heph_binary }}"
register: heph_binary_stat
# Bootstrap install only when hephd is absent. Thereafter hephd's own
# --self-update keeps it current; ansible must not fight (or downgrade) it.
# This builds from source and can take several minutes on a cold cargo cache.
- name: Bootstrap-install heph + hephd from the forge ({{ heph_version }})
ansible.builtin.command:
cmd: >-
{{ heph_bin_dir }}/cargo install --locked
--git {{ heph_repo_url }}
--tag {{ heph_version }}
heph hephd
environment:
PATH: "{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin"
RUSTUP_TOOLCHAIN: "{{ heph_rust_toolchain }}"
when: not heph_binary_stat.stat.exists
changed_when: true
notify: Restart heph
# Checkout provides the PWA shell at {{ heph_web_root }} (heph-pwa/ subdir),
# served directly by hephd. Static files are read from disk per request, so a
# version bump needs no restart; the service worker (CACHE = "heph-pwa-vN")
# evicts stale assets on next load.
- name: Ensure heph cache parent directory exists
ansible.builtin.file:
path: "{{ heph_pwa_src_dir | dirname }}"
state: directory
mode: '0755'
- name: Stage heph-pwa source at {{ heph_version }}
ansible.builtin.git:
repo: "{{ heph_repo_url }}"
dest: "{{ heph_pwa_src_dir }}"
version: "{{ heph_version }}"
depth: 1
single_branch: true
force: true
- name: Deploy heph LaunchAgent plist
ansible.builtin.template:
src: heph.plist.j2
dest: ~/Library/LaunchAgents/mcquack.eblume.heph.plist
mode: '0644'
notify: Restart heph
- name: Check if heph LaunchAgent is loaded
ansible.builtin.command: launchctl list mcquack.eblume.heph
register: heph_launchctl_check
changed_when: false
failed_when: false
- name: Load heph LaunchAgent if not loaded
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
when: heph_launchctl_check.rc != 0
changed_when: true
failed_when: false

View file

@ -0,0 +1,50 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- {{ ansible_managed }} -->
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mcquack.eblume.heph</string>
<key>ProgramArguments</key>
<array>
<string>{{ heph_binary }}</string>
<string>--mode</string>
<string>server</string>
<string>--http-addr</string>
<string>{{ heph_http_addr }}</string>
<string>--db</string>
<string>{{ heph_db }}</string>
<string>--socket</string>
<string>{{ heph_socket }}</string>
<string>--web-root</string>
<string>{{ heph_web_root }}</string>
<string>--oidc-issuer</string>
<string>{{ heph_oidc_issuer }}</string>
<string>--oidc-audience</string>
<string>{{ heph_oidc_audience }}</string>
<string>--self-update</string>
<string>--self-update-interval-secs</string>
<string>{{ heph_self_update_interval_secs }}</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>EnvironmentVariables</key>
<dict>
<!-- cargo + toolchain on PATH so --self-update can run `cargo install`. -->
<key>PATH</key>
<string>{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
<key>HOME</key>
<string>/Users/erichblume</string>
<!-- Pin the rustup channel: the launchagent runs without mise, so a bare
cargo shim would otherwise use rustup's (stale) default toolchain. -->
<key>RUSTUP_TOOLCHAIN</key>
<string>{{ heph_rust_toolchain }}</string>
</dict>
<key>StandardOutPath</key>
<string>{{ heph_log_dir }}/mcquack.heph.out.log</string>
<key>StandardErrorPath</key>
<string>{{ heph_log_dir }}/mcquack.heph.err.log</string>
</dict>
</plist>

View file

@ -15,7 +15,7 @@ spec:
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/external-secrets
path: argocd/manifests/external-secrets-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: external-secrets

View file

@ -434,3 +434,93 @@ data:
provider: !KeyOf mealie-provider
meta_launch_url: https://meals.ops.eblu.me
policy_engine_mode: all
heph.yaml: |
version: 1
metadata:
name: BlumeOps Heph SSO
labels:
blueprints.goauthentik.io/description: "Hephaestus hub OIDC (device-code) provider, application, and device-code flow"
entries:
# Device-code flow (RFC 8628). authentik ships no default for this, so we
# create one and bind it to the brand below. An empty stage_configuration
# flow is sufficient: the already-authenticated user just confirms the code.
- model: authentik_flows.flow
id: device-code-flow
identifiers:
slug: default-device-code-flow
attrs:
name: Device code flow
title: Device code flow
slug: default-device-code-flow
designation: stage_configuration
authentication: require_authenticated
# Enable the device-code grant globally by binding the flow to the default
# brand (domain authentik-default). Partial update — only sets this field.
- model: authentik_brands.brand
identifiers:
domain: authentik-default
attrs:
flow_device_code: !KeyOf device-code-flow
# OAuth2 provider for heph — PUBLIC client (device-code + PKCE, no secret).
# client_id doubles as the token audience the hub verifies (--oidc-audience heph),
# and the app slug 'heph' is the issuer path (/application/o/heph/).
- model: authentik_providers_oauth2.oauth2provider
id: heph-provider
identifiers:
name: Heph
attrs:
name: Heph
authorization_flow: !Find [authentik_flows.flow, [slug, default-provider-authorization-implicit-consent]]
invalidation_flow: !Find [authentik_flows.flow, [slug, default-provider-invalidation-flow]]
client_type: public
client_id: heph
# CLI/TUI use the device-code grant (no redirect). The heph-pwa browser
# login uses Authorization Code + PKCE, which DOES redirect back to the
# app's origin — register those here (Authentik also keys token-endpoint
# CORS off these origins). Trailing slash matters: the PWA's redirect_uri
# is its base dir, e.g. https://heph.ops.eblu.me/.
redirect_uris:
- matching_mode: strict
url: https://heph.ops.eblu.me/
- matching_mode: strict
url: http://localhost:8787/ # local dev (hephd --web-root)
signing_key: !Find [authentik_crypto.certificatekeypair, [name, authentik Self-signed Certificate]]
property_mappings:
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, openid]]
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, email]]
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, profile]]
# offline_access: heph CLI requests "openid offline_access"; without
# this mapping the refresh token is session-bound and hephd's
# refresh_token grant 400s once the session lapses (spoke sync dies).
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, offline_access]]
sub_mode: hashed_user_id
include_claims_in_id_token: true
# Heph application — linked to the OAuth2 provider
- model: authentik_core.application
id: heph-app
identifiers:
slug: heph
attrs:
name: Hephaestus
slug: heph
provider: !KeyOf heph-provider
meta_launch_url: https://heph.ops.eblu.me
policy_engine_mode: any
# Policy binding — restrict heph to admins group (single-owner, sensitive data)
- model: authentik_policies.policybinding
identifiers:
order: 0
target: !KeyOf heph-app
group: !Find [authentik_core.group, [name, admins]]
attrs:
target: !KeyOf heph-app
group: !Find [authentik_core.group, [name, admins]]
order: 0
enabled: true
negate: false
timeout: 30

View file

@ -0,0 +1,16 @@
# Ringtail (amd64) overlay for external-secrets.
#
# Reuses the shared indri manifest as a base and only overrides the controller
# image to the nix-built amd64 variant (`-nix` tag). The base sets the arm64
# image (built via containers/external-secrets/container.py on indri's Dagger
# runner); ringtail's k3s is amd64 and needs the image built by
# containers/external-secrets/default.nix on the nix-container-builder.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../external-secrets
images:
- name: registry.ops.eblu.me/blumeops/external-secrets
newTag: v2.2.0-13895bb-nix

View file

@ -12,4 +12,5 @@ resources:
images:
- name: ghcr.io/external-secrets/external-secrets
newTag: v2.2.0
newName: registry.ops.eblu.me/blumeops/external-secrets
newTag: v2.2.0-13895bb

View file

@ -10,4 +10,4 @@ resources:
images:
- name: nvcr.io/nvidia/k8s-device-plugin
newTag: v0.19.0
newTag: v0.19.2

View file

@ -6,8 +6,11 @@ namespace: tailscale
# Upstream Tailscale operator manifest from forge mirror.
# To upgrade: update the ref in the URL AND the newTag below.
# Must use the tailnet host forge.ops.eblu.me — the public forge.eblu.me
# black-holes /mirrors/ at the Fly edge (AI-scraper mitigation), which the
# in-cluster ArgoCD repo-server would otherwise hit and fail with a 403.
resources:
- https://forge.eblu.me/mirrors/tailscale/raw/tag/v1.94.2/cmd/k8s-operator/deploy/manifests/operator.yaml
- https://forge.ops.eblu.me/mirrors/tailscale/raw/tag/v1.94.2/cmd/k8s-operator/deploy/manifests/operator.yaml
- proxyclass.yaml
- dnsconfig.yaml

View file

@ -0,0 +1,51 @@
"""External Secrets Operator — native Dagger build.
Two-stage build: Go binary (all providers), Alpine runtime.
Source cloned from forge mirror.
A single binary serves as the controller, webhook, and cert-controller; the
Deployments select the role via a subcommand passed in `args:`, so the image
ENTRYPOINT must be the binary itself (matching upstream's distroless image).
"""
import dagger
from blumeops.containers import (
alpine_runtime,
clone_from_forge,
go_build,
oci_labels,
)
VERSION = "v2.2.0"
async def build(src: dagger.Directory) -> dagger.Container:
source = clone_from_forge("external-secrets", VERSION)
# Upstream `make build` compiles every secret provider into a single
# static binary (`-tags all_providers`, CGO disabled). Mirror that so the
# local image is functionally identical to ghcr.io/.../external-secrets.
backend = go_build(
source,
"/external-secrets",
tags="all_providers",
)
runtime = alpine_runtime(
extra_apk=["ca-certificates"],
create_user=False,
)
runtime = oci_labels(
runtime,
title="External Secrets Operator",
description=(
"Kubernetes operator that integrates external secret management systems"
),
version=VERSION,
)
return (
runtime.with_file("/bin/external-secrets", backend.file("/external-secrets"))
.with_user("65534")
.with_entrypoint(["/bin/external-secrets"])
)

View file

@ -0,0 +1,56 @@
# Nix-built External Secrets Operator (amd64, for ringtail k3s).
# Builds v2.2.0 from the forge mirror with all secret providers compiled in,
# faithful to upstream's `make build` (-tags all_providers). The container.py
# sibling builds the arm64 image for indri's minikube; this default.nix builds
# the amd64 image on ringtail's nix-container-builder.
{ pkgs ? import <nixpkgs> { } }:
let
version = "2.2.0";
src = pkgs.fetchgit {
url = "https://forge.ops.eblu.me/mirrors/external-secrets.git";
rev = "v${version}";
hash = "sha256-eAocOAp5s4CFRrpKfQr2lf3Ji+6nQQ1A5/eTw5B7v9U=";
};
# external-secrets v2.2.0 requires Go >= 1.26.1; nixpkgs default go is 1.25.x.
external-secrets = (pkgs.buildGoModule.override { go = pkgs.go_1_26; }) {
inherit src version;
pname = "external-secrets";
vendorHash = "sha256-0xuBK3fjAplPLAElHvKB6d+2lDz+De/s91fV4dPZwjE=";
doCheck = false;
subPackages = [ "." ];
tags = [ "all_providers" ];
ldflags = [ "-s" "-w" ];
meta = with pkgs.lib; {
description = "Kubernetes operator that integrates external secret management systems";
homepage = "https://github.com/external-secrets/external-secrets";
license = licenses.asl20;
mainProgram = "external-secrets";
};
};
in
pkgs.dockerTools.buildLayeredImage {
name = "blumeops/external-secrets";
contents = [
external-secrets
pkgs.cacert
pkgs.tzdata
];
config = {
Entrypoint = [ "${external-secrets}/bin/external-secrets" ];
Env = [
"SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt"
"TZDIR=${pkgs.tzdata}/share/zoneinfo"
];
User = "65534";
};
}

View file

@ -1 +0,0 @@
Fixed the export-filename step in [[run-1password-backup]]: 1Password's desktop app names the export `1PasswordExport-<account-uuid>-<timestamp>.1pux` automatically rather than letting you save to a fixed name, so the procedure now points the task at that glob instead of pretending the default name is `1Password-export.1pux`.

View file

@ -1 +0,0 @@
Adopt `AGENTS.md` as the canonical agent instruction file, keep `CLAUDE.md` as a compatibility shim, and update docs to reference the neutral file and the correct agent-change-process path.

View file

@ -1 +0,0 @@
Add `docs/explanation/ai-scraper-mitigation.md` — the egress-cost / AI-crawler threat model for the public Fly proxy, the tiered mitigation plan (Tier 1: mirror black-hole, shipped; Tier 2: user-agent denylist + Anubis; Tier 3: Cloudflare, rejected on principle), and the data behind it.

View file

@ -1,5 +0,0 @@
Rebuild and retag alloy v1.16.0 container images from the main-branch SHA
following the squash-merge of #345, per the build-container-image
squash-merge convention. Both images (`registry.ops.eblu.me/blumeops/alloy`)
now reference `9564435` rather than the branch SHA `26a3ab5`, restoring
source traceability after branch cleanup.

View file

@ -1,6 +0,0 @@
Upgrade native macOS Alloy on indri to v1.16.0. Built on gilbert with Go
1.26.2 + CGO (required for the macOS native DNS resolver, which Tailscale
MagicDNS depends on), scp'd to `~/.local/bin/alloy` on indri, codesigned,
and the LaunchAgent reloaded. Completes the v1.16.0 fleet upgrade started
in #345 — all four Alloy services (alloy-k8s, alloy-ringtail,
alloy-tracing-ringtail, alloy ansible) now run v1.16.0.

View file

@ -1 +0,0 @@
Add resource limits to all ArgoCD pods to prevent unbounded resource consumption during node-wide pressure events.

View file

@ -1 +0,0 @@
CLAUDE.md now imports AGENTS.md via `@AGENTS.md` instead of telling agents to go read it. Claude Code only auto-loads CLAUDE.md, so the prose shim was easy to skip; the import inlines AGENTS.md into the session prompt unconditionally.

View file

@ -1 +0,0 @@
`container-build-and-release` now prints the specific `mise run runner-logs <N>` command after dispatching, polling the Forgejo API to resolve the run number for the commit it just triggered.

View file

@ -0,0 +1 @@
Rebuilt the locally-built external-secrets image from the `main` branch so the deployed tag (`v2.2.0-0e70a1b`) traces to a `main` commit rather than the now-merged feature branch, giving a stable provenance reference.

View file

@ -0,0 +1 @@
Rebuilt the external-secrets images off `main` and repointed both clusters to the stable main-sha tags (`v2.2.0-13895bb` arm64 / `v2.2.0-13895bb-nix` amd64), so the deployed images on indri and ringtail trace to the same `main` commit rather than earlier feature-branch builds.

View file

@ -1 +0,0 @@
Fixed forge.eblu.me static assets (CSS, JS, images, fonts) not loading — the proxy's static asset cache block was missing the `Host` header, so Caddy couldn't route the requests.

View file

@ -1 +0,0 @@
Switch the Fly proxy deploy strategy from `bluegreen` to `immediate` in `fly/fly.toml`. With a single proxy machine, bluegreen offers little benefit — the green machine routinely failed to reach "started" inside Fly's default 5-minute deploy timeout (the cold-start sequence of `tailscaled``tailscale up` → wait-for-MagicDNS → nginx startup eats most of the budget), and the failed deploys would roll back. `immediate` replaces the machine in place with a brief downtime (~510s) but actually completes.

View file

@ -1 +0,0 @@
Black-hole the `/mirrors/*` repositories at the Fly proxy edge (`return 403``forge.ops.eblu.me`). A surprise $29.60 Fly bill traced to ~1.24 TB/30d of egress on `forge.eblu.me`, 99.95% of all proxy egress — of which ~71% was AI scrapers (Meta `meta-externalagent`, OpenAI `GPTBot`, Amazonbot) crawling the near-infinite git-history URL space of the public mirror repos and timing out Forgejo in the process. Mirrors exist for supply-chain control and are consumed over the tailnet, so their public web UI had no legitimate audience. `robots.txt` already disallowed `/mirrors/`, but the offending agents ignore it. Tier-2 mitigations (user-agent denylist, Anubis proof-of-work gateway) are documented in `docs/explanation/ai-scraper-mitigation.md`.

View file

@ -1 +0,0 @@
Add local nix container build for `frigate-notify` (`containers/frigate-notify/default.nix`) so the Frigate→ntfy bridge is rebuilt on ringtail from the forge mirror instead of pulled from `ghcr.io/0x2142/frigate-notify`.

View file

@ -1 +0,0 @@
Switched Grafana's deployment strategy from `RollingUpdate` to `Recreate`. With an RWO PVC holding the SQLite database and Bleve search index, `RollingUpdate` reliably crashloops the new pod on the index lock until rollout timeout. `Recreate` terminates the old pod first so the new one acquires the lock cleanly.

View file

@ -0,0 +1 @@
Bumped the indri heph hub to v1.2.1, which adds the hub `GET /config` endpoint and ships the heph-pwa **Login with Authentik** flow (Authorization Code + PKCE). Pairs with the Authentik `heph` provider redirect URIs registered earlier.

View file

@ -1,5 +0,0 @@
Fixed homepage container EACCES on cold start: the nix-built image now chowns
`/app/config` to uid 1000 at build time via `fakeRootCommands`, matching the
behavior of the old Dockerfile. Without this, homepage couldn't seed missing
skeleton configs (proxmox.yaml etc.) or create `/app/config/logs`, crashing on
its first uncached request. Caught during the ringtail cutover.

View file

@ -1,5 +0,0 @@
Remove the duplicate Homepage tiles for Mealie, Paperless, Immich, and
TeslaMate. Homepage runs on ringtail and autodiscovers ringtail Ingresses via
`gethomepage.dev/*` annotations; once these services migrated to ringtail they
were discovered automatically, making their leftover static `services.yaml`
entries (needed only while they lived on minikube) redundant.

View file

@ -1 +0,0 @@
Moved the Immich blackbox health probe from indri's alloy to ringtail's alloy. After the immich migration to ringtail, the probe still targeted `immich-server.immich.svc.cluster.local` on indri's cluster where the service no longer exists, causing a persistent `ServiceProbeFailure` alert.

View file

@ -1 +0,0 @@
Fix manage-forgejo-mirrors verify step — sync button is on the repo settings page ("Synchronize now"), not the main repo page.

View file

@ -1 +0,0 @@
Pin the Quartz docs build to v4.5.2. The Dagger `build_docs` pipeline cloned Quartz from the default branch unpinned; Quartz v5.0.0 restructured its config layout (`.quartz/plugins`, `../quartz` imports) and broke the docs build against our existing `quartz.config.ts`/`quartz.layout.ts`.

View file

@ -1 +0,0 @@
Rebuild Prowler container against main HEAD (v5.23.0-495e45d) after merging the IaC mutelist Dockerfile changes.

View file

@ -1 +0,0 @@
Removed the now-unused `containers/devpi/` Dagger build artifact. Devpi runs natively on indri via uv venv; the container image is no longer referenced anywhere. Doc examples in `docs/reference/tools/dagger.md` updated to use `miniflux` as the example container name.

View file

@ -1 +0,0 @@
Retired the `blumeops-tasks` mise task (Todoist API) in favor of `heph list --project Blumeops --json` from the self-hosted [hephaestus](https://github.com/eblume/hephaestus) system. Updated docs to point task discovery and rotation reminders at heph, and noted that the `~/code/personal/zk` zettelkasten is migrating into heph docs.

View file

@ -1 +0,0 @@
Reviewed [[1password]] reference card: added the `blumeops` vs `Personal` vault split, noted that `onepassword-connect` runs on both indri and ringtail (not just one cluster), and pulled the `op read` vs `op item get --fields` guidance up from agent memory into the card.

View file

@ -1 +0,0 @@
`review-compliance-reports` now also fetches and summarizes the weekly Prowler container-image and IaC scans (previously only the K8s CIS in-cluster scan was processed). For each scan it shows status counts, severity breakdown, week-over-week delta, and — for the high-volume image/IaC scans — top-N tables grouped by check ID and resource instead of per-finding listings.

View file

@ -1 +0,0 @@
Refresh the contributing tutorial: add `last-reviewed`, include the `.ai.md` changelog fragment type, and clarify that `prek` is pinned via `mise`.

View file

@ -1 +0,0 @@
Reviewed `index.md`; added ringtail to the infrastructure overview and stamped `last-reviewed`.

View file

@ -1 +0,0 @@
Review and refresh the Navidrome reference card: add `last-reviewed`, correct the scanner env var name, document the current image/version, and record routing and runtime details from the manifests.

View file

@ -1 +0,0 @@
Review and refresh the Ollama reference card: add `last-reviewed`, bump the documented image tag to 0.20.4, and add the two `qwen3.5` models now declared in `models.txt`.

View file

@ -1 +0,0 @@
Switch the ringtail provisioning playbook's blumeops clone URL from `forge.eblu.me` (public, via Fly proxy) to `forge.ops.eblu.me` (tailnet, direct via Caddy on indri). Ringtail is always on the tailnet, so the WAN round-trip is pure overhead — it also made `provision-ringtail` brittle whenever the Fly proxy was slow or down.

View file

@ -1 +0,0 @@
Cap systemd-coredump on ringtail (ProcessSizeMax/ExternalSizeMax 1G, MaxUse 2G) so multi-GB Wine/Proton game crash dumps no longer thrash the disk and lock up the desktop.

View file

@ -1,4 +0,0 @@
Update the ringtail NixOS flake lockfile (`nixos/ringtail/flake.lock`): bump
`nixpkgs` (b77b3de → 25f5383) and `disko` (5ba0c95 → 115e521) to latest.
`nixpkgs-services` was intentionally left pinned (skipped by the
`flake-update` pipeline). Routine recurring maintenance per [[manage-lockfile]].

View file

@ -1,4 +0,0 @@
Add GE-Proton (`pkgs.proton-ge-bin`) to `programs.steam.extraCompatPackages`
on ringtail. Subnautica 2 hangs at Mercuna plugin init under Proton
Experimental + DXVK D3D12; GE-Proton is available as a Steam per-game
compatibility option to work around it.

View file

@ -1,6 +0,0 @@
Add `sn2-prelaunch` Steam launch wrapper on ringtail that removes
Subnautica 2's stale `Saved/running.dat` and `Saved/beforelobby.dat`
lockfiles before each launch. SN2 pops up an invisible (0×0-sized)
Error dialog when it detects an unclean exit, blocking GameThread
forever; this is observable only as a black screen with a spinning
loader. Use via Steam launch option: `sn2-prelaunch %command%`.

View file

@ -1,3 +0,0 @@
Fixed sway keybindings on ringtail — the home-manager `keybindings` block was replacing the module's defaults entirely, leaving only explicit overrides (no workspace switching, focus, move, splits, resize mode, etc). Switched to `lib.mkOptionDefault` with `lib.mkForce` on the conflicting custom binds (`Mod+Return`, `Mod+d`, `Mod+space`, `Mod+l`) so defaults merge back in. Also added `Mod+F1` to show a filterable fuzzel list of current keybindings.
Fixed fuzzel config errors on launch — `border-radius` and `border-width` were under `[main]`, but fuzzel expects them as `radius`/`width` under a `[border]` section.

View file

@ -1 +0,0 @@
Disabled adaptive sync (VRR) on ringtail's DP-1 output. The OMEN 27i IPS panel pumps brightness when its refresh rate swings into the low VRR range during low-framerate content (e.g. game cutscenes), producing a flicker that worsened over a session until a reboot. Pinning the panel to a fixed 165Hz eliminates it.

View file

@ -1 +0,0 @@
rotate-fly-deploy-token: combine mint+store into one command with both fish and bash forms; document the `op item edit` "Password item requires ps value" validator gotcha and the placeholder-password workaround.

View file

@ -1 +0,0 @@
runner-logs now authenticates with Forgejo API token and auto-detects the repo from git remote. Job logs are fetched via SSH to indri (reading Forgejo's on-disk zstd log files) instead of the web endpoint, which doesn't support token auth for private repos.

View file

@ -1 +0,0 @@
`mise run runner-logs <run> -j <n>` now reports a clear error when the log file doesn't exist on indri (e.g. a runner crash that left `action_task.log_in_storage = 0`). Previously it printed only the header and exited 0, because `zstdcat` exits 0 with a "can't stat … -- ignored" stderr message and ssh+fish on indri swallows the remote exit code.

View file

@ -1 +0,0 @@
Deploy shower v1.1.1 to ringtail (kustomize newTag bump).

View file

@ -1 +0,0 @@
Pin shower v1.1.1 FOD outputHash (probed locally on ringtail).

View file

@ -1 +0,0 @@
Bump shower container to v1.1.1 (probe FOD hash).

View file

@ -1 +0,0 @@
Deployed shower v1.1.3 to ringtail (image built and pushed from ringtail; runner bypassed due to indri overload).

View file

@ -1 +0,0 @@
Bumped shower app to v1.1.3 (wheel/sdist + FOD hashes probed on ringtail).

View file

@ -1,5 +0,0 @@
Rebuild shower from the post-merge commit on main so the container's
SHA tag points at a commit that will still exist after the 30-day
branch-cleanup window. Functionally identical to the branch-tag image
already deployed, just preserves source traceability per
[[build-container-image#Squash-merge and container tags]].

View file

@ -1,6 +0,0 @@
Rebuild shower v1.1.0 container from main HEAD (`3c7967e`) and bump the
kustomization tag to `v1.1.0-3c7967e-nix`. The PR was squash-merged, so
the branch commit `444ff91` baked into the prior tag isn't reachable
from main's history. The new tag points at a commit that exists on
main; image content is byte-identical because the FOD output is content
addressed and the inputs didn't change.

View file

@ -1 +0,0 @@
Rebuild shower v1.1.2 from main HEAD (a33fa47) and retag — PR #358 was squash-merged so the branch SHA baked into the prior image tag isn't reachable from main. FOD is content-addressed, so image bytes are identical; only provenance changes.

View file

@ -1 +0,0 @@
Update `tailscale-operator-ringtail` ProxyClass to reference the `0108b68` main-SHA build of the tailscale container. Routine post-merge cleanup so the deployed image traces to a commit that survives PR branch cleanup.

View file

@ -0,0 +1 @@
Fixed the `tailscale-operator` and `tailscale-operator-ringtail` ArgoCD apps showing `Unknown` sync status. Their shared base kustomization fetched the upstream operator manifest from the public `forge.eblu.me/mirrors/...`, which the AI-scraper mitigation now black-holes (403). Pointed the remote resource at the tailnet host `forge.ops.eblu.me` instead, which the in-cluster repo-server can reach.

View file

@ -1 +0,0 @@
Reviewed transmission card: corrected storage layout (`/config/` is emptyDir, watch dir disabled) and noted the Prometheus exporter sidecar.

View file

@ -1 +0,0 @@
Rebuild unpoller container from squashed main commit so the image SHA tag matches a commit in main's history (was tagged with the pre-squash branch SHA).

View file

@ -1 +0,0 @@
Bump paperless and immich kustomizations to the main-SHA-built valkey tag (`v8.1.6-r0-fabca04`). Routine post-merge follow-up to keep production manifests pointing at images built from a commit on main.

View file

@ -1 +0,0 @@
Rebuild valkey container from squashed main commit (both arm64 dagger and amd64 nix variants), and update paperless + immich-ringtail kustomizations to the main-SHA tags `v8.1.7-ecded30` and `v8.1.7-ecded30-nix`.

View file

@ -1,8 +0,0 @@
Fix three follow-ups from the wave-1 decommission: grant the local
break-glass `admin` account ArgoCD admin rights (`g, admin, role:admin`
previously only the Authentik `admins` group had access, so admin was
locked out whenever its token expired), and repoint the alloy blackbox
probe for teslamate from the deleted minikube service to
`https://tesla.ops.eblu.me/` (through Caddy over Tailscale). The orphaned
paperless/teslamate roles + ExternalSecrets left on the minikube
blumeops-pg are also cleaned up.

View file

@ -1 +0,0 @@
Fixed the `op item edit` invocation in the [[zot]] API-key rotation procedure: the previous `pbpaste | op item edit ... "field[password]=-"` stdin syntax is rejected by op 2.34 as "invalid JSON" (recent op versions treat piped input as a full JSON template, not a single field value). Procedure now reads the clipboard into a local fish variable and passes it as an inline assignment.

View file

@ -1 +0,0 @@
Upgraded zot on indri from v2.1.15 to v2.1.16 (security fixes: TLS verification on metrics client, CORS Allow-Credentials suppression on wildcard origins, manifest/API-key body size limits).

View file

@ -1,5 +0,0 @@
Upgrade Grafana Alloy v1.14.0 → v1.16.0 across all four service deployments
(alloy-k8s, alloy-ringtail, alloy-tracing-ringtail on k8s; alloy native on
indri). Pulls in stable database observability (v1.15) and the OTel Collector
v0.147.0 bump. Container build also migrated from Dockerfile to native Dagger
`container.py` per the build-container-image migration playbook.

View file

@ -1,8 +0,0 @@
Wire the ringtail `blumeops-pg` cluster (which holds the wave-1-migrated
paperless + teslamate databases) into backups and Grafana. Adds a Tailscale
LoadBalancer Service (`blumeops-pg-ringtail.tail8d86e.ts.net`) and a Caddy L4
route (`pg.ops.eblu.me:5434`), then repoints borgmatic's `teslamate` +
`paperless` postgres dumps and the `mealie` SQLite dump at ringtail, and the
Grafana TeslaMate datasource at the ringtail DB. Closes the backup gap that
opened at cutover (the migrated live data was still being backed up from the
now-frozen minikube copies) and unblocks the wave-1 decommission.

View file

@ -1 +0,0 @@
Removed the dead minikube manifests, container builds, and tooling shims left behind after the cv + docs migration to indri-native (#342). Deletes `argocd/{apps,manifests}/{cv,docs}/`, `containers/{cv,quartz}/`, and the `quartz``docs` mapping in `mise-tasks/container-version-check`. Bumps `docs.current-version` to `v1.16.0` (the blumeops release tag) now that the legacy nginx-base version pin is gone.

View file

@ -1 +0,0 @@
Upgraded Dagger from v0.20.1 to v0.20.6 (engine, CLI pin, and SDK regen) and migrated `runner-job-image` from a Debian-based Dockerfile to a native Dagger `container.py` on Alpine 3.23, reusing the shared `alpine_runtime` helper.

View file

@ -1,8 +0,0 @@
Decommission the wave-1 services on minikube-indri now that paperless,
teslamate, and mealie run on ringtail with their data backed up. Removes the
minikube `paperless`/`teslamate`/`mealie` manifest dirs + ArgoCD app
definitions (pruning the parked Deployments, Services, and the redundant
minikube mealie/paperless PVCs), and drops the `paperless`/`teslamate` roles
from the minikube `blumeops-pg` cluster. The `paperless` and `teslamate`
databases are dropped from indri's blumeops-pg as the finalization step.
miniflux + authentik remain on the minikube cluster (later waves).

View file

@ -1 +0,0 @@
Reviewed `replicating-blumeops` tutorial: fixed "BluemeOps" typos (also in `contributing.md`) and added `last-reviewed` frontmatter.

View file

@ -0,0 +1 @@
Completed the external-secrets localization for the ringtail (amd64) cluster. The indri Dagger build (`container.py`) only produces an arm64 image; added `containers/external-secrets/default.nix` to build the amd64 variant on ringtail's nix-container-builder, and gave `external-secrets-ringtail` a thin kustomize overlay that reuses the shared manifest and points at the `-nix` image. Both clusters now run the locally-built external-secrets binary on their native architecture.

View file

@ -1,14 +0,0 @@
Fix nightly borgmatic backups failing for 2 days. The shower SQLite
dump hook referenced `kubectl --context=k3s-ringtail`, but indri's
kubeconfig deliberately doesn't carry the ringtail credentials. The
`before_backup` hook's failure aborted the entire run, taking out
*both* the local sifaka repo and the BorgBase offsite. Replaced
the inline-shell dump with a `~/bin/borgmatic-k8s-sqlite-dump`
helper deployed by the ansible role. Each dump entry now declares a
`target` of either `local:<context>` (mealie — kubectl uses indri's
kubeconfig) or `ssh:<user@host>` (shower — ssh into ringtail and
run `k3s kubectl` there, no indri-side kubeconfig needed; k3s.yaml
on ringtail is mode 644 so no sudo required). Bytes stream back via
`kubectl exec ... -- cat` rather than `kubectl cp`, since `kubectl
cp` requires `tar` inside the pod and nix-built images like shower
don't bundle it.

View file

@ -1 +0,0 @@
Upgraded the k8s Forgejo runner to the v12.8 line, switched it from first-boot registration to declarative `server.connections` credentials from 1Password, and consolidated the supporting runner how-to documentation.

View file

@ -0,0 +1 @@
Added the [[hephaestus]] (`heph`) sync hub to indri as a self-updating LaunchAgent managed by Ansible (`ansible/roles/heph`, tag `heph`). The hub runs `hephd --mode server` behind `heph.ops.eblu.me` (Caddy TLS), with self-update on a 10-minute interval and the heph-pwa mobile shell served from `--web-root`. Access is gated by a new Authentik device-code (RFC 8628) OIDC application. Indri is now the canonical hub; other devices (e.g. gilbert) attach as offline-capable spokes. The hub's store was seeded from gilbert via the data-safe Path A bring-up (copy store, reset `meta.origin`).

View file

@ -0,0 +1 @@
Granted the `offline_access` scope on the Authentik `heph` OAuth2 provider so hephaestus spokes receive a durable 30-day refresh token. Previously the refresh token was session-bound, so spoke sync would silently fail with a `400 Bad Request` on the `refresh_token` grant once the Authentik session lapsed.

View file

@ -0,0 +1 @@
Registered the heph-pwa redirect URIs (`https://heph.ops.eblu.me/`, plus `http://localhost:8787/` for dev) on the Authentik `heph` OAuth2 provider, enabling the PWA's new Authorization Code + PKCE "Login with Authentik" flow (and the token-endpoint CORS it needs). Pairs with hephaestus PR #9.

View file

@ -1,8 +0,0 @@
Migrated homepage dashboard from minikube (indri/arm64) to k3s (ringtail/amd64).
The container is now built via nix (`containers/homepage/default.nix`), adapted
from nixpkgs `homepage-dashboard` with the upstream Next.js cache patches and
wrapped with `dockerTools.buildLayeredImage`. Autodiscovery shifts: services on
minikube (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus,
Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries
in `services.yaml`; ringtail services (Authentik, Frigate/NVR, Ntfy, Ollama)
auto-populate via Ingress annotations.

View file

@ -0,0 +1 @@
Localized the external-secrets controller image. It now builds from the forge mirror via a native Dagger `container.py` (single `all_providers` static Go binary, faithful to upstream's `make build`) and is served from `registry.ops.eblu.me/blumeops/external-secrets` instead of `ghcr.io`, bringing another platform component under local supply-chain control.

View file

@ -1 +0,0 @@
Migrated CV (`cv.eblu.me`) and Docs (`docs.eblu.me`) from minikube Deployments to indri-native ansible roles. Caddy now serves the extracted release tarballs directly via a new `kind: static` service-block in the Caddy template — no daemon, no container — replacing the prior nginx-in-a-pod layer. Removes a network hop on every request and shrinks minikube's footprint. See [[cv-on-indri]] and [[docs-on-indri]]. Part of the broader minikube wind-down.

View file

@ -1 +0,0 @@
Migrated devpi (PyPI mirror at `pypi.ops.eblu.me`) from a minikube StatefulSet to a launchd-managed service on indri. devpi-server now runs in a uv-managed venv with pinned `devpi-server` and `devpi-web` versions, listens on `127.0.0.1:3141`, and is fronted by Caddy. The minikube StatefulSet was crash-looping under memory pressure (and breaking the Python toolchain everywhere); the new layout removes a layer of dependency on cluster health for critical-path tooling. See [[devpi-on-indri]].

View file

@ -1,13 +0,0 @@
Move the entire Immich stack — server, machine-learning, valkey,
and the PostgreSQL+VectorChord cluster — off `minikube-indri` and
onto `k3s-ringtail`. Postgres data migrated zero-loss via CNPG
`pg_basebackup` (replica catch-up then promote); row counts on
`asset`, `user`, `album`, `smart_search`, `activity`, `asset_face`
verified equal between source and replica before cutover. The ML
pod now uses ringtail's RTX 4080 via the nvidia-device-plugin
(time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy
routing at `photos.ops.eblu.me` is unchanged (still
`photos.tail8d86e.ts.net`, the device just lives on ringtail now).
Borgmatic backups continue against the same `immich-pg` tailnet
hostname. First concrete chain in the broader indri-k8s
decommission effort.

View file

@ -1,13 +0,0 @@
Move paperless, teslamate, and mealie off `minikube-indri` onto
`k3s-ringtail`, shedding ~1.1 GiB of resident load from the
OOM-thrashing 8 GiB minikube node (the kernel OOM killer had been
killing `kube-apiserver`/`dockerd`/argocd, flapping every
minikube-hosted service at once). paperless + teslamate databases
move into a fresh CNPG `blumeops-pg` cluster on ringtail via a cold
`pg_dump`/`pg_restore` from the quiesced source — row counts verified
equal before any routing flip; source DBs dropped only after the
ringtail side serves traffic. mealie's SQLite PVC is copied as-is.
paperless media stays on sifaka NFS. Downtime-tolerant cold cutover
(no streaming replication); rollback is repoint-and-scale-up with the
source untouched. Second chain in the indri-k8s decommission after
[[migrate-immich-to-ringtail]].

View file

@ -1 +0,0 @@
Add local nix container build for `tailscale` (`containers/tailscale/default.nix`) so ringtail's tailscale-operator ProxyClass proxy pods pull from the forge mirror instead of `docker.io/tailscale/tailscale`. Pinned at v1.94.2 to match `service-versions.yaml`. Indri's tailscale-operator continues to use upstream during the k8s-to-ringtail migration.

View file

@ -1 +0,0 @@
Address the 6 critical Prowler IaC findings against `argocd/manifests/`. Prowler's IaC provider hardcodes `self._mutelist = None` and delegates filtering to Trivy, but doesn't plumb `--ignorefile` through — so the documented "use Trivy filtering" path is actually broken. Added a shim around `trivy` in the Prowler image that injects `--ignorefile $TRIVY_IGNOREFILE` for `trivy fs` invocations when the env var points at a real file. The IaC cronjob now mounts `mutelist/trivyignore.yaml` (Trivy's per-path schema) and sets the env var, muting the `external-secrets` and `kube-state-metrics` Secret-access findings (KSV-0041, KSV-0114). Separately, `grafana-clusterrole` is tightened to remove `secrets` access entirely: the dashboard sidecar already only consumes ConfigMap-labeled dashboards, so its `RESOURCE` env var is now `configmap` instead of `both`.

View file

@ -1 +0,0 @@
Reviewed [[indri]] reference card: added `devpi`, `cv`, and `docs` to the native-services list; widened the k8s note to reflect the growing set of apps now on ringtail and the planned indri-minikube decommission; added CPU/RAM specs.

View file

@ -1,4 +0,0 @@
Recurring maintenance batch:
- Ringtail flake inputs refreshed (`disko`, `home-manager`, `nixpkgs`).
- Tooling deps bumped: prek hooks (trufflehog v3.95.3, kingfisher v1.101.0, ruff v0.15.14, `ansible-core` 2.21.0); fly proxy base images (nginx 1.30.1-alpine, alloy v1.16.1); `typer==0.26.2` in mise tasks.

View file

@ -1 +0,0 @@
Updated `nixos/ringtail/flake.lock` (weekly cadence): `disko`, `home-manager`, and `nixpkgs` inputs refreshed. `nixpkgs-services` skipped per overlay convention.

View file

@ -0,0 +1 @@
Reviewed four never-reviewed reference cards (`cluster`, `ntfy`, `tempo`, `alloy`) and corrected drift: minikube is now Kubernetes v1.35.0; ntfy, tempo, and alloy-k8s images are now locally-built `registry.ops.eblu.me/blumeops/*` nix containers (v2.19.2, v2.10.3, v1.16.0) rather than upstream Docker Hub; the Fly.io alloy binary is v1.16.1; and the ringtail workload list reflects the in-progress minikube→k3s migration.

View file

@ -0,0 +1 @@
Upgraded the nvidia-device-plugin on ringtail from v0.19.0 to v0.19.2 (upstream patch release: CDI/Tegra fixes and dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup).

View file

@ -1 +0,0 @@
Pin ringtail's wired IP to `192.168.1.21` via NixOS scripted networking; NetworkManager no longer manages `enp5s0`. Removes DHCP lease renewal as a failure mode after a silent lease teardown took ringtail offline. Also explicitly enables `net.ipv4.ip_forward` (previously set implicitly by scripted-DHCP) so k3s pod networking and Tailscale routing continue to work with static networking.

View file

@ -1 +0,0 @@
Ripped out the compensating-controls (CC) framework: deleted `compensating-controls.yaml`, the `review-compensating-controls` mise task, and the associated how-to / explanation docs. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files remain in place but no longer carry `CC: <id>` prefixes — each entry just keeps a free-form `Description` of why the finding is muted. The CC review cadence proved to be more overhead than this single-operator homelab needed.

Some files were not shown because too many files have changed in this diff Show more