Compare commits

...
Sign in to create a new pull request.

125 commits

Author SHA1 Message Date
bc34b601be Merge pull request 'heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)' (#371) from heph-offline-access into main 2026-06-06 18:29:47 -07:00
50a36ff93a heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)
The heph CLI requests scope "openid offline_access", but the Authentik
heph OAuth2 provider only mapped openid/email/profile. Without the
offline_access mapping the issued refresh token is bound to the login
session rather than the 30-day refresh-token window; once the session
lapses, hephd's refresh_token grant returns 400 Bad Request and spoke
sync silently degrades (heph sync --status -> auth_failure: true).

Add the built-in offline_access scope mapping to the provider's
property_mappings and document the requirement in the service reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:07:13 -07:00
cf63fcb5b5 C0: track heph in service-versions (self-updating; note drift task)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 08:22:46 -07:00
3abe80523a C0: bump indri heph hub to v1.2.1 (PWA Authentik login + /config)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:40:51 -07:00
6576880b0e heph Authentik: register heph-pwa redirect URIs (PKCE login) (#370)
Adds the heph-pwa redirect URIs to the Authentik `heph` OAuth2 provider so the new browser **Login with Authentik** flow (Authorization Code + PKCE, hephaestus PR #9) can redirect back and exchange the code:

- `https://heph.ops.eblu.me/` (the PWA origin)
- `http://localhost:8787/` (local dev: `hephd --web-root`)

Authentik also keys token-endpoint CORS off these origins, so they're required for the browser token exchange. Additive (the provider was `redirect_uris: []`); harmless until the PWA feature deploys.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #370
2026-06-05 07:30:31 -07:00
a2f1e06224 Add hephaestus sync hub to indri (launchagent, PWA, device-code OIDC) (#369)
Makes indri the canonical **heph** hub for the hub-and-spoke task/context system, deployed as a self-updating LaunchAgent managed by Ansible. Other devices (gilbert) attach as offline-capable spokes.

## What's here
- **`ansible/roles/heph`** (tag `heph`) — bootstrap `cargo install hephd` (only if absent; `--self-update` keeps it current after), version-pinned `heph-pwa` checkout served via `--web-root`, launchagent `mcquack.eblume.heph`:
  ```
  hephd --mode server --http-addr 0.0.0.0:8787 --db … --web-root …
        --oidc-issuer …/o/heph/ --oidc-audience heph
        --self-update --self-update-interval-secs 600
  ```
  `~/.cargo/bin` is on the agent `PATH` so self-update's `cargo install` works.
- **Caddy** — `heph.ops.eblu.me → localhost:8787` (TLS for the PWA secure context).
- **Authentik** — new `heph` **public device-code** OIDC app + `default-device-code-flow` bound to the default brand's `flow_device_code` (verified live: brand `authentik-default`, field currently unset → additive).
- **Docs** — `services/hephaestus.md` (Path-A seeding runbook + spoke caveat), `indri.md`, changelog fragment.

## Three features requested
- **Autoupdate** — 10-min interval (`--self-update-interval-secs 600`).
- **PWA** — `--web-root` (confirmed shipped in v1.2.0).
- **Spoke** — gilbert reconfig documented (post-merge step).

## Deploy plan (not done yet — awaiting review)
1. Seed from gilbert (Path A): `heph daemon stop` → copy `heph.db` → `DELETE FROM meta WHERE key='origin'`.
2. Sync Authentik `apps`/blueprint; verify blueprint status via API (not just logs).
3. `provision-indri --tags heph,caddy` from this branch.
4. Point gilbert at the hub + `heph auth login`.

## Known follow-ups (heph-side, tracked in the Hephaestus project)
- `heph daemon` can't bake hub/spoke config or pass `--self-update-interval-secs` → worked around by the ansible plist.
- Path-A seeding lacks a clean `hephd --owner-id`/seed command → manual `meta.origin` reset for now.
- Self-update moves hephd ahead of the ansible-pinned PWA shell over time (drift; tolerated by the SW cache, revisit on next release).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #369
2026-06-05 06:46:58 -07:00
f6c926f1f5 C0: rebuild external-secrets off main, repoint both clusters to stable tags
indri -> v2.2.0-13895bb (arm64), ringtail -> v2.2.0-13895bb-nix (amd64).
Both deployed images now trace to main commit 13895bb instead of earlier
branch builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:19:20 -07:00
13895bb04a Localize external-secrets on ringtail (amd64 nix build) (#368)
Follow-up to #367. That PR localized external-secrets but the Dagger build (on indri's Apple Silicon runner) only produces an **arm64** image — and external-secrets also runs on **ringtail (amd64)** via the same shared manifest. This completes the localization so both clusters run the local binary on their native arch.

## Approach (matches the kube-state-metrics dual-build pattern)
- **`containers/external-secrets/default.nix`** (new) — builds the **amd64** image on ringtail's nix-container-builder. `buildGoModule` with Go 1.26 (v2.2.0 requires ≥1.26.1; nixpkgs default is 1.25.x) and `-tags all_providers`, faithful to upstream. Same v2.2.0 source from the forge mirror.
- **`argocd/manifests/external-secrets-ringtail/`** (new) — thin kustomize overlay that reuses the shared indri manifest as a base and overrides **only** the image to the `-nix` (amd64) tag. No manifest duplication.
- **`argocd/apps/external-secrets-ringtail.yaml`** — repointed at the new overlay.

Result: indri → `v2.2.0-…` (arm64, Dagger), ringtail → `v2.2.0-…-nix` (amd64, nix).

## Build
Run #581 built both arches at the branch commit. Verified the nix image is `linux/amd64`, entrypoint = the binary, user 65534.

## Deployed from branch & verified on ringtail (k3s, amd64)
- All 3 pods rolled to the nix amd64 image, `1/1 Running` (no exec-format error → arch correct)
- Controller logs clean
- **Live secret fetch proven:** force-synced `homepage/homepage-grafana` → `refreshTime` advanced, `Ready=True`
- **All 20** ringtail ExternalSecrets remain `SecretSynced=True`

## Post-merge
The `external-secrets-ringtail` app is temporarily pointed at this branch + overlay path (apps app left on `main`, manual-sync, untouched). After merge:
```
argocd app sync apps                       # picks up the new Application path on main
argocd app set external-secrets-ringtail --revision main && argocd app sync external-secrets-ringtail
```
I'll also rebuild off `main` so both clusters land on stable main-sha tags (as done for indri in #367).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #368
2026-06-04 15:37:42 -07:00
30c82079b9 C0: rebuild external-secrets image off main (v2.2.0-0e70a1b)
Repoint to the main-branch-built image so the deployed tag traces to a main
commit rather than the merged feature branch. Same v2.2.0 source, stable
provenance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:59:17 -07:00
0e70a1b524 Localize external-secrets container (native container.py build) (#367)
Knocks out the weekly "pick one non-local container and make it local" task by moving **external-secrets** off `ghcr.io` onto a locally-built image, under our own supply-chain control. Doubles as its overdue service review.

## What changed
- **`containers/external-secrets/container.py`** (new) — native Dagger build (the Dockerfile→container.py migration pattern). Clones the forge mirror at `v2.2.0` and builds the single `all_providers` static Go binary, faithful to upstream's `make build` (CGO off, no version ldflags upstream). ENTRYPOINT is `/bin/external-secrets` so the controller/webhook/cert-controller Deployments select their role via `args:` exactly as before.
- **`argocd/manifests/external-secrets/kustomization.yaml`** — image swapped to `registry.ops.eblu.me/blumeops/external-secrets:v2.2.0-2985007`. **Like-for-like (v2.2.0)**, not an upgrade.
- **`service-versions.yaml`** — marked reviewed (2026-06-04), noted the local build.

## Build
Built on the indri forge runner (run #579, ~4 min) → pushed to Zot. Image config verified: `Entrypoint=/bin/external-secrets`, `User=65534`, version label `v2.2.0`.

## Deployed from branch & verified
- All 3 pods (controller / webhook / cert-controller) rolled to the local image, `1/1 Running`
- Controller + webhook logs clean (no errors; webhook serving TLS)
- **End-to-end secret fetch proven:** force-synced `monitoring/grafana-admin` → `refreshTime` advanced to now, `Ready=True`
- All 10 ExternalSecrets cluster-wide remain `SecretSynced=True` — no collateral damage
- App `Healthy`

## Post-merge
`external-secrets` currently points at this branch (so `apps` reads OutOfSync — expected). After merge:
```
argocd app set external-secrets --revision main && argocd app sync external-secrets
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #367
2026-06-04 14:55:55 -07:00
bb55fa9566 Recurring review sweep: 4 doc cards + nvidia-device-plugin v0.19.2 (#366)
Knocks out the two daily recurring review tasks (doc review + service review) in one PR.

## Doc review (4 never-reviewed reference cards, `last-reviewed: 2026-06-04`)
- **cluster.md** — Kubernetes version v1.34.0 → **v1.35.0**; refreshed the stale ringtail workload list and noted the in-progress minikube→k3s migration (points to `[[ringtail]]` as the canonical list).
- **ntfy.md / tempo.md / alloy.md** — corrected image references: these are now **locally-built `registry.ops.eblu.me/blumeops/*` nix containers** (ntfy v2.19.2, tempo v2.10.3, alloy-k8s v1.16.0), not upstream Docker Hub. Fly.io alloy binary bumped to v1.16.1.

## Service review
- **nvidia-device-plugin** (ringtail GPU): v0.19.0 → **v0.19.2**. Upstream patch releases — CDI/Tegra fixes + dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup (the service-account change in the notes is helm-only).

## Not in this PR (need container rebuilds, deferred)
The other stale services are locally-built nix images, so upgrading them is a forge-runner rebuild rather than a clean tag bump — left untouched (not date-bumped, so they resurface): **prometheus** (v3.10.0→v3.12.0), **loki** (3.6.7→3.7.2), **kube-state-metrics**, **homepage**. Happy to do these as a follow-up rebuild PR.

## Deploy / verify
Not yet deployed — `nvidia-device-plugin` still points at `main`. After review:
```
argocd app set nvidia-device-plugin --revision reviews-jun4 && argocd app sync nvidia-device-plugin
# after merge:
argocd app set nvidia-device-plugin --revision main && argocd app sync nvidia-device-plugin
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #366
2026-06-04 13:37:02 -07:00
02ea1cc72a C0: point tailscale-operator base mirror fetch at tailnet forge
The public forge.eblu.me now black-holes /mirrors/ at the Fly edge
(AI-scraper mitigation), so the in-cluster ArgoCD repo-server got a 403
fetching the upstream operator manifest — leaving tailscale-operator and
tailscale-operator-ringtail in Unknown sync. Use forge.ops.eblu.me.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 12:40:21 -07:00
Forgejo Actions
8f72f04d5c Update docs release to v1.17.0
- Built changelog from towncrier fragments

[skip ci]
2026-06-03 21:52:22 -07:00
29e0f012cd C0: pin Quartz docs build to v4.5.2 (v5.0.0 broke build)
The Dagger build_docs pipeline cloned Quartz from the default branch
unpinned. Quartz v5.0.0 restructured its config layout (.quartz/plugins,
../quartz imports), breaking the docs build against our existing
quartz.config.ts / quartz.layout.ts. Pin the clone to the last v4
release (v4.5.2) to restore known-good behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 21:39:41 -07:00
2148714584 C0: retire Todoist blumeops-tasks; point task discovery at heph
Replace the Todoist-backed blumeops-tasks mise task with
`heph list --project Blumeops --json` (hephaestus, now at v1 prototype
on gilbert). Update task-discovery, rotation-reminder, and zk
references across docs; note the zk zettelkasten is migrating into
heph docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 21:32:10 -07:00
308c8e3dad C0: drop duplicate Homepage static entries for ringtail-migrated services
Mealie, Paperless, Immich, TeslaMate are now autodiscovered from their ringtail
Ingress gethomepage.dev annotations; the static services.yaml entries (from when
they were on minikube, which homepage-on-ringtail can't autodiscover) were
duplicating them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 15:31:59 -07:00
eaa899cfc6 C0: wave-1 decommission follow-ups (argocd admin RBAC, teslamate probe)
- argocd: grant local break-glass admin the admin role (g, admin, role:admin);
  previously only the Authentik admins group had access, locking out admin
  once its token expired (policy.default is unset).
- alloy-k8s: repoint the teslamate blackbox probe from the deleted minikube
  service to https://tesla.ops.eblu.me/ (Caddy over Tailscale), like immich.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:02:05 -07:00
46f0002178 Decommission wave-1 minikube services (paperless, teslamate, mealie) (#365)
Final step of the wave-1 indri-k8s migration. paperless, teslamate, mealie run on ringtail with data migrated, verified, and backed up (local + BorgBase offsite via PR #364).

- Remove minikube paperless/teslamate/mealie manifest dirs + ArgoCD app defs (prunes the parked Deployments/Services + redundant minikube mealie/paperless PVCs)
- Drop paperless/teslamate roles + ExternalSecrets from the minikube blumeops-pg cluster
- miniflux + authentik stay on minikube (later waves)

Finalization after merge: sync apps + databases to prune, then DROP DATABASE paperless/teslamate on indri's blumeops-pg (fresh safety dump taken first).

Reviewed-on: #365
2026-06-03 12:36:06 -07:00
44798a6429 C0: mealie-ringtail image rebuilt from main (e0057b4-nix)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 12:26:55 -07:00
e0057b46e4 Wire ringtail blumeops-pg into backups + Grafana (#364)
Prereq for the wave-1 decommission. The cutover moved paperless+teslamate (postgres) and mealie (SQLite) to ringtail, but borgmatic and the Grafana TeslaMate datasource still pointed at the minikube copies — the migrated live data was unbacked since cutover, and dropping the minikube DBs would break the TeslaMate dashboards.

- Tailscale Service `blumeops-pg-ringtail` + Caddy L4 route `pg.ops.eblu.me:5434`
- borgmatic: teslamate + paperless postgres → :5434; mealie SQLite → ssh:eblume@ringtail
- Grafana TeslaMate datasource → pg.ops.eblu.me:5434

Deploy: sync databases-ringtail (tailscale svc) + grafana from branch; provision-indri --tags caddy,borgmatic; verify a backup run + dashboards. Unblocks the decommission PR.
Reviewed-on: #364
2026-06-03 12:25:30 -07:00
92b54e7ba9 C0: ringtail wave-1 images rebuilt from main (fcac8e5-nix tags)
Post-merge rebuild of paperless/mealie/teslamate Nix images at the main
merge commit, replacing the feature-branch -nix tags. Image content is
identical; only the commit-sha suffix changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:36:15 -07:00
fcac8e5a72 Wave 1 indri→ringtail migration: paperless, teslamate, mealie (#363)
Migrate paperless, teslamate, and mealie off the OOM-saturated minikube-indri node onto ringtail k3s, shedding ~1.1 GiB of resident load. Second chain in the indri-k8s decommission after immich.

**Containers ported to Nix (default.nix), build-verified on ringtail:**
- paperless → wraps nixpkgs paperless-ngx 2.20.15 (pinned unstable); runs as web/worker/beat/consumer
- mealie → wraps nixpkgs mealie 3.16.0 (forward 4-minor bump, breaking-change reviewed); single gunicorn, SQLite
- teslamate → from-scratch beamPackages mixRelease (not in nixpkgs); erlang_27+elixir_1_18, npm assets, ex_cldr locales pre-fetched

**Data:** cold downtime-tolerant cutover. paperless+teslamate postgres dump/restore from quiesced source into a new ringtail blumeops-pg CNPG cluster; mealie SQLite PVC copied. Source DBs untouched until verified (rollback = repoint).

**Also:** ringtail blumeops-pg cluster + ExternalSecrets scaffold; fixes pre-existing shower version-check drift.

Runbook: docs/how-to/ringtail/migrate-wave1-ringtail.md. Deploy-from-branch + cutover happens before merge; container images rebuilt from main after merge.
Reviewed-on: #363
2026-06-03 10:34:00 -07:00
40bd929820 C0: remove visible GNU Terry Pratchett from naughty.html body
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 37s
GNU lives in the overhead — the X-Clacks-Overhead header — never on the
visible page. Keep the header, drop the footer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 20:55:05 -07:00
a36a18aaa6 C0: black-hole /mirrors/* at Fly edge + name-and-shame scrapers
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 35s
A $29.60 Fly bill traced to ~1.25 TB/30d egress on forge.eblu.me (99.95% of
all proxy egress), ~71% of it AI scrapers (Meta meta-externalagent, OpenAI
GPTBot, Amazonbot, Bytespider) crawling the public mirror repos' infinite
git-history URL space and timing out Forgejo. robots.txt already disallowed
/mirrors/ but those agents ignore it, so enforce at the edge: return 403 (^~
to beat the regex asset locations), served as a roll-of-dishonour page with an
X-Naughty-Scrapers header. Mirrors stay reachable on the tailnet via
forge.ops.eblu.me. Tier 2 (UA denylist + Anubis) and the Cloudflare rejection
are documented in docs/explanation/ai-scraper-mitigation.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 20:52:20 -07:00
e0064de83d C0: update ringtail flake inputs (nixpkgs, disko)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 15:52:09 -07:00
f588638331 C0: rebuild valkey from squashed main commit
Image tags from PR #362 (v8.1.7-02859c5{,-nix}) referenced a branch
SHA that no longer exists on main after squash-merge. Rebuilt both
the dagger arm64 and nix amd64 variants from the squashed commit
(ecded30) and updated paperless + immich-ringtail to the new tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 14:53:21 -07:00
ecded30073 Make valkey local on ringtail (nix amd64) + bump to 8.1.7 (#362)
## Summary

Weekly "make one non-local container local" pickup: immich-ringtail still pulled `docker.io/valkey/valkey:8.1.6` because the existing `containers/valkey/container.py` build was arm64-only.

- Adds `containers/valkey/default.nix` — nix-built amd64 valkey image, packaged by the ringtail nix-container-builder runner using `pkgs.dockerTools.buildLayeredImage`. Mirrors the existing `containers/authentik-redis/default.nix` pattern.
- `containers/valkey/container.py` keeps building the Alpine arm64 image for paperless on indri. Bumped both builds to upstream valkey 8.1.7 (Alpine 3.22 now ships `8.1.7-r0`; nixpkgs has 8.1.7).
- Splits `VERSION` (upstream app) from `ALPINE_PIN` (apk pin) in `container.py` so both build files can declare the same upstream version and pass `container-version-check`.
- Updates `service-versions.yaml`: current-version 8.1.7, refreshed last-reviewed, upstream-source now points at the canonical valkey-io releases page.
- Switches kustomizations:
  - `immich-ringtail/kustomization.yaml`: `docker.io/valkey/valkey:8.1.6` → `registry.ops.eblu.me/blumeops/valkey:v8.1.7-02859c5-nix`, comment updated.
  - `paperless/kustomization.yaml`: `v8.1.6-r0-fabca04` → `v8.1.7-02859c5`.

## Build

build-container run #563 — both jobs succeeded after a transient runner crash on the first dispatch (#562 build-nix), which surfaced two separate bugs that landed in a separate C0 on main:

- `runner-logs` silently returned 0 with no output when the log file didn't exist on indri
- `ssh indri` swallowing remote exit codes (fish login shell), which the wrapper now works around via a stdout marker

## Test plan

- [ ] `argocd app set immich-ringtail --revision valkey-nix && argocd app sync immich-ringtail`
- [ ] `argocd app set paperless --revision valkey-nix && argocd app sync paperless`
- [ ] Both valkey pods come Ready and start serving on :6379
- [ ] Immich app + paperless can read/write their respective cache
- [ ] After merge: rebuild from squashed main commit + update kustomization tags (squash-tag follow-up)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #362
2026-05-28 14:51:09 -07:00
1ce381cb6e C0: surface missing-log failures in runner-logs
`mise run runner-logs <run> -j <n>` previously silently succeeded with
no output when forgejo had no log for the task. Two layered causes:

1. zstdcat exits 0 even when the file is missing (writes "can't stat
   … -- ignored" to stderr).
2. ssh to indri runs fish, which silently drops the remote exit code so
   the subprocess returncode is always 0.

Probe `test -f` over SSH and parse a stdout marker (EXISTS / MISSING) to
detect the missing-log case, then report it explicitly with the indri
path and a hint about action_task.log_in_storage = 0 so the operator
knows where to look next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 14:36:33 -07:00
e703d25efe C0: rebuild unpoller container from squashed main commit
Image was previously tagged with the unpoller-v3 branch SHA (1b27242),
which doesn't exist in main's history after squash-merge. Rebuilt from
the squashed commit so the tag references a reachable commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 10:10:21 -07:00
4d1f4af25b Upgrade unpoller v2.34.0 → v3.2.0, migrate to container.py (#361)
## Summary

- Service Review pickup: unpoller (last reviewed 73 days ago).
- Upgrades unpoller from v2.34.0 to v3.2.0 (major version bump).
- Migrates the container build from a Dockerfile to a native Dagger pipeline (`containers/unpoller/container.py`) following the navidrome / miniflux pattern.
- Refreshes `service-versions.yaml` (last-reviewed, current-version).

## Breaking changes (upstream)

- **v3.0.0** — UniFi network API shifts (later 10.x). Some metric / event / log names and labels may have changed. Worth a follow-up sweep of the unpoller Grafana dashboard for missing series.
- **v3.2.0** — defaults to a 60s background poll feeding cached Prometheus scrapes (was on-demand poll per scrape). To restore previous behavior, set `interval = 0` in `up.conf`. Leaving the new default in this PR — every-15s scrapes will simply serve from cache, which is fine for our use.

## Build

- Image: `registry.ops.eblu.me/blumeops/unpoller:v3.2.0-1b27242`
- Built by build-container workflow run #559 from this branch.

## Test plan

- [ ] `argocd app set unpoller --revision unpoller-v3 && argocd app sync unpoller`
- [ ] Pod comes Ready
- [ ] Verify metrics exported (`Site/Client/UAP/USG/USW` counts in logs, `unpoller_*` series in Prometheus)
- [ ] Spot-check unpoller Grafana dashboard for missing series after the v3 API shift
- [ ] After merge: `argocd app set unpoller --revision main && argocd app sync unpoller`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #361
2026-05-28 09:59:46 -07:00
f6febb1f77 C0: switch fly proxy deploy strategy to immediate
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 34s
Bluegreen kept timing out — the new green machine couldn't reach
"started" within Fly's 5-minute deploy budget. The cold-start sequence
(tailscaled → tailscale up → wait-for-MagicDNS → nginx startup) eats
most of that, leaving no headroom for healthcheck propagation.

For a single-machine proxy, bluegreen offers little benefit anyway:
no warm second instance, so trading 5-10s of downtime for predictable
completion is the right call.
2026-05-28 07:59:22 -07:00
4e25180b0a C0: clone blumeops via tailnet on ringtail provision
Switch ringtail.yml from forge.eblu.me (Fly proxy, WAN) to
forge.ops.eblu.me (Caddy on indri, tailnet). Ringtail is always
on the tailnet — the WAN round-trip was overhead and made
provision-ringtail fail any time Fly was slow or down.
2026-05-28 07:13:40 -07:00
c00d7db507 Recurring maintenance batch (2026-05-27) (#360)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 14m10s
Bundle of recurring overdue tasks:

- Ringtail flake update
- Security & compliance report review
- Tooling deps bump (prek, fly, mise, forgejo workflows)
- Top stale doc review
- Top stale service review (if trivial)

Larger items (service version bumps requiring upgrades, non-local container migration) split out as separate PRs.

Reviewed-on: #360
2026-05-28 06:01:57 -07:00
Erich Blume
753fa9cb63 C0: disable VRR on ringtail DP-1 to stop OMEN panel flicker
The OMEN 27i IPS pumps brightness when its refresh swings into the low
VRR range during low-framerate content (game cutscenes), producing a
~20Hz flicker that compounds over a session until a reboot. GPU health
is clean (no Xid/ECC/thermal); pinning fixed 165Hz eliminates it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 12:59:29 -07:00
Erich Blume
c09bd5b612 C0: cap systemd-coredump on ringtail to stop game-crash lockups
Wine/Proton game segfaults (e.g. Diablo IV) produced multi-GB cores that
systemd-coredump spent minutes compressing to disk, pinning the CPU and
freezing the desktop. Cap ProcessSizeMax/ExternalSizeMax at 1G (oversized
cores logged but skipped) and MaxUse at 2G to bound the store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:54:32 -07:00
35ae171783 C0: fix sync button location in manage-forgejo-mirrors
The verify step pointed to the main repo page, but the "Synchronize now"
button is in the Mirror settings section of the settings page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 07:15:07 -07:00
57fd88b269 C0: fix op item edit syntax in zot key rotation
The pbpaste | op item edit ... "field[password]=-" stdin syntax is
rejected by op 2.34 as "invalid JSON" — recent op versions treat
piped input as a full JSON template, not a single field value.
Procedure now uses an inline assignment via a local fish variable.
2026-05-22 21:50:43 -07:00
08a1cb164a C0: fix 1password export filename in backup how-to
1Password's desktop app names exports as
1PasswordExport-<uuid>-<timestamp>.1pux automatically — you can't
choose the name. Procedure now points the task at that glob.
2026-05-22 21:36:13 -07:00
d02bf062af C0: review 1password reference card
Added vault split (blumeops vs Personal), noted onepassword-connect
runs on both indri and ringtail, and lifted op CLI guidance from
agent memory into the card. Bumped last-reviewed.
2026-05-22 21:29:11 -07:00
ee51bcafb4 Rip out compensating-controls framework (#359)
## Summary

Removes the compensating-controls (CC) framework. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files stay in place but no longer carry \`CC: <id>\` prefixes — each entry now just keeps a free-form \`Description\` of why it's muted.

The CC review cadence proved to be more process overhead than this single-operator homelab needed.

## What changed

**Deleted**
- \`compensating-controls.yaml\` — the CC registry
- \`mise-tasks/review-compensating-controls\` — the staleness-review task
- \`docs/how-to/operations/review-compensating-controls.md\`
- \`docs/how-to/operations/record-review-evidence.md\` (was aspirational)
- \`docs/explanation/compliance-mute-categories.md\` (proposed-future CC/NA/RA work)
- 5 orphan \`+review-cc-*\` / \`+compliance-mute-categories\` changelog fragments

**Modified**
- 6 mutelist YAML files: stripped \`CC: <id>.\` prefix from every \`Description\` / \`statement\` field, kept the free-form text
- \`mise-tasks/review-compliance-reports\`: removed CC mentions from docstrings, panel text, and the node-verification table title. Node-verification logic itself is unchanged.
- \`docs/reference/operations/security.md\`: removed the "Compensating controls" section
- \`docs/how-to/operations/read-compliance-reports.md\`: rewrote step 3 of "Acting on findings" to point at the mutelist YAML directly
- \`docs/changelog.d/prowler-iac-mutelist.infra.md\`: rewrote to drop the "two new compensating controls" framing

## What did not change

- All Prowler manifests (cronjobs, RBAC, PVs, kustomization) — scans still run on the same schedule
- The Kingfisher deployment
- The trivy-shim in the Prowler container — that's about Trivy ignorefile plumbing, independent of the CC concept
- The mutelist entries themselves — each \`Resources\` list is unchanged; only the prose of \`Description\` was edited
- \`CHANGELOG.md\` — historical releases are left as-is

## Test plan

- [ ] Wait for human review before deploying — once merged, re-point ArgoCD: \`argocd app set prowler --revision main && argocd app sync prowler\` (no manifest changes besides the ConfigMap, so impact is limited to muted-finding descriptions in next week's report)
- [ ] Confirm next weekly Prowler K8s CIS run (Sunday 3am) still completes and produces a report on sifaka
- [ ] Confirm next weekly Prowler IaC run still honors \`trivyignore.yaml\` (the trivy shim is untouched but the ignorefile content was rewritten)
- [ ] \`mise run review-compliance-reports\` — verify node-verification block still runs and prints the renamed table title

Reviewed-on: #359
2026-05-22 21:08:53 -07:00
2fae0f7161 C0: switch grafana deployment to Recreate strategy
Grafana uses an RWO PVC for SQLite + Bleve search index. RollingUpdate
spawns the new pod before terminating the old one, so the new pod
crashloops on the index lock until rollout timeout. Recreate terminates
the old pod first, letting the new pod acquire the lock cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 06:33:26 -07:00
1897eb1c5b C0: move immich blackbox probe to ringtail alloy
Immich migrated to ringtail's k3s cluster but the probe still targeted
the in-cluster service DNS on indri's minikube, firing ServiceProbeFailure
indefinitely. Moved the target into alloy-ringtail's config so the probe
runs in the cluster where immich actually lives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:46:22 -07:00
e222d47d45 C0: deploy shower v1.1.3 (kustomize newTag bump)
Image v1.1.3-3645098-nix was built directly on ringtail and pushed via
skopeo, bypassing the Forgejo runner: indri was severely overloaded
(load avg 24.92, minikube VM at 344% CPU) and the workflow-dispatch
endpoint timed out. The image content is identical to what the runner
would have produced — same default.nix at commit 3645098 (on main),
same NIX_PATH (current nixpkgs flake), same skopeo invocation. Tag
short-sha matches the commit that defines the recipe so we aren't
pinning to a ghost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 20:09:54 -07:00
3645098bf1 C0: bump shower to v1.1.3
Wheel/sdist + FOD hashes probed on ringtail. Full nix-build verified
end-to-end before commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 19:57:37 -07:00
Erich Blume
96dbbb3cbe C0: add sn2-prelaunch wrapper to clear SN2 stale lockfiles
UE5 writes Saved/running.dat as a "session in progress" marker. If
the previous session exited uncleanly (SIGKILL, crash), it lingers,
and SN2 pops up an invisible 0×0 Error dialog at next launch that
the GameThread blocks on forever — visible only as a black screen
with a spinning loader. Wrap the Steam command to clear the marker
files before each launch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:26:10 -07:00
815a0cc6e6 C0: shower — rebuild from main SHA (post-merge retag)
PR #358 was squash-merged so the branch commit b8c7783 baked into the
prior image tag isn't reachable from main's history. Rebuild from main
HEAD (a33fa47) and retag. Image content is byte-identical (FOD is
content-addressed, inputs unchanged); only the SHA in the tag changes
so future provenance tracing stays on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 06:57:24 -07:00
a33fa47b80 C1: deploy shower v1.1.2 (#358)
## Summary

Deploys `adelaide-baby-shower-app` **v1.1.2** to ringtail k3s.

- Bumps `containers/shower/default.nix` `version` to 1.1.2.
- Refreshes sdist + wheel `fetchurl` hashes against the forge PyPI artifacts.
- Re-probed FOD `outputHash` on the nix-container-builder runner (ringtail) and pinned the new closure hash.
- Bumps kustomize `newTag` to `v1.1.2-b8c7783-nix` (built from this branch's tip).
- Bumps `service-versions.yaml` entry for shower to `1.1.2` / `last-reviewed: 2026-05-15`.

## Build provenance

Built by Forgejo Actions run #553 on `nix-container-builder` (ringtail) at commit `b8c7783`. After merge a C0 follow-on will rebuild from main and retag so future provenance points at main history.

## Test plan

- [ ] `argocd app set shower --revision shower-v1.1.2 && argocd app sync shower` deploys cleanly
- [ ] Pod migrates the SQLite PV and serves at `shower.ops.eblu.me` / `shower.eblu.me`
- [ ] No new errors in pod logs after `collectstatic` + gunicorn boot

Reviewed-on: #358
2026-05-15 06:50:46 -07:00
Erich Blume
12314857d8 C0: add GE-Proton to ringtail Steam extraCompatPackages
Lets Subnautica 2 (and any other game) opt into the GE-Proton
build via Steam's per-game compatibility tool override, as a
workaround for the Proton Experimental + DXVK D3D12 Mercuna hang.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 06:27:43 -07:00
4d2bc9975f C0: deploy shower v1.1.1 (kustomize newTag bump)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 20:51:10 -07:00
4e117dc921 C0: pin shower v1.1.1 FOD outputHash (probed on ringtail)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 20:40:22 -07:00
6e90c4c363 C0: bump shower to v1.1.1 (probe FOD hash)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 20:12:00 -07:00
dc69b8c68b C1: fix borgmatic shower SQLite dump (ssh to ringtail) (#357)
## Summary

Nightly borgmatic backups have been failing for 2 days. Root cause: the
shower SQLite dump `before_backup` hook (added in PR #349) referenced
`kubectl --context=k3s-ringtail`, but indri's kubeconfig deliberately
doesn't carry the ringtail credentials. The hook's failure aborted the
entire run, taking out *both* the local sifaka repo and the BorgBase
offsite. Verified the last good archive was `indri-2026-05-11T02:00`.

## Approach

ssh into ringtail and run `k3s kubectl` there — no indri-side
kubeconfig needed. `/etc/rancher/k3s/k3s.yaml` is mode 644 so no sudo
required, and the existing ssh access from indri to ringtail works.

Inline-shell quoting got hairy fast (fish on ringtail rejected `POD=...`
bash syntax; the nix shower image lacks `tar` so `kubectl cp` fails).
Pulled the dump logic into `~/bin/borgmatic-k8s-sqlite-dump`, deployed
by the ansible role. Each dump entry now declares a `target`:

- `local:<context>` — local kubectl with explicit context (mealie)
- `ssh:<user@host>` — ssh + `k3s kubectl` on the cluster host (shower)

Bytes come back via `kubectl exec ... -- cat` instead of `kubectl cp`
since `cp` needs `tar` in the pod (nix-built containers don't bundle it).

## Test plan

- [x] `mise run provision-indri -- --tags borgmatic --check --diff` shows expected diff
- [x] Apply, helper script deployed at `~/bin/borgmatic-k8s-sqlite-dump`
- [x] Helper invoked directly with `ssh:eblume@ringtail` produces a valid 288 KB SQLite file
- [x] Full `borgmatic create` completes without errors — both mealie.db (1.7 MB) and shower.db (288 KB) appear in `~/.local/share/borgmatic/k8s-dumps/`, archive `indri-2026-05-13T17:31:02` written to sifaka borg repo

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #357
2026-05-13 18:55:50 -07:00
947e4310c3 C2: migrate immich from minikube to ringtail (mikado chain) (#356)
## Summary

C2 Mikado chain to move the entire Immich stack (server, ML, valkey,
postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the
largest single tenant on minikube (~1.5 GiB resident) and minikube is
currently memory-saturated (97% RAM, swapping). This is the first
concrete chain in the broader indri-k8s decommission effort.

This PR contains the planning layer only — 7 cards (1 goal + 6
prerequisites). Implementation cycles follow per the Mikado Branch
Invariant.

## Goal end-state

- Immich `server`, `machine-learning`, `valkey` on ringtail.
- ML pod uses ringtail's RTX 4080 (performance win — currently
  CPU-only).
- CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail.
- Library still on sifaka NFS — ringtail mounts the same path.
- `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress.
- Minikube `immich` and `immich-pg` are removed.

## Cards

| Card | Depends on |
|---|---|
| `migrate-immich-to-ringtail` (goal) | all six below |
| `cnpg-on-ringtail` | — |
| `immich-pg-on-ringtail` | cnpg-on-ringtail |
| `immich-pg-data-migration` | immich-pg-on-ringtail |
| `sifaka-nfs-from-ringtail` | — |
| `immich-app-on-ringtail` | immich-pg-on-ringtail, sifaka-nfs-from-ringtail |
| `immich-cutover-and-decommission` | immich-pg-data-migration, immich-app-on-ringtail |

## Key constraints

- **No data loss.** Downtime is acceptable; data loss is not. Two
  surfaces matter: postgres (ML embeddings, face data — slow to
  re-derive) and the library files (don't move, but NFS access from
  ringtail must be verified).
- **Migration method:** Option A is a CNPG `externalCluster`
  basebackup → promote. Option B is `pg_dump`/`pg_restore` as a
  documented fallback. Either way, dry-run against a scratch
  cluster first.
- **Why pg moves too** (not cross-cluster): keeping pg on minikube
  would block the whole decommission, and Immich is chatty with pg
  so tailnet round-trips would hurt.

## Test plan

- [ ] Plan review — does the dependency graph make sense?
- [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the
      chain correctly.
- [ ] Per-card implementation cycles land separately (commit
      convention enforced by hook).

Reviewed-on: #356
2026-05-13 16:46:17 -07:00
bc8ceb502b Merge pull request 'C1: pin ringtail wired IP to 192.168.1.21 (static)' (#355) from ringtail-static-ip into main 2026-05-12 09:59:59 -07:00
a4a30aad44 fix(ringtail): explicitly enable net.ipv4.ip_forward
After the static IP change, k3s/flannel pod networking broke because
ip_forward was 0. NixOS doesn't enable IP forwarding by default — it
was previously being set implicitly somewhere in the NM-managed /
scripted-DHCP path. With static networking we have to set it ourselves.

Verified at runtime via sysctl -w before adding here; pod outbound
came back immediately and Tailscale VIP services recovered without
any pod restarts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 09:51:16 -07:00
d0b5423135 C1: pin ringtail wired IP to 192.168.1.21 (static)
Removes DHCP lease renewal as a failure mode on ringtail after an outage
on 2026-05-12 where the IP and routes silently disappeared from enp5s0
without any kernel link event. NetworkManager stays enabled for wireless
fallback but no longer manages the wired interface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 09:33:57 -07:00
dc0916a548 C0: shower — rebuild from main SHA (post-merge retag)
PR #354 was squash-merged so the branch commit 444ff91 baked into the
prior image tag isn't reachable from main's history. Rebuild from main
HEAD (3c7967e) and retag. Image content is byte-identical (FOD is
content-addressed, inputs unchanged); only the SHA in the tag changes
so future provenance tracing stays on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 20:20:39 -07:00
3c7967e445 C1: deploy shower v1.1.0 (phases + guest memories) (#354)
## Summary

Deploys `adelaide-baby-shower-app` **v1.1.0** to ringtail k3s.

### App changes (since v1.0.2)

- **Four-phase `ShowerState`** replaces the boolean `locked` flag — `pre_event` → `party` → `prizes_locked` → `event_locked` — with a backfill migration that maps `locked=True → pre_event`, `locked=False → party`.
- **Guest memories**: append-only photos + comments panel where guests can leave notes for the baby. Adds `GuestPhoto` + `GuestComment` models with file-extension validators and a max-size validator; new `shower.imaging` module for thumbnail generation.
- **Admin + QR polish**: configurable host link, fixed "View Site" URL, guest-facing QR copy improvements, contest tweaks.

Three Django migrations run automatically in the entrypoint against the SQLite PV:
- `0009_shower_phase`
- `0010_guest_memories`
- `0011_book_description`

No ConfigMap / env-var changes. The deploy uses `strategy: Recreate` with a single replica, so the old pod releases the data PVC before the new one mounts it and runs migrations.

### Container build changes

The v1.1.0 tag exposed a latent issue with the Forgejo PyPI install path:

- The recent commit [2d38418e](2d38418e) closed the forge package leak at the Fly edge by blocking `/api/packages/*` publicly.
- Forgejo's PyPI simple index returns absolute file URLs hardcoded to its public `ROOT_URL` (`forge.eblu.me`), so pip-installing from the tailnet index URL still tries to download from `forge.eblu.me` → 403.
- Previous shower builds escaped this because their FOD outputs were already in the nix store; bumping to a new version forced a fresh pip run that hit the block.

Fix mirrors what we already do for the sdist: both wheel and sdist are pulled via direct `fetchurl` against `forge.ops.eblu.me`, then the wheel is copied to TMPDIR under its clean filename (nix store path's hash prefix breaks pip's wheel-filename parser) and handed to pip as a local path. The forge `--extra-index-url` is no longer needed.

FOD outputHash pinned to `sha256-kTNOswobtkgyQmmqbQM8XO4vvaGg57nCuuZGbNXb0NM=` from run 547. Image: `registry.ops.eblu.me/blumeops/shower:v1.1.0-444ff91-nix`.

### Adjacent finding (already handled)

The ringtail `gitea-runner-nix_container_builder` systemd unit was left `inactive` after the recent `provision-ringtail` (matches the known `sshd-restart-hangs-mux` lesson — the rebuild changed the unit's PATH closure + config.yaml, systemd stopped it, then the playbook hung before the activation could restart it). Manually started; the existing memory `lesson_provision_ringtail_ssh_hang.md` was extended to mention the runner as the canary service to check after provisions.

## Test plan

- [ ] `argocd app diff shower --revision shower-v1.1.0` — review the manifest change
- [ ] `argocd app set shower --revision shower-v1.1.0 && argocd app sync shower`
- [ ] `kubectl --context=k3s-ringtail logs -n shower deploy/shower` — confirm migrations 0009/0010/0011 applied, no errors
- [ ] Hit `https://shower.ops.eblu.me/` (tailnet) — splash page renders, phase indicator visible
- [ ] Hit `https://shower.ops.eblu.me/host/` — host console loads, phase dropdown shows the four states
- [ ] Hit `https://shower.eblu.me/` (public via Fly) — splash page still served
- [ ] After merge: `argocd app set shower --revision main && argocd app sync shower`

Reviewed-on: #354
2026-05-11 20:08:03 -07:00
fbc1f7720e C0: gitignore .claude/scheduled_tasks.lock
Transient lock file written by the ScheduleWakeup harness tool when
Claude paces its own work between long-running operations. Not config,
not state worth checking in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:37:29 -07:00
4133785119 C1: ringtail — weekly flake.lock update (#352)
## Summary
- Recurring weekly lockfile refresh for `nixos/ringtail/flake.lock`.
- Inputs updated: `disko`, `home-manager`, `nixpkgs`.
- `nixpkgs-services` was deliberately skipped (per overlay convention — pinned services bump only on intentional update).
- Generated via `dagger call flake-update --src=. --flake-path=nixos/ringtail`.

## Test plan
- [x] `prek` hooks pass
- [ ] After merge: `mise run provision-ringtail` to deploy
- [ ] Then check for kernel update per [[manage-lockfile]]

## Notes
- Not deployed from this PR — provisioning is a follow-up.

Reviewed-on: #352
2026-05-11 16:13:07 -07:00
145df76d06 C1: service review — mealie (v3.12.0 deployed; upstream v3.17.0) (#351)
## Summary
- Recurring service review for `mealie`.
- Upstream is at **v3.17.0** (released 2026-05-06); deployed image is **v3.12.0** — 5 minor versions behind.
- Container is built locally from the forge mirror (`containers/mealie/Dockerfile`), so upgrade requires a fresh build + changelog review for breaking changes between v3.12 and v3.17.
- Deferring the actual upgrade to a separate task; this PR just refreshes `last-reviewed` and captures the gap in `notes`.

## Test plan
- [x] `prek` hooks pass
- [ ] Follow-up: open task to bump `containers/mealie/Dockerfile` `CONTAINER_APP_VERSION`, build, and update kustomization tag

## Notes
- No deployment changes in this PR.

Reviewed-on: #351
2026-05-11 16:12:36 -07:00
bb7efa850a C1: doc review — replicating-blumeops tutorial (#350)
## Summary
- Periodic doc review of `tutorials/replicating-blumeops.md` (was never reviewed).
- Fixed 4 instances of "BluemeOps" → "BlumeOps" (also caught 1 in `contributing.md`).
- Added `last-reviewed: 2026-05-11` and bumped `modified`.
- Verified all wiki-link targets resolve.

## Test plan
- [x] `prek` hooks pass (link checker, frontmatter checker)
- [ ] Optional: `mise run docs-preview docs/tutorials/replicating-blumeops.md`

Reviewed-on: #350
2026-05-11 16:11:35 -07:00
f83be3bf37 C1: review CC observability-stack-audit (extend to k3s) (#353)
## Summary
- Recurring compensating-control review (oldest stale control: 42 days).
- Verified the control is in effect on both clusters:
  - `alloy-k8s` on minikube-indri — Synced/Healthy, DaemonSet 1/1 ready
  - `alloy-ringtail` on k3s-ringtail — Synced/Healthy
  - `loki` (`monitoring/loki-0`) — Running, receiving logs (52 restarts in 18h is worth watching but not blocking review)
- Generalized the description: previously named only minikube, but the indri→ringtail migration means we now operate two clusters and both rely on this control.
- Added a follow-up note: enabling native apiserver audit logging is far more tractable on k3s (`--audit-log-path` / `--audit-policy-file`) than it was on minikube — worth revisiting once the migration concludes.

## Test plan
- [x] `prek` hooks pass
- [x] Verified alloy + loki status via `kubectl --context=minikube-indri` and `argocd app get`

## Notes
- No deployment changes.

Reviewed-on: #353
2026-05-11 16:10:39 -07:00
40d9a1ef9e C0: shower — rebuild from main SHA (post-PR-349 retag)
Standard squash-merge dance per
docs/how-to/deployment/build-container-image.md#Squash-merge-and-container-tags
— retags from v1.0.2-039d9b9-nix (branch SHA) to v1.0.2-292d354-nix
([main] tag from run 544 built off the merge commit). Functionally
identical; preserves source traceability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 13:55:25 -07:00
292d354902 C1: deploy adelaide-baby-shower-app to ringtail k3s (#349)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m12s
## Summary

Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).

### What's included

- **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer.
- **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett.
- **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default).
- **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`).
- **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`.
- **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter.
- **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages.
- **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments.

### Defense layers on the public surface

1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list)
2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP)
3. django-axes (5 fails / 1h, keyed on username+ip_address)
4. edge `/admin/` block (returns 403 for anything that isn't login/logout)

## Prerequisites for the user to do (NOT in this PR)

Halted on these per request — they touch shared/manual systems:

- [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000`
- [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git
- [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix`
- [x] **DNS**: `mise run dns-up` after merge
- [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy`
- [ ] **Caddy push**: `mise run provision-indri -- --tags caddy`
- [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy`
- [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging

## Test plan

- [ ] Container builds successfully on nix-container-builder runner
- [ ] Pod starts, migrations run, gunicorn answers on :8000
- [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean
- [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block)
- [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response
- [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead`
- [ ] Grafana dashboard "Shower APM" appears and starts showing traffic
- [ ] `mise run services-check` passes

Reviewed-on: #349
2026-05-11 13:47:18 -07:00
eceb2b99ce C0: bump homepage image to fixed-perms build (v1.11.0-678f26b-nix)
Pulls in 678f26b0 (chowned /app/config). Resolves the EACCES crash loop
on ringtail.
2026-05-10 21:16:34 -07:00
678f26b0e7 C0: fix homepage container /app/config write permissions
The previous Dockerfile chowned /app/config to 1000:1000 so the runtime
user could seed missing skeleton configs (e.g. proxmox.yaml) and write
/app/config/logs. The nix derivation didn't replicate that, so the new
amd64 image crashed with EACCES on cold start (fixed-forward — caught
during ringtail cutover, ArgoCD #348).

Add fakeRootCommands to dockerTools to create /app and /app/config and
chown them at build time. The deployment's ConfigMap subPath mounts
leave the parent directory as image filesystem, so its ownership has to
be set at build time, not at runtime.
2026-05-10 20:49:22 -07:00
ad7a0ed105 Merge pull request 'C1: migrate homepage dashboard from minikube to ringtail (nix-built amd64)' (#348) from homepage-to-ringtail into main 2026-05-10 20:40:33 -07:00
be54cc3411 C1: migrate homepage dashboard to ringtail k3s
Repoint the ArgoCD Application destination from minikube to ringtail and
bump the image tag to the new amd64 nix-built v1.11.0-b87f62e-nix.

Rework services.yaml for the autodiscovery shift: 11 services that
previously auto-populated via minikube Ingress annotations (ArgoCD,
Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, Navidrome,
Paperless, TeslaMate, Transmission) become explicit static entries with
their widget configs preserved. Conversely, the ringtail services that
will now auto-populate (Frigate/NVR, Authentik, Ntfy) are removed from
the static list to avoid duplicates; Ollama becomes newly visible.

Add a Content group for Immich/Kiwix/Miniflux which previously lived
under the autodiscovered "Content" group from annotations.
2026-05-10 20:37:03 -07:00
b87f62e0f5 C1: nix-build homepage container for amd64 ringtail migration
Replace Dockerfile (arm64-only, indri-built) with a nix derivation
adapted from nixpkgs pkgs/by-name/ho/homepage-dashboard. Built via the
nix-container-builder runner on ringtail, producing an amd64 image
suitable for k3s.

Includes the upstream Next.js file-system-cache patch to avoid
prerender cache write failures on a read-only nix store path
(nixpkgs issues #328621 and #458494).

Pinned to v1.11.0 (current production version).
2026-05-10 20:32:38 -07:00
8bc19fa460 C0: tailscale main-SHA rebuild for ringtail proxyclass
Routine post-squash-merge cleanup. Bumps the ProxyClass image tag from
the now-orphaned PR branch SHA (67af7a8) to the merge commit SHA
(0108b68) so the deployed image stays traceable after branch cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:52:39 -07:00
0108b68769 C1: mirror tailscale container locally for ringtail proxyclass (#347)
## Summary

Adds the first cut of a local nix build for `docker.io/tailscale/tailscale` and rewires only the ringtail tailscale-operator overlay to use it. Indri's overlay continues pulling upstream — minikube on indri is being decommissioned in favor of ringtail's k3s, so investing in dual-cluster routing here would be wasted churn.

## Changes

- `containers/tailscale/default.nix` — `buildGoModule` over `cmd/tailscale`, `cmd/tailscaled`, `cmd/containerboot`; packaged via `dockerTools.buildLayeredImage` with `cacert`, `iptables` (legacy symlink to match upstream Synology compat), `iproute2`, `tzdata`, `busybox`.
- `argocd/manifests/tailscale-operator-ringtail/kustomization.yaml` — kustomize `images:` rewrite swapping `docker.io/tailscale/tailscale` → `registry.ops.eblu.me/blumeops/tailscale:v1.94.2-67af7a8-nix`.
- `docs/changelog.d/mirror-tailscale-container.infra.md` — fragment.

## Pin rationale

v1.94.2 matches `service-versions.yaml:96` and the current ProxyClass exactly — this PR is "make it local," not "upgrade tailscale." Version bumps come as follow-up C0/C1 changes once we decide to test newer (v1.96.x had a Fly-side MagicDNS regression; v1.98.0 is current upstream stable).

## Test plan

- [x] Image built successfully on ringtail nix-container-builder (run #528).
- [x] Image visible in registry: `registry.ops.eblu.me/blumeops/tailscale:v1.94.2-67af7a8-nix`.
- [ ] Deploy from branch: `argocd app set tailscale-operator-ringtail --revision mirror-tailscale-container && argocd app sync tailscale-operator-ringtail`.
- [ ] Verify proxy pods restart with new image and existing tailnet ingresses (e.g., authentik, immich, tempo) keep resolving.
- [ ] After merge: rebuild on main SHA, update kustomization, run `services-check`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #347
2026-05-06 06:50:31 -07:00
6f0d80ca1e C0: doc review — index.md, add ringtail to infra overview
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:14:40 -07:00
39b042e638 C0: service review — caddy v2.11.2 (current latest, healthy)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:11:15 -07:00
24e5490259 C0: review CC init-container-isolation — defer retirement to post-ringtail
Runtime grafana pod matches the manifest and the CC's claim; bumped
last-reviewed. Noted that retiring init-chown-data in favor of fsGroup
alone should wait until grafana migrates to ringtail's k3s, since the
storage backend will change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:31:13 -07:00
074887cd57 C0: docs — explanation article on compliance mute categories
Captures the CC vs NA vs RA distinction surfaced during the 2026-05-03
weekly compliance review (CVE-2026-31789), and the image-scan mutelist
gap that blocks acting on it. Links the new article from the
review-compensating-controls how-to so it isn't orphaned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:19:53 -07:00
9fb5442ccd C0: kiwix — doc review, fix Adding Archives source path
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:46:16 -07:00
f16e1c81f1 C0: zot — upgrade indri registry to v2.1.16
Security fixes only (TLS verification on metrics client, CORS
Allow-Credentials suppression on wildcard origin, manifest/API-key
body-size limits, dependabot bumps). No config changes required;
re-built from source on indri and bounced launchagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:41:07 -07:00
a2c61b625d C0: rotate-fly-deploy-token — fish+bash one-shot, op validator gotcha
Combine mint+store into a single command with both fish and bash
forms (the doc previously only showed manual paste). Document the
1Password CLI "Password item requires ps value" validator error and
the placeholder-password workaround for Password-category items with
empty primary password fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:42:57 -07:00
2c0917b266 C0: valkey — bump kustomization tags to main-branch SHA
Routine post-merge follow-up after #346. Branch SHA tag (946fa75) replaced
with the main-SHA-built tag (fabca04) so paperless and immich reference an
image traceable to a commit on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:47:16 -07:00
fabca04771 Mirror valkey 8.1 locally for paperless and immich (#346)
## Summary

- Add native Dagger build of valkey 8.1.6-r0 on Alpine 3.22 at `containers/valkey/`
- Swap paperless redis sidecar and immich-valkey from `docker.io/valkey/valkey:8.1-alpine` to `registry.ops.eblu.me/blumeops/valkey:v8.1.6-r0-946fa75`
- Resolves the DR-2026-04 TODO in paperless kustomization about multi-arch redis

## Why

Move toward fully locally-built containers for supply chain control. Paperless and immich both pulled the same upstream tag — one mirror serves both. Authentik's nix-built Redis stays separate (different image entirely).

## Risk

Low. Both sidecars are stateless caches:
- paperless redis: no volumeMount (in-pod localhost, pure memory)
- immich-valkey: `emptyDir` (cache only)

Pod restart rebuilds the cache. Smoke-tested locally (PING/SET/GET roundtrip on `valkey 8.1.6` with `--bind 0.0.0.0 --protected-mode no`).

## Test plan

- [ ] After merge: `mise run container-build-and-release valkey` to rebuild with main SHA
- [ ] Update kustomizations to the `[main]` SHA tag (C0 follow-up)
- [ ] `argocd app sync paperless` and `argocd app sync immich`
- [ ] Verify pods come up healthy (paperless OCR queue functional, immich job queue functional)
- [ ] `mise run services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #346
2026-05-01 17:40:03 -07:00
f84f5f02b3 C0: review compensating control trusted-ci-only
Verified Forgejo runner is registered only to forge.ops.eblu.me and the
forge has registration disabled, so no untrusted users can trigger
privileged CI. Tightened notes to reflect the closed-forge mechanism
(not a per-repo allow-list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:49:22 -07:00
4aa0872949 C0: review ollama doc — refresh image, models, last-reviewed
Bumped documented image tag to 0.20.4 (matches kustomization newTag),
added the two qwen3.5 models from models.txt, and stamped the card.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:42:33 -07:00
2d55303213 C0: alloy native macOS on indri — upgrade to v1.16.0
Completes the v1.16.0 fleet upgrade for the fourth alloy service
(type: ansible, built from source on indri). Binary built on gilbert
with Go 1.26.2 + CGO, scp'd to indri, codesigned, LaunchAgent reloaded.
Service reports clean WAL replay and resumed metric/log shipping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:36:38 -07:00
55563afc7e C0: alloy — bump kustomization tags to main-branch SHA
Per the build-container-image squash-merge convention, rebuild alloy v1.16.0
container images from the main SHA (9564435) and update the three alloy
kustomizations to reference :v1.16.0-9564435[-nix] instead of the branch
SHA :v1.16.0-26a3ab5[-nix] left over from #345.

Both images were rebuilt locally on gilbert (dagger) and ringtail (nix)
because indri is still under heavy macOS memory-compressor pressure (see
separate ticket); CI on indri can't reliably run the dagger publish step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:31:27 -07:00
9564435b11 Alloy V1.16.0 (#345)
Bump Grafana Alloy v1.14.0 → v1.16.0 across all four services (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail; alloy native ansible). Also migrate the indri build path from `Dockerfile` to a native Dagger `container.py` per the build-container-image migration playbook.

## Highlights from upstream
- v1.15: database observability promoted to stable, OTel Collector → v0.147.0
- v1.16: clustering for `loki.source.kubernetes_events`, MySQL exporter 0.19.0
- One pre-existing breaking change in v1.15 (`loki.source.awsfirehose` undocumented metric prefix rename) — not used here.

## Build infra
Alloy v1.16.0's go.mod requires Go 1.26.2. The nix derivation now uses `pkgs.go_1_26` with `GOTOOLCHAIN=local` to avoid auto-downloading a toolchain blob that violated the fixed-output rule.

## Test plan
- [ ] CI: `mise run container-build-and-release alloy --ref alloy-v1.16.0` (dispatched as run 522; nix job to be re-triggered with the v1.16.0 goModules outputHash once the local ringtail build surfaces it)
- [ ] After CI green, bump `images[].newTag` in three kustomizations to the new `-<sha>` and `-<sha>-nix` tags, deploy from this branch via `argocd app set <app> --revision alloy-v1.16.0 && argocd app sync <app>`
- [ ] Manual rebuild of macOS native binary on gilbert (per ansible/roles/alloy README) and `mise run provision-indri -- --tags alloy --check --diff`
- [ ] `mise run services-check` after merge & redeploy

Reviewed-on: #345
2026-05-01 08:05:37 -07:00
7fed166c18 Update ringtail flake inputs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:55:08 -07:00
f6e392b80c C1: SHA-pin tooling dependencies (2026-04 cycle) (#344)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m45s
## Summary

Monthly tooling dependency refresh, with a one-time conversion from version-tag pins (`rev = "vX.Y.Z"`, `image:tag`, `>=`) to SHA / digest pins everywhere.

## Changes

- **prek hooks**: all `rev = "vX.Y.Z"` → commit SHA + `# vX.Y.Z` comment. Bumped trufflehog (3.94.0→3.95.2), kingfisher (1.91.0→1.97.0), ruff (0.15.7→0.15.12), shfmt (3.13.0→3.13.1), prettier (3.8.1→3.8.3), actionlint (1.7.11→1.7.12).
- **fly/Dockerfile**: tag pins → `image@sha256:...` digest pins. Bumped nginx (1.29.6→1.30.0-alpine), tailscale (v1.94.1→v1.94.2 — still inside the safe pre-1.96.5 range), alloy (v1.14.1→v1.16.0).
- **mise-tasks**: PEP 723 inline deps converted from `>=` to `==` (PEP 508 doesn't support hashes inline). All scripts pinned to current latest: rich 15.0.0, typer 0.25.0, pyyaml 6.0.3, httpx 0.28.1.
- **prek `additional_dependencies`**: ansible-lint==26.4.0, ansible-core==2.20.5.
- **taplo-lint**: pass `--no-schema`. Upstream's `--default-schema-catalogs` returns a format taplo v0.9.3 can't parse — we don't validate against TOML schemas anyway, so this turns off the broken catalog fetch.
- **docs/update-tooling-dependencies**: documents the SHA-pin convention, `docker buildx imagetools inspect` for digest lookup, and `prek clean` before re-verifying (cache grows to several GiB).

Forgejo workflow `actions/checkout@v6.0.2` was already at the latest SHA — no change.

## Test plan

- [x] `prek run --all-files` passes after `prek clean`
- [x] `deploy-fly` workflow builds and deploys the new fly image on merge
- [x] `fly status -a blumeops-proxy` healthy after deploy
- [x] Spot-check a few mise tasks (`mise run blumeops-tasks`, `mise run docs-check-links`) to confirm pinned deps resolve cleanly

Reviewed-on: #344
2026-04-30 16:51:43 -07:00
5096223b48 C1: clean up cv + docs minikube artifacts (#343)
## Summary

Follow-up to #342. The cv and docs services are now live on indri (Caddy file_server backed by ansible-managed tarball extraction) and verified working. This PR removes the dead minikube artifacts and the tooling shims that referenced them.

## Changes

**Deletions:**
- ``argocd/apps/{cv,docs}.yaml``
- ``argocd/manifests/{cv,docs}/`` (deployment, service, ingress, pdb, kustomization)
- ``containers/{cv,quartz}/`` (Dockerfiles + start scripts)

**Tooling:**
- ``mise-tasks/container-version-check``: remove the ``quartz``→``docs`` CONTAINER_TO_SERVICE mapping (containers/quartz no longer exists)
- ``service-versions.yaml``: bump ``docs.current-version`` to ``v1.16.0`` (the blumeops docs release tag) and trim the migration-window comment

## Live state context

The argocd Applications ``cv`` and ``docs`` were already deleted from the cluster manually as part of the cutover; this PR just removes the YAML files that the ``apps`` app-of-apps was still ingesting. After merge, ``argocd app sync apps`` will reconcile and the ``apps`` Application returns to Synced.

The Caddyfile ``handle_errors`` bug that briefly crashed all ``*.ops.eblu.me`` services during cutover is fixed in a separate C0 (``2ee53fe``) on main, not here.

## Test plan
- [x] ``mise run container-version-check --all-files`` clean
- [x] ``mise run service-review --type ansible`` shows cv at 1.0.3, docs at v1.16.0
- [ ] After merge: ``argocd app sync apps`` returns clean (cv/docs entries gone, no children to reconcile)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #343
2026-04-29 15:18:39 -07:00
2ee53fe375 C0: fix Caddyfile try_html — handle_errors can't nest inside handle{}
The kind=static branch added in #342 put handle_errors inside the
@host handle{} block. handle_errors is a top-level site-block directive,
not an ordered HTTP handler, so Caddy refuses to load the config:

  parsing caddyfile tokens for 'handle': directive 'handle_errors'
  is not an ordered HTTP handler

This crash-loops the whole reverse proxy and takes down every
*.ops.eblu.me service. Tripped today during the live cv/docs cutover.

Fix: drop handle_errors and append /404.html as the final try_files
candidate. The 404 page is served with status 200 instead of 404, but
that's acceptable for a human-facing curated 404 — the page renders
correctly. Documented inline.

The running Caddy on indri already has the fixed config (deployed
manually during the cutover); this lands the fix in main so future
provision-indri --tags caddy runs don't re-break it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:16:44 -07:00
8d634861f6 C1: migrate cv + docs from minikube to indri-native (#342)
## Summary

Replace the cv (`cv.eblu.me`) and docs (`docs.eblu.me`) minikube Deployments with indri-native ansible roles. Caddy serves the extracted release tarballs directly via a new `kind: static` service-block — no daemon, no nginx pod, no ProxyGroup ingress on the request path. Mirrors the rationale of the recent devpi migration; part of the broader minikube wind-down.

## What's in this commit

- `ansible/roles/{cv,docs}` — sentinel-gated tarball download + extract into `~/{cv,docs}/content/`
- `ansible/roles/caddy/` — new `kind: static` branch in the Caddyfile template (encoded gzip, immutable cache headers for fingerprinted assets, optional `try_html` for Quartz-style clean URLs, optional per-path `download_paths` for the resume PDF's `Content-Disposition`)
- `ansible/playbooks/indri.yml` — wires `cv` and `docs` roles before `caddy`
- `service-versions.yaml` — both services flip to `type: ansible`. `docs.current-version` stays at `1.28.2` for this commit so `container-version-check` keeps passing while `containers/quartz/Dockerfile` still exists; it moves to the docs release tag in the cleanup commit
- `.forgejo/workflows/{cv-deploy,build-blumeops}.yaml` — deploy step now bumps `cv_version`/`docs_version` in the role defaults and pushes; running ansible + purging the Fly cache is manual from gilbert (matches devpi)
- Docs: `docs/how-to/operations/{cv,docs}-on-indri.md`, updated `docs/reference/services/{cv,docs}.md`, changelog fragment

## What is not in this commit

The dead artifacts. After PR review and successful cutover, a follow-up commit deletes:

- `argocd/apps/{cv,docs}.yaml` and `argocd/manifests/{cv,docs}/`
- `containers/cv/`, `containers/quartz/`
- `CONTAINER_TO_SERVICE['quartz']` mapping in `mise-tasks/container-version-check`
- bumps `docs.current-version` in `service-versions.yaml` to the release tag

## Cutover plan (manual, from gilbert, after review)

1. **Take down old:**
   - Remove the cv and docs Applications: `argocd app delete cv --cascade && argocd app delete docs --cascade`
   - Verify k8s namespaces gone: `kubectl --context=minikube-indri get ns | grep -E '^(cv|docs)\\b'` (should be empty)
   - Verify tailnet MagicDNS no longer advertises the VIPs: `nslookup cv.tail8d86e.ts.net` and `nslookup docs.tail8d86e.ts.net` should both fail
2. **Bring up new:**
   - `mise run provision-indri -- --tags cv,docs,caddy --check --diff` (already validated on branch)
   - `mise run provision-indri -- --tags cv,docs,caddy`
   - `fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"`
3. **Verify:** `mise run services-check` and the curl checks listed in `docs/how-to/operations/{cv,docs}-on-indri.md`
4. **Cleanup commit + merge.**

Total expected downtime: minutes (not the few-hour budget you authorized).

## Test plan
- [ ] `mise run provision-indri -- --tags cv,docs --check --diff` clean
- [ ] `mise run provision-indri -- --tags caddy --check --diff` shows only the cv + docs blocks changing as previewed in the PR thread
- [ ] After cutover: `cv.eblu.me`, `cv.ops.eblu.me`, `docs.eblu.me`, `docs.ops.eblu.me` all return 200
- [ ] `cv.eblu.me/resume.pdf` includes `Content-Disposition: attachment`
- [ ] A clean Quartz URL (e.g. `docs.eblu.me/explanation/agent-change-process`) resolves to the right page
- [ ] `mise run services-check` clean
- [ ] `mise run service-review --type ansible` shows cv and docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #342
2026-04-29 14:55:11 -07:00
a529d60f60 C0: remove containers/devpi/ build artifact
Devpi now runs natively on indri (uv venv via ansible role), so the
Dagger container build at containers/devpi/ is unused. Removing it.

Also updated dagger.md examples to use 'miniflux' as the example
container-name argument.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:40:45 -07:00
14ca0160ba Migrate devpi from minikube to indri (launchd) (#341)
## Summary

Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent.

## What changed

- **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box).
- **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`.
- **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role.
- **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`.
- **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted.
- **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list.

## Already deployed (live on indri)

- Service running: `launchctl list mcquack.eblume.devpi` → PID 53888
- `curl https://pypi.ops.eblu.me/+api` returns 200 
- `mise run docs-mikado` works again 
- 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/`
- Minikube namespace and PVC fully reclaimed

## Test plan

- [ ] `mise run services-check` (after merge)
- [ ] CI workflows that use devpi succeed
- [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines)

## Context

This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain.

Reviewed-on: #341
2026-04-29 13:38:36 -07:00
f4a24595b1 C0: review CC ephemeral-privileged-jobs
Verified TTL=604800s and hostPID limited to ephemeral Prowler CronJob
on indri. Noted that alloy-tracing on ringtail also uses hostPID but
is out of scope until Prowler scans ringtail (tracked in Todoist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 11:09:34 -07:00
817acc5e5e C0: transmission doc — review and correct storage/monitoring details
Marked last-reviewed: 2026-04-29. Fixed the storage layout table —
`/config/` is an emptyDir (ephemeral), not NFS, and the watch directory
is disabled. Documented the transmission-exporter sidecar that exposes
Prometheus metrics on port 19091.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 11:00:01 -07:00
4d76fd5de5 C0: prowler — rebuild image against main HEAD
Squash-merge of #340 changed the SHA. Bump prowler tag from
v5.23.0-2daf629 (PR branch) to v5.23.0-495e45d (main HEAD) so the
Dockerfile changes are present in the image deployed off main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:49:27 -07:00
495e45d01d Address 6 critical Prowler IaC findings (mute + grafana RBAC tighten) (#340)
## Summary

The weekly Prowler IaC scan reported 6 critical findings against `argocd/manifests/`. They split cleanly into two patterns:

- **Legitimate-by-design RBAC → mute with new compensating controls**
  - `external-secrets-controller`, `external-secrets-cert-controller` manage `secrets` (KSV-0041) and the cert-controller mutates its own webhook configurations (KSV-0114). This is what the operator is *for*. New CC: `operator-purpose-bound-rbac`.
  - `kube-state-metrics` (both `minikube-indri` and `k3s-ringtail`) holds `list/watch` on secrets to expose `kube_secret_info` and `kube_secret_labels` metrics. KSM's metric schema only reads metadata, never the `data:` field. New CC: `kube-state-metrics-metadata-only`.

- **Over-broad RBAC → fix**
  - `grafana-clusterrole` had `get/watch/list` on `secrets` because the dashboard-sidecar config used `RESOURCE=both` (ConfigMaps + Secrets). Nothing in the cluster labels Secrets with `grafana_dashboard=1`, so this was unused power. Switched both sidecar instances to `RESOURCE=configmap` and removed `secrets` from the ClusterRole.

The IaC cronjob also did not previously pass `--mutelist-file`, which is why every IaC finding reported as unmuted regardless of mutelist configuration. The new `mutelist/iac.yaml` is bundled into the existing `prowler-mutelist` ConfigMap and mounted via `items:` selector.

## Test plan

- [ ] `kubectl --context=minikube-indri kustomize argocd/manifests/prowler/` — already passes locally
- [ ] `kubectl --context=minikube-indri kustomize argocd/manifests/grafana/` — already passes locally
- [ ] Deploy from this branch via `argocd app set prowler --revision prowler-iac-mutelist && argocd app sync prowler` and same for `grafana`
- [ ] Manually trigger the IaC cronjob and verify `MUTED=True` on the 6 critical findings (`kubectl --context=minikube-indri -n prowler create job --from=cronjob/prowler-iac-scan prowler-iac-test`)
- [ ] Restart grafana pod and confirm dashboards still render (sidecar still finds them via ConfigMap watch)
- [ ] After verify, `argocd app set <app> --revision main && argocd app sync <app>` post-merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #340
2026-04-29 10:43:32 -07:00
718e0a0043 C0: review-compliance-reports — summarize image and IaC scans
Previously only the K8s CIS in-cluster scan was processed; the weekly
container-image and IaC Prowler scans were running on schedule but never
reviewed. Now each scan gets its own status / severity / week-over-week
delta, with top-N grouped tables (by check ID and resource) for the
high-volume image and IaC outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:18:06 -07:00
cfb6d7a7aa C0: service-review — mark cv reviewed 2026-04-27
No version bump; build deps (jinja2, pyyaml) still loose-pinned and fine.
Known issue: deployed v1.0.3 package predates phone-hide commit; tracked
separately in Todoist by user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:57:33 -07:00
c9eb188e05 C0: blumeops-tasks — replace ambiguous due:+N with "Nd overdue"
The signed offset format read as "due in 5 days" rather than
"5 days overdue", causing misreads. Switch to self-explanatory
text: "5d overdue" / "due in 2d" / "due today".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:49:46 -07:00
4a37ffcdc2 C0: CLAUDE.md — import AGENTS.md instead of redirecting to it
Claude Code only auto-loads CLAUDE.md. The prose shim told agents to go
read AGENTS.md, which is easy to skip. Replacing the shim with
`@AGENTS.md` inlines AGENTS.md content into the session prompt, so the
startup rules (ai-docs, blumeops-tasks, change classification) land in
context unconditionally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:41:13 -07:00
f9d9e00057 C0: blumeops-tasks — show due offset + recurrence, sort by overdue-ness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:18:16 -07:00
005e2a03ed C0: split gandi-operations docs; add dns-acme-cleanup mise task
Splits the nebulous gandi-operations how-to into two single-topic cards
(manage-eblu-me-dns, rotate-gandi-pat) and adds a mise task for the
recurring _acme-challenge TXT cleanup needed due to a value-comparison
bug in libdns/gandi v1.1.0 that prevents certmagic's cleanup phase from
removing presented TXT values.

The gandi reference card is updated to drop the false "different
credential from Pulumi PAT" claim — verified during the 2026-04-27
incident that Caddy and Pulumi share a single PAT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 09:48:46 -07:00
72b27b7fd2 C0: docs — add mealie borg restore how-to
Captures the procedure used to restore mealie's SQLite DB from a borgmatic
archive after the post-DR wipe: extract from borg, snapshot the wiped DB,
swap via a helper pod on the ReadWriteOnce PVC, fix UID 911 ownership.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:04:28 -07:00
Erich Blume
34fa2ef28a C0: ringtail — restore sway default keybindings, fix fuzzel border config
Extend (not replace) home-manager's default sway keybindings via
lib.mkOptionDefault, with lib.mkForce on the custom overrides that
conflict with defaults. Add Mod+F1 cheatsheet binding (fuzzel-filterable).

Move fuzzel's border-radius/border-width out of [main] into a proper
[border] section with the expected short names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 12:16:02 -07:00
Erich Blume
88eabc3de6 Disable Xalia 2026-04-21 14:47:13 -07:00
7d94b9073a C0: docs — default argocd login to --sso; drop extraneous --grpc-web
Now that argocd's Authentik OAuth2 client is public, `argocd login --sso`
works for day-to-day use. Promote it to the default in AGENTS.md,
argocd-cli reference, and troubleshooting; keep the admin/password flow
documented as a break-glass fallback for when Authentik is unavailable.

Also drops --grpc-web from every interactive login command — confirmed
extraneous (login succeeds without it). Left in CI workflows and
`argocd cluster add` untouched; those are different contexts that I
didn't re-test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:43:21 -07:00
86317315ed C0: remove argocd OIDC client_secret wiring
Now that argocd's Authentik OAuth2 client is public (PKCE-only), the
client_secret plumbing is dead code:

- delete argocd-oidc-authentik ExternalSecret and drop it from kustomization
- remove AUTHENTIK_ARGOCD_CLIENT_SECRET env from authentik-worker
- remove argocd-client-secret mapping from authentik-config ExternalSecret

The argocd-client-secret field in the 1Password "Authentik (blumeops)"
item is now unreferenced and can be deleted there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:38:26 -07:00
0e62ad5596 C0: argocd OIDC — switch to public client for CLI SSO
Changes argocd's Authentik OAuth2 client from confidential to public and
drops the clientSecret from argocd-cm. Public + PKCE works for both the
web UI (argocd-server backend) and the argocd CLI (`argocd login --sso`)
without a shared secret, matching OAuth 2.1 guidance.

Confidential → public was needed because the CLI can't hold a client
secret; Authentik's per-app issuer model made the alternative
("cliClientID" pattern with separate public client) awkward since it
requires a shared issuer across apps which Authentik doesn't serve.

Follow-up: deadcode AUTHENTIK_ARGOCD_CLIENT_SECRET env wiring and the
argocd-oidc-authentik ExternalSecret once verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:34:39 -07:00
225b0e7008 C0: allow argocd CLI --sso localhost callback
Adds http://localhost:8085/auth/callback to the ArgoCD OAuth2 provider's
redirect_uris so `argocd login --sso` works. Loopback redirect is the
RFC 8252 pattern for native CLI apps; PKCE (already enabled) covers the
code-interception risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:18:08 -07:00
e6a6a6042e C0: suggest mise run runner-logs in container-build-and-release
After dispatching, poll the Forgejo API for the run matching our
head_sha and print `mise run runner-logs <N>` so the suggested monitor
command is one copy-paste away. Falls back to the bare command if the
poll times out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:12:00 -07:00
0ceafc374d C0: review operator-managed-pods CC (2026-04-21)
Tailscale operator still defaults to privileged proxy pods with no
seccomp profile (issue #7359 open upstream). Control remains valid.
Added note about ProxyClass + device plugin remediation path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:59:48 -07:00
a9ef02a602 C0: bump frigate-notify to v0.5.4-e928054-nix (workdir fix)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:44:24 -07:00
e92805409e fix(frigate-notify): set WorkingDir=/app and create writable /app
The upstream binary expects CWD=/app (relative config.yml lookup,
lumberjack logfile at ./log/app.log). Without this, the pod crashed on
startup — the ConfigMap-mounted /app/config.yml wasn't found and zerolog
spammed "mkdir log: permission denied" as it tried to create ./log at
/ as nonroot.

Creates /app as 1777 (tmp-style) so nonroot can write logs; WorkingDir
set to /app so the default config path resolves correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:43:00 -07:00
c88b6d773c C0: point frigate-notify at local registry tag v0.5.4-fb4bf5a-nix
Built from main in run #516 after #339 merged. Follows the navidrome
kustomization convention (deployment image = local ref + :kustomized,
kustomization override = newTag only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:31:29 -07:00
fb4bf5a7a3 Add frigate-notify nix container build (#339)
## Summary
- Mirrors `github.com/0x2142/frigate-notify` at `v0.5.4` to `forge.ops.eblu.me/mirrors/frigate-notify`.
- Adds `containers/frigate-notify/default.nix` — `buildGoModule` + `dockerTools.buildLayeredImage`, following the `ntfy` pattern.
- Uses `-tags goolm` to avoid the libolm CGO dependency (matrix notifier is imported unconditionally in the upstream but we only use ntfy alerts).
- Runs as nonroot (UID 65534), exposes port 8000, bundles `cacert`/`tzdata`.

## Why
Move `ghcr.io/0x2142/frigate-notify:v0.5.4` (ringtail-deployed) under local control. Aligns with the [[indri → ringtail migration plan]] and the `default.nix` convention for ringtail-targeted containers documented in [[build-container-image]].

## Verification
- `dagger call build-nix --src=. --container-name=frigate-notify export --path=./out.tar.gz` produces a valid 20MB docker archive (10 layers) with `blumeops/frigate-notify` tag locally.
- Hashes pinned for `fetchgit` (src) and `vendorHash` (go modules).

## Follow-up (post-merge)
1. `mise run container-build-and-release frigate-notify` — release from main SHA.
2. C0 follow-up: update `argocd/manifests/frigate/kustomization.yaml` image ref to `registry.ops.eblu.me/blumeops/frigate-notify:v0.5.4-<sha>-nix`.
3. ArgoCD auto-syncs the deployment.

## Test plan
- [ ] `dagger call build-nix` succeeds from a clean checkout.
- [ ] `mise run container-build-and-release frigate-notify --dry-run` looks correct.
- [ ] After release + kustomization swap: frigate-notify pod comes up healthy on ringtail; ntfy alerts still fire on Frigate events.

Reviewed-on: #339
2026-04-21 09:28:02 -07:00
30f39ae050 Review contributing tutorial: add last-reviewed, .ai.md fragment type, prek provenance
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:53:41 -07:00
fb32cc07c4 chore: repoint runner-job-image tag at CI-built v0.20.6-50f8c2a
Swaps the k8s runner label from the local bootstrap tag (v0.20.6-9b6be09)
to the equivalent image rebuilt by CI from main. Functionally identical;
closes the bootstrap loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:38:33 -07:00
50f8c2a33f Roll k8s runner to runner-job-image v0.20.6-9b6be09
Points the k8s Forgejo runner label at the locally-bootstrapped
runner-job-image built from the Alpine container.py on this branch.
Once merged, CI will rebuild the same image from the same SHA.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:28:18 -07:00
db8fd946ae Bump Dagger to 0.20.6 and migrate runner-job-image to Alpine container.py
Bumps the Dagger engine/CLI from v0.20.1 to v0.20.6 (mise pin, dagger.json
engineVersion, SDK regen) and rewrites the runner-job-image container as a
native Dagger pipeline on Alpine 3.23 using the shared alpine_runtime helper,
replacing the Debian-based Dockerfile. All Forgejo Actions in this repo use
actions/checkout (a JS action), so musl is not a compatibility concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:28:18 -07:00
Erich Blume
58fe4f0073 ty 2026-04-20 15:48:15 -07:00
54841dbf70 Update ringtail flake inputs 2026-04-20 10:09:16 -07:00
d6ad8e8e59 chore: refresh forgejo-runner review date 2026-04-20 09:15:35 -07:00
21177ff47f chore: update forgejo-runner image tag 2026-04-20 09:11:37 -07:00
1425bf1f5c Upgrade forgejo-runner to v12.8, adopt server.connections, and clean up docs (#338)
## Summary
- consolidate forgejo-runner how-to docs into current cards
- upgrade the k8s forgejo-runner deployment to the latest v12.8.x runner image
- switch the k8s runner from first-boot register flow to declarative server.connections config
- keep the runner image on the native Dagger build path and update the surrounding manifests/secrets

## Notes
- PR opened early for C1 review
- implementation and deployment verification will follow in subsequent commits

Reviewed-on: #338
2026-04-20 09:03:54 -07:00
312 changed files with 7252 additions and 3677 deletions

View file

@ -178,10 +178,11 @@ jobs:
echo "## Documentation"
echo ""
echo "Download \`$TARBALL\` and configure the quartz container with:"
echo "Download \`$TARBALL\` directly, or bump \`docs_version\`"
echo "in \`ansible/roles/docs/defaults/main.yml\` and run:"
echo ""
echo "\`\`\`"
echo "DOCS_RELEASE_URL=https://forge.eblu.me/eblume/blumeops/releases/download/$VERSION/$TARBALL"
echo "mise run provision-indri -- --tags docs"
echo "\`\`\`"
} > /tmp/release_body.txt
@ -223,18 +224,16 @@ jobs:
echo ""
echo "Release created successfully!"
- name: Update docs deployment
- name: Bump docs_version in ansible role
run: |
VERSION="${{ steps.version.outputs.version }}"
TARBALL="docs-${VERSION}.tar.gz"
DEPLOYMENT_FILE="argocd/manifests/docs/deployment.yaml"
RELEASE_URL="https://forge.eblu.me/eblume/blumeops/releases/download/${VERSION}/${TARBALL}"
DEFAULTS_FILE="ansible/roles/docs/defaults/main.yml"
echo "Updating $DEPLOYMENT_FILE with new release URL..."
yq -i "(.spec.template.spec.containers[0].env[] | select(.name == \"DOCS_RELEASE_URL\")).value = \"${RELEASE_URL}\"" "$DEPLOYMENT_FILE"
echo "Bumping docs_version in $DEFAULTS_FILE to ${VERSION}..."
yq -i ".docs_version = \"${VERSION}\"" "$DEFAULTS_FILE"
echo "Updated deployment:"
grep -A1 "DOCS_RELEASE_URL" "$DEPLOYMENT_FILE"
echo "Updated defaults:"
grep -E "^docs_version:" "$DEFAULTS_FILE"
- name: Commit release changes
env:
@ -248,7 +247,7 @@ jobs:
git config user.email "actions@forge.ops.eblu.me"
# Stage deployment changes
git add argocd/manifests/docs/deployment.yaml
git add ansible/roles/docs/defaults/main.yml
# Stage changelog changes if updated
if [ "$CHANGELOG_UPDATED" = "true" ]; then
@ -270,34 +269,6 @@ jobs:
echo "Changes committed and pushed"
fi
- name: Deploy docs
env:
ARGOCD_AUTH_TOKEN: ${{ secrets.ARGOCD_AUTH_TOKEN }}
run: |
echo "Syncing docs app via ArgoCD..."
# Sync docs app (uses ARGOCD_AUTH_TOKEN env var for auth)
argocd app sync docs \
--server argocd.ops.eblu.me \
--grpc-web \
--prune
# Wait for sync to complete
argocd app wait docs \
--server argocd.ops.eblu.me \
--grpc-web \
--timeout 120
echo "Docs app synced successfully!"
- name: Purge Fly.io proxy cache
env:
FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }}
run: |
echo "Purging nginx cache on Fly.io proxy..."
fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"
echo "Cache purged"
- name: Summary
run: |
VERSION="${{ steps.version.outputs.version }}"
@ -309,5 +280,12 @@ jobs:
echo "Release URL:"
echo " https://forge.eblu.me/eblume/blumeops/releases/tag/$VERSION"
echo ""
echo "Asset URL (for DOCS_RELEASE_URL ConfigMap):"
echo "Asset URL:"
echo " https://forge.eblu.me/eblume/blumeops/releases/download/$VERSION/$TARBALL"
echo ""
echo "To deploy on indri, run from gilbert:"
echo " mise run provision-indri -- --tags docs"
echo ""
echo "Then purge the Fly.io proxy cache:"
echo " fly ssh console -a blumeops-proxy -C \\"
echo " \"sh -c 'rm -rf /tmp/cache && nginx -s reload'\""

View file

@ -1,12 +1,14 @@
# CV Deploy Workflow
#
# Updates the CV deployment to a specific package version, commits
# the change, and syncs via ArgoCD.
# Bumps cv_version in ansible/roles/cv/defaults/main.yml and pushes the change.
# Deployment to indri is manual (runner has no SSH access to indri):
# mise run provision-indri -- --tags cv
#
# Usage:
# 1. Release a new CV package from the cv repo first
# 2. Go to Actions > Deploy CV > Run workflow
# 3. Enter the version to deploy, or leave as "latest"
# 4. Run the command above on gilbert to apply
name: Deploy CV
@ -60,18 +62,16 @@ jobs:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Update CV deployment
- name: Bump cv_version in ansible role
run: |
VERSION="${{ steps.version.outputs.version }}"
TARBALL="cv-${VERSION}.tar.gz"
DEPLOYMENT_FILE="argocd/manifests/cv/deployment.yaml"
RELEASE_URL="https://forge.eblu.me/api/packages/eblume/generic/cv/${VERSION}/${TARBALL}"
DEFAULTS_FILE="ansible/roles/cv/defaults/main.yml"
echo "Updating $DEPLOYMENT_FILE with CV_RELEASE_URL..."
yq -i "(.spec.template.spec.containers[0].env[] | select(.name == \"CV_RELEASE_URL\")).value = \"${RELEASE_URL}\"" "$DEPLOYMENT_FILE"
echo "Bumping cv_version in $DEFAULTS_FILE to ${VERSION}..."
yq -i ".cv_version = \"${VERSION}\"" "$DEFAULTS_FILE"
echo "Updated deployment:"
grep -A1 "CV_RELEASE_URL" "$DEPLOYMENT_FILE"
echo "Updated defaults:"
grep -E "^cv_version:" "$DEFAULTS_FILE"
- name: Commit release changes
env:
@ -82,7 +82,7 @@ jobs:
git config user.name "Forgejo Actions"
git config user.email "actions@forge.ops.eblu.me"
git add argocd/manifests/cv/deployment.yaml
git add ansible/roles/cv/defaults/main.yml
if git diff --cached --quiet; then
echo "No changes to commit (already at $VERSION)"
@ -94,38 +94,16 @@ jobs:
echo "Changes committed and pushed"
fi
- name: Deploy CV
env:
ARGOCD_AUTH_TOKEN: ${{ secrets.ARGOCD_AUTH_TOKEN }}
run: |
echo "Syncing CV app via ArgoCD..."
argocd app sync cv \
--server argocd.ops.eblu.me \
--grpc-web \
--prune
argocd app wait cv \
--server argocd.ops.eblu.me \
--grpc-web \
--timeout 120
echo "CV app synced successfully!"
- name: Purge Fly.io proxy cache
env:
FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }}
run: |
echo "Purging nginx cache on Fly.io proxy..."
fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"
echo "Cache purged"
- name: Summary
run: |
VERSION="${{ steps.version.outputs.version }}"
echo "================================================"
echo "CV Deployed: $VERSION"
echo "CV version bumped: $VERSION"
echo "================================================"
echo ""
echo "CV should now be live at:"
echo " https://cv.ops.eblu.me/"
echo "To deploy on indri, run from gilbert:"
echo " mise run provision-indri -- --tags cv"
echo ""
echo "Then purge the Fly.io proxy cache:"
echo " fly ssh console -a blumeops-proxy -C \\"
echo " \"sh -c 'rm -rf /tmp/cache && nginx -s reload'\""

3
.gitignore vendored
View file

@ -1,5 +1,6 @@
.claude/settings.local.json
.claude/agent-memory/
.claude/scheduled_tasks.lock
# Python
__pycache__/
@ -12,3 +13,5 @@ __pycache__/
# OS
.DS_Store
/**/__pycache__
/.env

View file

@ -65,7 +65,7 @@ See [[agent-change-process]] for the full methodology.
./pulumi/ # Pulumi IaC (tailnet ACLs, dns, cloud)
~/.config/{nvim,fish} # user's shell config, managed by chezmoi
~/code/personal/ # user's projects
~/code/personal/zk # user's Obsidian-sync managed zettelkasten. Potential source for reference data.
~/code/personal/zk # user's zettelkasten (Obsidian-sync). Reference-data source; migrating into heph docs (hephaestus).
~/code/3rd/ # mirrored external projects
~/code/work # FORBIDDEN
```
@ -86,7 +86,7 @@ Most services run in minikube on indri via ArgoCD (app-of-apps, manual sync). GP
**Commands:** `argocd app list|get|diff|sync <app>`
**Login:** `argocd login argocd.ops.eblu.me --username admin --password "$(op read 'op://vg6xf6vvfmoh5hqjjhlhbeoaie/srogeebssulhtb6tnqd7ls6qey/password')"`
**Login:** `argocd login argocd.ops.eblu.me --sso` (opens browser for Authentik SSO). Admin fallback for break-glass: `argocd login argocd.ops.eblu.me --username admin --password "$(op read 'op://vg6xf6vvfmoh5hqjjhlhbeoaie/srogeebssulhtb6tnqd7ls6qey/password')"`
### Indri (Ansible)
@ -147,10 +147,16 @@ Create a new spork: `mise run spork-create <mirror-name>`
## Task Discovery
BlumeOps tasks live in [hephaestus](https://github.com/eblume/hephaestus) (`heph`),
the user's self-hosted context/task system. Fetch them with the CLI:
```fish
mise run blumeops-tasks # fetch from Todoist, sorted by priority
heph list --project Blumeops --json # outstanding Blumeops tasks as JSON
```
Most tasks are stored in `./mise-tasks/`. For scripts with any logic or
(This replaced the retired `blumeops-tasks` mise task, which read from Todoist.)
Most operational scripts are stored in `./mise-tasks/`. For scripts with any logic or
complexity, use uv run --script 's with explicit dependencies. Complex
workflows with artifacts should become dagger pipelines. Mise tasks are for
development processes and operations - tools for the user or the agent.

View file

@ -12,6 +12,259 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
<!-- towncrier release notes start -->
## [v1.17.0] - 2026-06-03
### Features
- Deploy the Adelaide / Heidi / Addie baby shower app — guest splash, raffle
picker, and prize assignment console — on ringtail k3s with `shower.eblu.me`
as the public entry and `shower.ops.eblu.me` as the tailnet admin host. App
source: [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app).
- Deploy adelaide-baby-shower-app v1.1.0 to ringtail k3s. Replaces the
boolean lock with a four-phase `ShowerState` (`pre_event``party`
`prizes_locked``event_locked`), adds an append-only "guest memories"
panel where guests can leave photos and comments for the baby, and
polishes the admin and QR views. Three Django migrations
(`0009_shower_phase`, `0010_guest_memories`, `0011_book_description`)
run automatically in the entrypoint against the SQLite PV. No config
or env-var changes.
Container build also gains a Forgejo-PyPI workaround: Forgejo's simple
index returns absolute file URLs hardcoded to the public ROOT_URL
(`forge.eblu.me`), which the Fly edge 403s on `/api/packages/*`. The
wheel and sdist are now both pulled via direct `fetchurl` against
`forge.ops.eblu.me` (tailnet-only) and the wheel is handed to pip as
a local path.
- `review-compliance-reports` now also fetches and summarizes the weekly Prowler container-image and IaC scans (previously only the K8s CIS in-cluster scan was processed). For each scan it shows status counts, severity breakdown, week-over-week delta, and — for the high-volume image/IaC scans — top-N tables grouped by check ID and resource instead of per-finding listings.
- runner-logs now authenticates with Forgejo API token and auto-detects the repo from git remote. Job logs are fetched via SSH to indri (reading Forgejo's on-disk zstd log files) instead of the web endpoint, which doesn't support token auth for private repos.
### Bug Fixes
- Fix nightly borgmatic backups failing for 2 days. The shower SQLite
dump hook referenced `kubectl --context=k3s-ringtail`, but indri's
kubeconfig deliberately doesn't carry the ringtail credentials. The
`before_backup` hook's failure aborted the entire run, taking out
*both* the local sifaka repo and the BorgBase offsite. Replaced
the inline-shell dump with a `~/bin/borgmatic-k8s-sqlite-dump`
helper deployed by the ansible role. Each dump entry now declares a
`target` of either `local:<context>` (mealie — kubectl uses indri's
kubeconfig) or `ssh:<user@host>` (shower — ssh into ringtail and
run `k3s kubectl` there, no indri-side kubeconfig needed; k3s.yaml
on ringtail is mode 644 so no sudo required). Bytes stream back via
`kubectl exec ... -- cat` rather than `kubectl cp`, since `kubectl
cp` requires `tar` inside the pod and nix-built images like shower
don't bundle it.
- Shower app container now bakes the wheel + Python deps into the image
at build time via `buildPythonPackage` instead of pip-installing on
first boot. Boots are deterministic and don't depend on forge PyPI
being reachable from the pod. The `wheelHash` in
`containers/shower/default.nix` is the sha256 sourced from the
[forge PyPI simple index](https://forge.eblu.me/api/packages/eblume/pypi/simple/adelaide-baby-shower-app/);
bumping the version means bumping that hash too.
Borgmatic now covers the shower app: SQLite is dumped from the live
pod via `kubectl exec` (mirroring the existing mealie entry, with
`context: k3s-ringtail`), and the prize-photo media share is picked up
through `/Volumes/shower` (sifaka SMB mount on indri, same pattern as
`/Volumes/photos`).
- Disabled adaptive sync (VRR) on ringtail's DP-1 output. The OMEN 27i IPS panel pumps brightness when its refresh rate swings into the low VRR range during low-framerate content (e.g. game cutscenes), producing a flicker that worsened over a session until a reboot. Pinning the panel to a fixed 165Hz eliminates it.
- Fixed forge.eblu.me static assets (CSS, JS, images, fonts) not loading — the proxy's static asset cache block was missing the `Host` header, so Caddy couldn't route the requests.
- Fixed homepage container EACCES on cold start: the nix-built image now chowns
`/app/config` to uid 1000 at build time via `fakeRootCommands`, matching the
behavior of the old Dockerfile. Without this, homepage couldn't seed missing
skeleton configs (proxmox.yaml etc.) or create `/app/config/logs`, crashing on
its first uncached request. Caught during the ringtail cutover.
- Fixed sway keybindings on ringtail — the home-manager `keybindings` block was replacing the module's defaults entirely, leaving only explicit overrides (no workspace switching, focus, move, splits, resize mode, etc). Switched to `lib.mkOptionDefault` with `lib.mkForce` on the conflicting custom binds (`Mod+Return`, `Mod+d`, `Mod+space`, `Mod+l`) so defaults merge back in. Also added `Mod+F1` to show a filterable fuzzel list of current keybindings.
Fixed fuzzel config errors on launch — `border-radius` and `border-width` were under `[main]`, but fuzzel expects them as `radius`/`width` under a `[border]` section.
- Pin the Quartz docs build to v4.5.2. The Dagger `build_docs` pipeline cloned Quartz from the default branch unpinned; Quartz v5.0.0 restructured its config layout (`.quartz/plugins`, `../quartz` imports) and broke the docs build against our existing `quartz.config.ts`/`quartz.layout.ts`.
### Infrastructure
- Wire the ringtail `blumeops-pg` cluster (which holds the wave-1-migrated
paperless + teslamate databases) into backups and Grafana. Adds a Tailscale
LoadBalancer Service (`blumeops-pg-ringtail.tail8d86e.ts.net`) and a Caddy L4
route (`pg.ops.eblu.me:5434`), then repoints borgmatic's `teslamate` +
`paperless` postgres dumps and the `mealie` SQLite dump at ringtail, and the
Grafana TeslaMate datasource at the ringtail DB. Closes the backup gap that
opened at cutover (the migrated live data was still being backed up from the
now-frozen minikube copies) and unblocks the wave-1 decommission.
- Migrated homepage dashboard from minikube (indri/arm64) to k3s (ringtail/amd64).
The container is now built via nix (`containers/homepage/default.nix`), adapted
from nixpkgs `homepage-dashboard` with the upstream Next.js cache patches and
wrapped with `dockerTools.buildLayeredImage`. Autodiscovery shifts: services on
minikube (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus,
Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries
in `services.yaml`; ringtail services (Authentik, Frigate/NVR, Ntfy, Ollama)
auto-populate via Ingress annotations.
- Migrated CV (`cv.eblu.me`) and Docs (`docs.eblu.me`) from minikube Deployments to indri-native ansible roles. Caddy now serves the extracted release tarballs directly via a new `kind: static` service-block in the Caddy template — no daemon, no container — replacing the prior nginx-in-a-pod layer. Removes a network hop on every request and shrinks minikube's footprint. See [[cv-on-indri]] and [[docs-on-indri]]. Part of the broader minikube wind-down.
- Migrated devpi (PyPI mirror at `pypi.ops.eblu.me`) from a minikube StatefulSet to a launchd-managed service on indri. devpi-server now runs in a uv-managed venv with pinned `devpi-server` and `devpi-web` versions, listens on `127.0.0.1:3141`, and is fronted by Caddy. The minikube StatefulSet was crash-looping under memory pressure (and breaking the Python toolchain everywhere); the new layout removes a layer of dependency on cluster health for critical-path tooling. See [[devpi-on-indri]].
- Move the entire Immich stack — server, machine-learning, valkey,
and the PostgreSQL+VectorChord cluster — off `minikube-indri` and
onto `k3s-ringtail`. Postgres data migrated zero-loss via CNPG
`pg_basebackup` (replica catch-up then promote); row counts on
`asset`, `user`, `album`, `smart_search`, `activity`, `asset_face`
verified equal between source and replica before cutover. The ML
pod now uses ringtail's RTX 4080 via the nvidia-device-plugin
(time-slicing bumped 2 → 4 to share with frigate + ollama). Caddy
routing at `photos.ops.eblu.me` is unchanged (still
`photos.tail8d86e.ts.net`, the device just lives on ringtail now).
Borgmatic backups continue against the same `immich-pg` tailnet
hostname. First concrete chain in the broader indri-k8s
decommission effort.
- Add local nix container build for `tailscale` (`containers/tailscale/default.nix`) so ringtail's tailscale-operator ProxyClass proxy pods pull from the forge mirror instead of `docker.io/tailscale/tailscale`. Pinned at v1.94.2 to match `service-versions.yaml`. Indri's tailscale-operator continues to use upstream during the k8s-to-ringtail migration.
- Address the 6 critical Prowler IaC findings against `argocd/manifests/`. Prowler's IaC provider hardcodes `self._mutelist = None` and delegates filtering to Trivy, but doesn't plumb `--ignorefile` through — so the documented "use Trivy filtering" path is actually broken. Added a shim around `trivy` in the Prowler image that injects `--ignorefile $TRIVY_IGNOREFILE` for `trivy fs` invocations when the env var points at a real file. The IaC cronjob now mounts `mutelist/trivyignore.yaml` (Trivy's per-path schema) and sets the env var, muting the `external-secrets` and `kube-state-metrics` Secret-access findings (KSV-0041, KSV-0114). Separately, `grafana-clusterrole` is tightened to remove `secrets` access entirely: the dashboard sidecar already only consumes ConfigMap-labeled dashboards, so its `RESOURCE` env var is now `configmap` instead of `both`.
- Pin ringtail's wired IP to `192.168.1.21` via NixOS scripted networking; NetworkManager no longer manages `enp5s0`. Removes DHCP lease renewal as a failure mode after a silent lease teardown took ringtail offline. Also explicitly enables `net.ipv4.ip_forward` (previously set implicitly by scripted-DHCP) so k3s pod networking and Tailscale routing continue to work with static networking.
- Ripped out the compensating-controls (CC) framework: deleted `compensating-controls.yaml`, the `review-compensating-controls` mise task, and the associated how-to / explanation docs. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files remain in place but no longer carry `CC: <id>` prefixes — each entry just keeps a free-form `Description` of why the finding is muted. The CC review cadence proved to be more overhead than this single-operator homelab needed.
- Wire shower app for public exposure: fly nginx `shower.eblu.me` server
block as a guest-only surface — splash page, `/prizes/<token>/`, static
assets, media. Everything authenticated (`/admin/`, `/host/`,
`/accounts/`) returns 403 with a "tailnet only" pointer. Staff hit
`shower.ops.eblu.me` for the operator console + admin; the app's
v1.0.1 `DJANGO_PUBLIC_URL_BASE` setting makes QR codes generated on
the tailnet point back at the WAN host for guests. Plus a Caddy route
on indri, Pulumi Gandi CNAME, and a Grafana APM dashboard tracking
request rate, error rate, latency, bandwidth, and access logs.
- Mirror Valkey 8.1 locally as `registry.ops.eblu.me/blumeops/valkey`. Replaces direct pulls of `docker.io/valkey/valkey:8.1-alpine` for paperless and immich sidecars. Built via native Dagger pipeline on Alpine 3.22. Stateless swap — no data migration. Authentik's nix-built Redis remains separate.
- Add nix-built amd64 valkey for ringtail (`containers/valkey/default.nix`) so immich-ringtail can stop pulling the upstream multi-arch `docker.io/valkey/valkey` image. Existing `container.py` continues to build Alpine arm64 for paperless on indri. Both bump to valkey 8.1.7 (Alpine 3.22 8.1.7-r0 / nixpkgs 8.1.7).
- Upgrade Grafana Alloy v1.14.0 → v1.16.0 across all four service deployments
(alloy-k8s, alloy-ringtail, alloy-tracing-ringtail on k8s; alloy native on
indri). Pulls in stable database observability (v1.15) and the OTel Collector
v0.147.0 bump. Container build also migrated from Dockerfile to native Dagger
`container.py` per the build-container-image migration playbook.
- Upgraded Dagger from v0.20.1 to v0.20.6 (engine, CLI pin, and SDK regen) and migrated `runner-job-image` from a Debian-based Dockerfile to a native Dagger `container.py` on Alpine 3.23, reusing the shared `alpine_runtime` helper.
- Decommission the wave-1 services on minikube-indri now that paperless,
teslamate, and mealie run on ringtail with their data backed up. Removes the
minikube `paperless`/`teslamate`/`mealie` manifest dirs + ArgoCD app
definitions (pruning the parked Deployments, Services, and the redundant
minikube mealie/paperless PVCs), and drops the `paperless`/`teslamate` roles
from the minikube `blumeops-pg` cluster. The `paperless` and `teslamate`
databases are dropped from indri's blumeops-pg as the finalization step.
miniflux + authentik remain on the minikube cluster (later waves).
- Upgraded the k8s Forgejo runner to the v12.8 line, switched it from first-boot registration to declarative `server.connections` credentials from 1Password, and consolidated the supporting runner how-to documentation.
- Move paperless, teslamate, and mealie off `minikube-indri` onto
`k3s-ringtail`, shedding ~1.1 GiB of resident load from the
OOM-thrashing 8 GiB minikube node (the kernel OOM killer had been
killing `kube-apiserver`/`dockerd`/argocd, flapping every
minikube-hosted service at once). paperless + teslamate databases
move into a fresh CNPG `blumeops-pg` cluster on ringtail via a cold
`pg_dump`/`pg_restore` from the quiesced source — row counts verified
equal before any routing flip; source DBs dropped only after the
ringtail side serves traffic. mealie's SQLite PVC is copied as-is.
paperless media stays on sifaka NFS. Downtime-tolerant cold cutover
(no streaming replication); rollback is repoint-and-scale-up with the
source untouched. Second chain in the indri-k8s decommission after
[[migrate-immich-to-ringtail]].
- Recurring maintenance batch:
- Ringtail flake inputs refreshed (`disko`, `home-manager`, `nixpkgs`).
- Tooling deps bumped: prek hooks (trufflehog v3.95.3, kingfisher v1.101.0, ruff v0.15.14, `ansible-core` 2.21.0); fly proxy base images (nginx 1.30.1-alpine, alloy v1.16.1); `typer==0.26.2` in mise tasks.
- Updated `nixos/ringtail/flake.lock` (weekly cadence): `disko`, `home-manager`, and `nixpkgs` inputs refreshed. `nixpkgs-services` skipped per overlay convention.
- Reviewed `mealie` service version freshness; upstream is 5 minor versions ahead (v3.17.0 vs deployed v3.12.0). Marked reviewed; upgrade deferred.
- Deploy shower v1.1.2 — bump container build to new app release.
- Upgrade unpoller v2.34.0 → v3.2.0 and migrate container build from Dockerfile to native Dagger (container.py). v3.0.0 carries breaking UniFi API changes; v3.2.0 introduces a 60s background poll (cached scrapes) by default — set `interval = 0` in `up.conf` to restore on-demand polling.
- Monthly tooling dependency refresh: prek hooks (trufflehog, kingfisher, ruff, shfmt, prettier, actionlint, ansible-lint), fly proxy base images (nginx 1.30.0, tailscale v1.94.2, alloy v1.16.0), normalize pyyaml lower bound in mise-tasks.
- Add GE-Proton (`pkgs.proton-ge-bin`) to `programs.steam.extraCompatPackages`
on ringtail. Subnautica 2 hangs at Mercuna plugin init under Proton
Experimental + DXVK D3D12; GE-Proton is available as a Steam per-game
compatibility option to work around it.
- Add `sn2-prelaunch` Steam launch wrapper on ringtail that removes
Subnautica 2's stale `Saved/running.dat` and `Saved/beforelobby.dat`
lockfiles before each launch. SN2 pops up an invisible (0×0-sized)
Error dialog when it detects an unclean exit, blocking GameThread
forever; this is observable only as a black screen with a spinning
loader. Use via Steam launch option: `sn2-prelaunch %command%`.
- Add local nix container build for `frigate-notify` (`containers/frigate-notify/default.nix`) so the Frigate→ntfy bridge is rebuilt on ringtail from the forge mirror instead of pulled from `ghcr.io/0x2142/frigate-notify`.
- Add resource limits to all ArgoCD pods to prevent unbounded resource consumption during node-wide pressure events.
- Black-hole the `/mirrors/*` repositories at the Fly proxy edge (`return 403``forge.ops.eblu.me`). A surprise $29.60 Fly bill traced to ~1.24 TB/30d of egress on `forge.eblu.me`, 99.95% of all proxy egress — of which ~71% was AI scrapers (Meta `meta-externalagent`, OpenAI `GPTBot`, Amazonbot) crawling the near-infinite git-history URL space of the public mirror repos and timing out Forgejo in the process. Mirrors exist for supply-chain control and are consumed over the tailnet, so their public web UI had no legitimate audience. `robots.txt` already disallowed `/mirrors/`, but the offending agents ignore it. Tier-2 mitigations (user-agent denylist, Anubis proof-of-work gateway) are documented in `docs/explanation/ai-scraper-mitigation.md`.
- Bump paperless and immich kustomizations to the main-SHA-built valkey tag (`v8.1.6-r0-fabca04`). Routine post-merge follow-up to keep production manifests pointing at images built from a commit on main.
- Bump shower container to v1.1.1 (probe FOD hash).
- Bumped shower app to v1.1.3 (wheel/sdist + FOD hashes probed on ringtail).
- Cap systemd-coredump on ringtail (ProcessSizeMax/ExternalSizeMax 1G, MaxUse 2G) so multi-GB Wine/Proton game crash dumps no longer thrash the disk and lock up the desktop.
- Deploy shower v1.1.1 to ringtail (kustomize newTag bump).
- Deployed shower v1.1.3 to ringtail (image built and pushed from ringtail; runner bypassed due to indri overload).
- Fix three follow-ups from the wave-1 decommission: grant the local
break-glass `admin` account ArgoCD admin rights (`g, admin, role:admin`
previously only the Authentik `admins` group had access, so admin was
locked out whenever its token expired), and repoint the alloy blackbox
probe for teslamate from the deleted minikube service to
`https://tesla.ops.eblu.me/` (through Caddy over Tailscale). The orphaned
paperless/teslamate roles + ExternalSecrets left on the minikube
blumeops-pg are also cleaned up.
- Moved the Immich blackbox health probe from indri's alloy to ringtail's alloy. After the immich migration to ringtail, the probe still targeted `immich-server.immich.svc.cluster.local` on indri's cluster where the service no longer exists, causing a persistent `ServiceProbeFailure` alert.
- Pin shower v1.1.1 FOD outputHash (probed locally on ringtail).
- Rebuild Prowler container against main HEAD (v5.23.0-495e45d) after merging the IaC mutelist Dockerfile changes.
- Rebuild and retag alloy v1.16.0 container images from the main-branch SHA
following the squash-merge of #345, per the build-container-image
squash-merge convention. Both images (`registry.ops.eblu.me/blumeops/alloy`)
now reference `9564435` rather than the branch SHA `26a3ab5`, restoring
source traceability after branch cleanup.
- Rebuild shower from the post-merge commit on main so the container's
SHA tag points at a commit that will still exist after the 30-day
branch-cleanup window. Functionally identical to the branch-tag image
already deployed, just preserves source traceability per
[[build-container-image#Squash-merge and container tags]].
- Rebuild unpoller container from squashed main commit so the image SHA tag matches a commit in main's history (was tagged with the pre-squash branch SHA).
- Rebuild valkey container from squashed main commit (both arm64 dagger and amd64 nix variants), and update paperless + immich-ringtail kustomizations to the main-SHA tags `v8.1.7-ecded30` and `v8.1.7-ecded30-nix`.
- Retired the `blumeops-tasks` mise task (Todoist API) in favor of `heph list --project Blumeops --json` from the self-hosted [hephaestus](https://github.com/eblume/hephaestus) system. Updated docs to point task discovery and rotation reminders at heph, and noted that the `~/code/personal/zk` zettelkasten is migrating into heph docs.
- Switch the Fly proxy deploy strategy from `bluegreen` to `immediate` in `fly/fly.toml`. With a single proxy machine, bluegreen offers little benefit — the green machine routinely failed to reach "started" inside Fly's default 5-minute deploy timeout (the cold-start sequence of `tailscaled``tailscale up` → wait-for-MagicDNS → nginx startup eats most of the budget), and the failed deploys would roll back. `immediate` replaces the machine in place with a brief downtime (~510s) but actually completes.
- Switch the ringtail provisioning playbook's blumeops clone URL from `forge.eblu.me` (public, via Fly proxy) to `forge.ops.eblu.me` (tailnet, direct via Caddy on indri). Ringtail is always on the tailnet, so the WAN round-trip is pure overhead — it also made `provision-ringtail` brittle whenever the Fly proxy was slow or down.
- Switched Grafana's deployment strategy from `RollingUpdate` to `Recreate`. With an RWO PVC holding the SQLite database and Bleve search index, `RollingUpdate` reliably crashloops the new pod on the index lock until rollout timeout. `Recreate` terminates the old pod first so the new one acquires the lock cleanly.
- Update `tailscale-operator-ringtail` ProxyClass to reference the `0108b68` main-SHA build of the tailscale container. Routine post-merge cleanup so the deployed image traces to a commit that survives PR branch cleanup.
- Update the ringtail NixOS flake lockfile (`nixos/ringtail/flake.lock`): bump
`nixpkgs` (b77b3de → 25f5383) and `disko` (5ba0c95 → 115e521) to latest.
`nixpkgs-services` was intentionally left pinned (skipped by the
`flake-update` pipeline). Routine recurring maintenance per [[manage-lockfile]].
- Upgrade native macOS Alloy on indri to v1.16.0. Built on gilbert with Go
1.26.2 + CGO (required for the macOS native DNS resolver, which Tailscale
MagicDNS depends on), scp'd to `~/.local/bin/alloy` on indri, codesigned,
and the LaunchAgent reloaded. Completes the v1.16.0 fleet upgrade started
in #345 — all four Alloy services (alloy-k8s, alloy-ringtail,
alloy-tracing-ringtail, alloy ansible) now run v1.16.0.
- Upgraded zot on indri from v2.1.15 to v2.1.16 (security fixes: TLS verification on metrics client, CORS Allow-Credentials suppression on wildcard origins, manifest/API-key body size limits).
### Documentation
- Reviewed `replicating-blumeops` tutorial: fixed "BluemeOps" typos (also in `contributing.md`) and added `last-reviewed` frontmatter.
- Reviewed [[indri]] reference card: added `devpi`, `cv`, and `docs` to the native-services list; widened the k8s note to reflect the growing set of apps now on ringtail and the planned indri-minikube decommission; added CPU/RAM specs.
- New how-to: rotate-fly-deploy-token. Documents the 75-day rotation cadence, why we use `org`-scoped tokens (silences the cosmetic metrics-token warning on `fly status` with marginal blast-radius cost given the single-app personal org), and the procedure for rotation + Forgejo Actions secret sync.
- Add `docs/explanation/ai-scraper-mitigation.md` — the egress-cost / AI-crawler threat model for the public Fly proxy, the tiered mitigation plan (Tier 1: mirror black-hole, shipped; Tier 2: user-agent denylist + Anubis; Tier 3: Cloudflare, rejected on principle), and the data behind it.
- Fix manage-forgejo-mirrors verify step — sync button is on the repo settings page ("Synchronize now"), not the main repo page.
- Fixed the `op item edit` invocation in the [[zot]] API-key rotation procedure: the previous `pbpaste | op item edit ... "field[password]=-"` stdin syntax is rejected by op 2.34 as "invalid JSON" (recent op versions treat piped input as a full JSON template, not a single field value). Procedure now reads the clipboard into a local fish variable and passes it as an inline assignment.
- Fixed the export-filename step in [[run-1password-backup]]: 1Password's desktop app names the export `1PasswordExport-<account-uuid>-<timestamp>.1pux` automatically rather than letting you save to a fixed name, so the procedure now points the task at that glob instead of pretending the default name is `1Password-export.1pux`.
- Refresh the contributing tutorial: add `last-reviewed`, include the `.ai.md` changelog fragment type, and clarify that `prek` is pinned via `mise`.
- Review and refresh the Navidrome reference card: add `last-reviewed`, correct the scanner env var name, document the current image/version, and record routing and runtime details from the manifests.
- Review and refresh the Ollama reference card: add `last-reviewed`, bump the documented image tag to 0.20.4, and add the two `qwen3.5` models now declared in `models.txt`.
- Reviewed [[1password]] reference card: added the `blumeops` vs `Personal` vault split, noted that `onepassword-connect` runs on both indri and ringtail (not just one cluster), and pulled the `op read` vs `op item get --fields` guidance up from agent memory into the card.
- Reviewed `index.md`; added ringtail to the infrastructure overview and stamped `last-reviewed`.
- Reviewed transmission card: corrected storage layout (`/config/` is emptyDir, watch dir disabled) and noted the Prometheus exporter sidecar.
- rotate-fly-deploy-token: combine mint+store into one command with both fish and bash forms; document the `op item edit` "Password item requires ps value" validator gotcha and the placeholder-password workaround.
### AI Assistance
- Adopt `AGENTS.md` as the canonical agent instruction file, keep `CLAUDE.md` as a compatibility shim, and update docs to reference the neutral file and the correct agent-change-process path.
- CLAUDE.md now imports AGENTS.md via `@AGENTS.md` instead of telling agents to go read it. Claude Code only auto-loads CLAUDE.md, so the prose shim was easy to skip; the import inlines AGENTS.md into the session prompt unconditionally.
### Miscellaneous
- Removed the dead minikube manifests, container builds, and tooling shims left behind after the cv + docs migration to indri-native (#342). Deletes `argocd/{apps,manifests}/{cv,docs}/`, `containers/{cv,quartz}/`, and the `quartz``docs` mapping in `mise-tasks/container-version-check`. Bumps `docs.current-version` to `v1.16.0` (the blumeops release tag) now that the legacy nginx-base version pin is gone.
- Rebuild shower v1.1.0 container from main HEAD (`3c7967e`) and bump the
kustomization tag to `v1.1.0-3c7967e-nix`. The PR was squash-merged, so
the branch commit `444ff91` baked into the prior tag isn't reachable
from main's history. The new tag points at a commit that exists on
main; image content is byte-identical because the FOD output is content
addressed and the inputs didn't change.
- Rebuild shower v1.1.2 from main HEAD (a33fa47) and retag — PR #358 was squash-merged so the branch SHA baked into the prior image tag isn't reachable from main. FOD is content-addressed, so image bytes are identical; only provenance changes.
- Remove the duplicate Homepage tiles for Mealie, Paperless, Immich, and
TeslaMate. Homepage runs on ringtail and autodiscovers ringtail Ingresses via
`gethomepage.dev/*` annotations; once these services migrated to ringtail they
were discovered automatically, making their leftover static `services.yaml`
entries (needed only while they lived on minikube) redundant.
- Removed the now-unused `containers/devpi/` Dagger build artifact. Devpi runs natively on indri via uv venv; the container image is no longer referenced anywhere. Doc examples in `docs/reference/tools/dagger.md` updated to use `miniflux` as the example container name.
- `container-build-and-release` now prints the specific `mise run runner-logs <N>` command after dispatching, polling the Forgejo API to resolve the run number for the commit it just triggered.
- `mise run runner-logs <run> -j <n>` now reports a clear error when the log file doesn't exist on indri (e.g. a runner crash that left `action_task.log_in_storage = 0`). Previously it printed only the header and exited 0, because `zstdcat` exits 0 with a "can't stat … -- ignored" stderr message and ssh+fish on indri swallows the remote exit code.
## [v1.16.0] - 2026-04-18
### Infrastructure

View file

@ -1,7 +1 @@
# CLAUDE.md
Claude Code compatibility shim.
The canonical agent instructions for this repository now live in [`AGENTS.md`](AGENTS.md).
If a tool specifically looks for `CLAUDE.md`, read `AGENTS.md` and follow that file as the source of truth.
@AGENTS.md

View file

@ -212,6 +212,23 @@
no_log: true
tags: [forgejo_metrics]
# Devpi root password (PyPI mirror admin)
- name: Fetch devpi root password
ansible.builtin.command:
cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/kyhzfifryqnuk7jeyibmmjvxxm/add more/root password"
delegate_to: localhost
register: _devpi_root_password
changed_when: false
no_log: true
check_mode: false
tags: [devpi]
- name: Set devpi root password fact
ansible.builtin.set_fact:
devpi_root_password: "{{ _devpi_root_password.stdout }}"
no_log: true
tags: [devpi]
roles:
- role: alloy
tags: alloy
@ -227,6 +244,8 @@
tags: zot
- role: zot_metrics
tags: zot_metrics
- role: devpi
tags: devpi
- role: minikube
tags: minikube
- role: minikube_metrics
@ -237,5 +256,11 @@
tags: jellyfin_metrics
- role: forgejo_metrics
tags: forgejo_metrics
- role: cv
tags: cv
- role: docs
tags: docs
- role: heph
tags: heph
- role: caddy
tags: caddy

View file

@ -57,7 +57,7 @@
tasks:
- name: Ensure blumeops repo is present
ansible.builtin.git:
repo: "https://forge.eblu.me/eblume/blumeops.git"
repo: "https://forge.ops.eblu.me/eblume/blumeops.git"
dest: /etc/blumeops
version: "{{ ringtail_commit | default('main') }}"
force: true

View file

@ -27,6 +27,9 @@ borgmatic_source_directories:
- /Users/erichblume/.config/borgmatic
- /Users/erichblume/Documents
- /Users/erichblume/.local/share/borgmatic/k8s-dumps
# Shower app prize-photo uploads (sifaka SMB mount). Mounted manually
# on indri via Finder — see docs/how-to/operations/shower-app.md.
- /Volumes/shower
# Backup repositories
borgmatic_repositories:
@ -53,7 +56,17 @@ borgmatic_k8s_sqlite_dumps:
namespace: mealie
label_selector: app=mealie
db_path: /app/data/mealie.db
context: minikube
# migrated to ringtail (wave-1); ssh to ringtail and run k3s kubectl
# there, same as shower below.
target: ssh:eblume@ringtail
- name: shower
namespace: shower
label_selector: app=shower
db_path: /app/data/db.sqlite3
# ssh to ringtail and run k3s kubectl there — avoids needing a
# ringtail kubeconfig on indri. k3s.yaml on ringtail is
# world-readable (mode 644), so no sudo required.
target: ssh:eblume@ringtail
# Exclude patterns
borgmatic_exclude_patterns: []
@ -90,17 +103,18 @@ borgmatic_postgresql_databases:
hostname: pg.ops.eblu.me
port: 5432
username: borgmatic
- name: teslamate
hostname: pg.ops.eblu.me
port: 5432
username: borgmatic
- name: authentik
hostname: pg.ops.eblu.me
port: 5432
username: borgmatic
# migrated to ringtail blumeops-pg (wave-1); port 5434 = Caddy L4 route
- name: teslamate
hostname: pg.ops.eblu.me
port: 5434
username: borgmatic
- name: paperless
hostname: pg.ops.eblu.me
port: 5432
port: 5434
username: borgmatic
# immich-pg cluster (VectorChord) via Caddy L4 on port 5433
- name: immich

View file

@ -19,8 +19,10 @@
ansible.builtin.copy:
content: |
# Managed by ansible (borgmatic role) - k8s PostgreSQL backup credentials
# 5432 = minikube blumeops-pg, 5433 = immich-pg, 5434 = ringtail blumeops-pg
pg.ops.eblu.me:5432:*:borgmatic:{{ borgmatic_db_password }}
pg.ops.eblu.me:5433:*:borgmatic:{{ borgmatic_db_password }}
pg.ops.eblu.me:5434:*:borgmatic:{{ borgmatic_db_password }}
dest: ~/.pgpass
mode: '0600'
no_log: true
@ -49,6 +51,20 @@
mode: '0700'
when: borgmatic_k8s_sqlite_dumps | length > 0
- name: Ensure ~/bin exists
ansible.builtin.file:
path: "{{ ansible_env.HOME }}/bin"
state: directory
mode: '0755'
when: borgmatic_k8s_sqlite_dumps | length > 0
- name: Deploy k8s SQLite dump helper script
ansible.builtin.template:
src: k8s-sqlite-dump.sh.j2
dest: "{{ ansible_env.HOME }}/bin/borgmatic-k8s-sqlite-dump"
mode: '0755'
when: borgmatic_k8s_sqlite_dumps | length > 0
- name: Deploy borgmatic configuration
ansible.builtin.template:
src: config.yaml.j2

View file

@ -32,12 +32,20 @@ exclude_patterns:
encryption_passcommand: {{ borgmatic_encryption_passcommand }}
{% if borgmatic_k8s_sqlite_dumps %}
# Pre-backup: dump SQLite databases from k8s pods
# Uses sqlite3 .backup for a safe, consistent copy (no corruption from concurrent writes)
# Pre-backup: dump SQLite databases from k8s pods.
# Uses sqlite3.backup() for a safe, consistent copy.
#
# Quoting/escaping is delegated to ~/bin/borgmatic-k8s-sqlite-dump
# (deployed by the borgmatic ansible role). Each entry's `target`
# is either:
# - local:<context> -> local kubectl with --context (mealie etc.)
# - ssh:<user@host> -> ssh + k3s kubectl on the cluster host,
# used for ringtail since indri's kubeconfig
# deliberately doesn't carry that context.
before_backup:
- mkdir -p {{ borgmatic_k8s_dump_dir }}
{% for db in borgmatic_k8s_sqlite_dumps %}
- /opt/homebrew/bin/kubectl --context={{ db.context }} exec -n {{ db.namespace }} deploy/{{ db.name }} -- python3 -c "import sqlite3; sqlite3.connect('{{ db.db_path }}').backup(sqlite3.connect('/tmp/{{ db.name }}-backup.db'))" && /opt/homebrew/bin/kubectl --context={{ db.context }} cp {{ db.namespace }}/$(/opt/homebrew/bin/kubectl --context={{ db.context }} get pod -n {{ db.namespace }} -l {{ db.label_selector }} -o jsonpath='{.items[0].metadata.name}'):/tmp/{{ db.name }}-backup.db {{ borgmatic_k8s_dump_dir }}/{{ db.name }}.db
- {{ ansible_env.HOME }}/bin/borgmatic-k8s-sqlite-dump {{ db.target }} {{ db.namespace }} {{ db.label_selector }} {{ db.db_path }} {{ db.name }} {{ borgmatic_k8s_dump_dir }}/{{ db.name }}.db
{% endfor %}
{% endif %}

View file

@ -0,0 +1,73 @@
#!/usr/bin/env bash
# {{ ansible_managed }}
#
# Helper script invoked by borgmatic's before_backup hook to capture a
# k8s pod's SQLite database. Keeps the borgmatic config readable by
# pulling all the quoting out of YAML.
#
# Usage:
# borgmatic-k8s-sqlite-dump <target> <namespace> <selector> \
# <db_path> <name> <dump_target>
#
# <target> is one of:
# local:<context> - run local kubectl with --context=<context>
# ssh:<user@host> - ssh to host and run k3s kubectl there
# (no indri-side kubeconfig needed)
#
# <namespace> - k8s namespace of the pod
# <selector> - label selector to find the pod (e.g. app=shower)
# <db_path> - absolute path inside the pod to the SQLite DB
# <name> - short name used for temp filenames
# <dump_target> - file on this host to receive the dump
set -euo pipefail
target=${1:?missing target}
namespace=${2:?missing namespace}
selector=${3:?missing selector}
db_path=${4:?missing db path}
name=${5:?missing name}
dump_target=${6:?missing dump target}
# Stage the backup next to the source DB (a guaranteed-writable volume);
# minimal nix images (e.g. mealie) have no /tmp.
pod_tmp="$(dirname "$db_path")/.borgmatic-backup-${name}.db"
python_backup='import sqlite3; sqlite3.connect("'"$db_path"'").backup(sqlite3.connect("'"$pod_tmp"'"))'
mode=${target%%:*}
ref=${target#*:}
case "$mode" in
local)
# Pulls dump bytes out via "kubectl exec -- cat" rather than
# "kubectl cp", which would otherwise need tar inside the pod
# (nix-built images like shower don't bundle tar).
context=$ref
kubectl="/opt/homebrew/bin/kubectl --context=$context -n $namespace"
pod=$($kubectl get pod -l "$selector" \
-o jsonpath='{.items[0].metadata.name}')
$kubectl exec "$pod" -- python3 -c "$python_backup"
$kubectl exec "$pod" -- cat "$pod_tmp" > "$dump_target"
$kubectl exec "$pod" -- rm -f "$pod_tmp"
;;
ssh)
host=$ref
# Force bash on the remote (user's login shell on ringtail is
# fish). Pipe the script via stdin to dodge nested quoting.
# The dump bytes come back over the ssh stdout stream — no
# intermediate scp, no tar requirement in the pod.
ssh "$host" bash <<EOF > "$dump_target"
set -euo pipefail
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
pod=\$(k3s kubectl -n "$namespace" get pod -l "$selector" -o jsonpath='{.items[0].metadata.name}')
k3s kubectl -n "$namespace" exec "\$pod" -- python3 -c '$python_backup' 1>&2
k3s kubectl -n "$namespace" exec "\$pod" -- cat "$pod_tmp"
k3s kubectl -n "$namespace" exec "\$pod" -- rm -f "$pod_tmp" 1>&2
EOF
;;
*)
echo "borgmatic-k8s-sqlite-dump: unknown target mode: $mode" >&2
echo " expected local:<context> or ssh:<user@host>" >&2
exit 1
;;
esac

View file

@ -51,7 +51,10 @@ caddy_services:
backend: "https://feed.tail8d86e.ts.net"
- name: devpi
host: "pypi.{{ caddy_domain }}"
backend: "https://pypi.tail8d86e.ts.net"
backend: "http://localhost:3141"
- name: heph
host: "heph.{{ caddy_domain }}"
backend: "http://localhost:8787" # hephaestus hub (server mode) + PWA shell
- name: kiwix
host: "kiwix.{{ caddy_domain }}"
backend: "https://kiwix.tail8d86e.ts.net"
@ -72,10 +75,16 @@ caddy_services:
backend: "https://go.tail8d86e.ts.net"
- name: docs
host: "docs.{{ caddy_domain }}"
backend: "https://docs.tail8d86e.ts.net"
kind: static
root: "{{ docs_content_dir }}"
try_html: true # Quartz: path → path/ → path.html → 404.html
- name: cv
host: "cv.{{ caddy_domain }}"
backend: "https://cv.tail8d86e.ts.net"
kind: static
root: "{{ cv_content_dir }}"
download_paths:
- path: /resume.pdf
filename: erich-blume-resume.pdf
- name: nvr
host: "nvr.{{ caddy_domain }}"
backend: "https://nvr.tail8d86e.ts.net"
@ -95,6 +104,9 @@ caddy_services:
- name: paperless
host: "paperless.{{ caddy_domain }}"
backend: "https://paperless.tail8d86e.ts.net"
- name: shower
host: "shower.{{ caddy_domain }}"
backend: "https://shower.tail8d86e.ts.net"
- name: sifaka
host: "nas.{{ caddy_domain }}"
backend: "http://sifaka:5000"
@ -108,6 +120,8 @@ caddy_tcp_services:
backend: "pg.tail8d86e.ts.net:5432" # PostgreSQL (blumeops-pg)
- port: 5433
backend: "immich-pg.tail8d86e.ts.net:5432" # PostgreSQL (immich-pg)
- port: 5434
backend: "blumeops-pg-ringtail.tail8d86e.ts.net:5432" # PostgreSQL (blumeops-pg on ringtail)
- port: "{{ sifaka_node_exporter_port }}"
backend: "sifaka:{{ sifaka_node_exporter_port }}" # Sifaka node_exporter
- port: "{{ sifaka_smartctl_exporter_port }}"

View file

@ -31,6 +31,25 @@
{% for service in caddy_services %}
@{{ service.name }} host {{ service.host }}
handle @{{ service.name }} {
{% if service.kind | default('proxy') == 'static' %}
root * {{ service.root }}
encode gzip
# Long-cache fingerprinted assets; everything else stays default.
@{{ service.name }}_assets path_regexp \.(css|js|png|jpg|jpeg|gif|ico|svg|woff|woff2)$
header @{{ service.name }}_assets Cache-Control "public, max-age=31536000, immutable"
{% for dl in service.download_paths | default([]) %}
@{{ service.name }}_dl{{ loop.index }} path {{ dl.path }}
header @{{ service.name }}_dl{{ loop.index }} Content-Disposition `attachment; filename="{{ dl.filename }}"`
{% endfor %}
{% if service.try_html | default(false) %}
# Quartz clean URLs: path → path/ → path.html → /404.html (200).
# Caddy's handle_errors is a top-level directive and can't live in
# this nested handle, so the 404 page rides as the final try_files
# candidate (served with 200 — acceptable for a human-facing 404).
try_files {path} {path}/ {path}.html /404.html
{% endif %}
file_server
{% else %}
{% if service.cache_policy | default('') == 'spa' %}
# SPA cache policy: hashed static assets are immutable, HTML must revalidate.
# Prevents stale HTML from referencing chunk hashes that no longer exist.
@ -47,6 +66,7 @@
}
{% else %}
reverse_proxy {{ service.backend }}
{% endif %}
{% endif %}
}

View file

@ -0,0 +1,10 @@
---
# CV / resume static site (native, replaces minikube Deployment)
# Caddy serves cv_content_dir directly via the static-kind service block.
cv_version: "v1.0.3"
cv_release_url: "https://forge.ops.eblu.me/api/packages/eblume/generic/cv/{{ cv_version }}/cv-{{ cv_version }}.tar.gz"
cv_home: /Users/erichblume/blumeops/cv
cv_content_dir: "{{ cv_home }}/content"
cv_version_sentinel: "{{ cv_home }}/.installed-version"

View file

@ -0,0 +1,57 @@
---
# cv role — download and extract the CV release tarball into cv_content_dir.
# Caddy serves the directory directly; there is no daemon to manage.
#
# Idempotency: a sentinel file records the installed cv_version. The
# download/extract steps only run when the sentinel doesn't match cv_version.
#
# We use curl rather than ansible.builtin.get_url because the forge generic-
# packages endpoint returns 405 on HEAD requests, which get_url issues before
# downloading.
- name: Ensure cv home exists
ansible.builtin.file:
path: "{{ cv_home }}"
state: directory
mode: '0755'
- name: Read installed cv version sentinel
ansible.builtin.slurp:
src: "{{ cv_version_sentinel }}"
register: cv_installed_raw
failed_when: false
changed_when: false
- name: Set installed cv version fact
ansible.builtin.set_fact:
cv_installed_version: >-
{{ (cv_installed_raw.content | b64decode).strip()
if (cv_installed_raw.content is defined) else '' }}
- name: Recreate cv content dir
ansible.builtin.file:
path: "{{ cv_content_dir }}"
state: "{{ item }}"
mode: '0755'
loop:
- absent
- directory
when: cv_installed_version != cv_version
- name: Download and extract cv release tarball
ansible.builtin.shell:
cmd: >-
set -euo pipefail;
curl -fsSL {{ cv_release_url | quote }} -o {{ cv_home }}/cv.tar.gz &&
tar -xzf {{ cv_home }}/cv.tar.gz -C {{ cv_content_dir }} &&
rm -f {{ cv_home }}/cv.tar.gz
executable: /bin/bash
when: cv_installed_version != cv_version
changed_when: true
- name: Write cv version sentinel
ansible.builtin.copy:
content: "{{ cv_version }}\n"
dest: "{{ cv_version_sentinel }}"
mode: '0644'
when: cv_installed_version != cv_version

View file

@ -0,0 +1,21 @@
---
# devpi PyPI caching mirror (native launchd, replaces minikube StatefulSet)
devpi_home: /Users/erichblume/devpi
devpi_venv: "{{ devpi_home }}/venv"
devpi_server_dir: "{{ devpi_home }}/server-dir"
devpi_binary: "{{ devpi_venv }}/bin/devpi-server"
devpi_init_binary: "{{ devpi_venv }}/bin/devpi-init"
devpi_python_version: "3.12"
devpi_server_version: "6.19.3"
devpi_web_version: "5.0.2"
devpi_host: 127.0.0.1
devpi_port: 3141
devpi_outside_url: "https://pypi.ops.eblu.me"
devpi_log_dir: /Users/erichblume/Library/Logs
# uv binary on indri — mise shim so version bumps via `mise upgrade uv` flow through transparently
devpi_uv_binary: /Users/erichblume/.local/share/mise/shims/uv

View file

@ -0,0 +1,6 @@
---
- name: Restart devpi
ansible.builtin.shell: |
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.devpi.plist 2>/dev/null || true
launchctl load ~/Library/LaunchAgents/mcquack.eblume.devpi.plist
changed_when: true

View file

@ -0,0 +1,71 @@
---
# devpi role — devpi-server in a uv-managed venv, run via LaunchAgent.
# Replaces the prior minikube StatefulSet; see [[devpi-on-indri]].
#
# The root password is fetched in the indri.yml playbook pre_tasks and
# exposed as `devpi_root_password`.
- name: Ensure devpi home exists
ansible.builtin.file:
path: "{{ devpi_home }}"
state: directory
mode: '0755'
- name: Ensure devpi server-dir exists
ansible.builtin.file:
path: "{{ devpi_server_dir }}"
state: directory
mode: '0700'
- name: Create devpi venv if missing
ansible.builtin.command:
cmd: "{{ devpi_uv_binary }} venv --python {{ devpi_python_version }} {{ devpi_venv }}"
creates: "{{ devpi_venv }}/bin/python"
- name: Install devpi-server and devpi-web into venv
# Always bootstrap from upstream PyPI — devpi is the index it would otherwise resolve through,
# and that's a circular dependency (devpi cannot install itself from itself).
ansible.builtin.command:
cmd: >-
{{ devpi_uv_binary }} pip install
--python {{ devpi_venv }}/bin/python
--index-url https://pypi.org/simple/
devpi-server=={{ devpi_server_version }}
devpi-web=={{ devpi_web_version }}
register: devpi_pip_install
changed_when: "'Installed' in devpi_pip_install.stdout or 'Uninstalled' in devpi_pip_install.stdout"
notify: Restart devpi
- name: Check if devpi server-dir is initialized
ansible.builtin.stat:
path: "{{ devpi_server_dir }}/.serverversion"
register: devpi_serverversion
- name: Initialize devpi server-dir
ansible.builtin.command:
cmd: >-
{{ devpi_init_binary }}
--serverdir {{ devpi_server_dir }}
--root-passwd {{ devpi_root_password }}
when: not devpi_serverversion.stat.exists
changed_when: true
no_log: true
- name: Deploy devpi LaunchAgent plist
ansible.builtin.template:
src: devpi.plist.j2
dest: ~/Library/LaunchAgents/mcquack.eblume.devpi.plist
mode: '0644'
notify: Restart devpi
- name: Check if devpi LaunchAgent is loaded
ansible.builtin.command: launchctl list mcquack.eblume.devpi
register: devpi_launchctl_check
changed_when: false
failed_when: false
- name: Load devpi LaunchAgent if not loaded
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.devpi.plist
when: devpi_launchctl_check.rc != 0
changed_when: true
failed_when: false

View file

@ -0,0 +1,34 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- {{ ansible_managed }} -->
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mcquack.eblume.devpi</string>
<key>ProgramArguments</key>
<array>
<string>{{ devpi_binary }}</string>
<string>--serverdir</string>
<string>{{ devpi_server_dir }}</string>
<string>--host</string>
<string>{{ devpi_host }}</string>
<string>--port</string>
<string>{{ devpi_port }}</string>
<string>--outside-url</string>
<string>{{ devpi_outside_url }}</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>{{ devpi_venv }}/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
</dict>
<key>StandardOutPath</key>
<string>{{ devpi_log_dir }}/mcquack.devpi.out.log</string>
<key>StandardErrorPath</key>
<string>{{ devpi_log_dir }}/mcquack.devpi.err.log</string>
</dict>
</plist>

View file

@ -0,0 +1,10 @@
---
# Docs (Quartz-built static site) — replaces minikube Deployment.
# Caddy serves docs_content_dir directly via the static-kind service block,
# with Quartz-style try_files (path → path/ → path.html → 404).
docs_version: "v1.17.0"
docs_release_url: "https://forge.eblu.me/eblume/blumeops/releases/download/{{ docs_version }}/docs-{{ docs_version }}.tar.gz"
docs_home: /Users/erichblume/blumeops/docs
docs_content_dir: "{{ docs_home }}/content"
docs_version_sentinel: "{{ docs_home }}/.installed-version"

View file

@ -0,0 +1,57 @@
---
# docs role — download and extract the Quartz-built docs tarball into
# docs_content_dir. Caddy serves the directory directly with Quartz-style
# try_files; there is no daemon to manage.
#
# Idempotency: a sentinel file records the installed docs_version. The
# download/extract steps only run when the sentinel doesn't match docs_version.
#
# Mirrors the cv role's curl-based download for consistency, even though the
# forge releases endpoint here does support HEAD.
- name: Ensure docs home exists
ansible.builtin.file:
path: "{{ docs_home }}"
state: directory
mode: '0755'
- name: Read installed docs version sentinel
ansible.builtin.slurp:
src: "{{ docs_version_sentinel }}"
register: docs_installed_raw
failed_when: false
changed_when: false
- name: Set installed docs version fact
ansible.builtin.set_fact:
docs_installed_version: >-
{{ (docs_installed_raw.content | b64decode).strip()
if (docs_installed_raw.content is defined) else '' }}
- name: Recreate docs content dir
ansible.builtin.file:
path: "{{ docs_content_dir }}"
state: "{{ item }}"
mode: '0755'
loop:
- absent
- directory
when: docs_installed_version != docs_version
- name: Download and extract docs release tarball
ansible.builtin.shell:
cmd: >-
set -euo pipefail;
curl -fsSL {{ docs_release_url | quote }} -o {{ docs_home }}/docs.tar.gz &&
tar -xzf {{ docs_home }}/docs.tar.gz -C {{ docs_content_dir }} &&
rm -f {{ docs_home }}/docs.tar.gz
executable: /bin/bash
when: docs_installed_version != docs_version
changed_when: true
- name: Write docs version sentinel
ansible.builtin.copy:
content: "{{ docs_version }}\n"
dest: "{{ docs_version_sentinel }}"
mode: '0644'
when: docs_installed_version != docs_version

View file

@ -0,0 +1,49 @@
---
# hephaestus hub — the canonical heph replica (server mode) on indri.
# Other devices (e.g. gilbert) are spokes that sync against this hub.
# See [[set-up-sync-hub]] and [[host-heph-pwa]] in the hephaestus repo.
# Pinned release used for the initial `cargo install` and the PWA shell.
# After bootstrap, hephd's own --self-update keeps the binary current; this
# pin only governs the first install and the bundled PWA shell version.
heph_version: v1.2.1
# Anonymous public HTTPS clone — matches hephd's INSTALL_GIT_URL so the initial
# install and unattended self-update build from the same source (no ssh-agent).
heph_repo_url: https://forge.eblu.me/eblume/hephaestus.git
heph_bin_dir: /Users/erichblume/.cargo/bin
heph_binary: "{{ heph_bin_dir }}/hephd"
# rustc/cargo here are rustup shims. The bare (non-mise) environment that the
# launchagent and ansible run in falls back to rustup's *default* toolchain,
# which can lag behind heph's rust-version floor (Cargo.toml: 1.89). Pin the
# channel explicitly so both the bootstrap build and unattended self-update
# always use a current toolchain regardless of the host's rustup default.
heph_rust_toolchain: stable
heph_data_dir: /Users/erichblume/.local/share/heph
heph_db: "{{ heph_data_dir }}/heph.db"
heph_socket: "{{ heph_data_dir }}/hephd.sock"
heph_log_dir: /Users/erichblume/Library/Logs
# Version-pinned source checkout; the PWA static shell is served directly from
# its heph-pwa/ subdir (no copy), keeping shell and hub in lockstep at heph_version.
heph_pwa_src_dir: /Users/erichblume/.cache/heph-pwa-src
heph_web_root: "{{ heph_pwa_src_dir }}/heph-pwa"
# Hub listens on all interfaces so tailnet spokes can reach it directly
# (http://indri.tail8d86e.ts.net:8787) and Caddy can proxy heph.ops.eblu.me.
# Access is gated by Authentik OIDC regardless — tailnet reachability is not
# enough (this is the owner's most sensitive data).
heph_http_addr: 0.0.0.0:8787
heph_port: 8787
heph_external_url: https://heph.ops.eblu.me
# Authentik OIDC — issuer + audience together turn hub auth on. The audience is
# the device-code client id (see argocd/manifests/authentik heph blueprint).
heph_oidc_issuer: https://authentik.ops.eblu.me/application/o/heph/
heph_oidc_audience: heph
# Self-update poll interval (seconds). 10 minutes.
heph_self_update_interval_secs: 600

View file

@ -0,0 +1,6 @@
---
- name: Restart heph
ansible.builtin.shell: |
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.heph.plist 2>/dev/null || true
launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
changed_when: true

View file

@ -0,0 +1,82 @@
---
# hephaestus hub (server mode) on indri.
#
# DATA SEEDING (one-time, Path A — do this BEFORE the first provision so the hub
# adopts gilbert's existing data instead of being born empty):
#
# 1. On the seed device (gilbert): heph daemon stop
# 2. Copy its store to indri: scp ~/.local/share/heph/heph.db \
# indri:~/.local/share/heph/heph.db
# 3. On indri, give the hub its OWN device origin (keeps gilbert's owner_id +
# data; hephd regenerates a fresh origin on next start when it is missing):
# sqlite3 ~/.local/share/heph/heph.db "DELETE FROM meta WHERE key='origin';"
# 4. Run this role (installs hephd, stages the PWA, loads the launchagent).
#
# hephd auto-creates an empty store on first start if none exists, so seeding is
# optional — skip it only if you intend a fresh, empty hub.
- name: Ensure heph data directory exists
ansible.builtin.file:
path: "{{ heph_data_dir }}"
state: directory
mode: '0700'
- name: Check for installed hephd binary
ansible.builtin.stat:
path: "{{ heph_binary }}"
register: heph_binary_stat
# Bootstrap install only when hephd is absent. Thereafter hephd's own
# --self-update keeps it current; ansible must not fight (or downgrade) it.
# This builds from source and can take several minutes on a cold cargo cache.
- name: Bootstrap-install heph + hephd from the forge ({{ heph_version }})
ansible.builtin.command:
cmd: >-
{{ heph_bin_dir }}/cargo install --locked
--git {{ heph_repo_url }}
--tag {{ heph_version }}
heph hephd
environment:
PATH: "{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin"
RUSTUP_TOOLCHAIN: "{{ heph_rust_toolchain }}"
when: not heph_binary_stat.stat.exists
changed_when: true
notify: Restart heph
# Checkout provides the PWA shell at {{ heph_web_root }} (heph-pwa/ subdir),
# served directly by hephd. Static files are read from disk per request, so a
# version bump needs no restart; the service worker (CACHE = "heph-pwa-vN")
# evicts stale assets on next load.
- name: Ensure heph cache parent directory exists
ansible.builtin.file:
path: "{{ heph_pwa_src_dir | dirname }}"
state: directory
mode: '0755'
- name: Stage heph-pwa source at {{ heph_version }}
ansible.builtin.git:
repo: "{{ heph_repo_url }}"
dest: "{{ heph_pwa_src_dir }}"
version: "{{ heph_version }}"
depth: 1
single_branch: true
force: true
- name: Deploy heph LaunchAgent plist
ansible.builtin.template:
src: heph.plist.j2
dest: ~/Library/LaunchAgents/mcquack.eblume.heph.plist
mode: '0644'
notify: Restart heph
- name: Check if heph LaunchAgent is loaded
ansible.builtin.command: launchctl list mcquack.eblume.heph
register: heph_launchctl_check
changed_when: false
failed_when: false
- name: Load heph LaunchAgent if not loaded
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
when: heph_launchctl_check.rc != 0
changed_when: true
failed_when: false

View file

@ -0,0 +1,50 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- {{ ansible_managed }} -->
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mcquack.eblume.heph</string>
<key>ProgramArguments</key>
<array>
<string>{{ heph_binary }}</string>
<string>--mode</string>
<string>server</string>
<string>--http-addr</string>
<string>{{ heph_http_addr }}</string>
<string>--db</string>
<string>{{ heph_db }}</string>
<string>--socket</string>
<string>{{ heph_socket }}</string>
<string>--web-root</string>
<string>{{ heph_web_root }}</string>
<string>--oidc-issuer</string>
<string>{{ heph_oidc_issuer }}</string>
<string>--oidc-audience</string>
<string>{{ heph_oidc_audience }}</string>
<string>--self-update</string>
<string>--self-update-interval-secs</string>
<string>{{ heph_self_update_interval_secs }}</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>EnvironmentVariables</key>
<dict>
<!-- cargo + toolchain on PATH so --self-update can run `cargo install`. -->
<key>PATH</key>
<string>{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
<key>HOME</key>
<string>/Users/erichblume</string>
<!-- Pin the rustup channel: the launchagent runs without mise, so a bare
cargo shim would otherwise use rustup's (stale) default toolchain. -->
<key>RUSTUP_TOOLCHAIN</key>
<string>{{ heph_rust_toolchain }}</string>
</dict>
<key>StandardOutPath</key>
<string>{{ heph_log_dir }}/mcquack.heph.out.log</string>
<key>StandardErrorPath</key>
<string>{{ heph_log_dir }}/mcquack.heph.err.log</string>
</dict>
</plist>

View file

@ -0,0 +1,27 @@
# CloudNativePG Operator for ringtail k3s cluster
# Deploys the operator only; PostgreSQL clusters are created separately
#
# Sibling of cloudnative-pg.yaml (minikube). Same mirror, same release,
# different destination. Both apps will coexist during the immich
# migration; the minikube one is removed at the end of the broader
# indri-k8s decommission.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cloudnative-pg-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git
targetRevision: v1.27.1
path: releases
directory:
include: 'cnpg-1.27.1.yaml'
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: cnpg-system
syncPolicy:
syncOptions:
- CreateNamespace=true
- ServerSideApply=true # Required for large CRDs that exceed annotation size limit

View file

@ -1,18 +0,0 @@
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cv
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/cv
destination:
server: https://kubernetes.default.svc
namespace: cv
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,26 @@
# Databases on ringtail k3s.
#
# Today: only immich-pg (CNPG Cluster) + its borgmatic ExternalSecret.
# More databases may move here as the indri-k8s decommission proceeds.
#
# Prerequisites:
# - cloudnative-pg-ringtail (operator must exist before the Cluster CR)
# - external-secrets-ringtail + 1password-connect-ringtail (for the
# immich-pg-borgmatic ExternalSecret to sync)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: databases-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/databases-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: databases
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,29 +0,0 @@
# devpi PyPI Caching Proxy
# Provides PyPI cache and private package hosting
#
# After first deployment, initialize devpi:
# kubectl -n devpi exec -it devpi-0 -- devpi-init --serverdir /devpi --root-passwd <password>
# kubectl -n devpi rollout restart statefulset devpi
#
# Then create user/index:
# uvx devpi use https://pypi.tail8d86e.ts.net
# uvx devpi login root
# uvx devpi user -c eblume email=blume.erich@gmail.com
# uvx devpi index -c eblume/dev bases=root/pypi
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: devpi
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/devpi
destination:
server: https://kubernetes.default.svc
namespace: devpi
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,18 +0,0 @@
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: docs
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/docs
destination:
server: https://kubernetes.default.svc
namespace: docs
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -15,7 +15,7 @@ spec:
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/external-secrets
path: argocd/manifests/external-secrets-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: external-secrets

View file

@ -14,7 +14,7 @@ spec:
targetRevision: main
path: argocd/manifests/homepage
destination:
server: https://kubernetes.default.svc
server: https://ringtail.tail8d86e.ts.net:6443
namespace: homepage
syncPolicy:
syncOptions:

View file

@ -0,0 +1,31 @@
# Immich on ringtail k3s.
#
# Staging deployment; the minikube `immich` app remains in parallel
# until cutover. See [[immich-cutover-and-decommission]] for the
# routing flip + minikube cleanup.
#
# Prerequisites:
# - cnpg-on-ringtail + databases-ringtail (postgres)
# - 1password-connect-ringtail + external-secrets-ringtail (not used
# by this app today — immich-db Secret is created manually,
# matching the minikube pattern)
# - The immich-db Secret in the immich namespace, holding the
# password for the `immich` postgres role (copied from the source
# immich-pg-app Secret at migration time).
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: immich-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/immich-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: immich
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,30 +0,0 @@
# Immich - Self-hosted photo and video management
# High-performance Google Photos/iCloud alternative with AI features
#
# Kustomize manifests in argocd/manifests/immich/
# Components: server, machine-learning, valkey (Redis)
#
# Prerequisites:
# 1. Create immich namespace and secrets:
# kubectl create namespace immich
# kubectl --context=minikube-indri create secret generic immich-db -n immich \
# --from-literal=password="$(kubectl --context=minikube-indri -n databases get secret immich-pg-app -o jsonpath='{.data.password}' | base64 -d)"
# 2. Create immich-pg database and user (see immich-pg app)
# 3. NFS share on sifaka at /volume1/photos with read/write for indri
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: immich
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/immich
destination:
server: https://kubernetes.default.svc
namespace: immich
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,26 @@
# Mealie on ringtail k3s.
#
# Wave-1 indri-k8s decommission. Staging deployment; the minikube `mealie`
# app stays in parallel until cutover (copy SQLite PVC, drop the minikube
# tailscale ingress, flip Caddy). See [[migrate-wave1-ringtail]].
#
# Prerequisites:
# - external-secrets-ringtail (onepassword-blumeops ClusterSecretStore)
# - mealie-data PVC contents copied from minikube at cutover
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: mealie-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/mealie-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: mealie
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,17 +0,0 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: mealie
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/mealie
destination:
server: https://kubernetes.default.svc
namespace: mealie
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,28 @@
# Paperless-ngx on ringtail k3s.
#
# Wave-1 indri-k8s decommission. Staging deployment; the minikube
# `paperless` app stays in parallel until cutover (drop the minikube
# tailscale ingress to free the name, then flip Caddy). See
# [[migrate-wave1-ringtail]].
#
# Prerequisites:
# - databases-ringtail blumeops-pg (paperless database + role)
# - external-secrets-ringtail (onepassword-blumeops ClusterSecretStore)
# - sifaka NFS rule granting ringtail access to /volume1/paperless
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: paperless-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/paperless-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: paperless
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,17 +0,0 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: paperless
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/paperless
destination:
server: https://kubernetes.default.svc
namespace: paperless
syncPolicy:
syncOptions:
- CreateNamespace=true

20
argocd/apps/shower.yaml Normal file
View file

@ -0,0 +1,20 @@
# Adelaide / Heidi / Addie baby shower app — Django guest/raffle/prize system.
# Public landing page at shower.eblu.me (via fly proxy), staff console + admin
# at shower.ops.eblu.me (tailnet only). Built from forge PyPI wheel.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: shower
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/shower
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: shower
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -0,0 +1,28 @@
# TeslaMate on ringtail k3s.
#
# Wave-1 indri-k8s decommission. Staging deployment; the minikube
# `teslamate` app stays in parallel until cutover (migrate the teslamate
# database, drop the minikube tailscale ingress, flip Caddy). See
# [[migrate-wave1-ringtail]].
#
# Prerequisites:
# - databases-ringtail blumeops-pg (teslamate database + role; cube +
# earthdistance extensions created by superuser at cutover)
# - external-secrets-ringtail (onepassword-blumeops ClusterSecretStore)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: teslamate-ringtail
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/teslamate-ringtail
destination:
server: https://ringtail.tail8d86e.ts.net:6443
namespace: teslamate
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -1,32 +0,0 @@
# TeslaMate Tesla Data Logger
# Requires: CloudNativePG PostgreSQL cluster and manual secret setup
#
# Before syncing, create the namespace and secrets:
# kubectl create namespace teslamate
# op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl | kubectl apply -f -
# op inject -i argocd/manifests/teslamate/secret-encryption-key.yaml.tpl | kubectl apply -f -
# op inject -i argocd/manifests/teslamate/secret-db.yaml.tpl | kubectl apply -f -
#
# Then create the database:
# PGPASSWORD=$(op read "op://blumeops/postgres/password") \
# psql -h pg.ops.eblu.me -U eblume -c "CREATE DATABASE teslamate OWNER teslamate;"
#
# After syncing, access the TeslaMate UI at https://tesla.tail8d86e.ts.net to complete
# Tesla API authentication via OAuth flow.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: teslamate
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/teslamate
destination:
server: https://kubernetes.default.svc
namespace: teslamate
syncPolicy:
syncOptions:
- CreateNamespace=true

View file

@ -159,8 +159,10 @@ prometheus.exporter.blackbox "services" {
}
target {
// devpi runs natively on indri (LaunchAgent), not in-cluster.
// We probe through Caddy (https://pypi.ops.eblu.me) which the cluster can reach via Tailscale.
name = "devpi"
address = "http://devpi.devpi.svc.cluster.local:3141/+api"
address = "https://pypi.ops.eblu.me/+api"
module = "http_2xx"
}
@ -189,14 +191,9 @@ prometheus.exporter.blackbox "services" {
}
target {
// Migrated to ringtail (wave-1); probe through Caddy over Tailscale.
name = "teslamate"
address = "http://teslamate.teslamate.svc.cluster.local:4000/"
module = "http_2xx"
}
target {
name = "immich"
address = "http://immich-server.immich.svc.cluster.local:2283/api/server/ping"
address = "https://tesla.ops.eblu.me/"
module = "http_2xx"
}

View file

@ -10,7 +10,7 @@ resources:
images:
- name: registry.ops.eblu.me/blumeops/alloy
newTag: v1.14.0-fd0bebb
newTag: v1.16.0-9564435
configMapGenerator:
- name: alloy-config

View file

@ -45,6 +45,26 @@ prometheus.scrape "kube_state_metrics" {
forward_to = [prometheus.remote_write.prometheus.receiver]
}
// ============== SERVICE HEALTH PROBES ==============
// Blackbox-style HTTP probes for in-cluster services on ringtail
prometheus.exporter.blackbox "services" {
config = "{ modules: { http_2xx: { prober: http, timeout: 5s } } }"
target {
name = "immich"
address = "http://immich-server.immich.svc.cluster.local:2283/api/server/ping"
module = "http_2xx"
}
}
// Scrape blackbox probe results
prometheus.scrape "blackbox" {
targets = prometheus.exporter.blackbox.services.targets
scrape_interval = "30s"
forward_to = [prometheus.remote_write.prometheus.receiver]
}
// Push metrics to indri Prometheus
prometheus.remote_write "prometheus" {
external_labels = { cluster = "ringtail" }

View file

@ -10,7 +10,7 @@ resources:
images:
- name: registry.ops.eblu.me/blumeops/alloy
newTag: v1.14.0-fd0bebb-nix
newTag: v1.16.0-9564435-nix
configMapGenerator:
- name: alloy-config

View file

@ -9,7 +9,7 @@ resources:
images:
- name: registry.ops.eblu.me/blumeops/alloy
newTag: v1.14.0-fd0bebb-nix
newTag: v1.16.0-9564435-nix
configMapGenerator:
- name: alloy-tracing-config

View file

@ -25,7 +25,7 @@ kubectl wait --for=condition=available deployment/argocd-server -n argocd --time
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d && echo
# 5. Login and change password
argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
argocd login argocd.tail8d86e.ts.net --username admin
argocd account update-password
# 6. Apply repo-creds-forge credential template for SSH access to all forge repos
@ -114,4 +114,4 @@ spec:
Future improvement: integrate with a secrets operator (e.g., External Secrets).
- The credential template (`repo-creds`) uses a URL prefix to match all repos on forge.
- ArgoCD uses Tailscale Ingress with Let's Encrypt for TLS termination.
- The `--grpc-web` flag is required for CLI access through the Tailscale ingress.
- After Authentik is up, prefer `argocd login argocd.ops.eblu.me --sso` over the admin password login above; admin is only needed during bootstrap or as break-glass.

View file

@ -16,7 +16,6 @@ data:
name: Authentik
issuer: https://authentik.ops.eblu.me/application/o/argocd/
clientID: argocd
clientSecret: $argocd-oidc-authentik:client-secret
requestedScopes:
- openid
- profile

View file

@ -2,6 +2,9 @@
#
# - workflow-bot: minimal CI/CD permissions (sync, get)
# - admins: Authentik admins group mapped to ArgoCD admin role
# - admin: local break-glass account — keeps ArgoCD admin rights for when
# Authentik SSO is unavailable (without this it has no permissions, since
# policy.default is unset)
#
apiVersion: v1
kind: ConfigMap
@ -14,3 +17,4 @@ data:
p, role:workflow-bot, applications, get, *, allow
g, workflow-bot, role:workflow-bot
g, admins, role:admin
g, admin, role:admin

View file

@ -1,31 +0,0 @@
# ExternalSecret for ArgoCD OIDC client secret (Authentik)
#
# Referenced from argocd-cm as $argocd-oidc-authentik:client-secret
# Must have app.kubernetes.io/part-of: argocd label for ArgoCD to read it
#
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: argocd-oidc-authentik
namespace: argocd
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: onepassword-blumeops
target:
name: argocd-oidc-authentik
creationPolicy: Owner
template:
metadata:
labels:
app.kubernetes.io/part-of: argocd
data:
- secretKey: client-secret
remoteRef:
conversionStrategy: Default
decodingStrategy: None
key: "Authentik (blumeops)"
metadataPolicy: None
property: argocd-client-secret

View file

@ -9,7 +9,6 @@ resources:
- https://raw.githubusercontent.com/argoproj/argo-cd/998fb59dc355653c0657908a6ea2f87136e022d1/manifests/install.yaml
- ingress-tailscale.yaml
- external-secret-repo-forge.yaml
- external-secret-oidc-authentik.yaml
patches:
- path: argocd-cmd-params-cm.yaml

View file

@ -262,14 +262,15 @@ data:
name: ArgoCD
authorization_flow: !Find [authentik_flows.flow, [slug, default-provider-authorization-implicit-consent]]
invalidation_flow: !Find [authentik_flows.flow, [slug, default-provider-invalidation-flow]]
client_type: confidential
client_type: public
client_id: argocd
client_secret: !Env AUTHENTIK_ARGOCD_CLIENT_SECRET
redirect_uris:
- matching_mode: strict
url: https://argocd.ops.eblu.me/auth/callback
- matching_mode: strict
url: https://argocd.tail8d86e.ts.net/auth/callback
- matching_mode: strict
url: http://localhost:8085/auth/callback
signing_key: !Find [authentik_crypto.certificatekeypair, [name, authentik Self-signed Certificate]]
property_mappings:
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, openid]]
@ -433,3 +434,93 @@ data:
provider: !KeyOf mealie-provider
meta_launch_url: https://meals.ops.eblu.me
policy_engine_mode: all
heph.yaml: |
version: 1
metadata:
name: BlumeOps Heph SSO
labels:
blueprints.goauthentik.io/description: "Hephaestus hub OIDC (device-code) provider, application, and device-code flow"
entries:
# Device-code flow (RFC 8628). authentik ships no default for this, so we
# create one and bind it to the brand below. An empty stage_configuration
# flow is sufficient: the already-authenticated user just confirms the code.
- model: authentik_flows.flow
id: device-code-flow
identifiers:
slug: default-device-code-flow
attrs:
name: Device code flow
title: Device code flow
slug: default-device-code-flow
designation: stage_configuration
authentication: require_authenticated
# Enable the device-code grant globally by binding the flow to the default
# brand (domain authentik-default). Partial update — only sets this field.
- model: authentik_brands.brand
identifiers:
domain: authentik-default
attrs:
flow_device_code: !KeyOf device-code-flow
# OAuth2 provider for heph — PUBLIC client (device-code + PKCE, no secret).
# client_id doubles as the token audience the hub verifies (--oidc-audience heph),
# and the app slug 'heph' is the issuer path (/application/o/heph/).
- model: authentik_providers_oauth2.oauth2provider
id: heph-provider
identifiers:
name: Heph
attrs:
name: Heph
authorization_flow: !Find [authentik_flows.flow, [slug, default-provider-authorization-implicit-consent]]
invalidation_flow: !Find [authentik_flows.flow, [slug, default-provider-invalidation-flow]]
client_type: public
client_id: heph
# CLI/TUI use the device-code grant (no redirect). The heph-pwa browser
# login uses Authorization Code + PKCE, which DOES redirect back to the
# app's origin — register those here (Authentik also keys token-endpoint
# CORS off these origins). Trailing slash matters: the PWA's redirect_uri
# is its base dir, e.g. https://heph.ops.eblu.me/.
redirect_uris:
- matching_mode: strict
url: https://heph.ops.eblu.me/
- matching_mode: strict
url: http://localhost:8787/ # local dev (hephd --web-root)
signing_key: !Find [authentik_crypto.certificatekeypair, [name, authentik Self-signed Certificate]]
property_mappings:
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, openid]]
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, email]]
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, profile]]
# offline_access: heph CLI requests "openid offline_access"; without
# this mapping the refresh token is session-bound and hephd's
# refresh_token grant 400s once the session lapses (spoke sync dies).
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, offline_access]]
sub_mode: hashed_user_id
include_claims_in_id_token: true
# Heph application — linked to the OAuth2 provider
- model: authentik_core.application
id: heph-app
identifiers:
slug: heph
attrs:
name: Hephaestus
slug: heph
provider: !KeyOf heph-provider
meta_launch_url: https://heph.ops.eblu.me
policy_engine_mode: any
# Policy binding — restrict heph to admins group (single-owner, sensitive data)
- model: authentik_policies.policybinding
identifiers:
order: 0
target: !KeyOf heph-app
group: !Find [authentik_core.group, [name, admins]]
attrs:
target: !KeyOf heph-app
group: !Find [authentik_core.group, [name, admins]]
order: 0
enabled: true
negate: false
timeout: 30

View file

@ -75,11 +75,6 @@ spec:
secretKeyRef:
name: authentik-config
key: jellyfin-client-secret
- name: AUTHENTIK_ARGOCD_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: authentik-config
key: argocd-client-secret
- name: AUTHENTIK_MEALIE_CLIENT_SECRET
valueFrom:
secretKeyRef:

View file

@ -53,10 +53,6 @@ spec:
remoteRef:
key: "Authentik (blumeops)"
property: jellyfin-client-secret
- secretKey: argocd-client-secret
remoteRef:
key: "Authentik (blumeops)"
property: argocd-client-secret
- secretKey: mealie-client-secret
remoteRef:
key: "Authentik (blumeops)"

View file

@ -1,51 +0,0 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cv
namespace: cv
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app: cv
template:
metadata:
labels:
app: cv
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: cv
image: registry.ops.eblu.me/blumeops/cv:kustomized
ports:
- containerPort: 80
name: http
env:
- name: CV_RELEASE_URL
value: "https://forge.eblu.me/api/packages/eblume/generic/cv/v1.0.3/cv-v1.0.3.tar.gz"
resources:
requests:
memory: "64Mi"
cpu: "10m"
limits:
memory: "128Mi"
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 5
periodSeconds: 10

View file

@ -1,27 +0,0 @@
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: cv-tailscale
namespace: cv
annotations:
tailscale.com/proxy-class: "default"
tailscale.com/proxy-group: "ingress"
tailscale.com/tags: "tag:k8s,tag:flyio-target"
gethomepage.dev/enabled: "true"
gethomepage.dev/name: "CV"
gethomepage.dev/group: "Services"
gethomepage.dev/icon: "mdi-file-document"
gethomepage.dev/description: "Resume / CV"
gethomepage.dev/href: "https://cv.eblu.me"
gethomepage.dev/pod-selector: "app=cv"
spec:
ingressClassName: tailscale
defaultBackend:
service:
name: cv
port:
number: 80
tls:
- hosts:
- cv

View file

@ -1,12 +0,0 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: cv
resources:
- deployment.yaml
- service.yaml
- ingress-tailscale.yaml
- pdb.yaml
images:
- name: registry.ops.eblu.me/blumeops/cv
newTag: v1.0.3-613f05d

View file

@ -1,10 +0,0 @@
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: cv
spec:
minAvailable: 1
selector:
matchLabels:
app: cv

View file

@ -1,13 +0,0 @@
---
apiVersion: v1
kind: Service
metadata:
name: cv
namespace: cv
spec:
selector:
app: cv
ports:
- name: http
port: 80
targetPort: 80

View file

@ -0,0 +1,97 @@
# PostgreSQL Cluster for blumeops services on ringtail k3s.
#
# Wave-1 indri-k8s decommission target (see [[migrate-wave1-ringtail]]).
# Holds the paperless and teslamate databases migrated off the minikube
# blumeops-pg via cold pg_dump/pg_restore at cutover. miniflux + authentik
# stay where they are for now (later waves), so this cluster only carries
# the wave-1 roles.
#
# Apps reach this in-cluster at blumeops-pg-rw.databases.svc.cluster.local
# — the same name they used on minikube, so teslamate's DATABASE_HOST is
# unchanged.
#
# Database creation is deferred to cutover, mirroring the minikube cluster
# (where only the bootstrap database is declared and the rest were created
# out-of-band):
# - paperless: the bootstrap database below (restored into at cutover).
# - teslamate: created at its cutover by the eblume superuser, because the
# dump's `earthdistance` extension is untrusted and CREATE EXTENSION
# needs superuser. (cube + earthdistance ownership then transferred to
# the teslamate role so it can ALTER EXTENSION UPDATE.)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: blumeops-pg
namespace: databases
spec:
instances: 1
imageName: ghcr.io/cloudnative-pg/postgresql:18.3
storage:
size: 10Gi
storageClass: local-path
bootstrap:
initdb:
database: paperless
owner: paperless
managed:
roles:
# eblume superuser for admin + privileged restore steps (extensions)
- name: eblume
login: true
superuser: true
createdb: true
createrole: true
connectionLimit: -1
ensure: present
inherit: true
passwordSecret:
name: blumeops-pg-eblume
# borgmatic read-only user for backups
- name: borgmatic
login: true
connectionLimit: -1
ensure: present
inherit: true
inRoles:
- pg_read_all_data
passwordSecret:
name: blumeops-pg-borgmatic
# paperless user (also the bootstrap database owner above; the
# managed role sets its password from the 1Password-backed secret)
- name: paperless
login: true
connectionLimit: -1
ensure: present
inherit: true
passwordSecret:
name: blumeops-pg-paperless
# teslamate user. Extension ownership (cube, earthdistance) is
# transferred to this role at cutover so it can ALTER EXTENSION UPDATE.
- name: teslamate
login: true
connectionLimit: -1
ensure: present
inherit: true
passwordSecret:
name: blumeops-pg-teslamate
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
postgresql:
parameters:
max_connections: "50"
shared_buffers: "128MB"
password_encryption: "scram-sha-256"
pg_hba:
# Password auth from anywhere; network security is via Tailscale.
- host all all 0.0.0.0/0 scram-sha-256
- host all all ::/0 scram-sha-256

View file

@ -1,13 +1,14 @@
# ExternalSecret for borgmatic backup user password on immich-pg cluster
# ExternalSecret for borgmatic backup user password
#
# Replaces the manual op inject workflow from secret-borgmatic.yaml.tpl
#
# Reuses the same 1Password item as blumeops-pg-borgmatic.
# 1Password item: "borgmatic" in blumeops vault
# Field: "db-password"
#
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: immich-pg-borgmatic
name: blumeops-pg-borgmatic
namespace: databases
spec:
refreshInterval: 1h
@ -15,7 +16,7 @@ spec:
kind: ClusterSecretStore
name: onepassword-blumeops
target:
name: immich-pg-borgmatic
name: blumeops-pg-borgmatic
creationPolicy: Owner
template:
type: kubernetes.io/basic-auth

View file

@ -0,0 +1,30 @@
# ExternalSecret for eblume superuser password
#
# Replaces the manual op inject workflow from secret-eblume.yaml.tpl
#
# 1Password item: "postgres" in blumeops vault
# Field: "password"
#
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: blumeops-pg-eblume
namespace: databases
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: onepassword-blumeops
target:
name: blumeops-pg-eblume
creationPolicy: Owner
template:
type: kubernetes.io/basic-auth
data:
username: eblume
password: "{{ .password }}"
data:
- secretKey: password
remoteRef:
key: postgres
property: password

View file

@ -0,0 +1,32 @@
# ExternalSecret for borgmatic backup user password on immich-pg cluster
# (ringtail k3s).
#
# Mirror of argocd/manifests/databases/external-secret-immich-borgmatic.yaml.
# The onepassword-blumeops ClusterSecretStore exists on ringtail via the
# external-secrets-ringtail app.
#
# 1Password item: "borgmatic" in blumeops vault
# Field: "db-password"
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: immich-pg-borgmatic
namespace: databases
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: onepassword-blumeops
target:
name: immich-pg-borgmatic
creationPolicy: Owner
template:
type: kubernetes.io/basic-auth
data:
username: borgmatic
password: "{{ .password }}"
data:
- secretKey: password
remoteRef:
key: borgmatic
property: db-password

View file

@ -0,0 +1,53 @@
# PostgreSQL Cluster for Immich on ringtail k3s.
#
# Initially bootstrapped via CNPG pg_basebackup from the minikube
# immich-pg cluster on 2026-05-13, then promoted to primary. The
# externalClusters + bootstrap.pg_basebackup blocks have been pruned
# from this manifest now that the migration is complete — leaving
# them around is a footgun (re-enabling replica.enabled=true would
# try to demote this cluster against a stale source). See
# [[immich-pg-data-migration]] for the procedure used.
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: immich-pg
namespace: databases
spec:
instances: 1
imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0
storage:
size: 10Gi
storageClass: local-path
# Managed roles
managed:
roles:
- name: borgmatic
login: true
connectionLimit: -1
ensure: present
inherit: true
inRoles:
- pg_read_all_data
passwordSecret:
name: immich-pg-borgmatic
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
postgresql:
shared_preload_libraries:
- "vchord.so"
parameters:
max_connections: "50"
shared_buffers: "128MB"
password_encryption: "scram-sha-256"
pg_hba:
- host all all 0.0.0.0/0 scram-sha-256
- host all all ::/0 scram-sha-256

View file

@ -0,0 +1,16 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: databases
resources:
- immich-pg.yaml
- external-secret-immich-borgmatic.yaml
- service-immich-pg-tailscale.yaml
# wave-1 indri-k8s decommission: blumeops-pg (paperless + teslamate)
- blumeops-pg.yaml
- service-blumeops-pg-tailscale.yaml
- external-secret-eblume.yaml
- external-secret-borgmatic.yaml
- external-secret-paperless.yaml
- external-secret-teslamate.yaml

View file

@ -0,0 +1,24 @@
# Tailscale LoadBalancer for the ringtail blumeops-pg cluster.
# Canonical hostname: blumeops-pg-ringtail.tail8d86e.ts.net (distinct from
# the minikube blumeops-pg, which still owns pg.tail8d86e.ts.net until the
# wave-1 decommission). Borgmatic on indri and the Grafana TeslaMate
# datasource reach it via the Caddy L4 route pg.ops.eblu.me:5434.
apiVersion: v1
kind: Service
metadata:
name: blumeops-pg-tailscale
namespace: databases
annotations:
tailscale.com/hostname: "blumeops-pg-ringtail"
tailscale.com/proxy-class: "default"
spec:
type: LoadBalancer
loadBalancerClass: tailscale
selector:
cnpg.io/cluster: blumeops-pg
role: primary
ports:
- name: postgresql
port: 5432
targetPort: 5432
protocol: TCP

View file

@ -1,6 +1,8 @@
# Tailscale LoadBalancer for immich-pg PostgreSQL access
# Canonical hostname: immich-pg.tail8d86e.ts.net
# Caddy L4 proxies pg.ops.eblu.me:5433 → this service for borgmatic backups
# Tailscale LoadBalancer for immich-pg PostgreSQL access on ringtail.
# Canonical hostname: immich-pg.tail8d86e.ts.net (claimed from the
# minikube side after the minikube service was removed during the
# immich-to-ringtail migration). Borgmatic on indri uses this
# hostname for nightly backups.
apiVersion: v1
kind: Service
metadata:

View file

@ -44,18 +44,9 @@ spec:
- pg_read_all_data
passwordSecret:
name: blumeops-pg-borgmatic
# teslamate user for TeslaMate Tesla data logger
# Superuser removed. Extension ownership (cube, earthdistance)
# transferred manually so teslamate can ALTER EXTENSION UPDATE.
# earthdistance is untrusted — DROP+CREATE needs temporary
# superuser escalation during upgrades.
- name: teslamate
login: true
connectionLimit: -1
ensure: present
inherit: true
passwordSecret:
name: blumeops-pg-teslamate
# teslamate + paperless roles removed: migrated to ringtail blumeops-pg
# (wave-1 decommission). Their databases were dropped from this cluster
# after the cutover was verified and backed up.
# authentik user for Authentik identity provider (runs on ringtail)
- name: authentik
login: true
@ -65,14 +56,6 @@ spec:
createdb: true
passwordSecret:
name: blumeops-pg-authentik
# paperless user for Paperless-ngx document management
- name: paperless
login: true
connectionLimit: -1
ensure: present
inherit: true
passwordSecret:
name: blumeops-pg-paperless
# Resource limits for minikube environment
resources:

View file

@ -1,69 +0,0 @@
# PostgreSQL Cluster for Immich
# Uses VectorChord (successor to pgvecto.rs) for AI-powered vector search
# See: https://github.com/immich-app/immich/discussions/9060
# Managed by CloudNativePG operator
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: immich-pg
namespace: databases
spec:
instances: 1
# VectorChord image for PostgreSQL 17 with VectorChord 0.5.0
# Immich v2.4.1 requires VectorChord >=0.3 <0.6
# See: https://github.com/tensorchord/VectorChord
imageName: ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0
storage:
size: 10Gi
storageClass: standard
# Bootstrap creates initial database and owner
bootstrap:
initdb:
database: immich
owner: immich
postInitSQL:
# Extensions required by Immich
- CREATE EXTENSION IF NOT EXISTS vector;
- CREATE EXTENSION IF NOT EXISTS vchord CASCADE;
- CREATE EXTENSION IF NOT EXISTS cube CASCADE;
- CREATE EXTENSION IF NOT EXISTS earthdistance CASCADE;
# Managed roles
# Note: connectionLimit, ensure, inherit are CNPG defaults added to prevent ArgoCD drift
managed:
roles:
# borgmatic read-only user for backups
- name: borgmatic
login: true
connectionLimit: -1
ensure: present
inherit: true
inRoles:
- pg_read_all_data
passwordSecret:
name: immich-pg-borgmatic
# Resource limits for minikube environment
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
# PostgreSQL configuration
postgresql:
# VectorChord requires vchord.so in shared_preload_libraries
shared_preload_libraries:
- "vchord.so"
parameters:
max_connections: "50"
shared_buffers: "128MB"
password_encryption: "scram-sha-256"
pg_hba:
# Allow connections from k8s pods
- host all all 0.0.0.0/0 scram-sha-256
- host all all ::/0 scram-sha-256

View file

@ -5,13 +5,8 @@ namespace: databases
resources:
- blumeops-pg.yaml
- immich-pg.yaml
- service-tailscale.yaml
- service-immich-pg-tailscale.yaml
- service-metrics-tailscale.yaml
- external-secret-eblume.yaml
- external-secret-borgmatic.yaml
- external-secret-immich-borgmatic.yaml
- external-secret-teslamate.yaml
- external-secret-authentik.yaml
- external-secret-paperless.yaml

View file

@ -1,72 +0,0 @@
# devpi PyPI Caching Proxy
devpi-server running in Kubernetes, providing:
- PyPI caching proxy at `root/pypi`
- Private package hosting at `eblume/dev`
## Setup
### 1. Create the root password secret
```fish
kubectl create namespace devpi
op inject -i argocd/manifests/devpi/secret-root.yaml.tpl | kubectl apply -f -
```
### 2. Deploy via ArgoCD
```fish
argocd app sync apps
argocd app sync devpi
```
The container will auto-initialize on first startup using the root password from the secret.
### 3. Create user and index (first time only)
After the pod is running:
```fish
# Login to devpi as root
uvx --from devpi-client devpi use https://pypi.tail8d86e.ts.net
uvx --from devpi-client devpi login root
# Enter root password when prompted
# Create eblume user (prompts for password - use the one from 1Password)
uvx --from devpi-client devpi user -c eblume email=blume.erich@gmail.com
# Create private index inheriting from PyPI
uvx --from devpi-client devpi index -c eblume/dev bases=root/pypi
```
## Usage
### As pip index (caching proxy)
Configure `~/.config/pip/pip.conf`:
```ini
[global]
index-url = https://pypi.tail8d86e.ts.net/root/pypi/+simple/
trusted-host = pypi.tail8d86e.ts.net
```
### Upload private packages
```fish
cd ~/code/personal/your-package
uv build
uv publish --publish-url https://pypi.tail8d86e.ts.net/eblume/dev/
```
## URLs
- Web UI: https://pypi.tail8d86e.ts.net
- PyPI cache: https://pypi.tail8d86e.ts.net/root/pypi/+simple/
- Private index: https://pypi.tail8d86e.ts.net/eblume/dev/+simple/
## Credentials
Stored in 1Password vault `blumeops`, item `kyhzfifryqnuk7jeyibmmjvxxm`:
- `root password` - devpi root user
- `password` - eblume user password

View file

@ -1,25 +0,0 @@
# ExternalSecret for devpi root password
#
# Replaces the manual op inject workflow from secret-root.yaml.tpl
#
# 1Password item: "devpi" in blumeops vault
# Field: "root password"
#
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: devpi-root
namespace: devpi
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: onepassword-blumeops
target:
name: devpi-root
creationPolicy: Owner
data:
- secretKey: password
remoteRef:
key: devpi
property: root password

View file

@ -1,25 +0,0 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: devpi-tailscale
namespace: devpi
annotations:
tailscale.com/proxy-class: "default"
tailscale.com/proxy-group: "ingress"
gethomepage.dev/enabled: "true"
gethomepage.dev/name: "PyPI"
gethomepage.dev/group: "Infrastructure"
gethomepage.dev/icon: "pypi.png"
gethomepage.dev/description: "PyPI cache"
gethomepage.dev/href: "https://pypi.ops.eblu.me"
gethomepage.dev/pod-selector: "app=devpi"
spec:
ingressClassName: tailscale
defaultBackend:
service:
name: devpi
port:
number: 3141
tls:
- hosts:
- pypi

View file

@ -1,14 +0,0 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: devpi
resources:
- statefulset.yaml
- service.yaml
- ingress-tailscale.yaml
- external-secret.yaml
images:
- name: registry.ops.eblu.me/blumeops/devpi
newTag: v6.19.3-37b8a21

View file

@ -1,64 +0,0 @@
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: devpi
namespace: devpi
spec:
serviceName: devpi
replicas: 1
selector:
matchLabels:
app: devpi
template:
metadata:
labels:
app: devpi
spec:
securityContext:
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: devpi
image: registry.ops.eblu.me/blumeops/devpi:kustomized
env:
- name: DEVPI_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: devpi-root
key: password
- name: DEVPI_OUTSIDE_URL
value: "https://pypi.ops.eblu.me"
ports:
- containerPort: 3141
name: http
volumeMounts:
- name: data
mountPath: /devpi
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "2Gi" # High limit for initial PyPI index build, reclaimed after
cpu: "500m"
livenessProbe:
httpGet:
path: /+api
port: 3141
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /+api
port: 3141
initialDelaySeconds: 10
periodSeconds: 10
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi

View file

@ -1,51 +0,0 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: docs
namespace: docs
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app: docs
template:
metadata:
labels:
app: docs
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: docs
image: registry.ops.eblu.me/blumeops/quartz:kustomized
ports:
- containerPort: 80
name: http
env:
- name: DOCS_RELEASE_URL
value: "https://forge.eblu.me/eblume/blumeops/releases/download/v1.16.0/docs-v1.16.0.tar.gz"
resources:
requests:
memory: "64Mi"
cpu: "10m"
limits:
memory: "128Mi"
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 5
periodSeconds: 10

View file

@ -1,27 +0,0 @@
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: docs-tailscale
namespace: docs
annotations:
tailscale.com/proxy-class: "default"
tailscale.com/proxy-group: "ingress"
tailscale.com/tags: "tag:k8s,tag:flyio-target"
gethomepage.dev/enabled: "true"
gethomepage.dev/name: "Docs"
gethomepage.dev/group: "Services"
gethomepage.dev/icon: "mdi-book-open-page-variant"
gethomepage.dev/description: "BlumeOps Documentation"
gethomepage.dev/href: "https://docs.eblu.me"
gethomepage.dev/pod-selector: "app=docs"
spec:
ingressClassName: tailscale
defaultBackend:
service:
name: docs
port:
number: 80
tls:
- hosts:
- docs

View file

@ -1,12 +0,0 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: docs
resources:
- deployment.yaml
- service.yaml
- ingress-tailscale.yaml
- pdb.yaml
images:
- name: registry.ops.eblu.me/blumeops/quartz
newTag: v1.28.2-613f05d

View file

@ -1,10 +0,0 @@
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: docs
spec:
minAvailable: 1
selector:
matchLabels:
app: docs

View file

@ -1,13 +0,0 @@
---
apiVersion: v1
kind: Service
metadata:
name: docs
namespace: docs
spec:
selector:
app: docs
ports:
- name: http
port: 80
targetPort: 80

View file

@ -0,0 +1,16 @@
# Ringtail (amd64) overlay for external-secrets.
#
# Reuses the shared indri manifest as a base and only overrides the controller
# image to the nix-built amd64 variant (`-nix` tag). The base sets the arm64
# image (built via containers/external-secrets/container.py on indri's Dagger
# runner); ringtail's k3s is amd64 and needs the image built by
# containers/external-secrets/default.nix on the nix-container-builder.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../external-secrets
images:
- name: registry.ops.eblu.me/blumeops/external-secrets
newTag: v2.2.0-13895bb-nix

View file

@ -12,4 +12,5 @@ resources:
images:
- name: ghcr.io/external-secrets/external-secrets
newTag: v2.2.0
newName: registry.ops.eblu.me/blumeops/external-secrets
newTag: v2.2.0-13895bb

View file

@ -1,9 +1,8 @@
# Reviewed against v12.7.3 defaults (2026-03-30)
# Reviewed against v12.8.2 defaults (2026-04-20)
log:
level: info
runner:
file: /data/.runner
capacity: 2
timeout: 3h
shutdown_timeout: 3h
@ -13,7 +12,15 @@ runner:
TZ: America/Los_Angeles
container:
# Job execution image is set via RUNNER_LABELS in deployment.yaml
network: "host"
# Connect to DinD sidecar via TCP (not socket)
docker_host: tcp://127.0.0.1:2375
server:
connections:
forgejo:
url: https://forge.ops.eblu.me/
uuid: ${FORGEJO_RUNNER_UUID}
token: ${FORGEJO_RUNNER_TOKEN}
labels:
- k8s:docker://registry.ops.eblu.me/blumeops/runner-job-image:v0.20.6-50f8c2a

View file

@ -25,14 +25,6 @@ spec:
env:
- name: TZ
value: America/Los_Angeles
- name: DOCKER_HOST
value: tcp://localhost:2375
- name: FORGEJO_URL
value: "https://forge.ops.eblu.me"
- name: RUNNER_NAME
value: "k8s-runner"
- name: RUNNER_LABELS
value: "k8s:docker://registry.ops.eblu.me/blumeops/runner-job-image:v0.20.1-24f7512"
command:
- /bin/sh
- -c
@ -44,19 +36,11 @@ spec:
done
echo "Docker daemon ready"
# Register if not already registered
if [ ! -f /data/.runner ]; then
echo "Registering runner..."
forgejo-runner register \
--instance "$FORGEJO_URL" \
--token "$RUNNER_TOKEN" \
--name "$RUNNER_NAME" \
--labels "$RUNNER_LABELS" \
--no-interactive
fi
# Render config with credentials from ExternalSecret.
envsubst < /config/config.yaml > /tmp/config.yaml
# Start daemon
exec forgejo-runner daemon --config /config/config.yaml
exec forgejo-runner daemon --config /tmp/config.yaml
envFrom:
- secretRef:
name: forgejo-runner-env

View file

@ -1,11 +1,7 @@
# ExternalSecret for Forgejo Runner token
# ExternalSecret for Forgejo Runner credentials
#
# 1Password item: "Forgejo Secrets" in blumeops vault
# Field: runner_reg (runner registration token)
#
# Non-secret env vars (FORGEJO_URL, RUNNER_NAME, RUNNER_LABELS) live in the
# deployment spec so that changes (e.g. image version bumps) trigger a rollout
# automatically.
# Fields: runner_k8s_uuid, runner_k8s_token
#
apiVersion: external-secrets.io/v1
kind: ExternalSecret
@ -21,7 +17,11 @@ spec:
name: forgejo-runner-env
creationPolicy: Owner
data:
- secretKey: RUNNER_TOKEN
- secretKey: FORGEJO_RUNNER_UUID
remoteRef:
key: Forgejo Secrets
property: runner_reg
property: runner_k8s_uuid
- secretKey: FORGEJO_RUNNER_TOKEN
remoteRef:
key: Forgejo Secrets
property: runner_k8s_token

View file

@ -11,7 +11,7 @@ resources:
images:
- name: code.forgejo.org/forgejo/runner
newName: registry.ops.eblu.me/blumeops/forgejo-runner
newTag: v12.7.3-352b95c
newTag: v12.8.2-1425bf1
- name: docker
newTag: 27-dind

View file

@ -16,7 +16,7 @@ spec:
spec:
containers:
- name: frigate-notify
image: ghcr.io/0x2142/frigate-notify:kustomized
image: registry.ops.eblu.me/blumeops/frigate-notify:kustomized
env:
- name: TZ
value: America/Los_Angeles

View file

@ -17,8 +17,8 @@ images:
newTag: "1.37"
- name: ghcr.io/blakeblackshear/frigate
newTag: 0.17.1-tensorrt
- name: ghcr.io/0x2142/frigate-notify
newTag: v0.5.4
- name: registry.ops.eblu.me/blumeops/frigate-notify
newTag: v0.5.4-e928054-nix
configMapGenerator:
- name: frigate-config

View file

@ -0,0 +1,229 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-shower-apm
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
shower-apm.json: |
{
"annotations": { "list": [] },
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [],
"panels": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisLabel": "req/s",
"drawStyle": "line",
"fillOpacity": 20,
"lineInterpolation": "linear",
"lineWidth": 1,
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "normal" }
},
"mappings": [],
"thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
"unit": "reqps"
},
"overrides": []
},
"gridPos": { "h": 8, "w": 16, "x": 0, "y": 0 },
"id": 1,
"options": {
"legend": { "calcs": ["mean", "max"], "displayMode": "table", "placement": "right", "showLegend": true },
"tooltip": { "mode": "multi", "sort": "desc" }
},
"targets": [
{ "datasource": { "type": "prometheus", "uid": "prometheus" }, "expr": "sum by (status) (rate(flyio_nginx_http_requests_total{host=\"shower.eblu.me\"}[5m]))", "legendFormat": "{{status}}", "refId": "A" }
],
"title": "Request Rate by Status",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }, { "color": "yellow", "value": 0.01 }, { "color": "red", "value": 0.05 }] },
"unit": "percentunit"
},
"overrides": []
},
"gridPos": { "h": 4, "w": 8, "x": 16, "y": 0 },
"id": 2,
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"targets": [
{ "datasource": { "type": "prometheus", "uid": "prometheus" }, "expr": "sum(rate(flyio_nginx_http_requests_total{host=\"shower.eblu.me\",status=~\"5..\"}[5m])) / sum(rate(flyio_nginx_http_requests_total{host=\"shower.eblu.me\"}[5m]))", "refId": "A" }
],
"title": "Error Rate (5xx)",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }, { "color": "yellow", "value": 1 }, { "color": "red", "value": 5 }] },
"unit": "short"
},
"overrides": []
},
"gridPos": { "h": 4, "w": 4, "x": 16, "y": 4 },
"id": 3,
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"targets": [
{ "datasource": { "type": "prometheus", "uid": "prometheus" }, "expr": "sum(increase(flyio_nginx_http_requests_total{host=\"shower.eblu.me\",request_uri=~\"/admin/login.*\",status=~\"4..\"}[$__range]))", "refId": "A" }
],
"title": "Failed admin logins (range)",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
"unit": "reqps"
},
"overrides": []
},
"gridPos": { "h": 4, "w": 4, "x": 20, "y": 4 },
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"targets": [
{ "datasource": { "type": "prometheus", "uid": "prometheus" }, "expr": "sum(rate(flyio_nginx_http_requests_total{host=\"shower.eblu.me\"}[5m]))", "refId": "A" }
],
"title": "Current RPS",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisLabel": "seconds",
"drawStyle": "line",
"fillOpacity": 10,
"lineInterpolation": "linear",
"lineWidth": 1,
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "none" }
},
"mappings": [],
"thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
"unit": "s"
},
"overrides": []
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 8 },
"id": 5,
"options": {
"legend": { "calcs": ["mean", "max"], "displayMode": "table", "placement": "right", "showLegend": true },
"tooltip": { "mode": "multi", "sort": "desc" }
},
"targets": [
{ "datasource": { "type": "prometheus", "uid": "prometheus" }, "expr": "histogram_quantile(0.50, sum by (le) (rate(flyio_nginx_http_request_duration_seconds_bucket{host=\"shower.eblu.me\"}[5m])))", "legendFormat": "p50", "refId": "A" },
{ "datasource": { "type": "prometheus", "uid": "prometheus" }, "expr": "histogram_quantile(0.90, sum by (le) (rate(flyio_nginx_http_request_duration_seconds_bucket{host=\"shower.eblu.me\"}[5m])))", "legendFormat": "p90", "refId": "B" },
{ "datasource": { "type": "prometheus", "uid": "prometheus" }, "expr": "histogram_quantile(0.99, sum by (le) (rate(flyio_nginx_http_request_duration_seconds_bucket{host=\"shower.eblu.me\"}[5m])))", "legendFormat": "p99", "refId": "C" }
],
"title": "Latency Percentiles",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisLabel": "",
"drawStyle": "line",
"fillOpacity": 20,
"lineInterpolation": "linear",
"lineWidth": 1,
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "none" }
},
"mappings": [],
"thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
"unit": "Bps"
},
"overrides": []
},
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 8 },
"id": 6,
"options": {
"legend": { "calcs": ["mean", "max"], "displayMode": "table", "placement": "right", "showLegend": true },
"tooltip": { "mode": "single", "sort": "none" }
},
"targets": [
{ "datasource": { "type": "prometheus", "uid": "prometheus" }, "expr": "sum(rate(flyio_nginx_http_response_bytes_total{host=\"shower.eblu.me\"}[5m]))", "legendFormat": "Bandwidth", "refId": "A" }
],
"title": "Bandwidth",
"type": "timeseries"
},
{
"datasource": { "type": "loki", "uid": "loki" },
"gridPos": { "h": 8, "w": 24, "x": 0, "y": 16 },
"id": 7,
"options": {
"dedupStrategy": "none",
"enableLogDetails": true,
"prettifyLogMessage": false,
"showCommonLabels": false,
"showLabels": false,
"showTime": true,
"sortOrder": "Descending",
"wrapLogMessage": false
},
"targets": [
{ "datasource": { "type": "loki", "uid": "loki" }, "expr": "{instance=\"flyio-proxy\", job=\"flyio-nginx\"} |= \"shower.eblu.me\" | json | line_format \"{{.client_ip}} {{.request_method}} {{.request_uri}} {{.status}} {{.request_time}}s\"", "refId": "A" }
],
"title": "Recent Access Logs",
"type": "logs"
}
],
"refresh": "30s",
"schemaVersion": 38,
"tags": ["shower", "flyio", "apm"],
"templating": { "list": [] },
"time": { "from": "now-6h", "to": "now" },
"timepicker": {},
"timezone": "",
"title": "Shower APM",
"uid": "shower-apm",
"version": 1,
"weekStart": ""
}

View file

@ -22,6 +22,7 @@ resources:
- dashboards/configmap-transmission.yaml
- dashboards/configmap-cv-apm.yaml
- dashboards/configmap-docs-apm.yaml
- dashboards/configmap-shower-apm.yaml
- dashboards/configmap-flyio.yaml
- dashboards/configmap-sifaka-disks.yaml
- dashboards/configmap-forgejo.yaml

View file

@ -63,5 +63,7 @@ datasources:
password: $TESLAMATE_DB_PASSWORD
type: postgres
uid: TeslaMate
url: blumeops-pg-rw.databases.svc.cluster.local:5432
# teslamate DB migrated to ringtail blumeops-pg (wave-1); reached via the
# Caddy L4 route on indri (pg.ops.eblu.me:5434 -> blumeops-pg-ringtail).
url: pg.ops.eblu.me:5434
user: teslamate

View file

@ -14,7 +14,9 @@ spec:
app.kubernetes.io/name: grafana
app.kubernetes.io/instance: grafana
strategy:
type: RollingUpdate
# RWO PVC for SQLite + Bleve index — RollingUpdate spawns the new pod
# before the old one terminates, and it crashloops on the index lock.
type: Recreate
template:
metadata:
labels:
@ -156,7 +158,9 @@ spec:
- name: FOLDER
value: /tmp/dashboards
- name: RESOURCE
value: both
# ConfigMap-only — no dashboards are sourced from Secrets,
# so the ServiceAccount has no read access to secrets.
value: configmap
- name: FOLDER_ANNOTATION
value: grafana_folder
securityContext:
@ -183,7 +187,7 @@ spec:
- name: FOLDER
value: /tmp/dashboards
- name: RESOURCE
value: both
value: configmap
- name: FOLDER_ANNOTATION
value: grafana_folder
- name: REQ_USERNAME

View file

@ -7,7 +7,7 @@ metadata:
app.kubernetes.io/instance: grafana
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
resources: ["configmaps"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1

View file

@ -17,7 +17,7 @@ resources:
images:
- name: registry.ops.eblu.me/blumeops/homepage
newTag: v1.11.0-e375859
newTag: v1.11.0-678f26b-nix
configMapGenerator:
- name: homepage-config

View file

@ -1,3 +1,6 @@
# Homepage runs on ringtail (k3s) — its k8s autodiscovery only sees ringtail
# Ingresses (frigate→NVR, authentik, ntfy, ollama). Services that live on
# minikube (and indri-native) need explicit static entries here.
- Host Services:
- Forgejo:
href: https://forge.eblu.me
@ -12,6 +15,10 @@
href: https://registry.ops.eblu.me
icon: zot-registry
description: Container registry
- Devpi:
href: https://pypi.ops.eblu.me
icon: mdi-language-python
description: PyPI caching mirror
- Sifaka NAS:
href: https://nas.ops.eblu.me
icon: synology
@ -53,10 +60,6 @@
# type: caddy
# url: http://indri.tail8d86e.ts.net:2019
- Home:
- NVR:
href: https://nvr.ops.eblu.me
icon: frigate.png
description: Network video recorder
- Jellyfin:
href: https://jellyfin.ops.eblu.me
icon: jellyfin
@ -68,12 +71,62 @@
enableBlocks: true
enableNowPlaying: false
fields: ["movies", "series", "episodes"]
- DJ:
href: https://dj.ops.eblu.me
icon: navidrome.png
description: Music streaming server
widget:
type: navidrome
url: https://dj.ops.eblu.me
user: "{{HOMEPAGE_VAR_NAVIDROME_USER}}"
token: "{{HOMEPAGE_VAR_NAVIDROME_TOKEN}}"
salt: "{{HOMEPAGE_VAR_NAVIDROME_SALT}}"
- Content:
- Kiwix:
href: https://kiwix.ops.eblu.me
icon: kiwix.png
description: Offline Wikipedia
- Miniflux:
href: https://feed.ops.eblu.me
icon: miniflux.png
description: RSS reader
widget:
type: miniflux
url: https://feed.ops.eblu.me
key: "{{HOMEPAGE_VAR_MINIFLUX_API_KEY}}"
fields: ["unread"]
- Infrastructure:
- Authentik:
href: https://authentik.ops.eblu.me
icon: authentik
description: Identity provider
- Ntfy:
href: https://ntfy.ops.eblu.me
icon: ntfy.png
description: Push notifications
- ArgoCD:
href: https://argocd.ops.eblu.me
icon: argo-cd.png
description: GitOps CD
- Grafana:
href: https://grafana.ops.eblu.me
icon: grafana.png
description: Metrics dashboards
widget:
type: grafana
url: https://grafana.ops.eblu.me
username: "{{HOMEPAGE_VAR_GRAFANA_USERNAME}}"
password: "{{HOMEPAGE_VAR_GRAFANA_PASSWORD}}"
fields: ["dashboards", "totalalerts", "alertstriggered"]
- Prometheus:
href: https://prometheus.ops.eblu.me
icon: prometheus.png
description: Metrics storage
- Services:
# CV and Docs were previously auto-discovered from k8s Ingresses; after
# the indri-native migration ([[cv-on-indri]], [[docs-on-indri]]) there
# is no Ingress to discover, so they live here as static entries.
- CV:
href: https://cv.eblu.me
icon: mdi-file-document
description: Resume / CV
- Docs:
href: https://docs.eblu.me
icon: mdi-book-open-page-variant
description: BlumeOps Documentation
- Transmission:
href: https://torrent.ops.eblu.me
icon: transmission.png
description: Torrent client

View file

@ -16,11 +16,16 @@ spec:
app: immich
component: machine-learning
spec:
runtimeClassName: nvidia
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: machine-learning
# ringtail uses the -cuda tag (set in kustomization.yaml)
# to take advantage of the RTX 4080 via the nvidia
# device plugin. Time-slicing is configured for 4 replicas
# so frigate + ollama + this pod can share.
image: ghcr.io/immich-app/immich-machine-learning:kustomized
ports:
- name: http
@ -57,6 +62,7 @@ spec:
cpu: "100m"
limits:
memory: "4Gi"
nvidia.com/gpu: "1"
volumes:
- name: cache
persistentVolumeClaim:

Some files were not shown because too many files have changed in this diff Show more