blumeops

Author	SHA1	Message	Date
Erich Blume	eaa899cfc6	C0: wave-1 decommission follow-ups (argocd admin RBAC, teslamate probe) - argocd: grant local break-glass admin the admin role (g, admin, role:admin); previously only the Authentik admins group had access, locking out admin once its token expired (policy.default is unset). - alloy-k8s: repoint the teslamate blackbox probe from the deleted minikube service to https://tesla.ops.eblu.me/ (Caddy over Tailscale), like immich. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 13:02:05 -07:00
Erich Blume	46f0002178	Decommission wave-1 minikube services (paperless, teslamate, mealie) (#365 ) Final step of the wave-1 indri-k8s migration. paperless, teslamate, mealie run on ringtail with data migrated, verified, and backed up (local + BorgBase offsite via PR #364). - Remove minikube paperless/teslamate/mealie manifest dirs + ArgoCD app defs (prunes the parked Deployments/Services + redundant minikube mealie/paperless PVCs) - Drop paperless/teslamate roles + ExternalSecrets from the minikube blumeops-pg cluster - miniflux + authentik stay on minikube (later waves) Finalization after merge: sync apps + databases to prune, then DROP DATABASE paperless/teslamate on indri's blumeops-pg (fresh safety dump taken first). Reviewed-on: #365	2026-06-03 12:36:06 -07:00
Erich Blume	44798a6429	C0: mealie-ringtail image rebuilt from main (e0057b4-nix) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 12:26:55 -07:00
Erich Blume	e0057b46e4	Wire ringtail blumeops-pg into backups + Grafana (#364 ) Prereq for the wave-1 decommission. The cutover moved paperless+teslamate (postgres) and mealie (SQLite) to ringtail, but borgmatic and the Grafana TeslaMate datasource still pointed at the minikube copies — the migrated live data was unbacked since cutover, and dropping the minikube DBs would break the TeslaMate dashboards. - Tailscale Service `blumeops-pg-ringtail` + Caddy L4 route `pg.ops.eblu.me:5434` - borgmatic: teslamate + paperless postgres → :5434; mealie SQLite → ssh:eblume@ringtail - Grafana TeslaMate datasource → pg.ops.eblu.me:5434 Deploy: sync databases-ringtail (tailscale svc) + grafana from branch; provision-indri --tags caddy,borgmatic; verify a backup run + dashboards. Unblocks the decommission PR. Reviewed-on: #364	2026-06-03 12:25:30 -07:00
Erich Blume	92b54e7ba9	C0: ringtail wave-1 images rebuilt from main (fcac8e5-nix tags) Post-merge rebuild of paperless/mealie/teslamate Nix images at the main merge commit, replacing the feature-branch -nix tags. Image content is identical; only the commit-sha suffix changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 10:36:15 -07:00
Erich Blume	fcac8e5a72	Wave 1 indri→ringtail migration: paperless, teslamate, mealie (#363 ) Migrate paperless, teslamate, and mealie off the OOM-saturated minikube-indri node onto ringtail k3s, shedding ~1.1 GiB of resident load. Second chain in the indri-k8s decommission after immich. Containers ported to Nix (default.nix), build-verified on ringtail: - paperless → wraps nixpkgs paperless-ngx 2.20.15 (pinned unstable); runs as web/worker/beat/consumer - mealie → wraps nixpkgs mealie 3.16.0 (forward 4-minor bump, breaking-change reviewed); single gunicorn, SQLite - teslamate → from-scratch beamPackages mixRelease (not in nixpkgs); erlang_27+elixir_1_18, npm assets, ex_cldr locales pre-fetched Data: cold downtime-tolerant cutover. paperless+teslamate postgres dump/restore from quiesced source into a new ringtail blumeops-pg CNPG cluster; mealie SQLite PVC copied. Source DBs untouched until verified (rollback = repoint). Also: ringtail blumeops-pg cluster + ExternalSecrets scaffold; fixes pre-existing shower version-check drift. Runbook: docs/how-to/ringtail/migrate-wave1-ringtail.md. Deploy-from-branch + cutover happens before merge; container images rebuilt from main after merge. Reviewed-on: #363	2026-06-03 10:34:00 -07:00
Erich Blume	f588638331	C0: rebuild valkey from squashed main commit Image tags from PR #362 (v8.1.7-02859c5{,-nix}) referenced a branch SHA that no longer exists on main after squash-merge. Rebuilt both the dagger arm64 and nix amd64 variants from the squashed commit (`ecded30`) and updated paperless + immich-ringtail to the new tags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 14:53:21 -07:00
Erich Blume	ecded30073	Make valkey local on ringtail (nix amd64) + bump to 8.1.7 (#362 ) ## Summary Weekly "make one non-local container local" pickup: immich-ringtail still pulled `docker.io/valkey/valkey:8.1.6` because the existing `containers/valkey/container.py` build was arm64-only. - Adds `containers/valkey/default.nix` — nix-built amd64 valkey image, packaged by the ringtail nix-container-builder runner using `pkgs.dockerTools.buildLayeredImage`. Mirrors the existing `containers/authentik-redis/default.nix` pattern. - `containers/valkey/container.py` keeps building the Alpine arm64 image for paperless on indri. Bumped both builds to upstream valkey 8.1.7 (Alpine 3.22 now ships `8.1.7-r0`; nixpkgs has 8.1.7). - Splits `VERSION` (upstream app) from `ALPINE_PIN` (apk pin) in `container.py` so both build files can declare the same upstream version and pass `container-version-check`. - Updates `service-versions.yaml`: current-version 8.1.7, refreshed last-reviewed, upstream-source now points at the canonical valkey-io releases page. - Switches kustomizations: - `immich-ringtail/kustomization.yaml`: `docker.io/valkey/valkey:8.1.6` → `registry.ops.eblu.me/blumeops/valkey:v8.1.7-02859c5-nix`, comment updated. - `paperless/kustomization.yaml`: `v8.1.6-r0-fabca04` → `v8.1.7-02859c5`. ## Build build-container run #563 — both jobs succeeded after a transient runner crash on the first dispatch (#562 build-nix), which surfaced two separate bugs that landed in a separate C0 on main: - `runner-logs` silently returned 0 with no output when the log file didn't exist on indri - `ssh indri` swallowing remote exit codes (fish login shell), which the wrapper now works around via a stdout marker ## Test plan - [ ] `argocd app set immich-ringtail --revision valkey-nix && argocd app sync immich-ringtail` - [ ] `argocd app set paperless --revision valkey-nix && argocd app sync paperless` - [ ] Both valkey pods come Ready and start serving on :6379 - [ ] Immich app + paperless can read/write their respective cache - [ ] After merge: rebuild from squashed main commit + update kustomization tags (squash-tag follow-up) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #362	2026-05-28 14:51:09 -07:00
Erich Blume	e703d25efe	C0: rebuild unpoller container from squashed main commit Image was previously tagged with the unpoller-v3 branch SHA (`1b27242`), which doesn't exist in main's history after squash-merge. Rebuilt from the squashed commit so the tag references a reachable commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 10:10:21 -07:00
Erich Blume	4d1f4af25b	Upgrade unpoller v2.34.0 → v3.2.0, migrate to container.py (#361 ) ## Summary - Service Review pickup: unpoller (last reviewed 73 days ago). - Upgrades unpoller from v2.34.0 to v3.2.0 (major version bump). - Migrates the container build from a Dockerfile to a native Dagger pipeline (`containers/unpoller/container.py`) following the navidrome / miniflux pattern. - Refreshes `service-versions.yaml` (last-reviewed, current-version). ## Breaking changes (upstream) - v3.0.0 — UniFi network API shifts (later 10.x). Some metric / event / log names and labels may have changed. Worth a follow-up sweep of the unpoller Grafana dashboard for missing series. - v3.2.0 — defaults to a 60s background poll feeding cached Prometheus scrapes (was on-demand poll per scrape). To restore previous behavior, set `interval = 0` in `up.conf`. Leaving the new default in this PR — every-15s scrapes will simply serve from cache, which is fine for our use. ## Build - Image: `registry.ops.eblu.me/blumeops/unpoller:v3.2.0-1b27242` - Built by build-container workflow run #559 from this branch. ## Test plan - [ ] `argocd app set unpoller --revision unpoller-v3 && argocd app sync unpoller` - [ ] Pod comes Ready - [ ] Verify metrics exported (`Site/Client/UAP/USG/USW` counts in logs, `unpoller_*` series in Prometheus) - [ ] Spot-check unpoller Grafana dashboard for missing series after the v3 API shift - [ ] After merge: `argocd app set unpoller --revision main && argocd app sync unpoller` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #361	2026-05-28 09:59:46 -07:00
Erich Blume	ee51bcafb4	Rip out compensating-controls framework (#359 ) ## Summary Removes the compensating-controls (CC) framework. Prowler and Kingfisher continue to run weekly and produce reports; the Prowler mutelist YAML files stay in place but no longer carry \`CC: <id>\` prefixes — each entry now just keeps a free-form \`Description\` of why it's muted. The CC review cadence proved to be more process overhead than this single-operator homelab needed. ## What changed Deleted - \`compensating-controls.yaml\` — the CC registry - \`mise-tasks/review-compensating-controls\` — the staleness-review task - \`docs/how-to/operations/review-compensating-controls.md\` - \`docs/how-to/operations/record-review-evidence.md\` (was aspirational) - \`docs/explanation/compliance-mute-categories.md\` (proposed-future CC/NA/RA work) - 5 orphan \`+review-cc-\` / \`+compliance-mute-categories\` changelog fragments Modified* - 6 mutelist YAML files: stripped \`CC: <id>.\` prefix from every \`Description\` / \`statement\` field, kept the free-form text - \`mise-tasks/review-compliance-reports\`: removed CC mentions from docstrings, panel text, and the node-verification table title. Node-verification logic itself is unchanged. - \`docs/reference/operations/security.md\`: removed the "Compensating controls" section - \`docs/how-to/operations/read-compliance-reports.md\`: rewrote step 3 of "Acting on findings" to point at the mutelist YAML directly - \`docs/changelog.d/prowler-iac-mutelist.infra.md\`: rewrote to drop the "two new compensating controls" framing ## What did not change - All Prowler manifests (cronjobs, RBAC, PVs, kustomization) — scans still run on the same schedule - The Kingfisher deployment - The trivy-shim in the Prowler container — that's about Trivy ignorefile plumbing, independent of the CC concept - The mutelist entries themselves — each \`Resources\` list is unchanged; only the prose of \`Description\` was edited - \`CHANGELOG.md\` — historical releases are left as-is ## Test plan - [ ] Wait for human review before deploying — once merged, re-point ArgoCD: \`argocd app set prowler --revision main && argocd app sync prowler\` (no manifest changes besides the ConfigMap, so impact is limited to muted-finding descriptions in next week's report) - [ ] Confirm next weekly Prowler K8s CIS run (Sunday 3am) still completes and produces a report on sifaka - [ ] Confirm next weekly Prowler IaC run still honors \`trivyignore.yaml\` (the trivy shim is untouched but the ignorefile content was rewritten) - [ ] \`mise run review-compliance-reports\` — verify node-verification block still runs and prints the renamed table title Reviewed-on: #359	2026-05-22 21:08:53 -07:00
Erich Blume	2fae0f7161	C0: switch grafana deployment to Recreate strategy Grafana uses an RWO PVC for SQLite + Bleve search index. RollingUpdate spawns the new pod before terminating the old one, so the new pod crashloops on the index lock until rollout timeout. Recreate terminates the old pod first, letting the new pod acquire the lock cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 06:33:26 -07:00
Erich Blume	1897eb1c5b	C0: move immich blackbox probe to ringtail alloy Immich migrated to ringtail's k3s cluster but the probe still targeted the in-cluster service DNS on indri's minikube, firing ServiceProbeFailure indefinitely. Moved the target into alloy-ringtail's config so the probe runs in the cluster where immich actually lives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:46:22 -07:00
Erich Blume	e222d47d45	C0: deploy shower v1.1.3 (kustomize newTag bump) Image v1.1.3-3645098-nix was built directly on ringtail and pushed via skopeo, bypassing the Forgejo runner: indri was severely overloaded (load avg 24.92, minikube VM at 344% CPU) and the workflow-dispatch endpoint timed out. The image content is identical to what the runner would have produced — same default.nix at commit `3645098` (on main), same NIX_PATH (current nixpkgs flake), same skopeo invocation. Tag short-sha matches the commit that defines the recipe so we aren't pinning to a ghost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 20:09:54 -07:00
Erich Blume	815a0cc6e6	C0: shower — rebuild from main SHA (post-merge retag) PR #358 was squash-merged so the branch commit `b8c7783` baked into the prior image tag isn't reachable from main's history. Rebuild from main HEAD (`a33fa47`) and retag. Image content is byte-identical (FOD is content-addressed, inputs unchanged); only the SHA in the tag changes so future provenance tracing stays on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 06:57:24 -07:00
Erich Blume	a33fa47b80	C1: deploy shower v1.1.2 (#358 ) ## Summary Deploys `adelaide-baby-shower-app` v1.1.2 to ringtail k3s. - Bumps `containers/shower/default.nix` `version` to 1.1.2. - Refreshes sdist + wheel `fetchurl` hashes against the forge PyPI artifacts. - Re-probed FOD `outputHash` on the nix-container-builder runner (ringtail) and pinned the new closure hash. - Bumps kustomize `newTag` to `v1.1.2-b8c7783-nix` (built from this branch's tip). - Bumps `service-versions.yaml` entry for shower to `1.1.2` / `last-reviewed: 2026-05-15`. ## Build provenance Built by Forgejo Actions run #553 on `nix-container-builder` (ringtail) at commit ``b8c7783``. After merge a C0 follow-on will rebuild from main and retag so future provenance points at main history. ## Test plan - [ ] `argocd app set shower --revision shower-v1.1.2 && argocd app sync shower` deploys cleanly - [ ] Pod migrates the SQLite PV and serves at `shower.ops.eblu.me` / `shower.eblu.me` - [ ] No new errors in pod logs after `collectstatic` + gunicorn boot Reviewed-on: #358	2026-05-15 06:50:46 -07:00
Erich Blume	4d2bc9975f	C0: deploy shower v1.1.1 (kustomize newTag bump) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 20:51:10 -07:00
Erich Blume	947e4310c3	C2: migrate immich from minikube to ringtail (mikado chain) (#356 ) ## Summary C2 Mikado chain to move the entire Immich stack (server, ML, valkey, postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the largest single tenant on minikube (~1.5 GiB resident) and minikube is currently memory-saturated (97% RAM, swapping). This is the first concrete chain in the broader indri-k8s decommission effort. This PR contains the planning layer only — 7 cards (1 goal + 6 prerequisites). Implementation cycles follow per the Mikado Branch Invariant. ## Goal end-state - Immich `server`, `machine-learning`, `valkey` on ringtail. - ML pod uses ringtail's RTX 4080 (performance win — currently CPU-only). - CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail. - Library still on sifaka NFS — ringtail mounts the same path. - `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress. - Minikube `immich` and `immich-pg` are removed. ## Cards \| Card \| Depends on \| \|---\|---\| \| `migrate-immich-to-ringtail` (goal) \| all six below \| \| `cnpg-on-ringtail` \| — \| \| `immich-pg-on-ringtail` \| cnpg-on-ringtail \| \| `immich-pg-data-migration` \| immich-pg-on-ringtail \| \| `sifaka-nfs-from-ringtail` \| — \| \| `immich-app-on-ringtail` \| immich-pg-on-ringtail, sifaka-nfs-from-ringtail \| \| `immich-cutover-and-decommission` \| immich-pg-data-migration, immich-app-on-ringtail \| ## Key constraints - No data loss. Downtime is acceptable; data loss is not. Two surfaces matter: postgres (ML embeddings, face data — slow to re-derive) and the library files (don't move, but NFS access from ringtail must be verified). - Migration method: Option A is a CNPG `externalCluster` basebackup → promote. Option B is `pg_dump`/`pg_restore` as a documented fallback. Either way, dry-run against a scratch cluster first. - Why pg moves too (not cross-cluster): keeping pg on minikube would block the whole decommission, and Immich is chatty with pg so tailnet round-trips would hurt. ## Test plan - [ ] Plan review — does the dependency graph make sense? - [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the chain correctly. - [ ] Per-card implementation cycles land separately (commit convention enforced by hook). Reviewed-on: #356	2026-05-13 16:46:17 -07:00
Erich Blume	dc0916a548	C0: shower — rebuild from main SHA (post-merge retag) PR #354 was squash-merged so the branch commit `444ff91` baked into the prior image tag isn't reachable from main's history. Rebuild from main HEAD (`3c7967e`) and retag. Image content is byte-identical (FOD is content-addressed, inputs unchanged); only the SHA in the tag changes so future provenance tracing stays on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 20:20:39 -07:00
Erich Blume	3c7967e445	C1: deploy shower v1.1.0 (phases + guest memories) (#354 ) ## Summary Deploys `adelaide-baby-shower-app` v1.1.0 to ringtail k3s. ### App changes (since v1.0.2) - Four-phase `ShowerState` replaces the boolean `locked` flag — `pre_event` → `party` → `prizes_locked` → `event_locked` — with a backfill migration that maps `locked=True → pre_event`, `locked=False → party`. - Guest memories: append-only photos + comments panel where guests can leave notes for the baby. Adds `GuestPhoto` + `GuestComment` models with file-extension validators and a max-size validator; new `shower.imaging` module for thumbnail generation. - Admin + QR polish: configurable host link, fixed "View Site" URL, guest-facing QR copy improvements, contest tweaks. Three Django migrations run automatically in the entrypoint against the SQLite PV: - `0009_shower_phase` - `0010_guest_memories` - `0011_book_description` No ConfigMap / env-var changes. The deploy uses `strategy: Recreate` with a single replica, so the old pod releases the data PVC before the new one mounts it and runs migrations. ### Container build changes The v1.1.0 tag exposed a latent issue with the Forgejo PyPI install path: - The recent commit [`2d38418e`](`2d38418e`) closed the forge package leak at the Fly edge by blocking `/api/packages/*` publicly. - Forgejo's PyPI simple index returns absolute file URLs hardcoded to its public `ROOT_URL` (`forge.eblu.me`), so pip-installing from the tailnet index URL still tries to download from `forge.eblu.me` → 403. - Previous shower builds escaped this because their FOD outputs were already in the nix store; bumping to a new version forced a fresh pip run that hit the block. Fix mirrors what we already do for the sdist: both wheel and sdist are pulled via direct `fetchurl` against `forge.ops.eblu.me`, then the wheel is copied to TMPDIR under its clean filename (nix store path's hash prefix breaks pip's wheel-filename parser) and handed to pip as a local path. The forge `--extra-index-url` is no longer needed. FOD outputHash pinned to `sha256-kTNOswobtkgyQmmqbQM8XO4vvaGg57nCuuZGbNXb0NM=` from run 547. Image: `registry.ops.eblu.me/blumeops/shower:v1.1.0-444ff91-nix`. ### Adjacent finding (already handled) The ringtail `gitea-runner-nix_container_builder` systemd unit was left `inactive` after the recent `provision-ringtail` (matches the known `sshd-restart-hangs-mux` lesson — the rebuild changed the unit's PATH closure + config.yaml, systemd stopped it, then the playbook hung before the activation could restart it). Manually started; the existing memory `lesson_provision_ringtail_ssh_hang.md` was extended to mention the runner as the canary service to check after provisions. ## Test plan - [ ] `argocd app diff shower --revision shower-v1.1.0` — review the manifest change - [ ] `argocd app set shower --revision shower-v1.1.0 && argocd app sync shower` - [ ] `kubectl --context=k3s-ringtail logs -n shower deploy/shower` — confirm migrations 0009/0010/0011 applied, no errors - [ ] Hit `https://shower.ops.eblu.me/` (tailnet) — splash page renders, phase indicator visible - [ ] Hit `https://shower.ops.eblu.me/host/` — host console loads, phase dropdown shows the four states - [ ] Hit `https://shower.eblu.me/` (public via Fly) — splash page still served - [ ] After merge: `argocd app set shower --revision main && argocd app sync shower` Reviewed-on: #354	2026-05-11 20:08:03 -07:00
Erich Blume	40d9a1ef9e	C0: shower — rebuild from main SHA (post-PR-349 retag) Standard squash-merge dance per docs/how-to/deployment/build-container-image.md#Squash-merge-and-container-tags — retags from v1.0.2-039d9b9-nix (branch SHA) to v1.0.2-292d354-nix ([main] tag from run 544 built off the merge commit). Functionally identical; preserves source traceability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 13:55:25 -07:00
Erich Blume	292d354902	C1: deploy adelaide-baby-shower-app to ringtail k3s (#349 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m12s Details ## Summary Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app). ### What's included - ArgoCD app + manifests under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer. - Public surface (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett. - fail2ban filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default). - Caddy route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`). - Pulumi Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`. - Grafana APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter. - Container `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages. - Docs runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments. ### Defense layers on the public surface 1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list) 2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP) 3. django-axes (5 fails / 1h, keyed on username+ip_address) 4. edge `/admin/` block (returns 403 for anything that isn't login/logout) ## Prerequisites for the user to do (NOT in this PR) Halted on these per request — they touch shared/manual systems: - [x] NFS share on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000` - [ ] 1Password item `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git - [ ] Container build: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix` - [x] DNS: `mise run dns-up` after merge - [x] Fly cert: `fly certs add shower.eblu.me -a blumeops-proxy` - [ ] Caddy push: `mise run provision-indri -- --tags caddy` - [ ] Fly redeploy to pick up the new nginx block + fail2ban jail: `mise run fly-deploy` - [ ] ArgoCD sync: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging ## Test plan - [ ] Container builds successfully on nix-container-builder runner - [ ] Pod starts, migrations run, gunicorn answers on :8000 - [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean - [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet) - [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification) - [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block) - [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response - [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead` - [ ] Grafana dashboard "Shower APM" appears and starts showing traffic - [ ] `mise run services-check` passes Reviewed-on: #349	2026-05-11 13:47:18 -07:00
Erich Blume	eceb2b99ce	C0: bump homepage image to fixed-perms build (v1.11.0-678f26b-nix) Pulls in `678f26b0` (chowned /app/config). Resolves the EACCES crash loop on ringtail.	2026-05-10 21:16:34 -07:00
Erich Blume	be54cc3411	C1: migrate homepage dashboard to ringtail k3s Repoint the ArgoCD Application destination from minikube to ringtail and bump the image tag to the new amd64 nix-built v1.11.0-b87f62e-nix. Rework services.yaml for the autodiscovery shift: 11 services that previously auto-populated via minikube Ingress annotations (ArgoCD, Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, Navidrome, Paperless, TeslaMate, Transmission) become explicit static entries with their widget configs preserved. Conversely, the ringtail services that will now auto-populate (Frigate/NVR, Authentik, Ntfy) are removed from the static list to avoid duplicates; Ollama becomes newly visible. Add a Content group for Immich/Kiwix/Miniflux which previously lived under the autodiscovered "Content" group from annotations.	2026-05-10 20:37:03 -07:00
Erich Blume	8bc19fa460	C0: tailscale main-SHA rebuild for ringtail proxyclass Routine post-squash-merge cleanup. Bumps the ProxyClass image tag from the now-orphaned PR branch SHA (`67af7a8`) to the merge commit SHA (`0108b68`) so the deployed image stays traceable after branch cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 06:52:39 -07:00
Erich Blume	0108b68769	C1: mirror tailscale container locally for ringtail proxyclass (#347 ) ## Summary Adds the first cut of a local nix build for `docker.io/tailscale/tailscale` and rewires only the ringtail tailscale-operator overlay to use it. Indri's overlay continues pulling upstream — minikube on indri is being decommissioned in favor of ringtail's k3s, so investing in dual-cluster routing here would be wasted churn. ## Changes - `containers/tailscale/default.nix` — `buildGoModule` over `cmd/tailscale`, `cmd/tailscaled`, `cmd/containerboot`; packaged via `dockerTools.buildLayeredImage` with `cacert`, `iptables` (legacy symlink to match upstream Synology compat), `iproute2`, `tzdata`, `busybox`. - `argocd/manifests/tailscale-operator-ringtail/kustomization.yaml` — kustomize `images:` rewrite swapping `docker.io/tailscale/tailscale` → `registry.ops.eblu.me/blumeops/tailscale:v1.94.2-67af7a8-nix`. - `docs/changelog.d/mirror-tailscale-container.infra.md` — fragment. ## Pin rationale v1.94.2 matches `service-versions.yaml:96` and the current ProxyClass exactly — this PR is "make it local," not "upgrade tailscale." Version bumps come as follow-up C0/C1 changes once we decide to test newer (v1.96.x had a Fly-side MagicDNS regression; v1.98.0 is current upstream stable). ## Test plan - [x] Image built successfully on ringtail nix-container-builder (run #528). - [x] Image visible in registry: `registry.ops.eblu.me/blumeops/tailscale:v1.94.2-67af7a8-nix`. - [ ] Deploy from branch: `argocd app set tailscale-operator-ringtail --revision mirror-tailscale-container && argocd app sync tailscale-operator-ringtail`. - [ ] Verify proxy pods restart with new image and existing tailnet ingresses (e.g., authentik, immich, tempo) keep resolving. - [ ] After merge: rebuild on main SHA, update kustomization, run `services-check`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #347	2026-05-06 06:50:31 -07:00
Erich Blume	2c0917b266	C0: valkey — bump kustomization tags to main-branch SHA Routine post-merge follow-up after #346. Branch SHA tag (`946fa75`) replaced with the main-SHA-built tag (`fabca04`) so paperless and immich reference an image traceable to a commit on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:47:16 -07:00
Erich Blume	fabca04771	Mirror valkey 8.1 locally for paperless and immich (#346 ) ## Summary - Add native Dagger build of valkey 8.1.6-r0 on Alpine 3.22 at `containers/valkey/` - Swap paperless redis sidecar and immich-valkey from `docker.io/valkey/valkey:8.1-alpine` to `registry.ops.eblu.me/blumeops/valkey:v8.1.6-r0-946fa75` - Resolves the DR-2026-04 TODO in paperless kustomization about multi-arch redis ## Why Move toward fully locally-built containers for supply chain control. Paperless and immich both pulled the same upstream tag — one mirror serves both. Authentik's nix-built Redis stays separate (different image entirely). ## Risk Low. Both sidecars are stateless caches: - paperless redis: no volumeMount (in-pod localhost, pure memory) - immich-valkey: `emptyDir` (cache only) Pod restart rebuilds the cache. Smoke-tested locally (PING/SET/GET roundtrip on `valkey 8.1.6` with `--bind 0.0.0.0 --protected-mode no`). ## Test plan - [ ] After merge: `mise run container-build-and-release valkey` to rebuild with main SHA - [ ] Update kustomizations to the `[main]` SHA tag (C0 follow-up) - [ ] `argocd app sync paperless` and `argocd app sync immich` - [ ] Verify pods come up healthy (paperless OCR queue functional, immich job queue functional) - [ ] `mise run services-check` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #346	2026-05-01 17:40:03 -07:00
Erich Blume	55563afc7e	C0: alloy — bump kustomization tags to main-branch SHA Per the build-container-image squash-merge convention, rebuild alloy v1.16.0 container images from the main SHA (`9564435`) and update the three alloy kustomizations to reference :v1.16.0-9564435[-nix] instead of the branch SHA :v1.16.0-26a3ab5[-nix] left over from #345. Both images were rebuilt locally on gilbert (dagger) and ringtail (nix) because indri is still under heavy macOS memory-compressor pressure (see separate ticket); CI on indri can't reliably run the dagger publish step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 08:31:27 -07:00
Erich Blume	9564435b11	Alloy V1.16.0 (#345 ) Bump Grafana Alloy v1.14.0 → v1.16.0 across all four services (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail; alloy native ansible). Also migrate the indri build path from `Dockerfile` to a native Dagger `container.py` per the build-container-image migration playbook. ## Highlights from upstream - v1.15: database observability promoted to stable, OTel Collector → v0.147.0 - v1.16: clustering for `loki.source.kubernetes_events`, MySQL exporter 0.19.0 - One pre-existing breaking change in v1.15 (`loki.source.awsfirehose` undocumented metric prefix rename) — not used here. ## Build infra Alloy v1.16.0's go.mod requires Go 1.26.2. The nix derivation now uses `pkgs.go_1_26` with `GOTOOLCHAIN=local` to avoid auto-downloading a toolchain blob that violated the fixed-output rule. ## Test plan - [ ] CI: `mise run container-build-and-release alloy --ref alloy-v1.16.0` (dispatched as run 522; nix job to be re-triggered with the v1.16.0 goModules outputHash once the local ringtail build surfaces it) - [ ] After CI green, bump `images[].newTag` in three kustomizations to the new `-<sha>` and `-<sha>-nix` tags, deploy from this branch via `argocd app set <app> --revision alloy-v1.16.0 && argocd app sync <app>` - [ ] Manual rebuild of macOS native binary on gilbert (per ansible/roles/alloy README) and `mise run provision-indri -- --tags alloy --check --diff` - [ ] `mise run services-check` after merge & redeploy Reviewed-on: #345	2026-05-01 08:05:37 -07:00
Erich Blume	5096223b48	C1: clean up cv + docs minikube artifacts (#343 ) ## Summary Follow-up to #342. The cv and docs services are now live on indri (Caddy file_server backed by ansible-managed tarball extraction) and verified working. This PR removes the dead minikube artifacts and the tooling shims that referenced them. ## Changes Deletions: - ``argocd/apps/{cv,docs}.yaml`` - ``argocd/manifests/{cv,docs}/`` (deployment, service, ingress, pdb, kustomization) - ``containers/{cv,quartz}/`` (Dockerfiles + start scripts) Tooling: - ``mise-tasks/container-version-check``: remove the ``quartz``→``docs`` CONTAINER_TO_SERVICE mapping (containers/quartz no longer exists) - ``service-versions.yaml``: bump ``docs.current-version`` to ``v1.16.0`` (the blumeops docs release tag) and trim the migration-window comment ## Live state context The argocd Applications ``cv`` and ``docs`` were already deleted from the cluster manually as part of the cutover; this PR just removes the YAML files that the ``apps`` app-of-apps was still ingesting. After merge, ``argocd app sync apps`` will reconcile and the ``apps`` Application returns to Synced. The Caddyfile ``handle_errors`` bug that briefly crashed all ``*.ops.eblu.me`` services during cutover is fixed in a separate C0 (``2ee53fe``) on main, not here. ## Test plan - [x] ``mise run container-version-check --all-files`` clean - [x] ``mise run service-review --type ansible`` shows cv at 1.0.3, docs at v1.16.0 - [ ] After merge: ``argocd app sync apps`` returns clean (cv/docs entries gone, no children to reconcile) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #343	2026-04-29 15:18:39 -07:00
Erich Blume	8d634861f6	C1: migrate cv + docs from minikube to indri-native (#342 ) ## Summary Replace the cv (`cv.eblu.me`) and docs (`docs.eblu.me`) minikube Deployments with indri-native ansible roles. Caddy serves the extracted release tarballs directly via a new `kind: static` service-block — no daemon, no nginx pod, no ProxyGroup ingress on the request path. Mirrors the rationale of the recent devpi migration; part of the broader minikube wind-down. ## What's in this commit - `ansible/roles/{cv,docs}` — sentinel-gated tarball download + extract into `~/{cv,docs}/content/` - `ansible/roles/caddy/` — new `kind: static` branch in the Caddyfile template (encoded gzip, immutable cache headers for fingerprinted assets, optional `try_html` for Quartz-style clean URLs, optional per-path `download_paths` for the resume PDF's `Content-Disposition`) - `ansible/playbooks/indri.yml` — wires `cv` and `docs` roles before `caddy` - `service-versions.yaml` — both services flip to `type: ansible`. `docs.current-version` stays at `1.28.2` for this commit so `container-version-check` keeps passing while `containers/quartz/Dockerfile` still exists; it moves to the docs release tag in the cleanup commit - `.forgejo/workflows/{cv-deploy,build-blumeops}.yaml` — deploy step now bumps `cv_version`/`docs_version` in the role defaults and pushes; running ansible + purging the Fly cache is manual from gilbert (matches devpi) - Docs: `docs/how-to/operations/{cv,docs}-on-indri.md`, updated `docs/reference/services/{cv,docs}.md`, changelog fragment ## What is not in this commit The dead artifacts. After PR review and successful cutover, a follow-up commit deletes: - `argocd/apps/{cv,docs}.yaml` and `argocd/manifests/{cv,docs}/` - `containers/cv/`, `containers/quartz/` - `CONTAINER_TO_SERVICE['quartz']` mapping in `mise-tasks/container-version-check` - bumps `docs.current-version` in `service-versions.yaml` to the release tag ## Cutover plan (manual, from gilbert, after review) 1. Take down old: - Remove the cv and docs Applications: `argocd app delete cv --cascade && argocd app delete docs --cascade` - Verify k8s namespaces gone: `kubectl --context=minikube-indri get ns \| grep -E '^(cv\|docs)\\b'` (should be empty) - Verify tailnet MagicDNS no longer advertises the VIPs: `nslookup cv.tail8d86e.ts.net` and `nslookup docs.tail8d86e.ts.net` should both fail 2. Bring up new: - `mise run provision-indri -- --tags cv,docs,caddy --check --diff` (already validated on branch) - `mise run provision-indri -- --tags cv,docs,caddy` - `fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"` 3. Verify: `mise run services-check` and the curl checks listed in `docs/how-to/operations/{cv,docs}-on-indri.md` 4. Cleanup commit + merge. Total expected downtime: minutes (not the few-hour budget you authorized). ## Test plan - [ ] `mise run provision-indri -- --tags cv,docs --check --diff` clean - [ ] `mise run provision-indri -- --tags caddy --check --diff` shows only the cv + docs blocks changing as previewed in the PR thread - [ ] After cutover: `cv.eblu.me`, `cv.ops.eblu.me`, `docs.eblu.me`, `docs.ops.eblu.me` all return 200 - [ ] `cv.eblu.me/resume.pdf` includes `Content-Disposition: attachment` - [ ] A clean Quartz URL (e.g. `docs.eblu.me/explanation/agent-change-process`) resolves to the right page - [ ] `mise run services-check` clean - [ ] `mise run service-review --type ansible` shows cv and docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #342	2026-04-29 14:55:11 -07:00
Erich Blume	14ca0160ba	Migrate devpi from minikube to indri (launchd) (#341 ) ## Summary Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent. ## What changed - New ansible role `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box). - Caddy: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`. - Playbook: `indri.yml` gains pre_tasks for the root password and the new role. - service-versions.yaml: devpi flipped from `type: argocd` to `type: ansible`. - ArgoCD: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted. - Docs: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list. ## Already deployed (live on indri) - Service running: `launchctl list mcquack.eblume.devpi` → PID 53888 - `curl https://pypi.ops.eblu.me/+api` returns 200 ✅ - `mise run docs-mikado` works again ✅ - 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/` - Minikube namespace and PVC fully reclaimed ## Test plan - [ ] `mise run services-check` (after merge) - [ ] CI workflows that use devpi succeed - [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines) ## Context This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain. Reviewed-on: #341	2026-04-29 13:38:36 -07:00
Erich Blume	4d76fd5de5	C0: prowler — rebuild image against main HEAD Squash-merge of #340 changed the SHA. Bump prowler tag from v5.23.0-2daf629 (PR branch) to v5.23.0-495e45d (main HEAD) so the Dockerfile changes are present in the image deployed off main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 10:49:27 -07:00
Erich Blume	495e45d01d	Address 6 critical Prowler IaC findings (mute + grafana RBAC tighten) (#340 ) ## Summary The weekly Prowler IaC scan reported 6 critical findings against `argocd/manifests/`. They split cleanly into two patterns: - Legitimate-by-design RBAC → mute with new compensating controls - `external-secrets-controller`, `external-secrets-cert-controller` manage `secrets` (KSV-0041) and the cert-controller mutates its own webhook configurations (KSV-0114). This is what the operator is for. New CC: `operator-purpose-bound-rbac`. - `kube-state-metrics` (both `minikube-indri` and `k3s-ringtail`) holds `list/watch` on secrets to expose `kube_secret_info` and `kube_secret_labels` metrics. KSM's metric schema only reads metadata, never the `data:` field. New CC: `kube-state-metrics-metadata-only`. - Over-broad RBAC → fix - `grafana-clusterrole` had `get/watch/list` on `secrets` because the dashboard-sidecar config used `RESOURCE=both` (ConfigMaps + Secrets). Nothing in the cluster labels Secrets with `grafana_dashboard=1`, so this was unused power. Switched both sidecar instances to `RESOURCE=configmap` and removed `secrets` from the ClusterRole. The IaC cronjob also did not previously pass `--mutelist-file`, which is why every IaC finding reported as unmuted regardless of mutelist configuration. The new `mutelist/iac.yaml` is bundled into the existing `prowler-mutelist` ConfigMap and mounted via `items:` selector. ## Test plan - [ ] `kubectl --context=minikube-indri kustomize argocd/manifests/prowler/` — already passes locally - [ ] `kubectl --context=minikube-indri kustomize argocd/manifests/grafana/` — already passes locally - [ ] Deploy from this branch via `argocd app set prowler --revision prowler-iac-mutelist && argocd app sync prowler` and same for `grafana` - [ ] Manually trigger the IaC cronjob and verify `MUTED=True` on the 6 critical findings (`kubectl --context=minikube-indri -n prowler create job --from=cronjob/prowler-iac-scan prowler-iac-test`) - [ ] Restart grafana pod and confirm dashboards still render (sidecar still finds them via ConfigMap watch) - [ ] After verify, `argocd app set <app> --revision main && argocd app sync <app>` post-merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #340	2026-04-29 10:43:32 -07:00
Erich Blume	7d94b9073a	C0: docs — default argocd login to --sso; drop extraneous --grpc-web Now that argocd's Authentik OAuth2 client is public, `argocd login --sso` works for day-to-day use. Promote it to the default in AGENTS.md, argocd-cli reference, and troubleshooting; keep the admin/password flow documented as a break-glass fallback for when Authentik is unavailable. Also drops --grpc-web from every interactive login command — confirmed extraneous (login succeeds without it). Left in CI workflows and `argocd cluster add` untouched; those are different contexts that I didn't re-test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:43:21 -07:00
Erich Blume	86317315ed	C0: remove argocd OIDC client_secret wiring Now that argocd's Authentik OAuth2 client is public (PKCE-only), the client_secret plumbing is dead code: - delete argocd-oidc-authentik ExternalSecret and drop it from kustomization - remove AUTHENTIK_ARGOCD_CLIENT_SECRET env from authentik-worker - remove argocd-client-secret mapping from authentik-config ExternalSecret The argocd-client-secret field in the 1Password "Authentik (blumeops)" item is now unreferenced and can be deleted there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:38:26 -07:00
Erich Blume	0e62ad5596	C0: argocd OIDC — switch to public client for CLI SSO Changes argocd's Authentik OAuth2 client from confidential to public and drops the clientSecret from argocd-cm. Public + PKCE works for both the web UI (argocd-server backend) and the argocd CLI (`argocd login --sso`) without a shared secret, matching OAuth 2.1 guidance. Confidential → public was needed because the CLI can't hold a client secret; Authentik's per-app issuer model made the alternative ("cliClientID" pattern with separate public client) awkward since it requires a shared issuer across apps which Authentik doesn't serve. Follow-up: deadcode AUTHENTIK_ARGOCD_CLIENT_SECRET env wiring and the argocd-oidc-authentik ExternalSecret once verified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:34:39 -07:00
Erich Blume	225b0e7008	C0: allow argocd CLI --sso localhost callback Adds http://localhost:8085/auth/callback to the ArgoCD OAuth2 provider's redirect_uris so `argocd login --sso` works. Loopback redirect is the RFC 8252 pattern for native CLI apps; PKCE (already enabled) covers the code-interception risk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:18:08 -07:00
Erich Blume	a9ef02a602	C0: bump frigate-notify to v0.5.4-e928054-nix (workdir fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 09:44:24 -07:00
Erich Blume	c88b6d773c	C0: point frigate-notify at local registry tag v0.5.4-fb4bf5a-nix Built from main in run #516 after #339 merged. Follows the navidrome kustomization convention (deployment image = local ref + :kustomized, kustomization override = newTag only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 09:31:29 -07:00
Erich Blume	fb32cc07c4	chore: repoint runner-job-image tag at CI-built v0.20.6-50f8c2a Swaps the k8s runner label from the local bootstrap tag (v0.20.6-9b6be09) to the equivalent image rebuilt by CI from main. Functionally identical; closes the bootstrap loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 08:38:33 -07:00
Erich Blume	50f8c2a33f	Roll k8s runner to runner-job-image v0.20.6-9b6be09 Points the k8s Forgejo runner label at the locally-bootstrapped runner-job-image built from the Alpine container.py on this branch. Once merged, CI will rebuild the same image from the same SHA. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 08:28:18 -07:00
Erich Blume	21177ff47f	chore: update forgejo-runner image tag	2026-04-20 09:11:37 -07:00
Erich Blume	1425bf1f5c	Upgrade forgejo-runner to v12.8, adopt server.connections, and clean up docs (#338 ) ## Summary - consolidate forgejo-runner how-to docs into current cards - upgrade the k8s forgejo-runner deployment to the latest v12.8.x runner image - switch the k8s runner from first-boot register flow to declarative server.connections config - keep the runner image on the native Dagger build path and update the surrounding manifests/secrets ## Notes - PR opened early for C1 review - implementation and deployment verification will follow in subsequent commits Reviewed-on: #338	2026-04-20 09:03:54 -07:00
Erich Blume	55abb17f50	Add resource limits to ArgoCD pods to prevent unbounded consumption All 7 ArgoCD containers had no resource limits, allowing them to consume unlimited CPU/memory during node pressure events. This contributed to cluster-wide probe timeout cascades on minikube-indri. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 13:04:27 -07:00
Forgejo Actions	bdfcb4b677	Update docs release to v1.16.0 - Built changelog from towncrier fragments [skip ci]	2026-04-18 10:00:54 -07:00
Erich Blume	c8da243663	Run alloy-tracing as root for eBPF capabilities The nix-built Alloy image sets User=65534 (nobody). Even with privileged: true, a non-root user gets no effective capabilities (CapEff=0). Override with runAsUser: 0 so Beyla gets CAP_BPF and CAP_SYS_ADMIN needed for eBPF instrumentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:42:26 -07:00
Forgejo Actions	a72a2c2bd4	Update docs release to v1.15.7 - Built changelog from towncrier fragments [skip ci]	2026-04-18 08:14:58 -07:00
Erich Blume	b4472c7849	Deploy devpi 6.19.3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:04:23 -07:00

1 2 3 4 5 ...

456 commits