Commit graph

603 commits

Author SHA1 Message Date
678f26b0e7 C0: fix homepage container /app/config write permissions
The previous Dockerfile chowned /app/config to 1000:1000 so the runtime
user could seed missing skeleton configs (e.g. proxmox.yaml) and write
/app/config/logs. The nix derivation didn't replicate that, so the new
amd64 image crashed with EACCES on cold start (fixed-forward — caught
during ringtail cutover, ArgoCD #348).

Add fakeRootCommands to dockerTools to create /app and /app/config and
chown them at build time. The deployment's ConfigMap subPath mounts
leave the parent directory as image filesystem, so its ownership has to
be set at build time, not at runtime.
2026-05-10 20:49:22 -07:00
be54cc3411 C1: migrate homepage dashboard to ringtail k3s
Repoint the ArgoCD Application destination from minikube to ringtail and
bump the image tag to the new amd64 nix-built v1.11.0-b87f62e-nix.

Rework services.yaml for the autodiscovery shift: 11 services that
previously auto-populated via minikube Ingress annotations (ArgoCD,
Immich, Kiwix, Mealie, Miniflux, Grafana, Prometheus, Navidrome,
Paperless, TeslaMate, Transmission) become explicit static entries with
their widget configs preserved. Conversely, the ringtail services that
will now auto-populate (Frigate/NVR, Authentik, Ntfy) are removed from
the static list to avoid duplicates; Ollama becomes newly visible.

Add a Content group for Immich/Kiwix/Miniflux which previously lived
under the autodiscovered "Content" group from annotations.
2026-05-10 20:37:03 -07:00
8bc19fa460 C0: tailscale main-SHA rebuild for ringtail proxyclass
Routine post-squash-merge cleanup. Bumps the ProxyClass image tag from
the now-orphaned PR branch SHA (67af7a8) to the merge commit SHA
(0108b68) so the deployed image stays traceable after branch cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:52:39 -07:00
0108b68769 C1: mirror tailscale container locally for ringtail proxyclass (#347)
## Summary

Adds the first cut of a local nix build for `docker.io/tailscale/tailscale` and rewires only the ringtail tailscale-operator overlay to use it. Indri's overlay continues pulling upstream — minikube on indri is being decommissioned in favor of ringtail's k3s, so investing in dual-cluster routing here would be wasted churn.

## Changes

- `containers/tailscale/default.nix` — `buildGoModule` over `cmd/tailscale`, `cmd/tailscaled`, `cmd/containerboot`; packaged via `dockerTools.buildLayeredImage` with `cacert`, `iptables` (legacy symlink to match upstream Synology compat), `iproute2`, `tzdata`, `busybox`.
- `argocd/manifests/tailscale-operator-ringtail/kustomization.yaml` — kustomize `images:` rewrite swapping `docker.io/tailscale/tailscale` → `registry.ops.eblu.me/blumeops/tailscale:v1.94.2-67af7a8-nix`.
- `docs/changelog.d/mirror-tailscale-container.infra.md` — fragment.

## Pin rationale

v1.94.2 matches `service-versions.yaml:96` and the current ProxyClass exactly — this PR is "make it local," not "upgrade tailscale." Version bumps come as follow-up C0/C1 changes once we decide to test newer (v1.96.x had a Fly-side MagicDNS regression; v1.98.0 is current upstream stable).

## Test plan

- [x] Image built successfully on ringtail nix-container-builder (run #528).
- [x] Image visible in registry: `registry.ops.eblu.me/blumeops/tailscale:v1.94.2-67af7a8-nix`.
- [ ] Deploy from branch: `argocd app set tailscale-operator-ringtail --revision mirror-tailscale-container && argocd app sync tailscale-operator-ringtail`.
- [ ] Verify proxy pods restart with new image and existing tailnet ingresses (e.g., authentik, immich, tempo) keep resolving.
- [ ] After merge: rebuild on main SHA, update kustomization, run `services-check`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #347
2026-05-06 06:50:31 -07:00
6f0d80ca1e C0: doc review — index.md, add ringtail to infra overview
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 06:14:40 -07:00
24e5490259 C0: review CC init-container-isolation — defer retirement to post-ringtail
Runtime grafana pod matches the manifest and the CC's claim; bumped
last-reviewed. Noted that retiring init-chown-data in favor of fsGroup
alone should wait until grafana migrates to ringtail's k3s, since the
storage backend will change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:31:13 -07:00
074887cd57 C0: docs — explanation article on compliance mute categories
Captures the CC vs NA vs RA distinction surfaced during the 2026-05-03
weekly compliance review (CVE-2026-31789), and the image-scan mutelist
gap that blocks acting on it. Links the new article from the
review-compensating-controls how-to so it isn't orphaned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:19:53 -07:00
9fb5442ccd C0: kiwix — doc review, fix Adding Archives source path
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:46:16 -07:00
f16e1c81f1 C0: zot — upgrade indri registry to v2.1.16
Security fixes only (TLS verification on metrics client, CORS
Allow-Credentials suppression on wildcard origin, manifest/API-key
body-size limits, dependabot bumps). No config changes required;
re-built from source on indri and bounced launchagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:41:07 -07:00
a2c61b625d C0: rotate-fly-deploy-token — fish+bash one-shot, op validator gotcha
Combine mint+store into a single command with both fish and bash
forms (the doc previously only showed manual paste). Document the
1Password CLI "Password item requires ps value" validator error and
the placeholder-password workaround for Password-category items with
empty primary password fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:42:57 -07:00
2c0917b266 C0: valkey — bump kustomization tags to main-branch SHA
Routine post-merge follow-up after #346. Branch SHA tag (946fa75) replaced
with the main-SHA-built tag (fabca04) so paperless and immich reference an
image traceable to a commit on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:47:16 -07:00
fabca04771 Mirror valkey 8.1 locally for paperless and immich (#346)
## Summary

- Add native Dagger build of valkey 8.1.6-r0 on Alpine 3.22 at `containers/valkey/`
- Swap paperless redis sidecar and immich-valkey from `docker.io/valkey/valkey:8.1-alpine` to `registry.ops.eblu.me/blumeops/valkey:v8.1.6-r0-946fa75`
- Resolves the DR-2026-04 TODO in paperless kustomization about multi-arch redis

## Why

Move toward fully locally-built containers for supply chain control. Paperless and immich both pulled the same upstream tag — one mirror serves both. Authentik's nix-built Redis stays separate (different image entirely).

## Risk

Low. Both sidecars are stateless caches:
- paperless redis: no volumeMount (in-pod localhost, pure memory)
- immich-valkey: `emptyDir` (cache only)

Pod restart rebuilds the cache. Smoke-tested locally (PING/SET/GET roundtrip on `valkey 8.1.6` with `--bind 0.0.0.0 --protected-mode no`).

## Test plan

- [ ] After merge: `mise run container-build-and-release valkey` to rebuild with main SHA
- [ ] Update kustomizations to the `[main]` SHA tag (C0 follow-up)
- [ ] `argocd app sync paperless` and `argocd app sync immich`
- [ ] Verify pods come up healthy (paperless OCR queue functional, immich job queue functional)
- [ ] `mise run services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #346
2026-05-01 17:40:03 -07:00
f84f5f02b3 C0: review compensating control trusted-ci-only
Verified Forgejo runner is registered only to forge.ops.eblu.me and the
forge has registration disabled, so no untrusted users can trigger
privileged CI. Tightened notes to reflect the closed-forge mechanism
(not a per-repo allow-list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:49:22 -07:00
4aa0872949 C0: review ollama doc — refresh image, models, last-reviewed
Bumped documented image tag to 0.20.4 (matches kustomization newTag),
added the two qwen3.5 models from models.txt, and stamped the card.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:42:33 -07:00
2d55303213 C0: alloy native macOS on indri — upgrade to v1.16.0
Completes the v1.16.0 fleet upgrade for the fourth alloy service
(type: ansible, built from source on indri). Binary built on gilbert
with Go 1.26.2 + CGO, scp'd to indri, codesigned, LaunchAgent reloaded.
Service reports clean WAL replay and resumed metric/log shipping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:36:38 -07:00
55563afc7e C0: alloy — bump kustomization tags to main-branch SHA
Per the build-container-image squash-merge convention, rebuild alloy v1.16.0
container images from the main SHA (9564435) and update the three alloy
kustomizations to reference :v1.16.0-9564435[-nix] instead of the branch
SHA :v1.16.0-26a3ab5[-nix] left over from #345.

Both images were rebuilt locally on gilbert (dagger) and ringtail (nix)
because indri is still under heavy macOS memory-compressor pressure (see
separate ticket); CI on indri can't reliably run the dagger publish step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:31:27 -07:00
9564435b11 Alloy V1.16.0 (#345)
Bump Grafana Alloy v1.14.0 → v1.16.0 across all four services (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail; alloy native ansible). Also migrate the indri build path from `Dockerfile` to a native Dagger `container.py` per the build-container-image migration playbook.

## Highlights from upstream
- v1.15: database observability promoted to stable, OTel Collector → v0.147.0
- v1.16: clustering for `loki.source.kubernetes_events`, MySQL exporter 0.19.0
- One pre-existing breaking change in v1.15 (`loki.source.awsfirehose` undocumented metric prefix rename) — not used here.

## Build infra
Alloy v1.16.0's go.mod requires Go 1.26.2. The nix derivation now uses `pkgs.go_1_26` with `GOTOOLCHAIN=local` to avoid auto-downloading a toolchain blob that violated the fixed-output rule.

## Test plan
- [ ] CI: `mise run container-build-and-release alloy --ref alloy-v1.16.0` (dispatched as run 522; nix job to be re-triggered with the v1.16.0 goModules outputHash once the local ringtail build surfaces it)
- [ ] After CI green, bump `images[].newTag` in three kustomizations to the new `-<sha>` and `-<sha>-nix` tags, deploy from this branch via `argocd app set <app> --revision alloy-v1.16.0 && argocd app sync <app>`
- [ ] Manual rebuild of macOS native binary on gilbert (per ansible/roles/alloy README) and `mise run provision-indri -- --tags alloy --check --diff`
- [ ] `mise run services-check` after merge & redeploy

Reviewed-on: #345
2026-05-01 08:05:37 -07:00
f6e392b80c C1: SHA-pin tooling dependencies (2026-04 cycle) (#344)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m45s
## Summary

Monthly tooling dependency refresh, with a one-time conversion from version-tag pins (`rev = "vX.Y.Z"`, `image:tag`, `>=`) to SHA / digest pins everywhere.

## Changes

- **prek hooks**: all `rev = "vX.Y.Z"` → commit SHA + `# vX.Y.Z` comment. Bumped trufflehog (3.94.0→3.95.2), kingfisher (1.91.0→1.97.0), ruff (0.15.7→0.15.12), shfmt (3.13.0→3.13.1), prettier (3.8.1→3.8.3), actionlint (1.7.11→1.7.12).
- **fly/Dockerfile**: tag pins → `image@sha256:...` digest pins. Bumped nginx (1.29.6→1.30.0-alpine), tailscale (v1.94.1→v1.94.2 — still inside the safe pre-1.96.5 range), alloy (v1.14.1→v1.16.0).
- **mise-tasks**: PEP 723 inline deps converted from `>=` to `==` (PEP 508 doesn't support hashes inline). All scripts pinned to current latest: rich 15.0.0, typer 0.25.0, pyyaml 6.0.3, httpx 0.28.1.
- **prek `additional_dependencies`**: ansible-lint==26.4.0, ansible-core==2.20.5.
- **taplo-lint**: pass `--no-schema`. Upstream's `--default-schema-catalogs` returns a format taplo v0.9.3 can't parse — we don't validate against TOML schemas anyway, so this turns off the broken catalog fetch.
- **docs/update-tooling-dependencies**: documents the SHA-pin convention, `docker buildx imagetools inspect` for digest lookup, and `prek clean` before re-verifying (cache grows to several GiB).

Forgejo workflow `actions/checkout@v6.0.2` was already at the latest SHA — no change.

## Test plan

- [x] `prek run --all-files` passes after `prek clean`
- [x] `deploy-fly` workflow builds and deploys the new fly image on merge
- [x] `fly status -a blumeops-proxy` healthy after deploy
- [x] Spot-check a few mise tasks (`mise run blumeops-tasks`, `mise run docs-check-links`) to confirm pinned deps resolve cleanly

Reviewed-on: #344
2026-04-30 16:51:43 -07:00
5096223b48 C1: clean up cv + docs minikube artifacts (#343)
## Summary

Follow-up to #342. The cv and docs services are now live on indri (Caddy file_server backed by ansible-managed tarball extraction) and verified working. This PR removes the dead minikube artifacts and the tooling shims that referenced them.

## Changes

**Deletions:**
- ``argocd/apps/{cv,docs}.yaml``
- ``argocd/manifests/{cv,docs}/`` (deployment, service, ingress, pdb, kustomization)
- ``containers/{cv,quartz}/`` (Dockerfiles + start scripts)

**Tooling:**
- ``mise-tasks/container-version-check``: remove the ``quartz``→``docs`` CONTAINER_TO_SERVICE mapping (containers/quartz no longer exists)
- ``service-versions.yaml``: bump ``docs.current-version`` to ``v1.16.0`` (the blumeops docs release tag) and trim the migration-window comment

## Live state context

The argocd Applications ``cv`` and ``docs`` were already deleted from the cluster manually as part of the cutover; this PR just removes the YAML files that the ``apps`` app-of-apps was still ingesting. After merge, ``argocd app sync apps`` will reconcile and the ``apps`` Application returns to Synced.

The Caddyfile ``handle_errors`` bug that briefly crashed all ``*.ops.eblu.me`` services during cutover is fixed in a separate C0 (``2ee53fe``) on main, not here.

## Test plan
- [x] ``mise run container-version-check --all-files`` clean
- [x] ``mise run service-review --type ansible`` shows cv at 1.0.3, docs at v1.16.0
- [ ] After merge: ``argocd app sync apps`` returns clean (cv/docs entries gone, no children to reconcile)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #343
2026-04-29 15:18:39 -07:00
8d634861f6 C1: migrate cv + docs from minikube to indri-native (#342)
## Summary

Replace the cv (`cv.eblu.me`) and docs (`docs.eblu.me`) minikube Deployments with indri-native ansible roles. Caddy serves the extracted release tarballs directly via a new `kind: static` service-block — no daemon, no nginx pod, no ProxyGroup ingress on the request path. Mirrors the rationale of the recent devpi migration; part of the broader minikube wind-down.

## What's in this commit

- `ansible/roles/{cv,docs}` — sentinel-gated tarball download + extract into `~/{cv,docs}/content/`
- `ansible/roles/caddy/` — new `kind: static` branch in the Caddyfile template (encoded gzip, immutable cache headers for fingerprinted assets, optional `try_html` for Quartz-style clean URLs, optional per-path `download_paths` for the resume PDF's `Content-Disposition`)
- `ansible/playbooks/indri.yml` — wires `cv` and `docs` roles before `caddy`
- `service-versions.yaml` — both services flip to `type: ansible`. `docs.current-version` stays at `1.28.2` for this commit so `container-version-check` keeps passing while `containers/quartz/Dockerfile` still exists; it moves to the docs release tag in the cleanup commit
- `.forgejo/workflows/{cv-deploy,build-blumeops}.yaml` — deploy step now bumps `cv_version`/`docs_version` in the role defaults and pushes; running ansible + purging the Fly cache is manual from gilbert (matches devpi)
- Docs: `docs/how-to/operations/{cv,docs}-on-indri.md`, updated `docs/reference/services/{cv,docs}.md`, changelog fragment

## What is not in this commit

The dead artifacts. After PR review and successful cutover, a follow-up commit deletes:

- `argocd/apps/{cv,docs}.yaml` and `argocd/manifests/{cv,docs}/`
- `containers/cv/`, `containers/quartz/`
- `CONTAINER_TO_SERVICE['quartz']` mapping in `mise-tasks/container-version-check`
- bumps `docs.current-version` in `service-versions.yaml` to the release tag

## Cutover plan (manual, from gilbert, after review)

1. **Take down old:**
   - Remove the cv and docs Applications: `argocd app delete cv --cascade && argocd app delete docs --cascade`
   - Verify k8s namespaces gone: `kubectl --context=minikube-indri get ns | grep -E '^(cv|docs)\\b'` (should be empty)
   - Verify tailnet MagicDNS no longer advertises the VIPs: `nslookup cv.tail8d86e.ts.net` and `nslookup docs.tail8d86e.ts.net` should both fail
2. **Bring up new:**
   - `mise run provision-indri -- --tags cv,docs,caddy --check --diff` (already validated on branch)
   - `mise run provision-indri -- --tags cv,docs,caddy`
   - `fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"`
3. **Verify:** `mise run services-check` and the curl checks listed in `docs/how-to/operations/{cv,docs}-on-indri.md`
4. **Cleanup commit + merge.**

Total expected downtime: minutes (not the few-hour budget you authorized).

## Test plan
- [ ] `mise run provision-indri -- --tags cv,docs --check --diff` clean
- [ ] `mise run provision-indri -- --tags caddy --check --diff` shows only the cv + docs blocks changing as previewed in the PR thread
- [ ] After cutover: `cv.eblu.me`, `cv.ops.eblu.me`, `docs.eblu.me`, `docs.ops.eblu.me` all return 200
- [ ] `cv.eblu.me/resume.pdf` includes `Content-Disposition: attachment`
- [ ] A clean Quartz URL (e.g. `docs.eblu.me/explanation/agent-change-process`) resolves to the right page
- [ ] `mise run services-check` clean
- [ ] `mise run service-review --type ansible` shows cv and docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #342
2026-04-29 14:55:11 -07:00
a529d60f60 C0: remove containers/devpi/ build artifact
Devpi now runs natively on indri (uv venv via ansible role), so the
Dagger container build at containers/devpi/ is unused. Removing it.

Also updated dagger.md examples to use 'miniflux' as the example
container-name argument.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:40:45 -07:00
14ca0160ba Migrate devpi from minikube to indri (launchd) (#341)
## Summary

Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent.

## What changed

- **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box).
- **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`.
- **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role.
- **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`.
- **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted.
- **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list.

## Already deployed (live on indri)

- Service running: `launchctl list mcquack.eblume.devpi` → PID 53888
- `curl https://pypi.ops.eblu.me/+api` returns 200 
- `mise run docs-mikado` works again 
- 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/`
- Minikube namespace and PVC fully reclaimed

## Test plan

- [ ] `mise run services-check` (after merge)
- [ ] CI workflows that use devpi succeed
- [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines)

## Context

This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain.

Reviewed-on: #341
2026-04-29 13:38:36 -07:00
f4a24595b1 C0: review CC ephemeral-privileged-jobs
Verified TTL=604800s and hostPID limited to ephemeral Prowler CronJob
on indri. Noted that alloy-tracing on ringtail also uses hostPID but
is out of scope until Prowler scans ringtail (tracked in Todoist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 11:09:34 -07:00
817acc5e5e C0: transmission doc — review and correct storage/monitoring details
Marked last-reviewed: 2026-04-29. Fixed the storage layout table —
`/config/` is an emptyDir (ephemeral), not NFS, and the watch directory
is disabled. Documented the transmission-exporter sidecar that exposes
Prometheus metrics on port 19091.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 11:00:01 -07:00
4d76fd5de5 C0: prowler — rebuild image against main HEAD
Squash-merge of #340 changed the SHA. Bump prowler tag from
v5.23.0-2daf629 (PR branch) to v5.23.0-495e45d (main HEAD) so the
Dockerfile changes are present in the image deployed off main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:49:27 -07:00
495e45d01d Address 6 critical Prowler IaC findings (mute + grafana RBAC tighten) (#340)
## Summary

The weekly Prowler IaC scan reported 6 critical findings against `argocd/manifests/`. They split cleanly into two patterns:

- **Legitimate-by-design RBAC → mute with new compensating controls**
  - `external-secrets-controller`, `external-secrets-cert-controller` manage `secrets` (KSV-0041) and the cert-controller mutates its own webhook configurations (KSV-0114). This is what the operator is *for*. New CC: `operator-purpose-bound-rbac`.
  - `kube-state-metrics` (both `minikube-indri` and `k3s-ringtail`) holds `list/watch` on secrets to expose `kube_secret_info` and `kube_secret_labels` metrics. KSM's metric schema only reads metadata, never the `data:` field. New CC: `kube-state-metrics-metadata-only`.

- **Over-broad RBAC → fix**
  - `grafana-clusterrole` had `get/watch/list` on `secrets` because the dashboard-sidecar config used `RESOURCE=both` (ConfigMaps + Secrets). Nothing in the cluster labels Secrets with `grafana_dashboard=1`, so this was unused power. Switched both sidecar instances to `RESOURCE=configmap` and removed `secrets` from the ClusterRole.

The IaC cronjob also did not previously pass `--mutelist-file`, which is why every IaC finding reported as unmuted regardless of mutelist configuration. The new `mutelist/iac.yaml` is bundled into the existing `prowler-mutelist` ConfigMap and mounted via `items:` selector.

## Test plan

- [ ] `kubectl --context=minikube-indri kustomize argocd/manifests/prowler/` — already passes locally
- [ ] `kubectl --context=minikube-indri kustomize argocd/manifests/grafana/` — already passes locally
- [ ] Deploy from this branch via `argocd app set prowler --revision prowler-iac-mutelist && argocd app sync prowler` and same for `grafana`
- [ ] Manually trigger the IaC cronjob and verify `MUTED=True` on the 6 critical findings (`kubectl --context=minikube-indri -n prowler create job --from=cronjob/prowler-iac-scan prowler-iac-test`)
- [ ] Restart grafana pod and confirm dashboards still render (sidecar still finds them via ConfigMap watch)
- [ ] After verify, `argocd app set <app> --revision main && argocd app sync <app>` post-merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #340
2026-04-29 10:43:32 -07:00
718e0a0043 C0: review-compliance-reports — summarize image and IaC scans
Previously only the K8s CIS in-cluster scan was processed; the weekly
container-image and IaC Prowler scans were running on schedule but never
reviewed. Now each scan gets its own status / severity / week-over-week
delta, with top-N grouped tables (by check ID and resource) for the
high-volume image and IaC outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:18:06 -07:00
c9eb188e05 C0: blumeops-tasks — replace ambiguous due:+N with "Nd overdue"
The signed offset format read as "due in 5 days" rather than
"5 days overdue", causing misreads. Switch to self-explanatory
text: "5d overdue" / "due in 2d" / "due today".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:49:46 -07:00
4a37ffcdc2 C0: CLAUDE.md — import AGENTS.md instead of redirecting to it
Claude Code only auto-loads CLAUDE.md. The prose shim told agents to go
read AGENTS.md, which is easy to skip. Replacing the shim with
`@AGENTS.md` inlines AGENTS.md content into the session prompt, so the
startup rules (ai-docs, blumeops-tasks, change classification) land in
context unconditionally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:41:13 -07:00
f9d9e00057 C0: blumeops-tasks — show due offset + recurrence, sort by overdue-ness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:18:16 -07:00
005e2a03ed C0: split gandi-operations docs; add dns-acme-cleanup mise task
Splits the nebulous gandi-operations how-to into two single-topic cards
(manage-eblu-me-dns, rotate-gandi-pat) and adds a mise task for the
recurring _acme-challenge TXT cleanup needed due to a value-comparison
bug in libdns/gandi v1.1.0 that prevents certmagic's cleanup phase from
removing presented TXT values.

The gandi reference card is updated to drop the false "different
credential from Pulumi PAT" claim — verified during the 2026-04-27
incident that Caddy and Pulumi share a single PAT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 09:48:46 -07:00
72b27b7fd2 C0: docs — add mealie borg restore how-to
Captures the procedure used to restore mealie's SQLite DB from a borgmatic
archive after the post-DR wipe: extract from borg, snapshot the wiped DB,
swap via a helper pod on the ReadWriteOnce PVC, fix UID 911 ownership.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:04:28 -07:00
Erich Blume
34fa2ef28a C0: ringtail — restore sway default keybindings, fix fuzzel border config
Extend (not replace) home-manager's default sway keybindings via
lib.mkOptionDefault, with lib.mkForce on the custom overrides that
conflict with defaults. Add Mod+F1 cheatsheet binding (fuzzel-filterable).

Move fuzzel's border-radius/border-width out of [main] into a proper
[border] section with the expected short names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 12:16:02 -07:00
7d94b9073a C0: docs — default argocd login to --sso; drop extraneous --grpc-web
Now that argocd's Authentik OAuth2 client is public, `argocd login --sso`
works for day-to-day use. Promote it to the default in AGENTS.md,
argocd-cli reference, and troubleshooting; keep the admin/password flow
documented as a break-glass fallback for when Authentik is unavailable.

Also drops --grpc-web from every interactive login command — confirmed
extraneous (login succeeds without it). Left in CI workflows and
`argocd cluster add` untouched; those are different contexts that I
didn't re-test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:43:21 -07:00
e6a6a6042e C0: suggest mise run runner-logs in container-build-and-release
After dispatching, poll the Forgejo API for the run matching our
head_sha and print `mise run runner-logs <N>` so the suggested monitor
command is one copy-paste away. Falls back to the bare command if the
poll times out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:12:00 -07:00
fb4bf5a7a3 Add frigate-notify nix container build (#339)
## Summary
- Mirrors `github.com/0x2142/frigate-notify` at `v0.5.4` to `forge.ops.eblu.me/mirrors/frigate-notify`.
- Adds `containers/frigate-notify/default.nix` — `buildGoModule` + `dockerTools.buildLayeredImage`, following the `ntfy` pattern.
- Uses `-tags goolm` to avoid the libolm CGO dependency (matrix notifier is imported unconditionally in the upstream but we only use ntfy alerts).
- Runs as nonroot (UID 65534), exposes port 8000, bundles `cacert`/`tzdata`.

## Why
Move `ghcr.io/0x2142/frigate-notify:v0.5.4` (ringtail-deployed) under local control. Aligns with the [[indri → ringtail migration plan]] and the `default.nix` convention for ringtail-targeted containers documented in [[build-container-image]].

## Verification
- `dagger call build-nix --src=. --container-name=frigate-notify export --path=./out.tar.gz` produces a valid 20MB docker archive (10 layers) with `blumeops/frigate-notify` tag locally.
- Hashes pinned for `fetchgit` (src) and `vendorHash` (go modules).

## Follow-up (post-merge)
1. `mise run container-build-and-release frigate-notify` — release from main SHA.
2. C0 follow-up: update `argocd/manifests/frigate/kustomization.yaml` image ref to `registry.ops.eblu.me/blumeops/frigate-notify:v0.5.4-<sha>-nix`.
3. ArgoCD auto-syncs the deployment.

## Test plan
- [ ] `dagger call build-nix` succeeds from a clean checkout.
- [ ] `mise run container-build-and-release frigate-notify --dry-run` looks correct.
- [ ] After release + kustomization swap: frigate-notify pod comes up healthy on ringtail; ntfy alerts still fire on Frigate events.

Reviewed-on: #339
2026-04-21 09:28:02 -07:00
30f39ae050 Review contributing tutorial: add last-reviewed, .ai.md fragment type, prek provenance
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:53:41 -07:00
db8fd946ae Bump Dagger to 0.20.6 and migrate runner-job-image to Alpine container.py
Bumps the Dagger engine/CLI from v0.20.1 to v0.20.6 (mise pin, dagger.json
engineVersion, SDK regen) and rewrites the runner-job-image container as a
native Dagger pipeline on Alpine 3.23 using the shared alpine_runtime helper,
replacing the Debian-based Dockerfile. All Forgejo Actions in this repo use
actions/checkout (a JS action), so musl is not a compatibility concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:28:18 -07:00
1425bf1f5c Upgrade forgejo-runner to v12.8, adopt server.connections, and clean up docs (#338)
## Summary
- consolidate forgejo-runner how-to docs into current cards
- upgrade the k8s forgejo-runner deployment to the latest v12.8.x runner image
- switch the k8s runner from first-boot register flow to declarative server.connections config
- keep the runner image on the native Dagger build path and update the surrounding manifests/secrets

## Notes
- PR opened early for C1 review
- implementation and deployment verification will follow in subsequent commits

Reviewed-on: #338
2026-04-20 09:03:54 -07:00
353e2785c3 docs: review zot oidc client card 2026-04-20 07:55:25 -07:00
53a7374ac1 C0: drop fix-ntfy-nix-version mikado card
Historical one-shot fix from the zot hardening chain — knowledge is
self-evident in containers/ntfy/default.nix and container-version-check
regex. Should have been removed at mikado finalization. Scrubbed the two
wiki-link references in add-container-version-sync-check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 07:26:53 -07:00
51a878cddb C0: review navidrome reference doc 2026-04-18 20:25:19 -07:00
deedeecef9 C0: adopt AGENTS.md as canonical agent config 2026-04-18 20:15:30 -07:00
71c1c453d6 Fetch job logs via SSH to indri instead of Forgejo web endpoint
Forgejo's web action routes don't support API token auth for private
repos (only session cookies or public access). Switch log fetching to
read the zstd-compressed log files directly from indri via SSH —
Forgejo stores all runner logs on disk regardless of which runner
executed the job.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 17:08:46 -07:00
4f5a963ef6 Add API token auth and git remote detection to runner-logs
runner-logs now always authenticates with the Forgejo API token
(via --token flag, FORGEJO_TOKEN env, or 1Password) so it works on
private repos. The --repo default is auto-detected from the git
remote origin URL instead of being hardcoded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 16:39:21 -07:00
1d62653871 Fix forge.eblu.me static assets by adding missing Host header
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m26s
The static asset cache block (css/js/png/etc) was missing
proxy_set_header Host, so Caddy received "forge.eblu.me" instead of
"forge.ops.eblu.me" and couldn't route the request. HTML loaded fine
because the main location / block had the header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 16:00:56 -07:00
55abb17f50 Add resource limits to ArgoCD pods to prevent unbounded consumption
All 7 ArgoCD containers had no resource limits, allowing them to consume
unlimited CPU/memory during node pressure events. This contributed to
cluster-wide probe timeout cascades on minikube-indri.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 13:04:27 -07:00
Forgejo Actions
bdfcb4b677 Update docs release to v1.16.0
- Built changelog from towncrier fragments

[skip ci]
2026-04-18 10:00:54 -07:00
d26a6ae3b2 Update docs for Caddy routing and direct WireGuard peering
Comprehensive docs pass reflecting the new Fly proxy architecture:
- Fly proxy routes through Caddy on indri (not per-service TS Ingress)
- Direct WireGuard peering via --port=41641 pinning
- DERP relay performance lesson in Tailscale docs
- Caddy now in public traffic path
- indri tagged as flyio-target
- Removed fly-reload references
- Updated architecture diagrams and per-service setup guide
- Added changelog fragment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 09:57:30 -07:00
Forgejo Actions
a72a2c2bd4 Update docs release to v1.15.7
- Built changelog from towncrier fragments

[skip ci]
2026-04-18 08:14:58 -07:00