Prowler's IaC provider hardcodes self._mutelist = None and delegates
filtering to Trivy, but doesn't plumb --ignorefile through. The original
attempt with --mutelist-file silently no-op'd. Add a wrapper around
trivy in our image that injects --ignorefile $TRIVY_IGNOREFILE on `fs`
subcommands; switch the IaC cronjob to mount a Trivy-format
trivyignore.yaml and set the env var.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six critical IaC findings against argocd/manifests/ broke into two
patterns: legitimate-by-design RBAC (mute) and over-broad RBAC (fix).
Plumbing:
- cronjob-iac-scan.yaml now passes --mutelist-file (previously
unused, which is why all IaC findings reported as unmuted)
- new mutelist/iac.yaml is bundled into the prowler-mutelist
ConfigMap and mounted into the IaC cronjob via items: selector
Compensating controls (in compensating-controls.yaml):
- operator-purpose-bound-rbac — external-secrets-operator's whole
function is to manage Secret objects; ClusterRole over secrets
matches its purpose. cert-controller mutates its own validating
webhooks to inject a rotating CA bundle.
- kube-state-metrics-metadata-only — KSM exposes only Secret
metadata via kube_secret_info / kube_secret_labels; the data
field is never read into exposed metrics.
Mutes (mutelist/iac.yaml):
- KSV-0041 for external-secrets/rbac.yaml,
kube-state-metrics/rbac.yaml,
kube-state-metrics-ringtail/rbac.yaml
- KSV-0114 for external-secrets/rbac.yaml
Real fix:
- grafana-clusterrole no longer reads secrets. The dashboard sidecar
(RESOURCE=both → configmap, both init and watch instances) only
needs ConfigMap-labeled dashboards; no Secrets are labeled
grafana_dashboard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously only the K8s CIS in-cluster scan was processed; the weekly
container-image and IaC Prowler scans were running on schedule but never
reviewed. Now each scan gets its own status / severity / week-over-week
delta, with top-N grouped tables (by check ID and resource) for the
high-volume image and IaC outputs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The signed offset format read as "due in 5 days" rather than
"5 days overdue", causing misreads. Switch to self-explanatory
text: "5d overdue" / "due in 2d" / "due today".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Claude Code only auto-loads CLAUDE.md. The prose shim told agents to go
read AGENTS.md, which is easy to skip. Replacing the shim with
`@AGENTS.md` inlines AGENTS.md content into the session prompt, so the
startup rules (ai-docs, blumeops-tasks, change classification) land in
context unconditionally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend (not replace) home-manager's default sway keybindings via
lib.mkOptionDefault, with lib.mkForce on the custom overrides that
conflict with defaults. Add Mod+F1 cheatsheet binding (fuzzel-filterable).
Move fuzzel's border-radius/border-width out of [main] into a proper
[border] section with the expected short names.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After dispatching, poll the Forgejo API for the run matching our
head_sha and print `mise run runner-logs <N>` so the suggested monitor
command is one copy-paste away. Falls back to the bare command if the
poll times out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
- Mirrors `github.com/0x2142/frigate-notify` at `v0.5.4` to `forge.ops.eblu.me/mirrors/frigate-notify`.
- Adds `containers/frigate-notify/default.nix` — `buildGoModule` + `dockerTools.buildLayeredImage`, following the `ntfy` pattern.
- Uses `-tags goolm` to avoid the libolm CGO dependency (matrix notifier is imported unconditionally in the upstream but we only use ntfy alerts).
- Runs as nonroot (UID 65534), exposes port 8000, bundles `cacert`/`tzdata`.
## Why
Move `ghcr.io/0x2142/frigate-notify:v0.5.4` (ringtail-deployed) under local control. Aligns with the [[indri → ringtail migration plan]] and the `default.nix` convention for ringtail-targeted containers documented in [[build-container-image]].
## Verification
- `dagger call build-nix --src=. --container-name=frigate-notify export --path=./out.tar.gz` produces a valid 20MB docker archive (10 layers) with `blumeops/frigate-notify` tag locally.
- Hashes pinned for `fetchgit` (src) and `vendorHash` (go modules).
## Follow-up (post-merge)
1. `mise run container-build-and-release frigate-notify` — release from main SHA.
2. C0 follow-up: update `argocd/manifests/frigate/kustomization.yaml` image ref to `registry.ops.eblu.me/blumeops/frigate-notify:v0.5.4-<sha>-nix`.
3. ArgoCD auto-syncs the deployment.
## Test plan
- [ ] `dagger call build-nix` succeeds from a clean checkout.
- [ ] `mise run container-build-and-release frigate-notify --dry-run` looks correct.
- [ ] After release + kustomization swap: frigate-notify pod comes up healthy on ringtail; ntfy alerts still fire on Frigate events.
Reviewed-on: #339
Bumps the Dagger engine/CLI from v0.20.1 to v0.20.6 (mise pin, dagger.json
engineVersion, SDK regen) and rewrites the runner-job-image container as a
native Dagger pipeline on Alpine 3.23 using the shared alpine_runtime helper,
replacing the Debian-based Dockerfile. All Forgejo Actions in this repo use
actions/checkout (a JS action), so musl is not a compatibility concern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
- consolidate forgejo-runner how-to docs into current cards
- upgrade the k8s forgejo-runner deployment to the latest v12.8.x runner image
- switch the k8s runner from first-boot register flow to declarative server.connections config
- keep the runner image on the native Dagger build path and update the surrounding manifests/secrets
## Notes
- PR opened early for C1 review
- implementation and deployment verification will follow in subsequent commits
Reviewed-on: #338
Forgejo's web action routes don't support API token auth for private
repos (only session cookies or public access). Switch log fetching to
read the zstd-compressed log files directly from indri via SSH —
Forgejo stores all runner logs on disk regardless of which runner
executed the job.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
runner-logs now always authenticates with the Forgejo API token
(via --token flag, FORGEJO_TOKEN env, or 1Password) so it works on
private repos. The --repo default is auto-detected from the git
remote origin URL instead of being hardcoded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The static asset cache block (css/js/png/etc) was missing
proxy_set_header Host, so Caddy received "forge.eblu.me" instead of
"forge.ops.eblu.me" and couldn't route the request. HTML loaded fine
because the main location / block had the header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 7 ArgoCD containers had no resource limits, allowing them to consume
unlimited CPU/memory during node pressure events. This contributed to
cluster-wide probe timeout cascades on minikube-indri.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive docs pass reflecting the new Fly proxy architecture:
- Fly proxy routes through Caddy on indri (not per-service TS Ingress)
- Direct WireGuard peering via --port=41641 pinning
- DERP relay performance lesson in Tailscale docs
- Caddy now in public traffic path
- indri tagged as flyio-target
- Removed fly-reload references
- Updated architecture diagrams and per-service setup guide
- Added changelog fragment
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Queries the Forgejo API to verify the target commit exists on the remote
before dispatching a build, preventing wasted CI runs on unpushed commits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Dockerfile with container.py for native Dagger builds.
Bump devpi-server 6.19.1→6.19.3, devpi-web 5.0.1→5.0.2.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools
- Reuses TLS connections through the Tailscale tunnel instead of handshaking per request
- Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS)
## Trade-off
DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this.
## Still TODO on this branch
- [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder)
- [ ] Docs pass
- [ ] Deploy from branch and verify latency improvement
- [ ] Changelog fragment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #337
Replaces the hand-written Dockerfile with container.py using the shared
alpine_runtime helper, which bumps the base image from Alpine 3.22 to 3.23.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The xdg desktop entry and Librewolf user.js prefs didn't fix the
OAuth callback hang. Try stock Firefox instead as a simpler path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shared Dagger helpers (src/blumeops/) affect all Dagger-built containers,
making path-based auto-triggers unreliable. All builds now go through
`mise run container-build-and-release <name>`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend go_build() with buildmode and extra_env params, migrate miniflux
and forgejo-runner to use it, and bump all Alpine bases from 3.22 to 3.23.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Dockerfiles with native container.py for both transmission and
transmission-exporter. Updates base images (Alpine 3.23, Python 3.14),
pins uv to 0.11.6 instead of :latest.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LaunchAgents now call borgmatic directly at its mise-installed path
instead of routing through `mise x`, which triggered macOS TCC
permission dialogs (e.g. "mise wants to access Documents") that hung
headless sessions and caused backup failures.
Also adds `mise install` to the ansible role so borgmatic installation
is fully managed, and pins the version in both mise.toml and the role
defaults.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Upgrade Prowler from 5.22.0 to 5.23.0
- Remove the `enumerate-images` init container workaround from `cronjob-image-scan.yaml`
- Use native `--registry` and `--image-filter` flags now that upstream fix (PR prowler-cloud/prowler#10470) is released
The init container was a workaround for prowler-cloud/prowler#10457 where `--registry` args weren't forwarded to the provider constructor. We wrote the fix, it was merged, and v5.23.0 includes it.
## Test plan
- [ ] Build new container (`mise run container-release prowler 5.23.0`)
- [ ] Update kustomization.yaml with new image tag
- [ ] Sync prowler ArgoCD app from branch
- [ ] Manually trigger image scan job and verify `--registry` works natively
- [ ] Verify CIS and IaC scan cronjobs still work
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #336
Removed Grafana from the control description — no Prowler finding
references it. Tightened scope to match actual usage (ArgoCD wildcard
RBAC mute). Added workflow-bot scoping note.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Adds automated node-level verification to `review-compliance-reports`: kubelet file perms/ownership, kubelet config args, etcd CA separation, RBAC cluster-admin bindings
- Mutes the 14 MANUAL Prowler findings via new `manual-node-checks.yaml` mutelist file
- New `node-config-automated-verification` compensating control documents the approach
- Script fails loudly (red FAIL + verdict panel) if any check deviates from expected values
## Test plan
- [x] `mise run review-compliance-reports` — all 12 node checks PASS
- [x] Injected bad expected value (perms 400 vs actual 600) — FAIL rendered correctly
- [x] Fixed colon-in-binding-name bug (kubeadm:cluster-admins) with tab-separated jsonpath
- [ ] After merge: sync prowler mutelist ConfigMap and verify next scan shows 0 MANUAL findings
## Note
Prowler coverage is minikube-indri only — ringtail/k3s is a known gap tracked separately.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #335
## Summary
- Add native Dagger `container.py` for forgejo-runner (Go + Alpine runtime, static binary with CGO for SQLite)
- Update kustomization to point to local registry image (tag is placeholder until CI builds)
- Uses existing `clone_from_forge("forgejo-runner", ...)` mirror
## Test plan
- [x] `dagger call build --src=. --container-name=forgejo-runner` passes locally
- [ ] CI container build from branch succeeds
- [ ] Update kustomization tag to built image, deploy from branch via ArgoCD `--revision`
- [ ] Verify runner registers and picks up jobs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #334
The Dagger engine's internal OTLP proxy returns 500 on /v1/metrics when
there's no real backend, causing ~9s retry warnings per pipeline step.
Point OTEL_EXPORTER_OTLP_ENDPOINT at Tempo to give it a real endpoint.
Also removes the stale os.environ workaround from main.py (the SDK
initializes telemetry before our module loads, so it had no effect).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Dagger engine shim sets OTEL_METRICS_EXPORTER before our module
loads, so os.environ.setdefault was a no-op. Switch to a hard override.
Remove the redundant workflow-level env var since the fix belongs in
the module.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Upgrade grafana-sidecar from 1.28.0 to 2.6.0 (the 2.x memory regression #462 is resolved; ~35MB static overhead is acceptable)
- Port build from Dockerfile to native Dagger container.py
- Add liveness/readiness probes using the new /healthz endpoint on port 8080
- Update docs to reflect container.py migration and remove stale pin note
## Test plan
- [ ] Build container: `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization tag with new image tag
- [ ] Deploy from branch: `argocd app set grafana --revision grafana-sidecar-2.6.0 && argocd app sync grafana`
- [ ] Verify sidecar health endpoint: `kubectl exec -n monitoring <pod> -c grafana-sc-dashboard -- wget -qO- http://localhost:8080/healthz`
- [ ] Verify dashboards load in Grafana UI
- [ ] `mise run services-check`
Reviewed-on: #332
Outdated leaf card removed; zot.md now links to new service-versions
reference card instead. Added reverse link from review-services.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace broken SSH+filesystem log retrieval with Forgejo web API
endpoint. Fix CLI to use run numbers (not task IDs), add --repo
for querying any forge repo (e.g. sporks), --limit/-n for listing
size. Document runner-logs as the way to verify build success in
CLAUDE.md and container build docs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops`
- Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step
- Migrate navidrome as the first container (`containers/navidrome/container.py`)
- Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding)
- Add `dagger call container-version` for CI version extraction without Dockerfile parsing
- All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode
- Legacy `docker_build()` fallback preserved for all other containers
## Motivation
When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline.
## Container build dispatch needed
After merge, dispatch container build for navidrome:
```
mise run container-build-and-release navidrome --ref 470b4bd
```
## Deploy steps
1. Wait for container build to complete
2. Back up navidrome-data PVC (non-reversible DB migrations)
3. `argocd app set navidrome --revision main && argocd app sync navidrome`
4. Verify at https://dj.ops.eblu.me
## Future
Remaining containers migrate incrementally in follow-up PRs using the same pattern.
Reviewed-on: #330
Add flyio-tailscale (v1.94.1), flyio-nginx (1.29.6-alpine), and
flyio-alloy (v1.14.1) entries with new `fly` service type so future
upgrades go through the service-review workflow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>