Commit graph

404 commits

Author SHA1 Message Date
d7af004842 Add Forgejo metrics + upstream latency histogram to Fly proxy dashboard
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m53s
- Enable Forgejo /metrics endpoint (app.ini [metrics] section)
- Add Alloy scrape target for Forgejo metrics on indri
- Add upstream_response_time histogram to Fly proxy Alloy config
- Replace single p95 panel with p50/p90/p99 + upstream breakdown
  filtered to forge.eblu.me host

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:05:59 -07:00
0a98f76068 Update kiwix-serve to Dagger-built container (Alpine 3.23)
Points kustomization at v3.8.2-7a42aeb, the first image built from the
new container.py (replacing the Dockerfile).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 14:27:42 -07:00
5ec2411e20 Update navidrome, miniflux, forgejo-runner image tags to Alpine 3.23 builds [main]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:37:30 -07:00
fb1e8ff672 Deploy transmission containers from Dagger builds
Update kustomization image tags to the new container.py-built images
(v4.1.1-r1-2c483ce, v1.0.1-2c483ce).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:34:28 -07:00
30ed018fd8 Update prowler image tag to v5.23.0-7c1cd11 [main]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:51:26 -07:00
7c1cd11e45 Upgrade Prowler to 5.23.0, remove registry workaround (#336)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (prowler) (push) Successful in 36s
## Summary

- Upgrade Prowler from 5.22.0 to 5.23.0
- Remove the `enumerate-images` init container workaround from `cronjob-image-scan.yaml`
- Use native `--registry` and `--image-filter` flags now that upstream fix (PR prowler-cloud/prowler#10470) is released

The init container was a workaround for prowler-cloud/prowler#10457 where `--registry` args weren't forwarded to the provider constructor. We wrote the fix, it was merged, and v5.23.0 includes it.

## Test plan

- [ ] Build new container (`mise run container-release prowler 5.23.0`)
- [ ] Update kustomization.yaml with new image tag
- [ ] Sync prowler ArgoCD app from branch
- [ ] Manually trigger image scan job and verify `--registry` works natively
- [ ] Verify CIS and IaC scan cronjobs still work

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #336
2026-04-14 13:45:28 -07:00
be30668eef Automate Prowler MANUAL finding verification (#335)
## Summary
- Adds automated node-level verification to `review-compliance-reports`: kubelet file perms/ownership, kubelet config args, etcd CA separation, RBAC cluster-admin bindings
- Mutes the 14 MANUAL Prowler findings via new `manual-node-checks.yaml` mutelist file
- New `node-config-automated-verification` compensating control documents the approach
- Script fails loudly (red FAIL + verdict panel) if any check deviates from expected values

## Test plan
- [x] `mise run review-compliance-reports` — all 12 node checks PASS
- [x] Injected bad expected value (perms 400 vs actual 600) — FAIL rendered correctly
- [x] Fixed colon-in-binding-name bug (kubeadm:cluster-admins) with tab-separated jsonpath
- [ ] After merge: sync prowler mutelist ConfigMap and verify next scan shows 0 MANUAL findings

## Note
Prowler coverage is minikube-indri only — ringtail/k3s is a known gap tracked separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #335
2026-04-14 13:00:44 -07:00
Forgejo Actions
8c2f035e6d Update docs release to v1.15.6
- Built changelog from towncrier fragments

[skip ci]
2026-04-14 11:46:42 -07:00
Forgejo Actions
f2514a6f02 Update docs release to v1.15.5
- Built changelog from towncrier fragments

[skip ci]
2026-04-14 11:29:27 -07:00
9d85c97b9b Update forgejo-runner kustomization tag to main-branch image
C0 follow-up: switch from branch-built tag to main-built v12.7.3-0e93cc0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:10:36 -07:00
0e93cc08b4 Build forgejo-runner container locally (#334)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dagger (forgejo-runner) (push) Successful in 1m21s
## Summary
- Add native Dagger `container.py` for forgejo-runner (Go + Alpine runtime, static binary with CGO for SQLite)
- Update kustomization to point to local registry image (tag is placeholder until CI builds)
- Uses existing `clone_from_forge("forgejo-runner", ...)` mirror

## Test plan
- [x] `dagger call build --src=. --container-name=forgejo-runner` passes locally
- [ ] CI container build from branch succeeds
- [ ] Update kustomization tag to built image, deploy from branch via ArgoCD `--revision`
- [ ] Verify runner registers and picks up jobs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #334
2026-04-14 11:06:36 -07:00
ccaef4c1a7 Document devpi cold cache failure mode and deploy teslamate v3.0.0-08c698e
After a DR rebuild, devpi's empty cache causes race conditions under
concurrent load — metadata is served but wheel files 404. Also deploys
the first container.py-built teslamate image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:38:06 -07:00
2d2d495f95 Fix paperless redis: use upstream valkey instead of amd64-only nix image
The authentik-redis image is nix-built on ringtail (amd64 only) and was
previously running under QEMU emulation on arm64 minikube. Discovered
during DR recovery when fresh minikube lacked binfmt registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:48:20 -07:00
db6d8af8b1 Update grafana-sidecar image tag to v2.6.0-61fcd5d (merge build)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 08:02:39 -07:00
61fcd5d70a Upgrade grafana-sidecar 1.28.0 → 2.6.0 + container.py port (#332)
All checks were successful
Build Container / detect (push) Successful in 4s
Build Container / build-dagger (grafana-sidecar) (push) Successful in 1m50s
## Summary

- Upgrade grafana-sidecar from 1.28.0 to 2.6.0 (the 2.x memory regression #462 is resolved; ~35MB static overhead is acceptable)
- Port build from Dockerfile to native Dagger container.py
- Add liveness/readiness probes using the new /healthz endpoint on port 8080
- Update docs to reflect container.py migration and remove stale pin note

## Test plan

- [ ] Build container: `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization tag with new image tag
- [ ] Deploy from branch: `argocd app set grafana --revision grafana-sidecar-2.6.0 && argocd app sync grafana`
- [ ] Verify sidecar health endpoint: `kubectl exec -n monitoring <pod> -c grafana-sc-dashboard -- wget -qO- http://localhost:8080/healthz`
- [ ] Verify dashboards load in Grafana UI
- [ ] `mise run services-check`

Reviewed-on: #332
2026-04-13 07:57:13 -07:00
a18ec9d958 Update miniflux to main image tag, disable OTEL metrics in Dagger module
Point miniflux kustomization at the main-built v2.2.19-138e23d image
(replacing the branch tag). Disable the OTLP metrics exporter at module
import time to prevent ~11s retry delays in CI — the env var must be set
inside the module, not the runner shell, because the SDK runs inside the
Dagger engine container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:59:32 -07:00
138e23d525 Miniflux 2.2.19 + container.py migration + ty typechecker (#331)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (miniflux) (push) Successful in 1m3s
## Summary

- Upgrade miniflux from 2.2.17 to 2.2.19 (security hardening, performance improvements)
- Migrate miniflux from Dockerfile to native Dagger container.py build
- Refactor `alpine_runtime()` helper to support existing users (nobody/65534)
- Add `ty` (Astral) Python typechecker to prek hooks

## Test plan

- [ ] `dagger call build --src=. --container-name=miniflux` succeeds
- [ ] `dagger call container-version --container-name=miniflux` returns 2.2.19
- [ ] `mise run container-version-check` passes
- [ ] `ty check` passes cleanly
- [ ] `prek run --all-files` passes
- [ ] CI builds container successfully
- [ ] Miniflux healthcheck passes after deploy from branch

Reviewed-on: #331
2026-04-12 08:54:32 -07:00
94c937d588 Disable OTLP metrics exporter in CI, update navidrome to main tag
The Dagger Python SDK's OTLP metrics exporter hits a non-functional
local endpoint (500s), burning ~9s per retry cycle. Set
OTEL_METRICS_EXPORTER=none in the build-dagger CI job.

Also update navidrome kustomization to the main-SHA tag (c86b5d7).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:26:25 -07:00
c86b5d7772 Native Dagger container builds + Navidrome v0.61.1 (#330)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dagger (navidrome) (push) Successful in 22m26s
## Summary
- Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops`
- Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step
- Migrate navidrome as the first container (`containers/navidrome/container.py`)
- Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding)
- Add `dagger call container-version` for CI version extraction without Dockerfile parsing
- All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode
- Legacy `docker_build()` fallback preserved for all other containers

## Motivation
When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline.

## Container build dispatch needed
After merge, dispatch container build for navidrome:
```
mise run container-build-and-release navidrome --ref 470b4bd
```

## Deploy steps
1. Wait for container build to complete
2. Back up navidrome-data PVC (non-reversible DB migrations)
3. `argocd app set navidrome --revision main && argocd app sync navidrome`
4. Verify at https://dj.ops.eblu.me

## Future
Remaining containers migrate incrementally in follow-up PRs using the same pattern.

Reviewed-on: #330
2026-04-11 17:11:56 -07:00
5757df115d Upgrade ollama from 0.17.5 to 0.20.4
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 06:42:05 -07:00
22fc615a28 Update paperless image tag to main build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:01:02 -07:00
07f52e9488 Deploy Paperless-ngx document management (#328)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dockerfile (paperless) (push) Successful in 9s
## Summary

- Add paperless-ngx (v2.20.13) as a new ArgoCD-managed service on indri
- Dockerfile built from forge mirror (`mirrors/paperless-ngx`), multi-stage with s6-overlay
- PostgreSQL database via `blumeops-pg` CNPG cluster, Redis sidecar for Celery
- NFS document storage on sifaka (`/volume1/paperless`)
- Authentik OIDC SSO via baked JSON blob from 1Password
- Caddy route at `paperless.ops.eblu.me`
- 1Password item "Paperless (blumeops)" created with all secrets

## Files

- `containers/paperless/Dockerfile` — multi-stage build
- `argocd/manifests/paperless/` — full k8s manifest set
- `argocd/apps/paperless.yaml` — ArgoCD application
- `argocd/manifests/databases/` — CNPG role + ExternalSecret
- `ansible/roles/caddy/defaults/main.yml` — Caddy route
- `service-versions.yaml` — version tracking entry
- `docs/reference/services/paperless.md` — reference card

## Remaining deploy steps

1. Build container: `mise run container-build-and-release paperless`
2. Update kustomization.yaml `newTag` with actual image tag
3. Create Authentik application/provider for paperless
4. Create `paperless` database on blumeops-pg
5. Sync ArgoCD apps, then sync paperless from branch
6. Provision Caddy: `mise run provision-indri -- --tags caddy`
7. Verify at https://paperless.ops.eblu.me

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #328
2026-04-08 17:54:12 -07:00
22b77ac141 Fix Frigate preview config and services-check NoData detection
preview.quality was at the top level (invalid); moved under record
with a valid preset (very_low). Also fix services-check to catch
Grafana "Alerting (NoData)" state which was silently passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:12:42 -07:00
ec63d560f3 Deploy authentik 2026.2.2 container to ringtail
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 10:56:50 -07:00
0366a0346b Set Frigate preview quality to CRF 8 for faster timeline loading
Previews are ~4MB/hour at default quality (CRF 1), served over NFS from
sifaka. Reducing to CRF 8 shrinks preview files to improve review page
load times when scrubbing older footage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:43:43 -07:00
936d29bbe1 Fix UnPoller dashboard UIDs exceeding Grafana 12's 40-char limit
Strip redundant "unifi-poller-" prefix from generated slugs, bringing
UIDs from 45-48 chars down to 32-35 chars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 07:03:39 -07:00
3c894e659d Pin kube-state-metrics to main-SHA container tags
C0 follow-up to #327: update from branch-SHA tags to main-SHA tags
after squash-merge rebuild.

indri: v2.18.0-f59f885
ringtail: v2.18.0-f59f885-nix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:10:14 -07:00
f59f8859dc Localize kube-state-metrics container (Dockerfile + nix) (#327)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dockerfile (kube-state-metrics) (push) Successful in 5s
Build Container / build-nix (kube-state-metrics) (push) Successful in 7s
## Summary

- Build kube-state-metrics v2.18.0 locally from forge mirror, replacing upstream `registry.k8s.io` image
- Dockerfile (two-stage Go build) for indri/minikube
- default.nix (buildGoModule + buildLayeredImage) for ringtail/k3s
- Both kustomization files updated with `newName` pointing to local registry

## Verification

- [x] Nix build succeeded on ringtail (`nix-build` → 10-layer image)
- [x] Dockerfile build succeeded locally (`dagger call build` → ~2min)
- [x] `container-version-check --all-files` passes (2.18.0 consistent across Dockerfile, nix, service-versions.yaml)
- [ ] CI builds container images from this branch
- [ ] Update kustomization `newTag` with SHA-tagged version from CI
- [ ] ArgoCD sync on both clusters

## Test plan

- Trigger CI build: `mise run container-build-and-release kube-state-metrics`
- Verify tags: `mise run container-list kube-state-metrics`
- Update newTag in kustomization files with CI-produced tag
- Sync ArgoCD on indri: `argocd app sync kube-state-metrics`
- Sync ArgoCD on ringtail: `argocd app sync kube-state-metrics --context=k3s-ringtail` (note: argocd uses its own auth, not kubectl context)
- Verify metrics still flowing to Prometheus

Reviewed-on: #327
2026-04-07 16:09:25 -07:00
84eda0301f Bump authentik worker memory limit 1Gi → 2Gi (OOMKilled after ringtail restart)
Worker forks 4 Dramatiq processes each loading the full Django app
(~250MB each), hitting the 1Gi limit on startup. Ringtail has ample
RAM headroom.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:39:29 -07:00
efae404d1e Remove superuser from teslamate PG role, transfer extension ownership
teslamate had superuser on the shared blumeops-pg cluster (which also
hosts miniflux and authentik). Downgraded to plain database owner with
extension ownership (cube, earthdistance) transferred manually so it
can still ALTER EXTENSION UPDATE. earthdistance is untrusted in PG so
DROP+CREATE would need temporary superuser escalation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:36:39 -07:00
1fd8aae8f6 Upgrade ArgoCD v3.3.2 → v3.3.6, SHA-pin install manifest
Patch upgrade with bug fixes (diff normalization, installation ID cache).
Pin the upstream manifest URL to commit SHA for supply chain integrity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 08:21:11 -07:00
18fe172a54 Add seccomp RuntimeDefault profiles to alloy-k8s and immich pods
Resolves 4 unmuted Prowler core_seccomp_profile_docker_default
findings on alloy, immich-server, immich-machine-learning, and
immich-valkey.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:21:23 -07:00
Forgejo Actions
370a3574b2 Update docs release to v1.15.4
- Built changelog from towncrier fragments

[skip ci]
2026-04-06 07:53:54 -07:00
c7e5af6d51 Migrate 1Password Connect from Helm to kustomize (1.8.1 → 1.8.2) (#326)
## Summary

- Renders manifests from `connect-helm-charts v2.4.1` as plain kustomize (deployment + service)
- Bumps 1Password Connect from 1.8.1 → 1.8.2
- Completes the no-helm-policy migration — all services now use kustomize
- Retains all production hardening from the Helm chart (securityContext, runAsNonRoot, drop ALL, seccomp, resource limits)

## Changes

- **New:** `deployment.yaml`, `service.yaml`, `kustomization.yaml` in `argocd/manifests/1password-connect/`
- **Rewritten:** Both ArgoCD app definitions (indri + ringtail) — single source kustomize instead of multi-source Helm
- **Deleted:** `values.yaml` (Helm values no longer needed)
- **Updated:** `no-helm-policy.md`, `service-versions.yaml`, `README.md`

## Deployment plan

1. Sync `apps` app to pick up the new app definitions
2. `argocd app set 1password-connect --revision 1password-connect-kustomize`
3. `argocd app sync 1password-connect` — verify on indri
4. Repeat for ringtail
5. After merge: reset revision to main, re-sync both

## Test plan

- [ ] `kubectl kustomize` renders cleanly (verified locally)
- [ ] ArgoCD diff shows expected changes (Helm labels removed, images bumped)
- [ ] Pods come up healthy on indri
- [ ] External Secrets still resolves 1Password items
- [ ] Repeat on ringtail

Reviewed-on: #326
2026-04-06 07:31:40 -07:00
Forgejo Actions
facb803010 Update docs release to v1.15.3
- Built changelog from towncrier fragments

[skip ci]
2026-04-05 21:24:25 -07:00
5597e02467 Fix Homepage pod-selector for Immich (Helm labels → kustomize labels)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 12:12:48 -07:00
64200a55c5 Migrate Immich from Helm chart to kustomize manifests (v2.5.6 → v2.6.3)
Replace the Helm chart deployment with plain kustomize manifests following
the Authentik pattern (separate deployments per component). Consolidate
the immich-storage ArgoCD app into the main immich app. Add no-helm-policy
doc establishing kustomize as the standard deployment mechanism.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:42:25 -07:00
464e3222d2 Document upstream fix for Prowler --registry bug (pending release)
PR #10470 merged 2026-03-30; initContainer workaround stays until a
Prowler release includes the fix (latest is 5.22.0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 20:21:19 -07:00
306f580bdb Point Tempo at main-built container v2.10.3-75f9ba4
C0 follow-up: update tag from branch-built image to main-SHA image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 13:45:57 -07:00
75f9ba4943 Build Tempo container from source (2.10.3) (#323)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dockerfile (tempo) (push) Successful in 6s
## Summary
- Add `containers/tempo/Dockerfile` — two-stage Go build from forge mirror, modeled on loki
- Switch kustomization from upstream `grafana/tempo` to `registry.ops.eblu.me/blumeops/tempo`
- Bump Tempo 2.10.1 → 2.10.3

## Test plan
- [ ] Kick off container build via `mise run container-build-and-release tempo`
- [ ] Update kustomization `newTag` with built image tag
- [ ] Deploy from branch: `argocd app set tempo --revision local-tempo-container && argocd app sync tempo`
- [ ] Verify Tempo health: `curl tempo.ops.eblu.me/ready`
- [ ] Verify traces flowing in Grafana Tempo datasource

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #323
2026-04-02 13:45:02 -07:00
b1e2811077 Upgrade Grafana 12.3.3 → 12.4.2 (#322)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dockerfile (grafana) (push) Successful in 7s
## Summary

- Bumps Grafana from 12.3.3 to 12.4.2
- Patches 7 CVEs, notably CVE-2026-27880 (unauthenticated OOM DoS, CVSS 7.5) and CVE-2026-27879 (authenticated OOM via resample queries)
- No config changes required — reviewed alerting, datasources, OIDC, and feature toggles against 12.4.x breaking changes

## Breaking changes reviewed

| Change | Impact |
|--------|--------|
| Alerting: pending period applies to NoData/Error | Net positive — reduces noise from transient blips |
| Default notification uses empty receiver | No impact — we explicitly set `ntfy-infra` |
| Removed feature toggles (4) | No impact — none configured |
| OAuth ID token signature validation | Low risk — verify OIDC login post-deploy |
| OpsGenie deprecated | No impact — using webhook |

## Test plan

- [ ] Container build completes at forge
- [ ] Update kustomization.yaml with new image tag
- [ ] `argocd app set grafana --revision upgrade/grafana-12.4.2 && argocd app sync grafana`
- [ ] Verify Grafana UI loads at grafana.ops.eblu.me
- [ ] Verify OIDC login via Authentik
- [ ] Verify dashboards and datasources load
- [ ] Check alerting rules are intact

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #322
2026-04-02 11:33:19 -07:00
Forgejo Actions
2b7b21dc9b Update docs release to v1.15.2
- Built changelog from towncrier fragments

[skip ci]
2026-03-30 17:48:40 -07:00
4059b3d27b Add compensating controls framework and date-based report dirs (#320)
## Summary

- Add `compensating-controls.yaml` tracking 9 named controls that justify suppressed security findings
- Update all Prowler mutelist descriptions with `CC: <id>` references to named controls
- Add `mise run review-compensating-controls` task — surfaces stalest control with all codebase references
- Add [[review-compensating-controls]] how-to doc
- Organize Prowler and Kingfisher reports into `YYYY-MM-DD` subdirectories

### Compensating controls

| ID | Mitigates |
|----|-----------|
| `single-user-cluster` | Image cache abuse, RBAC breadth, system pod privileges |
| `tailscale-network-isolation` | Profiling endpoints, weak TLS, debug ports |
| `local-registry` | AlwaysPullImages gap |
| `sso-gated-admin-tools` | ArgoCD wildcard RBAC |
| `operator-managed-pods` | Tailscale proxy pod security settings |
| `ephemeral-privileged-jobs` | Prowler hostPID exposure |
| `trusted-ci-only` | Forgejo runner DinD |
| `init-container-isolation` | Grafana root init container |
| `observability-stack-audit` | Missing apiserver audit logging |

## Test plan

- [ ] `mise run review-compensating-controls` shows table and references
- [ ] `kubectl kustomize argocd/manifests/prowler/` renders correctly
- [ ] Sync prowler and kingfisher, verify next scan writes to dated subdirectory
- [ ] Grep for `CC:` in mutelist files — every muted finding should have at least one

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #320
2026-03-30 17:44:11 -07:00
a76e471d54 Add Prowler mutelist and fix kube-state-metrics seccomp (#319)
## Summary

- Add mutelist files to suppress expected/accepted Prowler CIS findings from components we don't control
- Mutelist files stored in `mutelist/` directory, grouped by category, merged at runtime via initContainer
- Fix missing seccomp `RuntimeDefault` profile on kube-state-metrics deployment

### Mutelist categories

| File | Checks | Covers |
|------|--------|--------|
| `apiserver.yaml` | 12 | Minikube apiserver flags |
| `control-plane.yaml` | 3 | Scheduler, controller-manager, kubelet |
| `core-pod-security.yaml` | 7 | System pods, Tailscale operator, Grafana init, Prowler hostPID, forgejo-runner |
| `rbac.yaml` | 3 | Built-in K8s roles, ArgoCD, CNPG |

Muted findings appear as `status=MUTED` in reports (not hidden), preserving audit trail.

### Not muted (follow-up)

- Alloy, Immich pods missing seccomp — need separate investigation (Helm/operator-managed)

## Test plan

- [ ] `kubectl kustomize argocd/manifests/prowler/` renders cleanly
- [ ] Trigger manual scan: `kubectl --context=minikube-indri -n prowler create job prowler-mutelist-test --from=cronjob/prowler`
- [ ] Verify initContainer merges successfully (check pod logs)
- [ ] Verify muted findings show as `MUTED` in report
- [ ] Sync kube-state-metrics and verify pod starts with seccomp profile

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #319
2026-03-30 17:22:31 -07:00
1e391f96bb Upgrade forgejo-runner 12.7.0 → 12.7.3, add service card
Patch upgrade picks up idempotent FetchTask API, offline registration
fix, cloudflare/circl security dep update, and custom gRPC user-agent.
No config defaults changed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 16:31:06 -07:00
b000efd6c3 Fix Kingfisher CronJob exit code handling
Kingfisher exits 200 (findings) or 205 (validated findings) on success.
Normalize these to 0 so the CronJob completes instead of restarting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 07:16:02 -07:00
457ab19416 Scope Kingfisher scan to eblume user only on ringtail
Mirror repos cause scan failures (likely ephemeral storage or timeout).
Scan only eblume/ repos until we investigate the root cause.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 07:11:52 -07:00
2c1f0abefc Deploy Kingfisher v165768b-0fe0eed-nix (tmp permissions fix)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 06:54:57 -07:00
14f366f993 Deploy Kingfisher v165768b-c494b62-nix (/tmp fix)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 06:51:17 -07:00
b01afb1c1d Deploy Kingfisher v165768b-aa9cc70-nix (bash fix)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 06:47:26 -07:00