From 8072cd21d75b1a88f7c843e68cefdf61da2c986e Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Mon, 8 Jun 2026 06:35:23 -0700 Subject: [PATCH 1/4] C0: review jellyfin, upgrade indri to 10.11.11 (security fixes) Jellyfin was 5 patch releases behind (10.11.6 -> 10.11.11). 10.11.7 and 10.11.10 contain disclosed CVE/GHSA security fixes. Upgraded via brew upgrade --cask jellyfin on indri; service verified healthy and externally reachable (HTTPS 200). Documented the recurring Gatekeeper gotcha: cask upgrades re-quarantine the .app and the launchd service hangs silently until the first-launch dialog is approved on indri's GUI console (xattr removal over SSH is blocked by macOS TCC). Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/changelog.d/+jellyfin-10-11-11.bugfix.md | 1 + docs/reference/services/jellyfin.md | 22 +++++++++++++++++-- service-versions.yaml | 10 +++++++-- 3 files changed, 29 insertions(+), 4 deletions(-) create mode 100644 docs/changelog.d/+jellyfin-10-11-11.bugfix.md diff --git a/docs/changelog.d/+jellyfin-10-11-11.bugfix.md b/docs/changelog.d/+jellyfin-10-11-11.bugfix.md new file mode 100644 index 0000000..779a042 --- /dev/null +++ b/docs/changelog.d/+jellyfin-10-11-11.bugfix.md @@ -0,0 +1 @@ +Upgraded Jellyfin on indri from 10.11.6 to 10.11.11, picking up the security fixes in 10.11.7 (disclosed CVEs/GHSAs, flagged "upgrade immediately") and 10.11.10 (three further GHSAs). Noted the recurring gotcha in the service-versions tracking: after a `brew upgrade --cask jellyfin`, the re-quarantined `.app` makes the launchd-spawned process hang silently until the Gatekeeper first-launch dialog is approved on indri's GUI console — removing the quarantine xattr over SSH is blocked by macOS TCC. diff --git a/docs/reference/services/jellyfin.md b/docs/reference/services/jellyfin.md index bbdfafd..c7b3074 100644 --- a/docs/reference/services/jellyfin.md +++ b/docs/reference/services/jellyfin.md @@ -1,7 +1,7 @@ --- title: Jellyfin -modified: 2026-02-07 -last-reviewed: 2026-03-23 +modified: 2026-06-08 +last-reviewed: 2026-06-08 tags: - service - media @@ -41,6 +41,24 @@ Dashboard > Playback: 2. Allow hardware encoding: Enabled 3. VPP Tone mapping: Enabled +## Upgrades + +Installed via Homebrew cask (`state: present`, unpinned), so the Ansible role +won't bump an already-installed cask. To upgrade, run on indri: + +```bash +brew upgrade --cask jellyfin +``` + +**Gatekeeper gotcha:** a cask upgrade replaces `/Applications/Jellyfin.app` and +re-applies the `com.apple.quarantine` xattr. When launchd respawns the service, +the new binary hangs silently — process alive but ~0 CPU, no logs, no listening +socket — because Gatekeeper is holding the first launch pending approval. +Removing the xattr over SSH fails (`xattr -dr com.apple.quarantine ...` → +"Operation not permitted", blocked by macOS TCC). Approve the first-launch +dialog on indri's GUI console (or run the `xattr` removal from a local Terminal +with Full Disk Access), then reload the LaunchAgent. + ## Observability - Metrics: `jellyfin_metrics` ansible role diff --git a/service-versions.yaml b/service-versions.yaml index 866c687..419d129 100644 --- a/service-versions.yaml +++ b/service-versions.yaml @@ -440,9 +440,15 @@ services: - name: jellyfin type: ansible - last-reviewed: 2026-03-17 - current-version: "10.11.6" + last-reviewed: 2026-06-08 + current-version: "10.11.11" upstream-source: https://github.com/jellyfin/jellyfin/releases + notes: >- + Homebrew cask (state: present, unpinned). Upgrade with + `brew upgrade --cask jellyfin` on indri. After upgrade the .app is + re-quarantined; launchd-spawned launch hangs silently until the + Gatekeeper first-launch dialog is approved on indri's GUI console + (xattr removal over SSH is blocked by TCC). - name: automounter type: ansible From 6370d2bddbb32a78de11516e92e114a7a713dce3 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Mon, 8 Jun 2026 07:00:48 -0700 Subject: [PATCH 2/4] C0: doc-review tailscale-operator (dual indri/ringtail, host caveat) Add last-reviewed; document the operator now running on both indri's minikube and ringtail's k3s; correct the ArgoCD apps row; pin upstream v1.94.2; add the ProxyGroup Ingress 'host: *' requirement. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../+tailscale-operator-doc-review.doc.md | 1 + .../kubernetes/tailscale-operator.md | 23 +++++++++++++++---- 2 files changed, 20 insertions(+), 4 deletions(-) create mode 100644 docs/changelog.d/+tailscale-operator-doc-review.doc.md diff --git a/docs/changelog.d/+tailscale-operator-doc-review.doc.md b/docs/changelog.d/+tailscale-operator-doc-review.doc.md new file mode 100644 index 0000000..8f7d5a3 --- /dev/null +++ b/docs/changelog.d/+tailscale-operator-doc-review.doc.md @@ -0,0 +1 @@ +Reviewed the tailscale-operator reference card: documented the dual indri/ringtail deployment, corrected the ArgoCD apps list, pinned the upstream version, and added the ProxyGroup Ingress `host:` caveat. diff --git a/docs/reference/kubernetes/tailscale-operator.md b/docs/reference/kubernetes/tailscale-operator.md index c102e02..174b347 100644 --- a/docs/reference/kubernetes/tailscale-operator.md +++ b/docs/reference/kubernetes/tailscale-operator.md @@ -1,6 +1,7 @@ --- title: Tailscale Operator -modified: 2026-02-08 +modified: 2026-06-08 +last-reviewed: 2026-06-08 tags: - kubernetes - tailscale @@ -15,8 +16,16 @@ The Tailscale operator enables Kubernetes services to be exposed directly on the | Property | Value | |----------|-------| | **Namespace** | `tailscale` | -| **Upstream** | `mirrors/tailscale` on forge (static manifest) | -| **ArgoCD Apps** | `tailscale-operator-base` (upstream), `tailscale-operator` (config) | +| **Upstream** | `mirrors/tailscale` on forge (static manifest, pinned `v1.94.2`) | +| **ArgoCD Apps** | `tailscale-operator` (indri/minikube), `tailscale-operator-ringtail` (ringtail/k3s) | + +The operator runs on **both** clusters — indri's minikube and ringtail's k3s. +Both apps layer on the shared `tailscale-operator-base` kustomize directory +(operator manifest, `ProxyClass`, `dnsconfig`); each cluster supplies its own +`ProxyGroup` (indri: 2 replicas, ringtail: 1) and OAuth `ExternalSecret`. The +ringtail overlay additionally rewrites the proxy image to a locally nix-built +mirror. See [[ringtail]] and [[migrate-wave1-ringtail]] for the ongoing +migration of k8s workloads onto ringtail. ## How It Works @@ -27,7 +36,13 @@ Ingresses use a shared ProxyGroup (`ingress`) rather than per-service Tailscale 3. Service becomes accessible at `.tail8d86e.ts.net` 4. TLS is handled automatically via Tailscale -Tailnet clients must have `--accept-routes` enabled to route to VIP addresses. +Two requirements for VIP routing to work: + +1. Tailnet clients must have `--accept-routes` enabled to route to VIP addresses. +2. Ingress rules must **not** set an explicit `host:` field. The ProxyGroup + proxy receives the FQDN as the `Host` header (e.g. + `prometheus.tail8d86e.ts.net`), which won't match a short name. Use + `host: "*"` or omit `host:` entirely. Services can be individually tagged (e.g., `tag:flyio-target`) via Ingress annotations to control which ACL grants apply. See [[expose-service-publicly]] for the tagging workflow. From e592ecfca49ab4e0a0a97d975b60926192955a56 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Mon, 8 Jun 2026 07:17:21 -0700 Subject: [PATCH 3/4] C0: update ringtail flake inputs (nixpkgs, disko) Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/changelog.d/+ringtail-flake-update.infra.md | 1 + nixos/ringtail/flake.lock | 12 ++++++------ 2 files changed, 7 insertions(+), 6 deletions(-) create mode 100644 docs/changelog.d/+ringtail-flake-update.infra.md diff --git a/docs/changelog.d/+ringtail-flake-update.infra.md b/docs/changelog.d/+ringtail-flake-update.infra.md new file mode 100644 index 0000000..1d806df --- /dev/null +++ b/docs/changelog.d/+ringtail-flake-update.infra.md @@ -0,0 +1 @@ +Updated ringtail NixOS flake inputs (nixpkgs `nixos-25.11`, disko) to latest via `dagger call flake-update`. diff --git a/nixos/ringtail/flake.lock b/nixos/ringtail/flake.lock index bb60501..340bd9d 100644 --- a/nixos/ringtail/flake.lock +++ b/nixos/ringtail/flake.lock @@ -7,11 +7,11 @@ ] }, "locked": { - "lastModified": 1780290312, - "narHash": "sha256-eTAlX0CwgB84Ts3GaBd944A3DRXVMzgA0EqroZBISUo=", + "lastModified": 1780894562, + "narHash": "sha256-c3430xwxwhHipl3jigUGMMBfpaMylDqytW/kdmB3ZGs=", "owner": "nix-community", "repo": "disko", - "rev": "115e5211780054d8a890b41f0b7734cafad54dfe", + "rev": "24fed06cac83bcc44ac8efbb57cab1a82fa0bedc", "type": "github" }, "original": { @@ -43,11 +43,11 @@ }, "nixpkgs": { "locked": { - "lastModified": 1779796641, - "narHash": "sha256-ZsIrKmhp4vbBXoXXmR/tBXA/UCsAQiJL9vsgZEduhVY=", + "lastModified": 1780511130, + "narHash": "sha256-2v9lT4ya59Lh1FqPeLnz1MoX9y/wz2huqfe9RtQZITk=", "owner": "NixOS", "repo": "nixpkgs", - "rev": "25f538306313eae3927264466c70d7001dcea1df", + "rev": "535f3e6942cb1cead3929c604320d3db54b542b9", "type": "github" }, "original": { From 1c41cca90392f25951f9040a86a767e7a4866509 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Mon, 8 Jun 2026 09:30:09 -0700 Subject: [PATCH 4/4] Retire Prowler image + IaC scans (keep K8s CIS only) (#372) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Why Weekly compliance review (2026-06-07) surfaced the toil problem head-on: | Report | Unmuted findings | Muted | Acted on | |--------|------------------|-------|----------| | **K8s CIS (In-Cluster)** | 0 | 65 | clean ✅ | | **Container Images** | 20,005 (+713 WoW) | 0 | never | | **IaC (manifests)** | 654 (+31/−30 WoW) | 0 | never | The image and IaC scans generate tens of thousands of un-actioned, un-muted findings every week: - **Image scan** — overwhelmingly unpatchable *upstream* base-image CVEs, and it re-scans every historical tag still in the registry (2× paperless, 3× mealie, 4× prowler tags in the latest report), multiplying the count. - **IaC scan** — systemic Trivy KSV pod-security warnings against our own manifests; real but homelab-acceptable, never muted, so re-surfaced indefinitely. The K8s CIS scan is the only one with realized value (fully mutelisted, 0 unmuted WoW) and is retained. Matches the broader scaling-back of the reporting system as minikube heads toward retirement. ## Changes - Delete `cronjob-image-scan.yaml` and `cronjob-iac-scan.yaml` + remove from kustomization - Drop the now-unused `mutelist/trivyignore.yaml` (only the IaC scan consumed it) - `review-compliance-reports`: drop the two retired scans (and the grouped-findings rendering that existed solely for them) - Docs: deploy-prowler (new 'Why only the K8s CIS scan' section), read-compliance-reports, security reference, prowler reference ## Deploy (after review) ```fish argocd app set prowler --revision retire-prowler-image-iac-scans argocd app sync prowler # prune removes the two CronJobs # after merge: argocd app set prowler --revision main && argocd app sync prowler ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/372 --- .../manifests/prowler/cronjob-iac-scan.yaml | 54 ------- .../manifests/prowler/cronjob-image-scan.yaml | 39 ----- argocd/manifests/prowler/kustomization.yaml | 3 - .../prowler/mutelist/trivyignore.yaml | 37 ----- .../retire-prowler-image-iac-scans.infra.md | 1 + docs/how-to/operations/deploy-prowler.md | 49 ++---- .../operations/read-compliance-reports.md | 11 +- docs/reference/operations/security.md | 8 +- docs/reference/services/prowler.md | 12 +- mise-tasks/review-compliance-reports | 147 ++++-------------- 10 files changed, 65 insertions(+), 296 deletions(-) delete mode 100644 argocd/manifests/prowler/cronjob-iac-scan.yaml delete mode 100644 argocd/manifests/prowler/cronjob-image-scan.yaml delete mode 100644 argocd/manifests/prowler/mutelist/trivyignore.yaml create mode 100644 docs/changelog.d/retire-prowler-image-iac-scans.infra.md diff --git a/argocd/manifests/prowler/cronjob-iac-scan.yaml b/argocd/manifests/prowler/cronjob-iac-scan.yaml deleted file mode 100644 index c1303a5..0000000 --- a/argocd/manifests/prowler/cronjob-iac-scan.yaml +++ /dev/null @@ -1,54 +0,0 @@ ---- -apiVersion: batch/v1 -kind: CronJob -metadata: - name: prowler-iac-scan - namespace: prowler -spec: - schedule: "0 2 * * 6" # Saturday 2am - concurrencyPolicy: Forbid - jobTemplate: - spec: - ttlSecondsAfterFinished: 604800 # Auto-delete after 7 days - template: - spec: - securityContext: - seccompProfile: - type: RuntimeDefault - containers: - - name: prowler - image: registry.ops.eblu.me/blumeops/prowler:kustomized - command: ["/bin/sh", "-c"] - # Prowler's --mutelist-file is a no-op for the IaC provider - # (it delegates to Trivy). The Prowler image's trivy shim - # injects --ignorefile $TRIVY_IGNOREFILE when set; see - # containers/prowler/Dockerfile. - env: - - name: TRIVY_IGNOREFILE - value: /mutelist/trivyignore.yaml - args: - - | - DATEDIR=/reports/prowler-iac/$(date +%Y-%m-%d) - mkdir -p "$DATEDIR" - prowler iac \ - --scan-repository-url https://forge.ops.eblu.me/eblume/blumeops.git \ - -z \ - --output-formats html csv json-ocsf \ - --output-directory "$DATEDIR" - volumeMounts: - - name: reports - mountPath: /reports - - name: mutelist - mountPath: /mutelist - readOnly: true - restartPolicy: OnFailure - volumes: - - name: reports - persistentVolumeClaim: - claimName: prowler-reports - - name: mutelist - configMap: - name: prowler-mutelist - items: - - key: trivyignore.yaml - path: trivyignore.yaml diff --git a/argocd/manifests/prowler/cronjob-image-scan.yaml b/argocd/manifests/prowler/cronjob-image-scan.yaml deleted file mode 100644 index b779d08..0000000 --- a/argocd/manifests/prowler/cronjob-image-scan.yaml +++ /dev/null @@ -1,39 +0,0 @@ ---- -apiVersion: batch/v1 -kind: CronJob -metadata: - name: prowler-image-scan - namespace: prowler -spec: - schedule: "0 3 * * 6" # Saturday 3am - concurrencyPolicy: Forbid - jobTemplate: - spec: - ttlSecondsAfterFinished: 604800 # Auto-delete after 7 days - template: - spec: - securityContext: - seccompProfile: - type: RuntimeDefault - containers: - - name: prowler - image: registry.ops.eblu.me/blumeops/prowler:kustomized - command: ["/bin/sh", "-c"] - args: - - | - DATEDIR=/reports/prowler-images/$(date +%Y-%m-%d) - mkdir -p "$DATEDIR" - prowler image \ - --registry https://registry.ops.eblu.me \ - --image-filter "^blumeops/" \ - -z \ - --output-formats html csv json-ocsf \ - --output-directory "$DATEDIR" - volumeMounts: - - name: reports - mountPath: /reports - restartPolicy: OnFailure - volumes: - - name: reports - persistentVolumeClaim: - claimName: prowler-reports diff --git a/argocd/manifests/prowler/kustomization.yaml b/argocd/manifests/prowler/kustomization.yaml index 1d92a6b..38295a3 100644 --- a/argocd/manifests/prowler/kustomization.yaml +++ b/argocd/manifests/prowler/kustomization.yaml @@ -10,8 +10,6 @@ resources: - pv-nfs.yaml - pvc.yaml - cronjob.yaml - - cronjob-image-scan.yaml - - cronjob-iac-scan.yaml configMapGenerator: - name: prowler-mutelist @@ -23,7 +21,6 @@ configMapGenerator: - mutelist/core-pod-security.yaml - mutelist/manual-node-checks.yaml - mutelist/rbac.yaml - - mutelist/trivyignore.yaml images: - name: registry.ops.eblu.me/blumeops/prowler diff --git a/argocd/manifests/prowler/mutelist/trivyignore.yaml b/argocd/manifests/prowler/mutelist/trivyignore.yaml deleted file mode 100644 index 87af966..0000000 --- a/argocd/manifests/prowler/mutelist/trivyignore.yaml +++ /dev/null @@ -1,37 +0,0 @@ -# Trivy ignorefile for Prowler IaC scan. -# -# Prowler's `--mutelist-file` flag is a no-op for the IaC provider -# (iac_provider.py sets self._mutelist = None and delegates to Trivy). -# Trivy in turn does not auto-discover this YAML form from cwd, so the -# Prowler image ships a shim wrapper around `trivy` that injects -# --ignorefile $TRIVY_IGNOREFILE when the env var is set. The cronjob -# mounts this file and sets TRIVY_IGNOREFILE accordingly. -# -# Schema: https://trivy.dev/latest/docs/configuration/filtering/ -# IDs use the hyphenated form Trivy displays (KSV-0041, not KSV0041). -misconfigurations: - - id: KSV-0041 - paths: - - "argocd/manifests/external-secrets/rbac.yaml" - statement: >- - external-secrets-operator's entire function is to read and - synthesize Secret objects; ClusterRole over secrets is its - purpose. Both the controller and cert-controller are - upstream-defined. - - id: KSV-0041 - paths: - - "argocd/manifests/kube-state-metrics/rbac.yaml" - - "argocd/manifests/kube-state-metrics-ringtail/rbac.yaml" - statement: >- - KSM exposes only Secret metadata (name, namespace, type, labels), - never the data field. list/watch on secrets is required for - kube_secret_info / kube_secret_labels metrics. - - id: KSV-0114 - paths: - - "argocd/manifests/external-secrets/rbac.yaml" - statement: >- - cert-controller manages the external-secrets validating webhook - configurations to inject its own rotating CA bundle. RBAC is - scoped to two named webhooks (secretstore-validate, - externalsecret-validate) via resourceNames; KSV-0114 doesn't see - the resourceNames restriction so reports the full ClusterRole. diff --git a/docs/changelog.d/retire-prowler-image-iac-scans.infra.md b/docs/changelog.d/retire-prowler-image-iac-scans.infra.md new file mode 100644 index 0000000..9afd261 --- /dev/null +++ b/docs/changelog.d/retire-prowler-image-iac-scans.infra.md @@ -0,0 +1 @@ +Retired the Prowler container-image CVE scan and IaC scan, keeping only the K8s CIS benchmark scan. The two retired scans generated tens of thousands of un-actioned, un-muted findings every week (~20,000 image findings and growing, mostly unpatchable upstream-image CVEs; ~650 systemic Trivy KSV pod-security warnings) — the weekly `mise run review-compliance-reports` re-surfaced them all as "action needed" though none were ever triaged. The K8s CIS scan is fully mutelisted and runs clean, so it stays. Removed the two CronJobs, the now-unused `trivyignore.yaml` mutelist, and the grouped-findings rendering in the review tool that existed solely for the high-volume scans. diff --git a/docs/how-to/operations/deploy-prowler.md b/docs/how-to/operations/deploy-prowler.md index 75dced2..1475680 100644 --- a/docs/how-to/operations/deploy-prowler.md +++ b/docs/how-to/operations/deploy-prowler.md @@ -1,6 +1,6 @@ --- title: Deploy Prowler CIS Scanner -modified: 2026-03-24 +modified: 2026-06-08 last-reviewed: 2026-03-24 tags: - how-to @@ -11,7 +11,20 @@ tags: # Deploy Prowler CIS Scanner -Prowler runs weekly CIS Kubernetes Benchmark scans against minikube-indri and writes HTML/CSV/JSON reports to the NFS share on sifaka. +Prowler runs a weekly CIS Kubernetes Benchmark scan against minikube-indri and writes HTML/CSV/JSON reports to the NFS share on sifaka. + +## Why only the K8s CIS scan + +Prowler originally ran three CronJobs: K8s CIS, container-image CVE scanning, and IaC scanning. The image and IaC scans were **retired in 2026-06**. + +Both were pure toil with no realized value: + +- **Image scan** produced ~20,000 unmuted findings per run and growing, none ever triaged or muted. They were overwhelmingly CVEs in *upstream* base images we don't control and can't patch, and the job re-scanned every historical tag still in the registry, multiplying the count. +- **IaC scan** produced ~650 Trivy KSV findings (`runAsNonRoot`, `readOnlyRootFilesystem`, drop-capabilities, …) against our own manifests — real but systemic, homelab-acceptable, and likewise never muted, so the weekly review re-surfaced all of them indefinitely. + +The K8s CIS scan, by contrast, is fully mutelisted and runs clean (0 unmuted findings week over week), so it stays. The guiding principle matches [[ai-scraper-mitigation]]: don't keep generating a firehose of output that has no audience. If image-CVE signal is wanted later, the right shape is critical-severity-only, currently-deployed-tags-only, alert-on-new — a rebuild, not a revival (tracked as the "Trivy for image/IaC scanning" task). + +Note that the K8s CIS scan itself is tied to minikube-indri, which is slated for retirement; on k3s only ~22 of 70 checks produce results (no static pods). Re-pointing a lean posture check at ringtail is tracked separately ("prowler scan against ringtail"). ## What it checks @@ -33,38 +46,6 @@ Prowler's Kubernetes provider runs ~70 checks from the CIS Kubernetes Benchmark **k3s note:** k3s embeds the control plane in a single binary — no static pods exist. Only core + RBAC checks (~22 of 70) produce results. Consider `kube-bench` for k3s control plane checks. -### Image vulnerability scanning (Saturday 3am) - -Prowler's image provider scans all `blumeops/*` container images in `registry.ops.eblu.me` for: - -- **CVEs** — known vulnerabilities from NVD, Alpine SecDB, Debian Security Tracker, and other sources -- **Embedded secrets** — credentials or API keys baked into image layers -- **Misconfigurations** — Dockerfile best practices (running as root, missing HEALTHCHECK, etc.) - -Uses Trivy under the hood. Reports are written to `sifaka:/volume1/reports/prowler-images/`. - -To run an ad-hoc image scan: - -```fish -kubectl create job --from=cronjob/prowler-image-scan prowler-image-manual -n prowler --context=minikube-indri -``` - -### IaC scanning (Saturday 2am) - -Prowler's IaC provider scans the blumeops repository (cloned at scan time) for misconfigurations in: - -- **Dockerfiles** — running as root, using `latest` tags, missing `HEALTHCHECK` -- **Kubernetes manifests** — missing resource limits, privileged containers, insecure settings -- **Other IaC files** — Terraform, CloudFormation, etc. if present - -Uses Trivy under the hood. Reports are written to `sifaka:/volume1/reports/prowler-iac/`. - -To run an ad-hoc IaC scan: - -```fish -kubectl create job --from=cronjob/prowler-iac-scan prowler-iac-manual -n prowler --context=minikube-indri -``` - ## Reports Reports are written to `sifaka:/volume1/reports/prowler/` with timestamped filenames. See [[read-compliance-reports]] for how to access and interpret them. diff --git a/docs/how-to/operations/read-compliance-reports.md b/docs/how-to/operations/read-compliance-reports.md index e676ad5..2990026 100644 --- a/docs/how-to/operations/read-compliance-reports.md +++ b/docs/how-to/operations/read-compliance-reports.md @@ -1,6 +1,6 @@ --- title: Read Compliance Reports -modified: 2026-04-06 +modified: 2026-06-08 last-reviewed: 2026-04-06 tags: - how-to @@ -27,8 +27,13 @@ Reports are stored on sifaka at `/volume1/reports/`. Each scanner writes to its | Scanner | Path | Schedule | |---------|------|----------| | [[prowler]] K8s CIS | `sifaka:/volume1/reports/prowler/` | Weekly (Sunday 3am) | -| [[prowler]] Image | `sifaka:/volume1/reports/prowler-images/` | Weekly (Saturday 3am) | -| [[prowler]] IaC | `sifaka:/volume1/reports/prowler-iac/` | Weekly (Saturday 2am) | + +> **Retired (2026-06):** the Prowler **image** (`prowler-images/`) and **IaC** +> (`prowler-iac/`) scans were retired. They produced tens of thousands of +> un-actioned, un-muted findings every week — mostly unpatchable upstream-image +> CVEs and systemic pod-security KSV warnings — and nobody triaged them. See +> [[deploy-prowler#Why only the K8s CIS scan]] for the rationale. Their stale +> report directories may linger on sifaka until manually removed. Copy reports to your local machine (remember `scp -O` for sifaka): diff --git a/docs/reference/operations/security.md b/docs/reference/operations/security.md index 11c4df9..86b3d3b 100644 --- a/docs/reference/operations/security.md +++ b/docs/reference/operations/security.md @@ -1,6 +1,6 @@ --- title: Security & Compliance -modified: 2026-03-24 +modified: 2026-06-08 last-reviewed: 2026-03-24 tags: - operations @@ -21,7 +21,7 @@ Security posture and compliance scanning for BlumeOps infrastructure. ## Scanning tools -- [[prowler]] — CIS Kubernetes Benchmark scanner (weekly CronJob) +- [[prowler]] — CIS Kubernetes Benchmark scanner (weekly CronJob). The container-image CVE scan and IaC scan were retired in 2026-06 (un-actioned noise — see [[deploy-prowler#Why only the K8s CIS scan]]); only the K8s CIS scan remains. - [[deploy-prowler]] — deployment and ad-hoc scan how-to - [[read-compliance-reports]] — accessing and interpreting reports - [[kingfisher]] — Secret detection and live validation for Forgejo repos (weekly CronJob + prek hook) @@ -52,5 +52,5 @@ Suppressed findings are kept in Prowler mutelist YAML under `argocd/manifests/pr - No SOC 2 compliance mapping for Kubernetes (Prowler only maps SOC 2 for AWS/Azure/GCP) - k3s control plane checks produce no results (embedded binary, no static pods) — consider kube-bench -- Container image scanning covers `blumeops/*` images only — upstream images (ollama, immich, etc.) are not scanned -- IaC scanning covers the blumeops repo only — no scanning of third-party Helm charts or vendored manifests +- No container-image CVE scanning (the Prowler image scan was retired 2026-06 as un-actioned noise). If reintroduced, scope it to critical-severity, currently-deployed tags, alert-on-new +- No automated IaC misconfiguration scanning (the Prowler IaC scan was retired 2026-06). Manifest pod-security hardening is now an accept-and-document decision rather than a weekly report diff --git a/docs/reference/services/prowler.md b/docs/reference/services/prowler.md index f45955f..9f7e4b3 100644 --- a/docs/reference/services/prowler.md +++ b/docs/reference/services/prowler.md @@ -1,6 +1,6 @@ --- title: Prowler -modified: 2026-03-24 +modified: 2026-06-08 last-reviewed: 2026-03-24 tags: - service @@ -17,20 +17,20 @@ CIS Kubernetes Benchmark scanner for compliance posture reporting. |----------|-------| | **Namespace** | `prowler` | | **Image** | `registry.ops.eblu.me/blumeops/prowler` (see `argocd/manifests/prowler/kustomization.yaml` for current tag) | -| **Schedule** | K8s CIS: Sunday 3am / Image: Saturday 3am / IaC: Saturday 2am | -| **Reports** | `sifaka:/volume1/reports/prowler/`, `prowler-images/`, `prowler-iac/` (NFS) | +| **Schedule** | K8s CIS: Sunday 3am | +| **Reports** | `sifaka:/volume1/reports/prowler/` (NFS) | | **Manifests** | `argocd/manifests/prowler/` | ## What it does -Runs Prowler 5 as two CronJobs: +Runs Prowler 5 as a single CronJob: - **K8s CIS scan** (Sunday) — CIS Kubernetes Benchmark v1.11 checks across pod security, RBAC, apiserver, etcd, kubelet, controller-manager, and scheduler -- **Image scan** (Saturday) — CVE, secret, and misconfiguration scanning of all `blumeops/*` container images in the registry via Trivy -- **IaC scan** (Saturday) — static analysis of Dockerfiles, K8s manifests, and other IaC files in the repo via Trivy Reports are written in HTML, CSV, and JSON-OCSF to the NFS share on sifaka. +The **image** and **IaC** scans (formerly Saturday CronJobs) were retired in 2026-06 — they generated tens of thousands of un-actioned findings weekly. See [[deploy-prowler#Why only the K8s CIS scan]]. + ## See also - [[security]] — security & compliance posture overview diff --git a/mise-tasks/review-compliance-reports b/mise-tasks/review-compliance-reports index 24d2afc..f2a0a54 100755 --- a/mise-tasks/review-compliance-reports +++ b/mise-tasks/review-compliance-reports @@ -10,19 +10,19 @@ Covers: - Prowler K8s CIS (in-cluster): per-finding detail - - Prowler container image scans: grouped by check + resource - - Prowler IaC manifest scans: grouped by check + resource - Kingfisher secret scanning: TODO — pending upstream JSON/CSV output support (currently HTML-only; contribute from spork) -For each Prowler scan, copies the two most recent CSV reports, parses +The Prowler container-image CVE scan and IaC scan were retired in 2026-06 +(see docs/how-to/operations/deploy-prowler.md) — they produced tens of +thousands of un-actioned findings weekly. Only the K8s CIS scan remains. + +For the Prowler scan, copies the two most recent CSV reports, parses them, and displays: 1. Overall status (pass/fail/manual/muted counts) 2. Unmuted failures by severity 3. Delta from the previous report (new vs resolved) - 4. Actionable unmuted failures (per-finding for in-cluster; grouped - by check ID and resource for image/IaC because they have far too - many findings to list individually) + 4. Actionable unmuted failures (per-finding detail) This is the primary tool for the weekly compliance report review. """ @@ -39,11 +39,9 @@ from rich.console import Console from rich.panel import Panel from rich.table import Table -PROWLER_SCANS: list[tuple[str, str, bool]] = [ - # (label, sifaka base path, group_findings) - ("K8s CIS (In-Cluster)", "/volume1/reports/prowler", False), - ("Container Images", "/volume1/reports/prowler-images", True), - ("IaC (manifests)", "/volume1/reports/prowler-iac", True), +PROWLER_SCANS: list[tuple[str, str]] = [ + # (label, sifaka base path) + ("K8s CIS (In-Cluster)", "/volume1/reports/prowler"), ] console = Console() @@ -334,14 +332,8 @@ def summarize_report( tmpdir: str, *, show_muted: bool = False, - group_findings: bool = False, ) -> None: - """Fetch and summarize the latest Prowler report under `base`. - - When `group_findings` is True, top-N CHECK_ID and RESOURCE_NAME tables - are shown instead of a per-finding detail table — appropriate for - image and IaC scans that produce thousands of findings. - """ + """Fetch and summarize the latest Prowler report under `base`.""" console.rule(f"[bold]{label}[/bold]") csvs = list_reports(base) if not csvs: @@ -458,36 +450,29 @@ def summarize_report( ) console.print() - # For grouped scans the new/resolved listings are too noisy - # (potentially thousands of lines). Skip the listings; the count - # is in the panel above and detail is in the grouped tables. - if not group_findings: - if new_keys: - console.print("[bold red]New Unmuted Failures:[/bold red]") - for k in sorted(new_keys): - r = curr_keys[k] - console.print( - f" [{r['SEVERITY']}] {r['CHECK_ID']}: " - f"{r['STATUS_EXTENDED'][:120]}" - ) - console.print() + if new_keys: + console.print("[bold red]New Unmuted Failures:[/bold red]") + for k in sorted(new_keys): + r = curr_keys[k] + console.print( + f" [{r['SEVERITY']}] {r['CHECK_ID']}: " + f"{r['STATUS_EXTENDED'][:120]}" + ) + console.print() - if resolved_keys: - console.print("[bold green]Resolved:[/bold green]") - for k in sorted(resolved_keys): - r = prev_keys[k] - console.print( - f" [dim][{r['SEVERITY']}] {r['CHECK_ID']}: " - f"{r['STATUS_EXTENDED'][:120]}[/dim]" - ) - console.print() + if resolved_keys: + console.print("[bold green]Resolved:[/bold green]") + for k in sorted(resolved_keys): + r = prev_keys[k] + console.print( + f" [dim][{r['SEVERITY']}] {r['CHECK_ID']}: " + f"{r['STATUS_EXTENDED'][:120]}[/dim]" + ) + console.print() - # --- Unmuted failure details (grouped or per-finding) --- + # --- Unmuted failure details --- if latest["unmuted"]: - if group_findings: - _print_grouped_findings(latest["unmuted"]) - else: - _print_findings_detail(latest["unmuted"]) + _print_findings_detail(latest["unmuted"]) # --- Muted findings summary --- if show_muted and latest["muted"]: @@ -566,75 +551,6 @@ def _print_findings_detail(unmuted: list[dict]) -> None: console.print() -def _worst_severity(rows: list[dict]) -> str: - """Return the most severe severity label across `rows`.""" - if not rows: - return "" - return min( - (r["SEVERITY"] for r in rows), - key=lambda s: severity_sort({"SEVERITY": s}), - ) - - -def _print_grouped_findings(unmuted: list[dict], top_n: int = 15) -> None: - """Top-N tables grouped by CHECK_ID and RESOURCE_NAME. - - Used for image and IaC scans where per-finding tables would be too - large to be useful. Shows count and worst severity for each group. - """ - by_check: dict[str, list[dict]] = {} - by_resource: dict[str, list[dict]] = {} - for r in unmuted: - by_check.setdefault(r["CHECK_ID"], []).append(r) - by_resource.setdefault(r.get("RESOURCE_NAME", "") or "(no resource)", []).append(r) - - check_table = Table( - show_header=True, - header_style="bold", - title=f"Top {top_n} Checks by Unmuted Finding Count", - ) - check_table.add_column("Worst Sev") - check_table.add_column("Check ID") - check_table.add_column("Count", justify="right") - - for check, rows in sorted( - by_check.items(), key=lambda kv: -len(kv[1]) - )[:top_n]: - worst = _worst_severity(rows) - style = _sev_style(worst) - check_table.add_row( - f"[{style}]{worst}[/{style}]" if style else worst, - check, - str(len(rows)), - ) - - console.print(check_table) - console.print() - - res_table = Table( - show_header=True, - header_style="bold", - title=f"Top {top_n} Resources by Unmuted Finding Count", - ) - res_table.add_column("Worst Sev") - res_table.add_column("Resource") - res_table.add_column("Count", justify="right") - - for resource, rows in sorted( - by_resource.items(), key=lambda kv: -len(kv[1]) - )[:top_n]: - worst = _worst_severity(rows) - style = _sev_style(worst) - res_table.add_row( - f"[{style}]{worst}[/{style}]" if style else worst, - resource[:80], - str(len(rows)), - ) - - console.print(res_table) - console.print() - - def main( full: Annotated[ bool, typer.Option(help="(reserved) currently a no-op; all unmuted failures already shown") @@ -646,13 +562,12 @@ def main( del full # historical flag, kept for backwards compatibility with tempfile.TemporaryDirectory() as tmpdir: - for label, base, group in PROWLER_SCANS: + for label, base in PROWLER_SCANS: summarize_report( label, base, tmpdir, show_muted=show_muted, - group_findings=group, ) # --- Node-level MANUAL check verification ---