Compare commits

...
Sign in to create a new pull request.

15 commits

Author SHA1 Message Date
1c41cca903 Retire Prowler image + IaC scans (keep K8s CIS only) (#372)
## Why

Weekly compliance review (2026-06-07) surfaced the toil problem head-on:

| Report | Unmuted findings | Muted | Acted on |
|--------|------------------|-------|----------|
| **K8s CIS (In-Cluster)** | 0 | 65 | clean  |
| **Container Images** | 20,005 (+713 WoW) | 0 | never |
| **IaC (manifests)** | 654 (+31/−30 WoW) | 0 | never |

The image and IaC scans generate tens of thousands of un-actioned, un-muted findings every week:

- **Image scan** — overwhelmingly unpatchable *upstream* base-image CVEs, and it re-scans every historical tag still in the registry (2× paperless, 3× mealie, 4× prowler tags in the latest report), multiplying the count.
- **IaC scan** — systemic Trivy KSV pod-security warnings against our own manifests; real but homelab-acceptable, never muted, so re-surfaced indefinitely.

The K8s CIS scan is the only one with realized value (fully mutelisted, 0 unmuted WoW) and is retained. Matches the broader scaling-back of the reporting system as minikube heads toward retirement.

## Changes

- Delete `cronjob-image-scan.yaml` and `cronjob-iac-scan.yaml` + remove from kustomization
- Drop the now-unused `mutelist/trivyignore.yaml` (only the IaC scan consumed it)
- `review-compliance-reports`: drop the two retired scans (and the grouped-findings rendering that existed solely for them)
- Docs: deploy-prowler (new 'Why only the K8s CIS scan' section), read-compliance-reports, security reference, prowler reference

## Deploy (after review)

```fish
argocd app set prowler --revision retire-prowler-image-iac-scans
argocd app sync prowler   # prune removes the two CronJobs
# after merge: argocd app set prowler --revision main && argocd app sync prowler
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #372
2026-06-08 09:30:09 -07:00
e592ecfca4 C0: update ringtail flake inputs (nixpkgs, disko)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 07:17:21 -07:00
6370d2bddb C0: doc-review tailscale-operator (dual indri/ringtail, host caveat)
Add last-reviewed; document the operator now running on both indri's
minikube and ringtail's k3s; correct the ArgoCD apps row; pin upstream
v1.94.2; add the ProxyGroup Ingress 'host: *' requirement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 07:00:48 -07:00
8072cd21d7 C0: review jellyfin, upgrade indri to 10.11.11 (security fixes)
Jellyfin was 5 patch releases behind (10.11.6 -> 10.11.11). 10.11.7 and
10.11.10 contain disclosed CVE/GHSA security fixes. Upgraded via
brew upgrade --cask jellyfin on indri; service verified healthy and
externally reachable (HTTPS 200).

Documented the recurring Gatekeeper gotcha: cask upgrades re-quarantine
the .app and the launchd service hangs silently until the first-launch
dialog is approved on indri's GUI console (xattr removal over SSH is
blocked by macOS TCC).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 06:35:23 -07:00
bc34b601be Merge pull request 'heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)' (#371) from heph-offline-access into main 2026-06-06 18:29:47 -07:00
50a36ff93a heph Authentik: grant offline_access scope (fixes spoke sync refresh-token 400)
The heph CLI requests scope "openid offline_access", but the Authentik
heph OAuth2 provider only mapped openid/email/profile. Without the
offline_access mapping the issued refresh token is bound to the login
session rather than the 30-day refresh-token window; once the session
lapses, hephd's refresh_token grant returns 400 Bad Request and spoke
sync silently degrades (heph sync --status -> auth_failure: true).

Add the built-in offline_access scope mapping to the provider's
property_mappings and document the requirement in the service reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:07:13 -07:00
cf63fcb5b5 C0: track heph in service-versions (self-updating; note drift task)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 08:22:46 -07:00
3abe80523a C0: bump indri heph hub to v1.2.1 (PWA Authentik login + /config)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:40:51 -07:00
6576880b0e heph Authentik: register heph-pwa redirect URIs (PKCE login) (#370)
Adds the heph-pwa redirect URIs to the Authentik `heph` OAuth2 provider so the new browser **Login with Authentik** flow (Authorization Code + PKCE, hephaestus PR #9) can redirect back and exchange the code:

- `https://heph.ops.eblu.me/` (the PWA origin)
- `http://localhost:8787/` (local dev: `hephd --web-root`)

Authentik also keys token-endpoint CORS off these origins, so they're required for the browser token exchange. Additive (the provider was `redirect_uris: []`); harmless until the PWA feature deploys.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #370
2026-06-05 07:30:31 -07:00
a2f1e06224 Add hephaestus sync hub to indri (launchagent, PWA, device-code OIDC) (#369)
Makes indri the canonical **heph** hub for the hub-and-spoke task/context system, deployed as a self-updating LaunchAgent managed by Ansible. Other devices (gilbert) attach as offline-capable spokes.

## What's here
- **`ansible/roles/heph`** (tag `heph`) — bootstrap `cargo install hephd` (only if absent; `--self-update` keeps it current after), version-pinned `heph-pwa` checkout served via `--web-root`, launchagent `mcquack.eblume.heph`:
  ```
  hephd --mode server --http-addr 0.0.0.0:8787 --db … --web-root …
        --oidc-issuer …/o/heph/ --oidc-audience heph
        --self-update --self-update-interval-secs 600
  ```
  `~/.cargo/bin` is on the agent `PATH` so self-update's `cargo install` works.
- **Caddy** — `heph.ops.eblu.me → localhost:8787` (TLS for the PWA secure context).
- **Authentik** — new `heph` **public device-code** OIDC app + `default-device-code-flow` bound to the default brand's `flow_device_code` (verified live: brand `authentik-default`, field currently unset → additive).
- **Docs** — `services/hephaestus.md` (Path-A seeding runbook + spoke caveat), `indri.md`, changelog fragment.

## Three features requested
- **Autoupdate** — 10-min interval (`--self-update-interval-secs 600`).
- **PWA** — `--web-root` (confirmed shipped in v1.2.0).
- **Spoke** — gilbert reconfig documented (post-merge step).

## Deploy plan (not done yet — awaiting review)
1. Seed from gilbert (Path A): `heph daemon stop` → copy `heph.db` → `DELETE FROM meta WHERE key='origin'`.
2. Sync Authentik `apps`/blueprint; verify blueprint status via API (not just logs).
3. `provision-indri --tags heph,caddy` from this branch.
4. Point gilbert at the hub + `heph auth login`.

## Known follow-ups (heph-side, tracked in the Hephaestus project)
- `heph daemon` can't bake hub/spoke config or pass `--self-update-interval-secs` → worked around by the ansible plist.
- Path-A seeding lacks a clean `hephd --owner-id`/seed command → manual `meta.origin` reset for now.
- Self-update moves hephd ahead of the ansible-pinned PWA shell over time (drift; tolerated by the SW cache, revisit on next release).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #369
2026-06-05 06:46:58 -07:00
f6c926f1f5 C0: rebuild external-secrets off main, repoint both clusters to stable tags
indri -> v2.2.0-13895bb (arm64), ringtail -> v2.2.0-13895bb-nix (amd64).
Both deployed images now trace to main commit 13895bb instead of earlier
branch builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 16:19:20 -07:00
13895bb04a Localize external-secrets on ringtail (amd64 nix build) (#368)
Follow-up to #367. That PR localized external-secrets but the Dagger build (on indri's Apple Silicon runner) only produces an **arm64** image — and external-secrets also runs on **ringtail (amd64)** via the same shared manifest. This completes the localization so both clusters run the local binary on their native arch.

## Approach (matches the kube-state-metrics dual-build pattern)
- **`containers/external-secrets/default.nix`** (new) — builds the **amd64** image on ringtail's nix-container-builder. `buildGoModule` with Go 1.26 (v2.2.0 requires ≥1.26.1; nixpkgs default is 1.25.x) and `-tags all_providers`, faithful to upstream. Same v2.2.0 source from the forge mirror.
- **`argocd/manifests/external-secrets-ringtail/`** (new) — thin kustomize overlay that reuses the shared indri manifest as a base and overrides **only** the image to the `-nix` (amd64) tag. No manifest duplication.
- **`argocd/apps/external-secrets-ringtail.yaml`** — repointed at the new overlay.

Result: indri → `v2.2.0-…` (arm64, Dagger), ringtail → `v2.2.0-…-nix` (amd64, nix).

## Build
Run #581 built both arches at the branch commit. Verified the nix image is `linux/amd64`, entrypoint = the binary, user 65534.

## Deployed from branch & verified on ringtail (k3s, amd64)
- All 3 pods rolled to the nix amd64 image, `1/1 Running` (no exec-format error → arch correct)
- Controller logs clean
- **Live secret fetch proven:** force-synced `homepage/homepage-grafana` → `refreshTime` advanced, `Ready=True`
- **All 20** ringtail ExternalSecrets remain `SecretSynced=True`

## Post-merge
The `external-secrets-ringtail` app is temporarily pointed at this branch + overlay path (apps app left on `main`, manual-sync, untouched). After merge:
```
argocd app sync apps                       # picks up the new Application path on main
argocd app set external-secrets-ringtail --revision main && argocd app sync external-secrets-ringtail
```
I'll also rebuild off `main` so both clusters land on stable main-sha tags (as done for indri in #367).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #368
2026-06-04 15:37:42 -07:00
30c82079b9 C0: rebuild external-secrets image off main (v2.2.0-0e70a1b)
Repoint to the main-branch-built image so the deployed tag traces to a main
commit rather than the merged feature branch. Same v2.2.0 source, stable
provenance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:59:17 -07:00
0e70a1b524 Localize external-secrets container (native container.py build) (#367)
Knocks out the weekly "pick one non-local container and make it local" task by moving **external-secrets** off `ghcr.io` onto a locally-built image, under our own supply-chain control. Doubles as its overdue service review.

## What changed
- **`containers/external-secrets/container.py`** (new) — native Dagger build (the Dockerfile→container.py migration pattern). Clones the forge mirror at `v2.2.0` and builds the single `all_providers` static Go binary, faithful to upstream's `make build` (CGO off, no version ldflags upstream). ENTRYPOINT is `/bin/external-secrets` so the controller/webhook/cert-controller Deployments select their role via `args:` exactly as before.
- **`argocd/manifests/external-secrets/kustomization.yaml`** — image swapped to `registry.ops.eblu.me/blumeops/external-secrets:v2.2.0-2985007`. **Like-for-like (v2.2.0)**, not an upgrade.
- **`service-versions.yaml`** — marked reviewed (2026-06-04), noted the local build.

## Build
Built on the indri forge runner (run #579, ~4 min) → pushed to Zot. Image config verified: `Entrypoint=/bin/external-secrets`, `User=65534`, version label `v2.2.0`.

## Deployed from branch & verified
- All 3 pods (controller / webhook / cert-controller) rolled to the local image, `1/1 Running`
- Controller + webhook logs clean (no errors; webhook serving TLS)
- **End-to-end secret fetch proven:** force-synced `monitoring/grafana-admin` → `refreshTime` advanced to now, `Ready=True`
- All 10 ExternalSecrets cluster-wide remain `SecretSynced=True` — no collateral damage
- App `Healthy`

## Post-merge
`external-secrets` currently points at this branch (so `apps` reads OutOfSync — expected). After merge:
```
argocd app set external-secrets --revision main && argocd app sync external-secrets
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #367
2026-06-04 14:55:55 -07:00
bb55fa9566 Recurring review sweep: 4 doc cards + nvidia-device-plugin v0.19.2 (#366)
Knocks out the two daily recurring review tasks (doc review + service review) in one PR.

## Doc review (4 never-reviewed reference cards, `last-reviewed: 2026-06-04`)
- **cluster.md** — Kubernetes version v1.34.0 → **v1.35.0**; refreshed the stale ringtail workload list and noted the in-progress minikube→k3s migration (points to `[[ringtail]]` as the canonical list).
- **ntfy.md / tempo.md / alloy.md** — corrected image references: these are now **locally-built `registry.ops.eblu.me/blumeops/*` nix containers** (ntfy v2.19.2, tempo v2.10.3, alloy-k8s v1.16.0), not upstream Docker Hub. Fly.io alloy binary bumped to v1.16.1.

## Service review
- **nvidia-device-plugin** (ringtail GPU): v0.19.0 → **v0.19.2**. Upstream patch releases — CDI/Tegra fixes + dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup (the service-account change in the notes is helm-only).

## Not in this PR (need container rebuilds, deferred)
The other stale services are locally-built nix images, so upgrading them is a forge-runner rebuild rather than a clean tag bump — left untouched (not date-bumped, so they resurface): **prometheus** (v3.10.0→v3.12.0), **loki** (3.6.7→3.7.2), **kube-state-metrics**, **homepage**. Happy to do these as a follow-up rebuild PR.

## Deploy / verify
Not yet deployed — `nvidia-device-plugin` still points at `main`. After review:
```
argocd app set nvidia-device-plugin --revision reviews-jun4 && argocd app sync nvidia-device-plugin
# after merge:
argocd app set nvidia-device-plugin --revision main && argocd app sync nvidia-device-plugin
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #366
2026-06-04 13:37:02 -07:00
46 changed files with 724 additions and 329 deletions

View file

@ -260,5 +260,7 @@
tags: cv tags: cv
- role: docs - role: docs
tags: docs tags: docs
- role: heph
tags: heph
- role: caddy - role: caddy
tags: caddy tags: caddy

View file

@ -52,6 +52,9 @@ caddy_services:
- name: devpi - name: devpi
host: "pypi.{{ caddy_domain }}" host: "pypi.{{ caddy_domain }}"
backend: "http://localhost:3141" backend: "http://localhost:3141"
- name: heph
host: "heph.{{ caddy_domain }}"
backend: "http://localhost:8787" # hephaestus hub (server mode) + PWA shell
- name: kiwix - name: kiwix
host: "kiwix.{{ caddy_domain }}" host: "kiwix.{{ caddy_domain }}"
backend: "https://kiwix.tail8d86e.ts.net" backend: "https://kiwix.tail8d86e.ts.net"

View file

@ -0,0 +1,49 @@
---
# hephaestus hub — the canonical heph replica (server mode) on indri.
# Other devices (e.g. gilbert) are spokes that sync against this hub.
# See [[set-up-sync-hub]] and [[host-heph-pwa]] in the hephaestus repo.
# Pinned release used for the initial `cargo install` and the PWA shell.
# After bootstrap, hephd's own --self-update keeps the binary current; this
# pin only governs the first install and the bundled PWA shell version.
heph_version: v1.2.1
# Anonymous public HTTPS clone — matches hephd's INSTALL_GIT_URL so the initial
# install and unattended self-update build from the same source (no ssh-agent).
heph_repo_url: https://forge.eblu.me/eblume/hephaestus.git
heph_bin_dir: /Users/erichblume/.cargo/bin
heph_binary: "{{ heph_bin_dir }}/hephd"
# rustc/cargo here are rustup shims. The bare (non-mise) environment that the
# launchagent and ansible run in falls back to rustup's *default* toolchain,
# which can lag behind heph's rust-version floor (Cargo.toml: 1.89). Pin the
# channel explicitly so both the bootstrap build and unattended self-update
# always use a current toolchain regardless of the host's rustup default.
heph_rust_toolchain: stable
heph_data_dir: /Users/erichblume/.local/share/heph
heph_db: "{{ heph_data_dir }}/heph.db"
heph_socket: "{{ heph_data_dir }}/hephd.sock"
heph_log_dir: /Users/erichblume/Library/Logs
# Version-pinned source checkout; the PWA static shell is served directly from
# its heph-pwa/ subdir (no copy), keeping shell and hub in lockstep at heph_version.
heph_pwa_src_dir: /Users/erichblume/.cache/heph-pwa-src
heph_web_root: "{{ heph_pwa_src_dir }}/heph-pwa"
# Hub listens on all interfaces so tailnet spokes can reach it directly
# (http://indri.tail8d86e.ts.net:8787) and Caddy can proxy heph.ops.eblu.me.
# Access is gated by Authentik OIDC regardless — tailnet reachability is not
# enough (this is the owner's most sensitive data).
heph_http_addr: 0.0.0.0:8787
heph_port: 8787
heph_external_url: https://heph.ops.eblu.me
# Authentik OIDC — issuer + audience together turn hub auth on. The audience is
# the device-code client id (see argocd/manifests/authentik heph blueprint).
heph_oidc_issuer: https://authentik.ops.eblu.me/application/o/heph/
heph_oidc_audience: heph
# Self-update poll interval (seconds). 10 minutes.
heph_self_update_interval_secs: 600

View file

@ -0,0 +1,6 @@
---
- name: Restart heph
ansible.builtin.shell: |
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.heph.plist 2>/dev/null || true
launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
changed_when: true

View file

@ -0,0 +1,82 @@
---
# hephaestus hub (server mode) on indri.
#
# DATA SEEDING (one-time, Path A — do this BEFORE the first provision so the hub
# adopts gilbert's existing data instead of being born empty):
#
# 1. On the seed device (gilbert): heph daemon stop
# 2. Copy its store to indri: scp ~/.local/share/heph/heph.db \
# indri:~/.local/share/heph/heph.db
# 3. On indri, give the hub its OWN device origin (keeps gilbert's owner_id +
# data; hephd regenerates a fresh origin on next start when it is missing):
# sqlite3 ~/.local/share/heph/heph.db "DELETE FROM meta WHERE key='origin';"
# 4. Run this role (installs hephd, stages the PWA, loads the launchagent).
#
# hephd auto-creates an empty store on first start if none exists, so seeding is
# optional — skip it only if you intend a fresh, empty hub.
- name: Ensure heph data directory exists
ansible.builtin.file:
path: "{{ heph_data_dir }}"
state: directory
mode: '0700'
- name: Check for installed hephd binary
ansible.builtin.stat:
path: "{{ heph_binary }}"
register: heph_binary_stat
# Bootstrap install only when hephd is absent. Thereafter hephd's own
# --self-update keeps it current; ansible must not fight (or downgrade) it.
# This builds from source and can take several minutes on a cold cargo cache.
- name: Bootstrap-install heph + hephd from the forge ({{ heph_version }})
ansible.builtin.command:
cmd: >-
{{ heph_bin_dir }}/cargo install --locked
--git {{ heph_repo_url }}
--tag {{ heph_version }}
heph hephd
environment:
PATH: "{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin"
RUSTUP_TOOLCHAIN: "{{ heph_rust_toolchain }}"
when: not heph_binary_stat.stat.exists
changed_when: true
notify: Restart heph
# Checkout provides the PWA shell at {{ heph_web_root }} (heph-pwa/ subdir),
# served directly by hephd. Static files are read from disk per request, so a
# version bump needs no restart; the service worker (CACHE = "heph-pwa-vN")
# evicts stale assets on next load.
- name: Ensure heph cache parent directory exists
ansible.builtin.file:
path: "{{ heph_pwa_src_dir | dirname }}"
state: directory
mode: '0755'
- name: Stage heph-pwa source at {{ heph_version }}
ansible.builtin.git:
repo: "{{ heph_repo_url }}"
dest: "{{ heph_pwa_src_dir }}"
version: "{{ heph_version }}"
depth: 1
single_branch: true
force: true
- name: Deploy heph LaunchAgent plist
ansible.builtin.template:
src: heph.plist.j2
dest: ~/Library/LaunchAgents/mcquack.eblume.heph.plist
mode: '0644'
notify: Restart heph
- name: Check if heph LaunchAgent is loaded
ansible.builtin.command: launchctl list mcquack.eblume.heph
register: heph_launchctl_check
changed_when: false
failed_when: false
- name: Load heph LaunchAgent if not loaded
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.heph.plist
when: heph_launchctl_check.rc != 0
changed_when: true
failed_when: false

View file

@ -0,0 +1,50 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- {{ ansible_managed }} -->
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mcquack.eblume.heph</string>
<key>ProgramArguments</key>
<array>
<string>{{ heph_binary }}</string>
<string>--mode</string>
<string>server</string>
<string>--http-addr</string>
<string>{{ heph_http_addr }}</string>
<string>--db</string>
<string>{{ heph_db }}</string>
<string>--socket</string>
<string>{{ heph_socket }}</string>
<string>--web-root</string>
<string>{{ heph_web_root }}</string>
<string>--oidc-issuer</string>
<string>{{ heph_oidc_issuer }}</string>
<string>--oidc-audience</string>
<string>{{ heph_oidc_audience }}</string>
<string>--self-update</string>
<string>--self-update-interval-secs</string>
<string>{{ heph_self_update_interval_secs }}</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>EnvironmentVariables</key>
<dict>
<!-- cargo + toolchain on PATH so --self-update can run `cargo install`. -->
<key>PATH</key>
<string>{{ heph_bin_dir }}:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
<key>HOME</key>
<string>/Users/erichblume</string>
<!-- Pin the rustup channel: the launchagent runs without mise, so a bare
cargo shim would otherwise use rustup's (stale) default toolchain. -->
<key>RUSTUP_TOOLCHAIN</key>
<string>{{ heph_rust_toolchain }}</string>
</dict>
<key>StandardOutPath</key>
<string>{{ heph_log_dir }}/mcquack.heph.out.log</string>
<key>StandardErrorPath</key>
<string>{{ heph_log_dir }}/mcquack.heph.err.log</string>
</dict>
</plist>

View file

@ -15,7 +15,7 @@ spec:
source: source:
repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git repoURL: ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git
targetRevision: main targetRevision: main
path: argocd/manifests/external-secrets path: argocd/manifests/external-secrets-ringtail
destination: destination:
server: https://ringtail.tail8d86e.ts.net:6443 server: https://ringtail.tail8d86e.ts.net:6443
namespace: external-secrets namespace: external-secrets

View file

@ -434,3 +434,93 @@ data:
provider: !KeyOf mealie-provider provider: !KeyOf mealie-provider
meta_launch_url: https://meals.ops.eblu.me meta_launch_url: https://meals.ops.eblu.me
policy_engine_mode: all policy_engine_mode: all
heph.yaml: |
version: 1
metadata:
name: BlumeOps Heph SSO
labels:
blueprints.goauthentik.io/description: "Hephaestus hub OIDC (device-code) provider, application, and device-code flow"
entries:
# Device-code flow (RFC 8628). authentik ships no default for this, so we
# create one and bind it to the brand below. An empty stage_configuration
# flow is sufficient: the already-authenticated user just confirms the code.
- model: authentik_flows.flow
id: device-code-flow
identifiers:
slug: default-device-code-flow
attrs:
name: Device code flow
title: Device code flow
slug: default-device-code-flow
designation: stage_configuration
authentication: require_authenticated
# Enable the device-code grant globally by binding the flow to the default
# brand (domain authentik-default). Partial update — only sets this field.
- model: authentik_brands.brand
identifiers:
domain: authentik-default
attrs:
flow_device_code: !KeyOf device-code-flow
# OAuth2 provider for heph — PUBLIC client (device-code + PKCE, no secret).
# client_id doubles as the token audience the hub verifies (--oidc-audience heph),
# and the app slug 'heph' is the issuer path (/application/o/heph/).
- model: authentik_providers_oauth2.oauth2provider
id: heph-provider
identifiers:
name: Heph
attrs:
name: Heph
authorization_flow: !Find [authentik_flows.flow, [slug, default-provider-authorization-implicit-consent]]
invalidation_flow: !Find [authentik_flows.flow, [slug, default-provider-invalidation-flow]]
client_type: public
client_id: heph
# CLI/TUI use the device-code grant (no redirect). The heph-pwa browser
# login uses Authorization Code + PKCE, which DOES redirect back to the
# app's origin — register those here (Authentik also keys token-endpoint
# CORS off these origins). Trailing slash matters: the PWA's redirect_uri
# is its base dir, e.g. https://heph.ops.eblu.me/.
redirect_uris:
- matching_mode: strict
url: https://heph.ops.eblu.me/
- matching_mode: strict
url: http://localhost:8787/ # local dev (hephd --web-root)
signing_key: !Find [authentik_crypto.certificatekeypair, [name, authentik Self-signed Certificate]]
property_mappings:
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, openid]]
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, email]]
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, profile]]
# offline_access: heph CLI requests "openid offline_access"; without
# this mapping the refresh token is session-bound and hephd's
# refresh_token grant 400s once the session lapses (spoke sync dies).
- !Find [authentik_providers_oauth2.scopemapping, [scope_name, offline_access]]
sub_mode: hashed_user_id
include_claims_in_id_token: true
# Heph application — linked to the OAuth2 provider
- model: authentik_core.application
id: heph-app
identifiers:
slug: heph
attrs:
name: Hephaestus
slug: heph
provider: !KeyOf heph-provider
meta_launch_url: https://heph.ops.eblu.me
policy_engine_mode: any
# Policy binding — restrict heph to admins group (single-owner, sensitive data)
- model: authentik_policies.policybinding
identifiers:
order: 0
target: !KeyOf heph-app
group: !Find [authentik_core.group, [name, admins]]
attrs:
target: !KeyOf heph-app
group: !Find [authentik_core.group, [name, admins]]
order: 0
enabled: true
negate: false
timeout: 30

View file

@ -0,0 +1,16 @@
# Ringtail (amd64) overlay for external-secrets.
#
# Reuses the shared indri manifest as a base and only overrides the controller
# image to the nix-built amd64 variant (`-nix` tag). The base sets the arm64
# image (built via containers/external-secrets/container.py on indri's Dagger
# runner); ringtail's k3s is amd64 and needs the image built by
# containers/external-secrets/default.nix on the nix-container-builder.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../external-secrets
images:
- name: registry.ops.eblu.me/blumeops/external-secrets
newTag: v2.2.0-13895bb-nix

View file

@ -12,4 +12,5 @@ resources:
images: images:
- name: ghcr.io/external-secrets/external-secrets - name: ghcr.io/external-secrets/external-secrets
newTag: v2.2.0 newName: registry.ops.eblu.me/blumeops/external-secrets
newTag: v2.2.0-13895bb

View file

@ -10,4 +10,4 @@ resources:
images: images:
- name: nvcr.io/nvidia/k8s-device-plugin - name: nvcr.io/nvidia/k8s-device-plugin
newTag: v0.19.0 newTag: v0.19.2

View file

@ -1,54 +0,0 @@
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: prowler-iac-scan
namespace: prowler
spec:
schedule: "0 2 * * 6" # Saturday 2am
concurrencyPolicy: Forbid
jobTemplate:
spec:
ttlSecondsAfterFinished: 604800 # Auto-delete after 7 days
template:
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: prowler
image: registry.ops.eblu.me/blumeops/prowler:kustomized
command: ["/bin/sh", "-c"]
# Prowler's --mutelist-file is a no-op for the IaC provider
# (it delegates to Trivy). The Prowler image's trivy shim
# injects --ignorefile $TRIVY_IGNOREFILE when set; see
# containers/prowler/Dockerfile.
env:
- name: TRIVY_IGNOREFILE
value: /mutelist/trivyignore.yaml
args:
- |
DATEDIR=/reports/prowler-iac/$(date +%Y-%m-%d)
mkdir -p "$DATEDIR"
prowler iac \
--scan-repository-url https://forge.ops.eblu.me/eblume/blumeops.git \
-z \
--output-formats html csv json-ocsf \
--output-directory "$DATEDIR"
volumeMounts:
- name: reports
mountPath: /reports
- name: mutelist
mountPath: /mutelist
readOnly: true
restartPolicy: OnFailure
volumes:
- name: reports
persistentVolumeClaim:
claimName: prowler-reports
- name: mutelist
configMap:
name: prowler-mutelist
items:
- key: trivyignore.yaml
path: trivyignore.yaml

View file

@ -1,39 +0,0 @@
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: prowler-image-scan
namespace: prowler
spec:
schedule: "0 3 * * 6" # Saturday 3am
concurrencyPolicy: Forbid
jobTemplate:
spec:
ttlSecondsAfterFinished: 604800 # Auto-delete after 7 days
template:
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: prowler
image: registry.ops.eblu.me/blumeops/prowler:kustomized
command: ["/bin/sh", "-c"]
args:
- |
DATEDIR=/reports/prowler-images/$(date +%Y-%m-%d)
mkdir -p "$DATEDIR"
prowler image \
--registry https://registry.ops.eblu.me \
--image-filter "^blumeops/" \
-z \
--output-formats html csv json-ocsf \
--output-directory "$DATEDIR"
volumeMounts:
- name: reports
mountPath: /reports
restartPolicy: OnFailure
volumes:
- name: reports
persistentVolumeClaim:
claimName: prowler-reports

View file

@ -10,8 +10,6 @@ resources:
- pv-nfs.yaml - pv-nfs.yaml
- pvc.yaml - pvc.yaml
- cronjob.yaml - cronjob.yaml
- cronjob-image-scan.yaml
- cronjob-iac-scan.yaml
configMapGenerator: configMapGenerator:
- name: prowler-mutelist - name: prowler-mutelist
@ -23,7 +21,6 @@ configMapGenerator:
- mutelist/core-pod-security.yaml - mutelist/core-pod-security.yaml
- mutelist/manual-node-checks.yaml - mutelist/manual-node-checks.yaml
- mutelist/rbac.yaml - mutelist/rbac.yaml
- mutelist/trivyignore.yaml
images: images:
- name: registry.ops.eblu.me/blumeops/prowler - name: registry.ops.eblu.me/blumeops/prowler

View file

@ -1,37 +0,0 @@
# Trivy ignorefile for Prowler IaC scan.
#
# Prowler's `--mutelist-file` flag is a no-op for the IaC provider
# (iac_provider.py sets self._mutelist = None and delegates to Trivy).
# Trivy in turn does not auto-discover this YAML form from cwd, so the
# Prowler image ships a shim wrapper around `trivy` that injects
# --ignorefile $TRIVY_IGNOREFILE when the env var is set. The cronjob
# mounts this file and sets TRIVY_IGNOREFILE accordingly.
#
# Schema: https://trivy.dev/latest/docs/configuration/filtering/
# IDs use the hyphenated form Trivy displays (KSV-0041, not KSV0041).
misconfigurations:
- id: KSV-0041
paths:
- "argocd/manifests/external-secrets/rbac.yaml"
statement: >-
external-secrets-operator's entire function is to read and
synthesize Secret objects; ClusterRole over secrets is its
purpose. Both the controller and cert-controller are
upstream-defined.
- id: KSV-0041
paths:
- "argocd/manifests/kube-state-metrics/rbac.yaml"
- "argocd/manifests/kube-state-metrics-ringtail/rbac.yaml"
statement: >-
KSM exposes only Secret metadata (name, namespace, type, labels),
never the data field. list/watch on secrets is required for
kube_secret_info / kube_secret_labels metrics.
- id: KSV-0114
paths:
- "argocd/manifests/external-secrets/rbac.yaml"
statement: >-
cert-controller manages the external-secrets validating webhook
configurations to inject its own rotating CA bundle. RBAC is
scoped to two named webhooks (secretstore-validate,
externalsecret-validate) via resourceNames; KSV-0114 doesn't see
the resourceNames restriction so reports the full ClusterRole.

View file

@ -0,0 +1,51 @@
"""External Secrets Operator — native Dagger build.
Two-stage build: Go binary (all providers), Alpine runtime.
Source cloned from forge mirror.
A single binary serves as the controller, webhook, and cert-controller; the
Deployments select the role via a subcommand passed in `args:`, so the image
ENTRYPOINT must be the binary itself (matching upstream's distroless image).
"""
import dagger
from blumeops.containers import (
alpine_runtime,
clone_from_forge,
go_build,
oci_labels,
)
VERSION = "v2.2.0"
async def build(src: dagger.Directory) -> dagger.Container:
source = clone_from_forge("external-secrets", VERSION)
# Upstream `make build` compiles every secret provider into a single
# static binary (`-tags all_providers`, CGO disabled). Mirror that so the
# local image is functionally identical to ghcr.io/.../external-secrets.
backend = go_build(
source,
"/external-secrets",
tags="all_providers",
)
runtime = alpine_runtime(
extra_apk=["ca-certificates"],
create_user=False,
)
runtime = oci_labels(
runtime,
title="External Secrets Operator",
description=(
"Kubernetes operator that integrates external secret management systems"
),
version=VERSION,
)
return (
runtime.with_file("/bin/external-secrets", backend.file("/external-secrets"))
.with_user("65534")
.with_entrypoint(["/bin/external-secrets"])
)

View file

@ -0,0 +1,56 @@
# Nix-built External Secrets Operator (amd64, for ringtail k3s).
# Builds v2.2.0 from the forge mirror with all secret providers compiled in,
# faithful to upstream's `make build` (-tags all_providers). The container.py
# sibling builds the arm64 image for indri's minikube; this default.nix builds
# the amd64 image on ringtail's nix-container-builder.
{ pkgs ? import <nixpkgs> { } }:
let
version = "2.2.0";
src = pkgs.fetchgit {
url = "https://forge.ops.eblu.me/mirrors/external-secrets.git";
rev = "v${version}";
hash = "sha256-eAocOAp5s4CFRrpKfQr2lf3Ji+6nQQ1A5/eTw5B7v9U=";
};
# external-secrets v2.2.0 requires Go >= 1.26.1; nixpkgs default go is 1.25.x.
external-secrets = (pkgs.buildGoModule.override { go = pkgs.go_1_26; }) {
inherit src version;
pname = "external-secrets";
vendorHash = "sha256-0xuBK3fjAplPLAElHvKB6d+2lDz+De/s91fV4dPZwjE=";
doCheck = false;
subPackages = [ "." ];
tags = [ "all_providers" ];
ldflags = [ "-s" "-w" ];
meta = with pkgs.lib; {
description = "Kubernetes operator that integrates external secret management systems";
homepage = "https://github.com/external-secrets/external-secrets";
license = licenses.asl20;
mainProgram = "external-secrets";
};
};
in
pkgs.dockerTools.buildLayeredImage {
name = "blumeops/external-secrets";
contents = [
external-secrets
pkgs.cacert
pkgs.tzdata
];
config = {
Entrypoint = [ "${external-secrets}/bin/external-secrets" ];
Env = [
"SSL_CERT_FILE=${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt"
"TZDIR=${pkgs.tzdata}/share/zoneinfo"
];
User = "65534";
};
}

View file

@ -0,0 +1 @@
Rebuilt the locally-built external-secrets image from the `main` branch so the deployed tag (`v2.2.0-0e70a1b`) traces to a `main` commit rather than the now-merged feature branch, giving a stable provenance reference.

View file

@ -0,0 +1 @@
Rebuilt the external-secrets images off `main` and repointed both clusters to the stable main-sha tags (`v2.2.0-13895bb` arm64 / `v2.2.0-13895bb-nix` amd64), so the deployed images on indri and ringtail trace to the same `main` commit rather than earlier feature-branch builds.

View file

@ -0,0 +1 @@
Bumped the indri heph hub to v1.2.1, which adds the hub `GET /config` endpoint and ships the heph-pwa **Login with Authentik** flow (Authorization Code + PKCE). Pairs with the Authentik `heph` provider redirect URIs registered earlier.

View file

@ -0,0 +1 @@
Upgraded Jellyfin on indri from 10.11.6 to 10.11.11, picking up the security fixes in 10.11.7 (disclosed CVEs/GHSAs, flagged "upgrade immediately") and 10.11.10 (three further GHSAs). Noted the recurring gotcha in the service-versions tracking: after a `brew upgrade --cask jellyfin`, the re-quarantined `.app` makes the launchd-spawned process hang silently until the Gatekeeper first-launch dialog is approved on indri's GUI console — removing the quarantine xattr over SSH is blocked by macOS TCC.

View file

@ -0,0 +1 @@
Updated ringtail NixOS flake inputs (nixpkgs `nixos-25.11`, disko) to latest via `dagger call flake-update`.

View file

@ -0,0 +1 @@
Reviewed the tailscale-operator reference card: documented the dual indri/ringtail deployment, corrected the ArgoCD apps list, pinned the upstream version, and added the ProxyGroup Ingress `host:` caveat.

View file

@ -0,0 +1 @@
Completed the external-secrets localization for the ringtail (amd64) cluster. The indri Dagger build (`container.py`) only produces an arm64 image; added `containers/external-secrets/default.nix` to build the amd64 variant on ringtail's nix-container-builder, and gave `external-secrets-ringtail` a thin kustomize overlay that reuses the shared manifest and points at the `-nix` image. Both clusters now run the locally-built external-secrets binary on their native architecture.

View file

@ -0,0 +1 @@
Added the [[hephaestus]] (`heph`) sync hub to indri as a self-updating LaunchAgent managed by Ansible (`ansible/roles/heph`, tag `heph`). The hub runs `hephd --mode server` behind `heph.ops.eblu.me` (Caddy TLS), with self-update on a 10-minute interval and the heph-pwa mobile shell served from `--web-root`. Access is gated by a new Authentik device-code (RFC 8628) OIDC application. Indri is now the canonical hub; other devices (e.g. gilbert) attach as offline-capable spokes. The hub's store was seeded from gilbert via the data-safe Path A bring-up (copy store, reset `meta.origin`).

View file

@ -0,0 +1 @@
Granted the `offline_access` scope on the Authentik `heph` OAuth2 provider so hephaestus spokes receive a durable 30-day refresh token. Previously the refresh token was session-bound, so spoke sync would silently fail with a `400 Bad Request` on the `refresh_token` grant once the Authentik session lapsed.

View file

@ -0,0 +1 @@
Registered the heph-pwa redirect URIs (`https://heph.ops.eblu.me/`, plus `http://localhost:8787/` for dev) on the Authentik `heph` OAuth2 provider, enabling the PWA's new Authorization Code + PKCE "Login with Authentik" flow (and the token-endpoint CORS it needs). Pairs with hephaestus PR #9.

View file

@ -0,0 +1 @@
Localized the external-secrets controller image. It now builds from the forge mirror via a native Dagger `container.py` (single `all_providers` static Go binary, faithful to upstream's `make build`) and is served from `registry.ops.eblu.me/blumeops/external-secrets` instead of `ghcr.io`, bringing another platform component under local supply-chain control.

View file

@ -0,0 +1 @@
Retired the Prowler container-image CVE scan and IaC scan, keeping only the K8s CIS benchmark scan. The two retired scans generated tens of thousands of un-actioned, un-muted findings every week (~20,000 image findings and growing, mostly unpatchable upstream-image CVEs; ~650 systemic Trivy KSV pod-security warnings) — the weekly `mise run review-compliance-reports` re-surfaced them all as "action needed" though none were ever triaged. The K8s CIS scan is fully mutelisted and runs clean, so it stays. Removed the two CronJobs, the now-unused `trivyignore.yaml` mutelist, and the grouped-findings rendering in the review tool that existed solely for the high-volume scans.

View file

@ -0,0 +1 @@
Reviewed four never-reviewed reference cards (`cluster`, `ntfy`, `tempo`, `alloy`) and corrected drift: minikube is now Kubernetes v1.35.0; ntfy, tempo, and alloy-k8s images are now locally-built `registry.ops.eblu.me/blumeops/*` nix containers (v2.19.2, v2.10.3, v1.16.0) rather than upstream Docker Hub; the Fly.io alloy binary is v1.16.1; and the ringtail workload list reflects the in-progress minikube→k3s migration.

View file

@ -0,0 +1 @@
Upgraded the nvidia-device-plugin on ringtail from v0.19.0 to v0.19.2 (upstream patch release: CDI/Tegra fixes and dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup).

View file

@ -1,6 +1,6 @@
--- ---
title: Deploy Prowler CIS Scanner title: Deploy Prowler CIS Scanner
modified: 2026-03-24 modified: 2026-06-08
last-reviewed: 2026-03-24 last-reviewed: 2026-03-24
tags: tags:
- how-to - how-to
@ -11,7 +11,20 @@ tags:
# Deploy Prowler CIS Scanner # Deploy Prowler CIS Scanner
Prowler runs weekly CIS Kubernetes Benchmark scans against minikube-indri and writes HTML/CSV/JSON reports to the NFS share on sifaka. Prowler runs a weekly CIS Kubernetes Benchmark scan against minikube-indri and writes HTML/CSV/JSON reports to the NFS share on sifaka.
## Why only the K8s CIS scan
Prowler originally ran three CronJobs: K8s CIS, container-image CVE scanning, and IaC scanning. The image and IaC scans were **retired in 2026-06**.
Both were pure toil with no realized value:
- **Image scan** produced ~20,000 unmuted findings per run and growing, none ever triaged or muted. They were overwhelmingly CVEs in *upstream* base images we don't control and can't patch, and the job re-scanned every historical tag still in the registry, multiplying the count.
- **IaC scan** produced ~650 Trivy KSV findings (`runAsNonRoot`, `readOnlyRootFilesystem`, drop-capabilities, …) against our own manifests — real but systemic, homelab-acceptable, and likewise never muted, so the weekly review re-surfaced all of them indefinitely.
The K8s CIS scan, by contrast, is fully mutelisted and runs clean (0 unmuted findings week over week), so it stays. The guiding principle matches [[ai-scraper-mitigation]]: don't keep generating a firehose of output that has no audience. If image-CVE signal is wanted later, the right shape is critical-severity-only, currently-deployed-tags-only, alert-on-new — a rebuild, not a revival (tracked as the "Trivy for image/IaC scanning" task).
Note that the K8s CIS scan itself is tied to minikube-indri, which is slated for retirement; on k3s only ~22 of 70 checks produce results (no static pods). Re-pointing a lean posture check at ringtail is tracked separately ("prowler scan against ringtail").
## What it checks ## What it checks
@ -33,38 +46,6 @@ Prowler's Kubernetes provider runs ~70 checks from the CIS Kubernetes Benchmark
**k3s note:** k3s embeds the control plane in a single binary — no static pods exist. Only core + RBAC checks (~22 of 70) produce results. Consider `kube-bench` for k3s control plane checks. **k3s note:** k3s embeds the control plane in a single binary — no static pods exist. Only core + RBAC checks (~22 of 70) produce results. Consider `kube-bench` for k3s control plane checks.
### Image vulnerability scanning (Saturday 3am)
Prowler's image provider scans all `blumeops/*` container images in `registry.ops.eblu.me` for:
- **CVEs** — known vulnerabilities from NVD, Alpine SecDB, Debian Security Tracker, and other sources
- **Embedded secrets** — credentials or API keys baked into image layers
- **Misconfigurations** — Dockerfile best practices (running as root, missing HEALTHCHECK, etc.)
Uses Trivy under the hood. Reports are written to `sifaka:/volume1/reports/prowler-images/`.
To run an ad-hoc image scan:
```fish
kubectl create job --from=cronjob/prowler-image-scan prowler-image-manual -n prowler --context=minikube-indri
```
### IaC scanning (Saturday 2am)
Prowler's IaC provider scans the blumeops repository (cloned at scan time) for misconfigurations in:
- **Dockerfiles** — running as root, using `latest` tags, missing `HEALTHCHECK`
- **Kubernetes manifests** — missing resource limits, privileged containers, insecure settings
- **Other IaC files** — Terraform, CloudFormation, etc. if present
Uses Trivy under the hood. Reports are written to `sifaka:/volume1/reports/prowler-iac/`.
To run an ad-hoc IaC scan:
```fish
kubectl create job --from=cronjob/prowler-iac-scan prowler-iac-manual -n prowler --context=minikube-indri
```
## Reports ## Reports
Reports are written to `sifaka:/volume1/reports/prowler/` with timestamped filenames. See [[read-compliance-reports]] for how to access and interpret them. Reports are written to `sifaka:/volume1/reports/prowler/` with timestamped filenames. See [[read-compliance-reports]] for how to access and interpret them.

View file

@ -1,6 +1,6 @@
--- ---
title: Read Compliance Reports title: Read Compliance Reports
modified: 2026-04-06 modified: 2026-06-08
last-reviewed: 2026-04-06 last-reviewed: 2026-04-06
tags: tags:
- how-to - how-to
@ -27,8 +27,13 @@ Reports are stored on sifaka at `/volume1/reports/`. Each scanner writes to its
| Scanner | Path | Schedule | | Scanner | Path | Schedule |
|---------|------|----------| |---------|------|----------|
| [[prowler]] K8s CIS | `sifaka:/volume1/reports/prowler/` | Weekly (Sunday 3am) | | [[prowler]] K8s CIS | `sifaka:/volume1/reports/prowler/` | Weekly (Sunday 3am) |
| [[prowler]] Image | `sifaka:/volume1/reports/prowler-images/` | Weekly (Saturday 3am) |
| [[prowler]] IaC | `sifaka:/volume1/reports/prowler-iac/` | Weekly (Saturday 2am) | > **Retired (2026-06):** the Prowler **image** (`prowler-images/`) and **IaC**
> (`prowler-iac/`) scans were retired. They produced tens of thousands of
> un-actioned, un-muted findings every week — mostly unpatchable upstream-image
> CVEs and systemic pod-security KSV warnings — and nobody triaged them. See
> [[deploy-prowler#Why only the K8s CIS scan]] for the rationale. Their stale
> report directories may linger on sifaka until manually removed.
Copy reports to your local machine (remember `scp -O` for sifaka): Copy reports to your local machine (remember `scp -O` for sifaka):

View file

@ -33,6 +33,7 @@ Primary BlumeOps server. Mac Mini M1 (2020).
- [[alloy|Alloy]] - Metrics/logs collector - [[alloy|Alloy]] - Metrics/logs collector
- [[caddy]] - Reverse proxy for `*.ops.eblu.me` - [[caddy]] - Reverse proxy for `*.ops.eblu.me`
- [[devpi]] - PyPI mirror (LaunchAgent) - [[devpi]] - PyPI mirror (LaunchAgent)
- [[hephaestus]] - heph task/context sync hub (LaunchAgent, self-updating)
- [[cv]] - Static CV site, served by Caddy - [[cv]] - Static CV site, served by Caddy
- [[docs]] - Quartz-built docs site, served by Caddy - [[docs]] - Quartz-built docs site, served by Caddy

View file

@ -1,6 +1,7 @@
--- ---
title: Cluster title: Cluster
modified: 2026-02-19 modified: 2026-06-04
last-reviewed: 2026-06-04
tags: tags:
- kubernetes - kubernetes
--- ---
@ -15,7 +16,7 @@ BlumeOps runs two Kubernetes clusters: a Minikube cluster on [[indri]] (most ser
|----------|-------| |----------|-------|
| **Driver** | docker | | **Driver** | docker |
| **Container Runtime** | docker | | **Container Runtime** | docker |
| **Kubernetes Version** | v1.34.0 | | **Kubernetes Version** | v1.35.0 |
| **CPUs** | 6 | | **CPUs** | 6 |
| **Memory** | 11GB | | **Memory** | 11GB |
| **Disk** | 200GB | | **Disk** | 200GB |
@ -41,7 +42,9 @@ Single-node k3s cluster for workloads requiring amd64 or GPU access. See [[ringt
|----------|-------| |----------|-------|
| **Context** | `k3s-ringtail` | | **Context** | `k3s-ringtail` |
| **API Server** | `https://ringtail.tail8d86e.ts.net:6443` | | **API Server** | `https://ringtail.tail8d86e.ts.net:6443` |
| **Workloads** | Frigate (GPU), ntfy, frigate-notify, nvidia-device-plugin | | **Workloads** | GPU workloads (Frigate, Ollama), notifications (ntfy, frigate-notify), [[authentik]], and services migrated off indri minikube (Immich, Mealie, Paperless, TeslaMate). See [[ringtail]] for the authoritative list. |
Services are being progressively migrated from indri's minikube to ringtail's k3s; the split above reflects an in-progress state, not a fixed boundary.
## Related ## Related

View file

@ -1,6 +1,7 @@
--- ---
title: Tailscale Operator title: Tailscale Operator
modified: 2026-02-08 modified: 2026-06-08
last-reviewed: 2026-06-08
tags: tags:
- kubernetes - kubernetes
- tailscale - tailscale
@ -15,8 +16,16 @@ The Tailscale operator enables Kubernetes services to be exposed directly on the
| Property | Value | | Property | Value |
|----------|-------| |----------|-------|
| **Namespace** | `tailscale` | | **Namespace** | `tailscale` |
| **Upstream** | `mirrors/tailscale` on forge (static manifest) | | **Upstream** | `mirrors/tailscale` on forge (static manifest, pinned `v1.94.2`) |
| **ArgoCD Apps** | `tailscale-operator-base` (upstream), `tailscale-operator` (config) | | **ArgoCD Apps** | `tailscale-operator` (indri/minikube), `tailscale-operator-ringtail` (ringtail/k3s) |
The operator runs on **both** clusters — indri's minikube and ringtail's k3s.
Both apps layer on the shared `tailscale-operator-base` kustomize directory
(operator manifest, `ProxyClass`, `dnsconfig`); each cluster supplies its own
`ProxyGroup` (indri: 2 replicas, ringtail: 1) and OAuth `ExternalSecret`. The
ringtail overlay additionally rewrites the proxy image to a locally nix-built
mirror. See [[ringtail]] and [[migrate-wave1-ringtail]] for the ongoing
migration of k8s workloads onto ringtail.
## How It Works ## How It Works
@ -27,7 +36,13 @@ Ingresses use a shared ProxyGroup (`ingress`) rather than per-service Tailscale
3. Service becomes accessible at `<hostname>.tail8d86e.ts.net` 3. Service becomes accessible at `<hostname>.tail8d86e.ts.net`
4. TLS is handled automatically via Tailscale 4. TLS is handled automatically via Tailscale
Tailnet clients must have `--accept-routes` enabled to route to VIP addresses. Two requirements for VIP routing to work:
1. Tailnet clients must have `--accept-routes` enabled to route to VIP addresses.
2. Ingress rules must **not** set an explicit `host:` field. The ProxyGroup
proxy receives the FQDN as the `Host` header (e.g.
`prometheus.tail8d86e.ts.net`), which won't match a short name. Use
`host: "*"` or omit `host:` entirely.
Services can be individually tagged (e.g., `tag:flyio-target`) via Ingress annotations to control which ACL grants apply. See [[expose-service-publicly]] for the tagging workflow. Services can be individually tagged (e.g., `tag:flyio-target`) via Ingress annotations to control which ACL grants apply. See [[expose-service-publicly]] for the tagging workflow.

View file

@ -1,6 +1,6 @@
--- ---
title: Security & Compliance title: Security & Compliance
modified: 2026-03-24 modified: 2026-06-08
last-reviewed: 2026-03-24 last-reviewed: 2026-03-24
tags: tags:
- operations - operations
@ -21,7 +21,7 @@ Security posture and compliance scanning for BlumeOps infrastructure.
## Scanning tools ## Scanning tools
- [[prowler]] — CIS Kubernetes Benchmark scanner (weekly CronJob) - [[prowler]] — CIS Kubernetes Benchmark scanner (weekly CronJob). The container-image CVE scan and IaC scan were retired in 2026-06 (un-actioned noise — see [[deploy-prowler#Why only the K8s CIS scan]]); only the K8s CIS scan remains.
- [[deploy-prowler]] — deployment and ad-hoc scan how-to - [[deploy-prowler]] — deployment and ad-hoc scan how-to
- [[read-compliance-reports]] — accessing and interpreting reports - [[read-compliance-reports]] — accessing and interpreting reports
- [[kingfisher]] — Secret detection and live validation for Forgejo repos (weekly CronJob + prek hook) - [[kingfisher]] — Secret detection and live validation for Forgejo repos (weekly CronJob + prek hook)
@ -52,5 +52,5 @@ Suppressed findings are kept in Prowler mutelist YAML under `argocd/manifests/pr
- No SOC 2 compliance mapping for Kubernetes (Prowler only maps SOC 2 for AWS/Azure/GCP) - No SOC 2 compliance mapping for Kubernetes (Prowler only maps SOC 2 for AWS/Azure/GCP)
- k3s control plane checks produce no results (embedded binary, no static pods) — consider kube-bench - k3s control plane checks produce no results (embedded binary, no static pods) — consider kube-bench
- Container image scanning covers `blumeops/*` images only — upstream images (ollama, immich, etc.) are not scanned - No container-image CVE scanning (the Prowler image scan was retired 2026-06 as un-actioned noise). If reintroduced, scope it to critical-severity, currently-deployed tags, alert-on-new
- IaC scanning covers the blumeops repo only — no scanning of third-party Helm charts or vendored manifests - No automated IaC misconfiguration scanning (the Prowler IaC scan was retired 2026-06). Manifest pod-security hardening is now an accept-and-document decision rather than a weekly report

View file

@ -1,6 +1,7 @@
--- ---
title: Alloy title: Alloy
modified: 2026-03-13 modified: 2026-06-04
last-reviewed: 2026-06-04
tags: tags:
- service - service
- observability - observability
@ -20,10 +21,10 @@ Unified observability collector for metrics and logs with three deployments:
| **Indri Binary** | `~/.local/bin/alloy` | | **Indri Binary** | `~/.local/bin/alloy` |
| **Indri Config** | `~/.config/grafana-alloy/config.alloy` | | **Indri Config** | `~/.config/grafana-alloy/config.alloy` |
| **K8s Namespace** | `alloy` | | **K8s Namespace** | `alloy` |
| **K8s Image** | `grafana/alloy:v1.14.0` | | **K8s Image** | `registry.ops.eblu.me/blumeops/alloy:v1.16.0-9564435` (locally built) |
| **ArgoCD App** | `alloy-k8s` | | **ArgoCD App** | `alloy-k8s` |
| **Fly.io Config** | `fly/alloy.river` | | **Fly.io Config** | `fly/alloy.river` |
| **Fly.io Image** | `grafana/alloy:v1.5.1` (binary copied into nginx container) | | **Fly.io Image** | `grafana/alloy:v1.16.1` (binary copied into nginx container, sha-pinned) |
## Metrics Collected ## Metrics Collected

View file

@ -0,0 +1,141 @@
---
title: Hephaestus
modified: 2026-06-04
last-reviewed: 2026-06-04
tags:
- service
- hephaestus
---
# Hephaestus
[hephaestus](https://github.com/eblume/hephaestus) (`heph`) is the user's
self-hosted task + context/knowledge system. It is **hub-and-spoke**: each device
runs a full local SQLite replica (`hephd --mode local`) and background-syncs
against one canonical **hub**. Indri runs that hub.
## Quick Reference
| Property | Value |
|----------|-------|
| **PWA URL** | https://heph.ops.eblu.me (browser PWA, Caddy TLS) |
| **Spoke sync URL** | http://indri.tail8d86e.ts.net:8787 (direct, tailnet) |
| **Local Port** | 8787 (`hephd --mode server`, bound `0.0.0.0`) |
| **Binary** | `~/.cargo/bin/hephd` (self-updating) |
| **Data** | `~/.local/share/heph/heph.db` |
| **PWA shell** | `~/.local/share/heph/web` |
| **Logs** | `~/Library/Logs/mcquack.heph.{out,err}.log` |
| **LaunchAgent** | `mcquack.eblume.heph` |
| **Ansible role** | `ansible/roles/heph` (tag `heph`) |
## What runs on indri
The launchagent runs the hub in server mode with three features enabled:
```
hephd --mode server --http-addr 0.0.0.0:8787 --db ~/.local/share/heph/heph.db
--web-root ~/.local/share/heph/web
--oidc-issuer https://authentik.ops.eblu.me/application/o/heph/
--oidc-audience heph
--self-update --self-update-interval-secs 600
```
- **Server mode** exposes the HTTP sync endpoint (`/rpc`, `/sync/*`) that spokes
reconcile their op-log against.
- **Self-update** (10-minute poll) rebuilds `hephd` from the forge when a newer
release tag appears (`cargo install --git https://forge.eblu.me/eblume/hephaestus.git`).
Indri's Rust toolchain (`~/.cargo/bin`) is on the agent's `PATH` for this, and
the plist pins `RUSTUP_TOOLCHAIN=stable` — the
launchagent runs without mise, so a bare `cargo` shim would otherwise fall back
to rustup's *default* toolchain, which can lag behind heph's `rust-version` floor
(1.89) and silently fail the build.
- **PWA** (`--web-root`) serves the [heph-pwa] mobile shell; Caddy terminates TLS
at `heph.ops.eblu.me` so the PWA runs in a secure context (service worker,
install-to-home-screen, voice capture).
[heph-pwa]: https://github.com/eblume/hephaestus
The hub binds `0.0.0.0` so tailnet spokes can also sync directly
(`http://indri.tail8d86e.ts.net:8787`); access is gated by Authentik OIDC either
way — tailnet reachability alone is not enough.
## Authentication (Authentik OIDC, device-code)
The hub verifies an OIDC bearer token on every sync. The `heph` application is a
**public** OAuth2 client using the **device-code flow** (RFC 8628), provisioned
in the [[authentik]] blueprint (`argocd/manifests/authentik/configmap-blueprint.yaml`):
- Issuer: `https://authentik.ops.eblu.me/application/o/heph/`
- Audience / client id: `heph`
- Restricted to the `admins` group (single-owner, sensitive data).
- Scope mappings: `openid`, `email`, `profile`, **`offline_access`**.
> **`offline_access` is required for durable sync.** The `heph` CLI requests
> `scope = "openid offline_access"`, and a refresh token is only issued for the
> 30-day refresh-token window when the provider actually grants `offline_access`.
> Without that scope mapping the refresh token is bound to the login **session**;
> once the session lapses, hephd's `refresh_token` grant returns `400 Bad
> Request`, the bearer can't be refreshed, and spoke sync silently degrades
> (`heph sync --status``auth_failure: true`). `heph auth login` papers over it
> until the next session expiry. Keep `offline_access` in the provider's
> `property_mappings`.
Because no Authentik instance ships a device-code flow by default, the blueprint
also creates `default-device-code-flow` and binds it to the default brand's
`flow_device_code`. Devices obtain a token with `heph auth login`; the PWA
currently takes a pasted token (in-app device-code login is upstream follow-up).
## Data seeding (Path A, one-time)
The hub was seeded from the existing `gilbert` device so no task history was
lost. heph's data-safe bring-up ("Path A") has the hub **adopt the device's
identity** rather than rewriting the device:
1. Quiesce the seed device: `heph daemon stop` (on gilbert).
2. Copy its store to indri: `scp ~/.local/share/heph/heph.db indri:~/.local/share/heph/heph.db`.
3. Give the hub its **own device origin** (keeps gilbert's `owner_id` + data;
`hephd` regenerates a fresh `origin` on next start when it is missing):
```fish
ssh indri "sqlite3 ~/.local/share/heph/heph.db \"DELETE FROM meta WHERE key='origin';\""
```
4. `mise run provision-indri -- --tags heph` (installs hephd, stages the PWA,
loads the launchagent → hub starts on the seeded store).
Only `meta.origin` changes; `owner_id`, nodes, op-log, and links are copied
untouched. A clean `hephd --owner-id` / seed command is tracked upstream as
hephaestus follow-up — until then this manual reset is the documented path.
## Connecting a spoke (e.g. gilbert)
A device joins by running its local daemon with the hub URL + OIDC client and
logging in once:
```bash
hephd --mode local --hub-url http://indri.tail8d86e.ts.net:8787 \
--oidc-issuer https://authentik.ops.eblu.me/application/o/heph/ \
--oidc-client-id heph
heph auth login --hub-url http://indri.tail8d86e.ts.net:8787 \
--issuer https://authentik.ops.eblu.me/application/o/heph/ --client-id heph
```
> **Use the direct `http://…:8787` tailnet URL for sync, not the Caddy HTTPS
> URL.** hephd's sync client is plain-HTTP-only; pointing `--hub-url` at
> `https://heph.ops.eblu.me` fails with a confusing `error sending request`
> (the HTTP connector rejects the `https` scheme before connecting). Tailscale
> encrypts the transport, and the OIDC bearer token still gates every request.
> `heph.ops.eblu.me` (Caddy TLS) exists only for the browser PWA, which needs a
> secure context. The cached token is keyed by the exact `--hub-url`, so use the
> same value for `hephd` and `heph auth login`.
> **Caveat:** `heph daemon` cannot yet bake hub/spoke flags into the generated
> launchd plist (upstream gap). On a spoke whose plist is managed by `heph
> daemon`, the hub/OIDC flags must be hand-added — and a later `heph daemon
> start/restart` will regenerate the plist and drop them. Avoid `heph daemon`
> subcommands on a configured spoke until that gap is closed; reload via
> `launchctl` instead.
## Related
- [[indri]] — host
- [[authentik]] — OIDC provider
- [[caddy]] — TLS termination for `heph.ops.eblu.me`

View file

@ -1,7 +1,7 @@
--- ---
title: Jellyfin title: Jellyfin
modified: 2026-02-07 modified: 2026-06-08
last-reviewed: 2026-03-23 last-reviewed: 2026-06-08
tags: tags:
- service - service
- media - media
@ -41,6 +41,24 @@ Dashboard > Playback:
2. Allow hardware encoding: Enabled 2. Allow hardware encoding: Enabled
3. VPP Tone mapping: Enabled 3. VPP Tone mapping: Enabled
## Upgrades
Installed via Homebrew cask (`state: present`, unpinned), so the Ansible role
won't bump an already-installed cask. To upgrade, run on indri:
```bash
brew upgrade --cask jellyfin
```
**Gatekeeper gotcha:** a cask upgrade replaces `/Applications/Jellyfin.app` and
re-applies the `com.apple.quarantine` xattr. When launchd respawns the service,
the new binary hangs silently — process alive but ~0 CPU, no logs, no listening
socket — because Gatekeeper is holding the first launch pending approval.
Removing the xattr over SSH fails (`xattr -dr com.apple.quarantine ...`
"Operation not permitted", blocked by macOS TCC). Approve the first-launch
dialog on indri's GUI console (or run the `xattr` removal from a local Terminal
with Full Disk Access), then reload the LaunchAgent.
## Observability ## Observability
- Metrics: `jellyfin_metrics` ansible role - Metrics: `jellyfin_metrics` ansible role

View file

@ -1,6 +1,7 @@
--- ---
title: Ntfy title: Ntfy
modified: 2026-02-17 modified: 2026-06-04
last-reviewed: 2026-06-04
tags: tags:
- service - service
- notifications - notifications
@ -17,7 +18,7 @@ Self-hosted push notification service. Ntfy receives HTTP POST messages and deli
| **URL** | https://ntfy.ops.eblu.me | | **URL** | https://ntfy.ops.eblu.me |
| **Tailscale URL** | https://ntfy.tail8d86e.ts.net | | **Tailscale URL** | https://ntfy.tail8d86e.ts.net |
| **Namespace** | `ntfy` | | **Namespace** | `ntfy` |
| **Image** | `binwiederhier/ntfy:v2.17.0` | | **Image** | `registry.ops.eblu.me/blumeops/ntfy:v2.19.2-fd0bebb-nix` (locally built) |
| **Upstream** | https://github.com/binwiederhier/ntfy | | **Upstream** | https://github.com/binwiederhier/ntfy |
| **Manifests** | `argocd/manifests/ntfy/` | | **Manifests** | `argocd/manifests/ntfy/` |

View file

@ -1,6 +1,6 @@
--- ---
title: Prowler title: Prowler
modified: 2026-03-24 modified: 2026-06-08
last-reviewed: 2026-03-24 last-reviewed: 2026-03-24
tags: tags:
- service - service
@ -17,20 +17,20 @@ CIS Kubernetes Benchmark scanner for compliance posture reporting.
|----------|-------| |----------|-------|
| **Namespace** | `prowler` | | **Namespace** | `prowler` |
| **Image** | `registry.ops.eblu.me/blumeops/prowler` (see `argocd/manifests/prowler/kustomization.yaml` for current tag) | | **Image** | `registry.ops.eblu.me/blumeops/prowler` (see `argocd/manifests/prowler/kustomization.yaml` for current tag) |
| **Schedule** | K8s CIS: Sunday 3am / Image: Saturday 3am / IaC: Saturday 2am | | **Schedule** | K8s CIS: Sunday 3am |
| **Reports** | `sifaka:/volume1/reports/prowler/`, `prowler-images/`, `prowler-iac/` (NFS) | | **Reports** | `sifaka:/volume1/reports/prowler/` (NFS) |
| **Manifests** | `argocd/manifests/prowler/` | | **Manifests** | `argocd/manifests/prowler/` |
## What it does ## What it does
Runs Prowler 5 as two CronJobs: Runs Prowler 5 as a single CronJob:
- **K8s CIS scan** (Sunday) — CIS Kubernetes Benchmark v1.11 checks across pod security, RBAC, apiserver, etcd, kubelet, controller-manager, and scheduler - **K8s CIS scan** (Sunday) — CIS Kubernetes Benchmark v1.11 checks across pod security, RBAC, apiserver, etcd, kubelet, controller-manager, and scheduler
- **Image scan** (Saturday) — CVE, secret, and misconfiguration scanning of all `blumeops/*` container images in the registry via Trivy
- **IaC scan** (Saturday) — static analysis of Dockerfiles, K8s manifests, and other IaC files in the repo via Trivy
Reports are written in HTML, CSV, and JSON-OCSF to the NFS share on sifaka. Reports are written in HTML, CSV, and JSON-OCSF to the NFS share on sifaka.
The **image** and **IaC** scans (formerly Saturday CronJobs) were retired in 2026-06 — they generated tens of thousands of un-actioned findings weekly. See [[deploy-prowler#Why only the K8s CIS scan]].
## See also ## See also
- [[security]] — security & compliance posture overview - [[security]] — security & compliance posture overview

View file

@ -1,6 +1,7 @@
--- ---
title: Tempo title: Tempo
modified: 2026-03-05 modified: 2026-06-04
last-reviewed: 2026-06-04
tags: tags:
- service - service
- observability - observability
@ -18,7 +19,7 @@ Distributed tracing backend for BlumeOps infrastructure. Receives traces via OTL
| **Tailscale URL** | https://tempo.tail8d86e.ts.net | | **Tailscale URL** | https://tempo.tail8d86e.ts.net |
| **OTLP Endpoint** | https://tempo-otlp.tail8d86e.ts.net | | **OTLP Endpoint** | https://tempo-otlp.tail8d86e.ts.net |
| **Namespace** | `monitoring` | | **Namespace** | `monitoring` |
| **Image** | `grafana/tempo:2.10.1` | | **Image** | `registry.ops.eblu.me/blumeops/tempo:v2.10.3-75f9ba4` (locally built) |
| **Storage** | 10Gi PVC (local filesystem) | | **Storage** | 10Gi PVC (local filesystem) |
| **Retention** | 7 days | | **Retention** | 7 days |

View file

@ -10,19 +10,19 @@
Covers: Covers:
- Prowler K8s CIS (in-cluster): per-finding detail - Prowler K8s CIS (in-cluster): per-finding detail
- Prowler container image scans: grouped by check + resource
- Prowler IaC manifest scans: grouped by check + resource
- Kingfisher secret scanning: TODO — pending upstream JSON/CSV output - Kingfisher secret scanning: TODO — pending upstream JSON/CSV output
support (currently HTML-only; contribute from spork) support (currently HTML-only; contribute from spork)
For each Prowler scan, copies the two most recent CSV reports, parses The Prowler container-image CVE scan and IaC scan were retired in 2026-06
(see docs/how-to/operations/deploy-prowler.md) — they produced tens of
thousands of un-actioned findings weekly. Only the K8s CIS scan remains.
For the Prowler scan, copies the two most recent CSV reports, parses
them, and displays: them, and displays:
1. Overall status (pass/fail/manual/muted counts) 1. Overall status (pass/fail/manual/muted counts)
2. Unmuted failures by severity 2. Unmuted failures by severity
3. Delta from the previous report (new vs resolved) 3. Delta from the previous report (new vs resolved)
4. Actionable unmuted failures (per-finding for in-cluster; grouped 4. Actionable unmuted failures (per-finding detail)
by check ID and resource for image/IaC because they have far too
many findings to list individually)
This is the primary tool for the weekly compliance report review. This is the primary tool for the weekly compliance report review.
""" """
@ -39,11 +39,9 @@ from rich.console import Console
from rich.panel import Panel from rich.panel import Panel
from rich.table import Table from rich.table import Table
PROWLER_SCANS: list[tuple[str, str, bool]] = [ PROWLER_SCANS: list[tuple[str, str]] = [
# (label, sifaka base path, group_findings) # (label, sifaka base path)
("K8s CIS (In-Cluster)", "/volume1/reports/prowler", False), ("K8s CIS (In-Cluster)", "/volume1/reports/prowler"),
("Container Images", "/volume1/reports/prowler-images", True),
("IaC (manifests)", "/volume1/reports/prowler-iac", True),
] ]
console = Console() console = Console()
@ -334,14 +332,8 @@ def summarize_report(
tmpdir: str, tmpdir: str,
*, *,
show_muted: bool = False, show_muted: bool = False,
group_findings: bool = False,
) -> None: ) -> None:
"""Fetch and summarize the latest Prowler report under `base`. """Fetch and summarize the latest Prowler report under `base`."""
When `group_findings` is True, top-N CHECK_ID and RESOURCE_NAME tables
are shown instead of a per-finding detail table — appropriate for
image and IaC scans that produce thousands of findings.
"""
console.rule(f"[bold]{label}[/bold]") console.rule(f"[bold]{label}[/bold]")
csvs = list_reports(base) csvs = list_reports(base)
if not csvs: if not csvs:
@ -458,36 +450,29 @@ def summarize_report(
) )
console.print() console.print()
# For grouped scans the new/resolved listings are too noisy if new_keys:
# (potentially thousands of lines). Skip the listings; the count console.print("[bold red]New Unmuted Failures:[/bold red]")
# is in the panel above and detail is in the grouped tables. for k in sorted(new_keys):
if not group_findings: r = curr_keys[k]
if new_keys: console.print(
console.print("[bold red]New Unmuted Failures:[/bold red]") f" [{r['SEVERITY']}] {r['CHECK_ID']}: "
for k in sorted(new_keys): f"{r['STATUS_EXTENDED'][:120]}"
r = curr_keys[k] )
console.print( console.print()
f" [{r['SEVERITY']}] {r['CHECK_ID']}: "
f"{r['STATUS_EXTENDED'][:120]}"
)
console.print()
if resolved_keys: if resolved_keys:
console.print("[bold green]Resolved:[/bold green]") console.print("[bold green]Resolved:[/bold green]")
for k in sorted(resolved_keys): for k in sorted(resolved_keys):
r = prev_keys[k] r = prev_keys[k]
console.print( console.print(
f" [dim][{r['SEVERITY']}] {r['CHECK_ID']}: " f" [dim][{r['SEVERITY']}] {r['CHECK_ID']}: "
f"{r['STATUS_EXTENDED'][:120]}[/dim]" f"{r['STATUS_EXTENDED'][:120]}[/dim]"
) )
console.print() console.print()
# --- Unmuted failure details (grouped or per-finding) --- # --- Unmuted failure details ---
if latest["unmuted"]: if latest["unmuted"]:
if group_findings: _print_findings_detail(latest["unmuted"])
_print_grouped_findings(latest["unmuted"])
else:
_print_findings_detail(latest["unmuted"])
# --- Muted findings summary --- # --- Muted findings summary ---
if show_muted and latest["muted"]: if show_muted and latest["muted"]:
@ -566,75 +551,6 @@ def _print_findings_detail(unmuted: list[dict]) -> None:
console.print() console.print()
def _worst_severity(rows: list[dict]) -> str:
"""Return the most severe severity label across `rows`."""
if not rows:
return ""
return min(
(r["SEVERITY"] for r in rows),
key=lambda s: severity_sort({"SEVERITY": s}),
)
def _print_grouped_findings(unmuted: list[dict], top_n: int = 15) -> None:
"""Top-N tables grouped by CHECK_ID and RESOURCE_NAME.
Used for image and IaC scans where per-finding tables would be too
large to be useful. Shows count and worst severity for each group.
"""
by_check: dict[str, list[dict]] = {}
by_resource: dict[str, list[dict]] = {}
for r in unmuted:
by_check.setdefault(r["CHECK_ID"], []).append(r)
by_resource.setdefault(r.get("RESOURCE_NAME", "") or "(no resource)", []).append(r)
check_table = Table(
show_header=True,
header_style="bold",
title=f"Top {top_n} Checks by Unmuted Finding Count",
)
check_table.add_column("Worst Sev")
check_table.add_column("Check ID")
check_table.add_column("Count", justify="right")
for check, rows in sorted(
by_check.items(), key=lambda kv: -len(kv[1])
)[:top_n]:
worst = _worst_severity(rows)
style = _sev_style(worst)
check_table.add_row(
f"[{style}]{worst}[/{style}]" if style else worst,
check,
str(len(rows)),
)
console.print(check_table)
console.print()
res_table = Table(
show_header=True,
header_style="bold",
title=f"Top {top_n} Resources by Unmuted Finding Count",
)
res_table.add_column("Worst Sev")
res_table.add_column("Resource")
res_table.add_column("Count", justify="right")
for resource, rows in sorted(
by_resource.items(), key=lambda kv: -len(kv[1])
)[:top_n]:
worst = _worst_severity(rows)
style = _sev_style(worst)
res_table.add_row(
f"[{style}]{worst}[/{style}]" if style else worst,
resource[:80],
str(len(rows)),
)
console.print(res_table)
console.print()
def main( def main(
full: Annotated[ full: Annotated[
bool, typer.Option(help="(reserved) currently a no-op; all unmuted failures already shown") bool, typer.Option(help="(reserved) currently a no-op; all unmuted failures already shown")
@ -646,13 +562,12 @@ def main(
del full # historical flag, kept for backwards compatibility del full # historical flag, kept for backwards compatibility
with tempfile.TemporaryDirectory() as tmpdir: with tempfile.TemporaryDirectory() as tmpdir:
for label, base, group in PROWLER_SCANS: for label, base in PROWLER_SCANS:
summarize_report( summarize_report(
label, label,
base, base,
tmpdir, tmpdir,
show_muted=show_muted, show_muted=show_muted,
group_findings=group,
) )
# --- Node-level MANUAL check verification --- # --- Node-level MANUAL check verification ---

View file

@ -7,11 +7,11 @@
] ]
}, },
"locked": { "locked": {
"lastModified": 1780290312, "lastModified": 1780894562,
"narHash": "sha256-eTAlX0CwgB84Ts3GaBd944A3DRXVMzgA0EqroZBISUo=", "narHash": "sha256-c3430xwxwhHipl3jigUGMMBfpaMylDqytW/kdmB3ZGs=",
"owner": "nix-community", "owner": "nix-community",
"repo": "disko", "repo": "disko",
"rev": "115e5211780054d8a890b41f0b7734cafad54dfe", "rev": "24fed06cac83bcc44ac8efbb57cab1a82fa0bedc",
"type": "github" "type": "github"
}, },
"original": { "original": {
@ -43,11 +43,11 @@
}, },
"nixpkgs": { "nixpkgs": {
"locked": { "locked": {
"lastModified": 1779796641, "lastModified": 1780511130,
"narHash": "sha256-ZsIrKmhp4vbBXoXXmR/tBXA/UCsAQiJL9vsgZEduhVY=", "narHash": "sha256-2v9lT4ya59Lh1FqPeLnz1MoX9y/wz2huqfe9RtQZITk=",
"owner": "NixOS", "owner": "NixOS",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "25f538306313eae3927264466c70d7001dcea1df", "rev": "535f3e6942cb1cead3929c604320d3db54b542b9",
"type": "github" "type": "github"
}, },
"original": { "original": {

View file

@ -56,8 +56,8 @@ services:
- name: nvidia-device-plugin - name: nvidia-device-plugin
type: argocd type: argocd
last-reviewed: 2026-03-27 last-reviewed: 2026-06-04
current-version: "v0.19.0" current-version: "v0.19.2"
upstream-source: https://github.com/NVIDIA/k8s-device-plugin/releases upstream-source: https://github.com/NVIDIA/k8s-device-plugin/releases
notes: DaemonSet + RuntimeClass on ringtail for GPU workloads notes: DaemonSet + RuntimeClass on ringtail for GPU workloads
@ -159,10 +159,13 @@ services:
- name: external-secrets - name: external-secrets
type: argocd type: argocd
last-reviewed: 2026-03-25 last-reviewed: 2026-06-04
current-version: "v2.2.0" current-version: "v2.2.0"
upstream-source: https://github.com/external-secrets/external-secrets/releases upstream-source: https://github.com/external-secrets/external-secrets/releases
notes: Static kustomize manifests rendered from upstream Helm chart notes: >-
Static kustomize manifests rendered from upstream Helm chart. Controller
image is locally built from the forge mirror via containers/external-secrets/container.py
(single all_providers static Go binary).
- name: 1password-connect - name: 1password-connect
type: argocd type: argocd
@ -411,6 +414,23 @@ services:
upstream-source: https://github.com/caddyserver/caddy/releases upstream-source: https://github.com/caddyserver/caddy/releases
notes: Built from source with Gandi DNS and Layer 4 plugins notes: Built from source with Gandi DNS and Layer 4 plugins
- name: heph
type: ansible
last-reviewed: 2026-06-05
current-version: "v1.2.1"
upstream-source: https://forge.eblu.me/eblume/hephaestus/releases
notes: >-
hephaestus task/context sync hub on indri (server-mode launchagent,
ansible/roles/heph; cargo-built from the forge). SELF-UPDATING: hephd
polls the forge for newer releases every 10 min and rebuilds + restarts
itself, so the running version drifts AHEAD of the ansible heph_version
pin. current-version here is the last observed/deployed tag, not a hard
pin — verify the live version via `curl https://heph.ops.eblu.me/config`
is served (hub up) and the hub log's `current=` line. Reconciling this
self-update vs IaC-pin drift is tracked in the heph "Hephaestus" project:
"Reconcile hephd self-update with ansible-pinned version (drift on indri
hub)" (node 01KTBXWT6XTHNDH92CVJY88E5K).
- name: borgmatic - name: borgmatic
type: ansible type: ansible
last-reviewed: 2026-04-15 last-reviewed: 2026-04-15
@ -420,9 +440,15 @@ services:
- name: jellyfin - name: jellyfin
type: ansible type: ansible
last-reviewed: 2026-03-17 last-reviewed: 2026-06-08
current-version: "10.11.6" current-version: "10.11.11"
upstream-source: https://github.com/jellyfin/jellyfin/releases upstream-source: https://github.com/jellyfin/jellyfin/releases
notes: >-
Homebrew cask (state: present, unpinned). Upgrade with
`brew upgrade --cask jellyfin` on indri. After upgrade the .app is
re-quarantined; launchd-spawned launch hangs silently until the
Gatekeeper first-launch dialog is approved on indri's GUI console
(xattr removal over SSH is blocked by TCC).
- name: automounter - name: automounter
type: ansible type: ansible