Commit graph

944 commits

Author SHA1 Message Date
2daf6291b7 Replace dead Prowler IaC mutelist with Trivy ignorefile shim
Prowler's IaC provider hardcodes self._mutelist = None and delegates
filtering to Trivy, but doesn't plumb --ignorefile through. The original
attempt with --mutelist-file silently no-op'd. Add a wrapper around
trivy in our image that injects --ignorefile $TRIVY_IGNOREFILE on `fs`
subcommands; switch the IaC cronjob to mount a Trivy-format
trivyignore.yaml and set the env var.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 09:50:31 -07:00
0510a8151c Address critical Prowler IaC findings via mute + RBAC tightening
Six critical IaC findings against argocd/manifests/ broke into two
patterns: legitimate-by-design RBAC (mute) and over-broad RBAC (fix).

Plumbing:
  - cronjob-iac-scan.yaml now passes --mutelist-file (previously
    unused, which is why all IaC findings reported as unmuted)
  - new mutelist/iac.yaml is bundled into the prowler-mutelist
    ConfigMap and mounted into the IaC cronjob via items: selector

Compensating controls (in compensating-controls.yaml):
  - operator-purpose-bound-rbac — external-secrets-operator's whole
    function is to manage Secret objects; ClusterRole over secrets
    matches its purpose. cert-controller mutates its own validating
    webhooks to inject a rotating CA bundle.
  - kube-state-metrics-metadata-only — KSM exposes only Secret
    metadata via kube_secret_info / kube_secret_labels; the data
    field is never read into exposed metrics.

Mutes (mutelist/iac.yaml):
  - KSV-0041 for external-secrets/rbac.yaml,
    kube-state-metrics/rbac.yaml,
    kube-state-metrics-ringtail/rbac.yaml
  - KSV-0114 for external-secrets/rbac.yaml

Real fix:
  - grafana-clusterrole no longer reads secrets. The dashboard sidecar
    (RESOURCE=both → configmap, both init and watch instances) only
    needs ConfigMap-labeled dashboards; no Secrets are labeled
    grafana_dashboard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:48:54 -07:00
718e0a0043 C0: review-compliance-reports — summarize image and IaC scans
Previously only the K8s CIS in-cluster scan was processed; the weekly
container-image and IaC Prowler scans were running on schedule but never
reviewed. Now each scan gets its own status / severity / week-over-week
delta, with top-N grouped tables (by check ID and resource) for the
high-volume image and IaC outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:18:06 -07:00
cfb6d7a7aa C0: service-review — mark cv reviewed 2026-04-27
No version bump; build deps (jinja2, pyyaml) still loose-pinned and fine.
Known issue: deployed v1.0.3 package predates phone-hide commit; tracked
separately in Todoist by user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:57:33 -07:00
c9eb188e05 C0: blumeops-tasks — replace ambiguous due:+N with "Nd overdue"
The signed offset format read as "due in 5 days" rather than
"5 days overdue", causing misreads. Switch to self-explanatory
text: "5d overdue" / "due in 2d" / "due today".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:49:46 -07:00
4a37ffcdc2 C0: CLAUDE.md — import AGENTS.md instead of redirecting to it
Claude Code only auto-loads CLAUDE.md. The prose shim told agents to go
read AGENTS.md, which is easy to skip. Replacing the shim with
`@AGENTS.md` inlines AGENTS.md content into the session prompt, so the
startup rules (ai-docs, blumeops-tasks, change classification) land in
context unconditionally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:41:13 -07:00
f9d9e00057 C0: blumeops-tasks — show due offset + recurrence, sort by overdue-ness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:18:16 -07:00
005e2a03ed C0: split gandi-operations docs; add dns-acme-cleanup mise task
Splits the nebulous gandi-operations how-to into two single-topic cards
(manage-eblu-me-dns, rotate-gandi-pat) and adds a mise task for the
recurring _acme-challenge TXT cleanup needed due to a value-comparison
bug in libdns/gandi v1.1.0 that prevents certmagic's cleanup phase from
removing presented TXT values.

The gandi reference card is updated to drop the false "different
credential from Pulumi PAT" claim — verified during the 2026-04-27
incident that Caddy and Pulumi share a single PAT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 09:48:46 -07:00
72b27b7fd2 C0: docs — add mealie borg restore how-to
Captures the procedure used to restore mealie's SQLite DB from a borgmatic
archive after the post-DR wipe: extract from borg, snapshot the wiped DB,
swap via a helper pod on the ReadWriteOnce PVC, fix UID 911 ownership.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:04:28 -07:00
Erich Blume
34fa2ef28a C0: ringtail — restore sway default keybindings, fix fuzzel border config
Extend (not replace) home-manager's default sway keybindings via
lib.mkOptionDefault, with lib.mkForce on the custom overrides that
conflict with defaults. Add Mod+F1 cheatsheet binding (fuzzel-filterable).

Move fuzzel's border-radius/border-width out of [main] into a proper
[border] section with the expected short names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 12:16:02 -07:00
Erich Blume
88eabc3de6 Disable Xalia 2026-04-21 14:47:13 -07:00
7d94b9073a C0: docs — default argocd login to --sso; drop extraneous --grpc-web
Now that argocd's Authentik OAuth2 client is public, `argocd login --sso`
works for day-to-day use. Promote it to the default in AGENTS.md,
argocd-cli reference, and troubleshooting; keep the admin/password flow
documented as a break-glass fallback for when Authentik is unavailable.

Also drops --grpc-web from every interactive login command — confirmed
extraneous (login succeeds without it). Left in CI workflows and
`argocd cluster add` untouched; those are different contexts that I
didn't re-test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:43:21 -07:00
86317315ed C0: remove argocd OIDC client_secret wiring
Now that argocd's Authentik OAuth2 client is public (PKCE-only), the
client_secret plumbing is dead code:

- delete argocd-oidc-authentik ExternalSecret and drop it from kustomization
- remove AUTHENTIK_ARGOCD_CLIENT_SECRET env from authentik-worker
- remove argocd-client-secret mapping from authentik-config ExternalSecret

The argocd-client-secret field in the 1Password "Authentik (blumeops)"
item is now unreferenced and can be deleted there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:38:26 -07:00
0e62ad5596 C0: argocd OIDC — switch to public client for CLI SSO
Changes argocd's Authentik OAuth2 client from confidential to public and
drops the clientSecret from argocd-cm. Public + PKCE works for both the
web UI (argocd-server backend) and the argocd CLI (`argocd login --sso`)
without a shared secret, matching OAuth 2.1 guidance.

Confidential → public was needed because the CLI can't hold a client
secret; Authentik's per-app issuer model made the alternative
("cliClientID" pattern with separate public client) awkward since it
requires a shared issuer across apps which Authentik doesn't serve.

Follow-up: deadcode AUTHENTIK_ARGOCD_CLIENT_SECRET env wiring and the
argocd-oidc-authentik ExternalSecret once verified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:34:39 -07:00
225b0e7008 C0: allow argocd CLI --sso localhost callback
Adds http://localhost:8085/auth/callback to the ArgoCD OAuth2 provider's
redirect_uris so `argocd login --sso` works. Loopback redirect is the
RFC 8252 pattern for native CLI apps; PKCE (already enabled) covers the
code-interception risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:18:08 -07:00
e6a6a6042e C0: suggest mise run runner-logs in container-build-and-release
After dispatching, poll the Forgejo API for the run matching our
head_sha and print `mise run runner-logs <N>` so the suggested monitor
command is one copy-paste away. Falls back to the bare command if the
poll times out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:12:00 -07:00
0ceafc374d C0: review operator-managed-pods CC (2026-04-21)
Tailscale operator still defaults to privileged proxy pods with no
seccomp profile (issue #7359 open upstream). Control remains valid.
Added note about ProxyClass + device plugin remediation path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:59:48 -07:00
a9ef02a602 C0: bump frigate-notify to v0.5.4-e928054-nix (workdir fix)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:44:24 -07:00
e92805409e fix(frigate-notify): set WorkingDir=/app and create writable /app
The upstream binary expects CWD=/app (relative config.yml lookup,
lumberjack logfile at ./log/app.log). Without this, the pod crashed on
startup — the ConfigMap-mounted /app/config.yml wasn't found and zerolog
spammed "mkdir log: permission denied" as it tried to create ./log at
/ as nonroot.

Creates /app as 1777 (tmp-style) so nonroot can write logs; WorkingDir
set to /app so the default config path resolves correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:43:00 -07:00
c88b6d773c C0: point frigate-notify at local registry tag v0.5.4-fb4bf5a-nix
Built from main in run #516 after #339 merged. Follows the navidrome
kustomization convention (deployment image = local ref + :kustomized,
kustomization override = newTag only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:31:29 -07:00
fb4bf5a7a3 Add frigate-notify nix container build (#339)
## Summary
- Mirrors `github.com/0x2142/frigate-notify` at `v0.5.4` to `forge.ops.eblu.me/mirrors/frigate-notify`.
- Adds `containers/frigate-notify/default.nix` — `buildGoModule` + `dockerTools.buildLayeredImage`, following the `ntfy` pattern.
- Uses `-tags goolm` to avoid the libolm CGO dependency (matrix notifier is imported unconditionally in the upstream but we only use ntfy alerts).
- Runs as nonroot (UID 65534), exposes port 8000, bundles `cacert`/`tzdata`.

## Why
Move `ghcr.io/0x2142/frigate-notify:v0.5.4` (ringtail-deployed) under local control. Aligns with the [[indri → ringtail migration plan]] and the `default.nix` convention for ringtail-targeted containers documented in [[build-container-image]].

## Verification
- `dagger call build-nix --src=. --container-name=frigate-notify export --path=./out.tar.gz` produces a valid 20MB docker archive (10 layers) with `blumeops/frigate-notify` tag locally.
- Hashes pinned for `fetchgit` (src) and `vendorHash` (go modules).

## Follow-up (post-merge)
1. `mise run container-build-and-release frigate-notify` — release from main SHA.
2. C0 follow-up: update `argocd/manifests/frigate/kustomization.yaml` image ref to `registry.ops.eblu.me/blumeops/frigate-notify:v0.5.4-<sha>-nix`.
3. ArgoCD auto-syncs the deployment.

## Test plan
- [ ] `dagger call build-nix` succeeds from a clean checkout.
- [ ] `mise run container-build-and-release frigate-notify --dry-run` looks correct.
- [ ] After release + kustomization swap: frigate-notify pod comes up healthy on ringtail; ntfy alerts still fire on Frigate events.

Reviewed-on: #339
2026-04-21 09:28:02 -07:00
30f39ae050 Review contributing tutorial: add last-reviewed, .ai.md fragment type, prek provenance
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:53:41 -07:00
fb32cc07c4 chore: repoint runner-job-image tag at CI-built v0.20.6-50f8c2a
Swaps the k8s runner label from the local bootstrap tag (v0.20.6-9b6be09)
to the equivalent image rebuilt by CI from main. Functionally identical;
closes the bootstrap loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:38:33 -07:00
50f8c2a33f Roll k8s runner to runner-job-image v0.20.6-9b6be09
Points the k8s Forgejo runner label at the locally-bootstrapped
runner-job-image built from the Alpine container.py on this branch.
Once merged, CI will rebuild the same image from the same SHA.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:28:18 -07:00
db8fd946ae Bump Dagger to 0.20.6 and migrate runner-job-image to Alpine container.py
Bumps the Dagger engine/CLI from v0.20.1 to v0.20.6 (mise pin, dagger.json
engineVersion, SDK regen) and rewrites the runner-job-image container as a
native Dagger pipeline on Alpine 3.23 using the shared alpine_runtime helper,
replacing the Debian-based Dockerfile. All Forgejo Actions in this repo use
actions/checkout (a JS action), so musl is not a compatibility concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:28:18 -07:00
Erich Blume
58fe4f0073 ty 2026-04-20 15:48:15 -07:00
54841dbf70 Update ringtail flake inputs 2026-04-20 10:09:16 -07:00
d6ad8e8e59 chore: refresh forgejo-runner review date 2026-04-20 09:15:35 -07:00
21177ff47f chore: update forgejo-runner image tag 2026-04-20 09:11:37 -07:00
1425bf1f5c Upgrade forgejo-runner to v12.8, adopt server.connections, and clean up docs (#338)
## Summary
- consolidate forgejo-runner how-to docs into current cards
- upgrade the k8s forgejo-runner deployment to the latest v12.8.x runner image
- switch the k8s runner from first-boot register flow to declarative server.connections config
- keep the runner image on the native Dagger build path and update the surrounding manifests/secrets

## Notes
- PR opened early for C1 review
- implementation and deployment verification will follow in subsequent commits

Reviewed-on: #338
2026-04-20 09:03:54 -07:00
353e2785c3 docs: review zot oidc client card 2026-04-20 07:55:25 -07:00
53a7374ac1 C0: drop fix-ntfy-nix-version mikado card
Historical one-shot fix from the zot hardening chain — knowledge is
self-evident in containers/ntfy/default.nix and container-version-check
regex. Should have been removed at mikado finalization. Scrubbed the two
wiki-link references in add-container-version-sync-check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 07:26:53 -07:00
51a878cddb C0: review navidrome reference doc 2026-04-18 20:25:19 -07:00
deedeecef9 C0: adopt AGENTS.md as canonical agent config 2026-04-18 20:15:30 -07:00
71c1c453d6 Fetch job logs via SSH to indri instead of Forgejo web endpoint
Forgejo's web action routes don't support API token auth for private
repos (only session cookies or public access). Switch log fetching to
read the zstd-compressed log files directly from indri via SSH —
Forgejo stores all runner logs on disk regardless of which runner
executed the job.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 17:08:46 -07:00
4f5a963ef6 Add API token auth and git remote detection to runner-logs
runner-logs now always authenticates with the Forgejo API token
(via --token flag, FORGEJO_TOKEN env, or 1Password) so it works on
private repos. The --repo default is auto-detected from the git
remote origin URL instead of being hardcoded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 16:39:21 -07:00
1d62653871 Fix forge.eblu.me static assets by adding missing Host header
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m26s
The static asset cache block (css/js/png/etc) was missing
proxy_set_header Host, so Caddy received "forge.eblu.me" instead of
"forge.ops.eblu.me" and couldn't route the request. HTML loaded fine
because the main location / block had the header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 16:00:56 -07:00
55abb17f50 Add resource limits to ArgoCD pods to prevent unbounded consumption
All 7 ArgoCD containers had no resource limits, allowing them to consume
unlimited CPU/memory during node pressure events. This contributed to
cluster-wide probe timeout cascades on minikube-indri.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 13:04:27 -07:00
Forgejo Actions
bdfcb4b677 Update docs release to v1.16.0
- Built changelog from towncrier fragments

[skip ci]
2026-04-18 10:00:54 -07:00
d26a6ae3b2 Update docs for Caddy routing and direct WireGuard peering v1.16.0
Comprehensive docs pass reflecting the new Fly proxy architecture:
- Fly proxy routes through Caddy on indri (not per-service TS Ingress)
- Direct WireGuard peering via --port=41641 pinning
- DERP relay performance lesson in Tailscale docs
- Caddy now in public traffic path
- indri tagged as flyio-target
- Removed fly-reload references
- Updated architecture diagrams and per-service setup guide
- Added changelog fragment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 09:57:30 -07:00
12b2786ca2 Route Fly proxy through Caddy on indri for direct WireGuard peering
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m59s
Tailscale Ingress pods in k8s can't establish direct WireGuard
connections (stuck behind pod-network NAT → DERP relay → 20s latency).
Indri's host-level Tailscale CAN peer directly with Fly.

Change all nginx upstreams to route through Caddy on indri instead of
per-service Tailscale Ingress endpoints. Tag indri as flyio-target in
the Tailscale ACL so the Fly proxy can reach it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 09:40:20 -07:00
bca4c2bede Expose Tailscale WireGuard UDP port on Fly proxy
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 1m33s
Enable direct peer-to-peer WireGuard connections by pinning tailscaled
to port 41641 and exposing it as a UDP service. Without this, all
traffic routes through Tailscale DERP relays causing 20+ second
latency. Requires dedicated IPv4 (allocated: 168.220.82.221).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 09:17:03 -07:00
c8da243663 Run alloy-tracing as root for eBPF capabilities
The nix-built Alloy image sets User=65534 (nobody). Even with
privileged: true, a non-root user gets no effective capabilities
(CapEff=0). Override with runAsUser: 0 so Beyla gets CAP_BPF and
CAP_SYS_ADMIN needed for eBPF instrumentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:42:26 -07:00
3a2913ba1f Allow BPF in privileged containers on ringtail
NixOS defaults kernel.unprivileged_bpf_disabled=2, which blocks BPF
syscalls outside the init namespace even with CAP_BPF. Set to 1 so
privileged containers (Beyla/Alloy tracing) can create BPF maps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:32:30 -07:00
24f3f9b24a Raise k3s memlock rlimit for eBPF tracing on ringtail
Beyla (alloy-tracing) has been failing since April 13 with
"failed to set memlock rlimit: operation not permitted" because k3s
inherits the default 8MB memlock limit. Set LimitMEMLOCK=infinity on
the k3s systemd service so privileged containers can use eBPF.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:27:39 -07:00
Forgejo Actions
a72a2c2bd4 Update docs release to v1.15.7
- Built changelog from towncrier fragments

[skip ci]
2026-04-18 08:14:58 -07:00
9bafe85b2b Add teslamate extensions to DR restore procedure v1.15.7
The earthdistance extension (depends on cube) must be created before
restoring the teslamate database — discovered missing after 2026-04-13 DR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:12:26 -07:00
b4472c7849 Deploy devpi 6.19.3
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:04:23 -07:00
4dab6d11bb Add remote commit check to container-build-and-release
Queries the Forgejo API to verify the target commit exists on the remote
before dispatching a build, preventing wasted CI runs on unpushed commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:03:21 -07:00
37b8a21524 Migrate devpi to Dagger build and bump to 6.19.3
Replace Dockerfile with container.py for native Dagger builds.
Bump devpi-server 6.19.1→6.19.3, devpi-web 5.0.1→5.0.2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 07:57:05 -07:00