blumeops

Author	SHA1	Message	Date
Erich Blume	4a37ffcdc2	C0: CLAUDE.md — import AGENTS.md instead of redirecting to it Claude Code only auto-loads CLAUDE.md. The prose shim told agents to go read AGENTS.md, which is easy to skip. Replacing the shim with `@AGENTS.md` inlines AGENTS.md content into the session prompt, so the startup rules (ai-docs, blumeops-tasks, change classification) land in context unconditionally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:41:13 -07:00
Erich Blume	f9d9e00057	C0: blumeops-tasks — show due offset + recurrence, sort by overdue-ness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:18:16 -07:00
Erich Blume	005e2a03ed	C0: split gandi-operations docs; add dns-acme-cleanup mise task Splits the nebulous gandi-operations how-to into two single-topic cards (manage-eblu-me-dns, rotate-gandi-pat) and adds a mise task for the recurring _acme-challenge TXT cleanup needed due to a value-comparison bug in libdns/gandi v1.1.0 that prevents certmagic's cleanup phase from removing presented TXT values. The gandi reference card is updated to drop the false "different credential from Pulumi PAT" claim — verified during the 2026-04-27 incident that Caddy and Pulumi share a single PAT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 09:48:46 -07:00
Erich Blume	72b27b7fd2	C0: docs — add mealie borg restore how-to Captures the procedure used to restore mealie's SQLite DB from a borgmatic archive after the post-DR wipe: extract from borg, snapshot the wiped DB, swap via a helper pod on the ReadWriteOnce PVC, fix UID 911 ownership. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:04:28 -07:00
Erich Blume	34fa2ef28a	C0: ringtail — restore sway default keybindings, fix fuzzel border config Extend (not replace) home-manager's default sway keybindings via lib.mkOptionDefault, with lib.mkForce on the custom overrides that conflict with defaults. Add Mod+F1 cheatsheet binding (fuzzel-filterable). Move fuzzel's border-radius/border-width out of [main] into a proper [border] section with the expected short names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 12:16:02 -07:00
Erich Blume	88eabc3de6	Disable Xalia	2026-04-21 14:47:13 -07:00
Erich Blume	7d94b9073a	C0: docs — default argocd login to --sso; drop extraneous --grpc-web Now that argocd's Authentik OAuth2 client is public, `argocd login --sso` works for day-to-day use. Promote it to the default in AGENTS.md, argocd-cli reference, and troubleshooting; keep the admin/password flow documented as a break-glass fallback for when Authentik is unavailable. Also drops --grpc-web from every interactive login command — confirmed extraneous (login succeeds without it). Left in CI workflows and `argocd cluster add` untouched; those are different contexts that I didn't re-test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:43:21 -07:00
Erich Blume	86317315ed	C0: remove argocd OIDC client_secret wiring Now that argocd's Authentik OAuth2 client is public (PKCE-only), the client_secret plumbing is dead code: - delete argocd-oidc-authentik ExternalSecret and drop it from kustomization - remove AUTHENTIK_ARGOCD_CLIENT_SECRET env from authentik-worker - remove argocd-client-secret mapping from authentik-config ExternalSecret The argocd-client-secret field in the 1Password "Authentik (blumeops)" item is now unreferenced and can be deleted there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:38:26 -07:00
Erich Blume	0e62ad5596	C0: argocd OIDC — switch to public client for CLI SSO Changes argocd's Authentik OAuth2 client from confidential to public and drops the clientSecret from argocd-cm. Public + PKCE works for both the web UI (argocd-server backend) and the argocd CLI (`argocd login --sso`) without a shared secret, matching OAuth 2.1 guidance. Confidential → public was needed because the CLI can't hold a client secret; Authentik's per-app issuer model made the alternative ("cliClientID" pattern with separate public client) awkward since it requires a shared issuer across apps which Authentik doesn't serve. Follow-up: deadcode AUTHENTIK_ARGOCD_CLIENT_SECRET env wiring and the argocd-oidc-authentik ExternalSecret once verified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:34:39 -07:00
Erich Blume	225b0e7008	C0: allow argocd CLI --sso localhost callback Adds http://localhost:8085/auth/callback to the ArgoCD OAuth2 provider's redirect_uris so `argocd login --sso` works. Loopback redirect is the RFC 8252 pattern for native CLI apps; PKCE (already enabled) covers the code-interception risk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:18:08 -07:00
Erich Blume	e6a6a6042e	C0: suggest mise run runner-logs in container-build-and-release After dispatching, poll the Forgejo API for the run matching our head_sha and print `mise run runner-logs <N>` so the suggested monitor command is one copy-paste away. Falls back to the bare command if the poll times out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:12:00 -07:00
Erich Blume	0ceafc374d	C0: review operator-managed-pods CC (2026-04-21) Tailscale operator still defaults to privileged proxy pods with no seccomp profile (issue #7359 open upstream). Control remains valid. Added note about ProxyClass + device plugin remediation path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 09:59:48 -07:00
Erich Blume	a9ef02a602	C0: bump frigate-notify to v0.5.4-e928054-nix (workdir fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 09:44:24 -07:00
Erich Blume	e92805409e	fix(frigate-notify): set WorkingDir=/app and create writable /app The upstream binary expects CWD=/app (relative config.yml lookup, lumberjack logfile at ./log/app.log). Without this, the pod crashed on startup — the ConfigMap-mounted /app/config.yml wasn't found and zerolog spammed "mkdir log: permission denied" as it tried to create ./log at / as nonroot. Creates /app as 1777 (tmp-style) so nonroot can write logs; WorkingDir set to /app so the default config path resolves correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 09:43:00 -07:00
Erich Blume	c88b6d773c	C0: point frigate-notify at local registry tag v0.5.4-fb4bf5a-nix Built from main in run #516 after #339 merged. Follows the navidrome kustomization convention (deployment image = local ref + :kustomized, kustomization override = newTag only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 09:31:29 -07:00
Erich Blume	fb4bf5a7a3	Add frigate-notify nix container build (#339 ) ## Summary - Mirrors `github.com/0x2142/frigate-notify` at `v0.5.4` to `forge.ops.eblu.me/mirrors/frigate-notify`. - Adds `containers/frigate-notify/default.nix` — `buildGoModule` + `dockerTools.buildLayeredImage`, following the `ntfy` pattern. - Uses `-tags goolm` to avoid the libolm CGO dependency (matrix notifier is imported unconditionally in the upstream but we only use ntfy alerts). - Runs as nonroot (UID 65534), exposes port 8000, bundles `cacert`/`tzdata`. ## Why Move `ghcr.io/0x2142/frigate-notify:v0.5.4` (ringtail-deployed) under local control. Aligns with the [[indri → ringtail migration plan]] and the `default.nix` convention for ringtail-targeted containers documented in [[build-container-image]]. ## Verification - `dagger call build-nix --src=. --container-name=frigate-notify export --path=./out.tar.gz` produces a valid 20MB docker archive (10 layers) with `blumeops/frigate-notify` tag locally. - Hashes pinned for `fetchgit` (src) and `vendorHash` (go modules). ## Follow-up (post-merge) 1. `mise run container-build-and-release frigate-notify` — release from main SHA. 2. C0 follow-up: update `argocd/manifests/frigate/kustomization.yaml` image ref to `registry.ops.eblu.me/blumeops/frigate-notify:v0.5.4-<sha>-nix`. 3. ArgoCD auto-syncs the deployment. ## Test plan - [ ] `dagger call build-nix` succeeds from a clean checkout. - [ ] `mise run container-build-and-release frigate-notify --dry-run` looks correct. - [ ] After release + kustomization swap: frigate-notify pod comes up healthy on ringtail; ntfy alerts still fire on Frigate events. Reviewed-on: #339	2026-04-21 09:28:02 -07:00
Erich Blume	30f39ae050	Review contributing tutorial: add last-reviewed, .ai.md fragment type, prek provenance Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 08:53:41 -07:00
Erich Blume	fb32cc07c4	chore: repoint runner-job-image tag at CI-built v0.20.6-50f8c2a Swaps the k8s runner label from the local bootstrap tag (v0.20.6-9b6be09) to the equivalent image rebuilt by CI from main. Functionally identical; closes the bootstrap loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 08:38:33 -07:00
Erich Blume	50f8c2a33f	Roll k8s runner to runner-job-image v0.20.6-9b6be09 Points the k8s Forgejo runner label at the locally-bootstrapped runner-job-image built from the Alpine container.py on this branch. Once merged, CI will rebuild the same image from the same SHA. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 08:28:18 -07:00
Erich Blume	db8fd946ae	Bump Dagger to 0.20.6 and migrate runner-job-image to Alpine container.py Bumps the Dagger engine/CLI from v0.20.1 to v0.20.6 (mise pin, dagger.json engineVersion, SDK regen) and rewrites the runner-job-image container as a native Dagger pipeline on Alpine 3.23 using the shared alpine_runtime helper, replacing the Debian-based Dockerfile. All Forgejo Actions in this repo use actions/checkout (a JS action), so musl is not a compatibility concern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 08:28:18 -07:00
Erich Blume	58fe4f0073	ty	2026-04-20 15:48:15 -07:00
Erich Blume	54841dbf70	Update ringtail flake inputs	2026-04-20 10:09:16 -07:00
Erich Blume	d6ad8e8e59	chore: refresh forgejo-runner review date	2026-04-20 09:15:35 -07:00
Erich Blume	21177ff47f	chore: update forgejo-runner image tag	2026-04-20 09:11:37 -07:00
Erich Blume	1425bf1f5c	Upgrade forgejo-runner to v12.8, adopt server.connections, and clean up docs (#338 ) ## Summary - consolidate forgejo-runner how-to docs into current cards - upgrade the k8s forgejo-runner deployment to the latest v12.8.x runner image - switch the k8s runner from first-boot register flow to declarative server.connections config - keep the runner image on the native Dagger build path and update the surrounding manifests/secrets ## Notes - PR opened early for C1 review - implementation and deployment verification will follow in subsequent commits Reviewed-on: #338	2026-04-20 09:03:54 -07:00
Erich Blume	353e2785c3	docs: review zot oidc client card	2026-04-20 07:55:25 -07:00
Erich Blume	53a7374ac1	C0: drop fix-ntfy-nix-version mikado card Historical one-shot fix from the zot hardening chain — knowledge is self-evident in containers/ntfy/default.nix and container-version-check regex. Should have been removed at mikado finalization. Scrubbed the two wiki-link references in add-container-version-sync-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 07:26:53 -07:00
Erich Blume	51a878cddb	C0: review navidrome reference doc	2026-04-18 20:25:19 -07:00
Erich Blume	deedeecef9	C0: adopt AGENTS.md as canonical agent config	2026-04-18 20:15:30 -07:00
Erich Blume	71c1c453d6	Fetch job logs via SSH to indri instead of Forgejo web endpoint Forgejo's web action routes don't support API token auth for private repos (only session cookies or public access). Switch log fetching to read the zstd-compressed log files directly from indri via SSH — Forgejo stores all runner logs on disk regardless of which runner executed the job. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 17:08:46 -07:00
Erich Blume	4f5a963ef6	Add API token auth and git remote detection to runner-logs runner-logs now always authenticates with the Forgejo API token (via --token flag, FORGEJO_TOKEN env, or 1Password) so it works on private repos. The --repo default is auto-detected from the git remote origin URL instead of being hardcoded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 16:39:21 -07:00
Erich Blume	1d62653871	Fix forge.eblu.me static assets by adding missing Host header All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m26s Details The static asset cache block (css/js/png/etc) was missing proxy_set_header Host, so Caddy received "forge.eblu.me" instead of "forge.ops.eblu.me" and couldn't route the request. HTML loaded fine because the main location / block had the header. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 16:00:56 -07:00
Erich Blume	55abb17f50	Add resource limits to ArgoCD pods to prevent unbounded consumption All 7 ArgoCD containers had no resource limits, allowing them to consume unlimited CPU/memory during node pressure events. This contributed to cluster-wide probe timeout cascades on minikube-indri. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 13:04:27 -07:00
Forgejo Actions	bdfcb4b677	Update docs release to v1.16.0 - Built changelog from towncrier fragments [skip ci]	2026-04-18 10:00:54 -07:00
Erich Blume	d26a6ae3b2	Update docs for Caddy routing and direct WireGuard peering v1.16.0 Comprehensive docs pass reflecting the new Fly proxy architecture: - Fly proxy routes through Caddy on indri (not per-service TS Ingress) - Direct WireGuard peering via --port=41641 pinning - DERP relay performance lesson in Tailscale docs - Caddy now in public traffic path - indri tagged as flyio-target - Removed fly-reload references - Updated architecture diagrams and per-service setup guide - Added changelog fragment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 09:57:30 -07:00
Erich Blume	12b2786ca2	Route Fly proxy through Caddy on indri for direct WireGuard peering All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m59s Details Tailscale Ingress pods in k8s can't establish direct WireGuard connections (stuck behind pod-network NAT → DERP relay → 20s latency). Indri's host-level Tailscale CAN peer directly with Fly. Change all nginx upstreams to route through Caddy on indri instead of per-service Tailscale Ingress endpoints. Tag indri as flyio-target in the Tailscale ACL so the Fly proxy can reach it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 09:40:20 -07:00
Erich Blume	bca4c2bede	Expose Tailscale WireGuard UDP port on Fly proxy Some checks failed Deploy Fly.io Proxy / deploy (push) Failing after 1m33s Details Enable direct peer-to-peer WireGuard connections by pinning tailscaled to port 41641 and exposing it as a UDP service. Without this, all traffic routes through Tailscale DERP relays causing 20+ second latency. Requires dedicated IPv4 (allocated: 168.220.82.221). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 09:17:03 -07:00
Erich Blume	c8da243663	Run alloy-tracing as root for eBPF capabilities The nix-built Alloy image sets User=65534 (nobody). Even with privileged: true, a non-root user gets no effective capabilities (CapEff=0). Override with runAsUser: 0 so Beyla gets CAP_BPF and CAP_SYS_ADMIN needed for eBPF instrumentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:42:26 -07:00
Erich Blume	3a2913ba1f	Allow BPF in privileged containers on ringtail NixOS defaults kernel.unprivileged_bpf_disabled=2, which blocks BPF syscalls outside the init namespace even with CAP_BPF. Set to 1 so privileged containers (Beyla/Alloy tracing) can create BPF maps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:32:30 -07:00
Erich Blume	24f3f9b24a	Raise k3s memlock rlimit for eBPF tracing on ringtail Beyla (alloy-tracing) has been failing since April 13 with "failed to set memlock rlimit: operation not permitted" because k3s inherits the default 8MB memlock limit. Set LimitMEMLOCK=infinity on the k3s systemd service so privileged containers can use eBPF. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:27:39 -07:00
Forgejo Actions	a72a2c2bd4	Update docs release to v1.15.7 - Built changelog from towncrier fragments [skip ci]	2026-04-18 08:14:58 -07:00
Erich Blume	9bafe85b2b	Add teslamate extensions to DR restore procedure v1.15.7 The earthdistance extension (depends on cube) must be created before restoring the teslamate database — discovered missing after 2026-04-13 DR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:12:26 -07:00
Erich Blume	b4472c7849	Deploy devpi 6.19.3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:04:23 -07:00
Erich Blume	4dab6d11bb	Add remote commit check to container-build-and-release Queries the Forgejo API to verify the target commit exists on the remote before dispatching a build, preventing wasted CI runs on unpushed commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 08:03:21 -07:00
Erich Blume	37b8a21524	Migrate devpi to Dagger build and bump to 6.19.3 Replace Dockerfile with container.py for native Dagger builds. Bump devpi-server 6.19.1→6.19.3, devpi-web 5.0.1→5.0.2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 07:57:05 -07:00
Erich Blume	fe0e913963	Switch Fly proxy to upstream keepalive pools (#337 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m37s Details ## Summary - Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools - Reuses TLS connections through the Tailscale tunnel instead of handshaking per request - Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS) ## Trade-off DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this. ## Still TODO on this branch - [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder) - [ ] Docs pass - [ ] Deploy from branch and verify latency improvement - [ ] Changelog fragment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #337	2026-04-17 16:39:52 -07:00
Erich Blume	54b1cee950	Fix Connection header: only send 'upgrade' for WebSocket requests Some checks failed Deploy Fly.io Proxy / deploy (push) Failing after 1m35s Details Was sending Connection: upgrade on every proxied request, which is semantically wrong for normal HTTP traffic. Use a map to conditionally send 'upgrade' only when the client requests a WebSocket switch, 'close' otherwise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 15:27:40 -07:00
Erich Blume	1c0ee099fb	Move forge-specific latency panels to Forgejo dashboard Fly.io dashboard keeps aggregate all-hosts p50/p90/p99. Forge-filtered upstream response time panel moves to Forgejo's "Public Proxy" section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 15:13:40 -07:00
Erich Blume	d7af004842	Add Forgejo metrics + upstream latency histogram to Fly proxy dashboard All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m53s Details - Enable Forgejo /metrics endpoint (app.ini [metrics] section) - Add Alloy scrape target for Forgejo metrics on indri - Add upstream_response_time histogram to Fly proxy Alloy config - Replace single p95 panel with p50/p90/p99 + upstream breakdown filtered to forge.eblu.me host Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 15:05:59 -07:00
Erich Blume	8fccbda573	Extend Fly proxy latency histogram buckets to 60s All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m29s Details Previous max bucket was 10s — all slower requests collapsed into +Inf, making p50/p90/p99 unreadable during the Forgejo archive DoS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 14:50:28 -07:00

1 2 3 4 5 ...

989 commits