Claude Code only auto-loads CLAUDE.md. The prose shim told agents to go
read AGENTS.md, which is easy to skip. Replacing the shim with
`@AGENTS.md` inlines AGENTS.md content into the session prompt, so the
startup rules (ai-docs, blumeops-tasks, change classification) land in
context unconditionally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the nebulous gandi-operations how-to into two single-topic cards
(manage-eblu-me-dns, rotate-gandi-pat) and adds a mise task for the
recurring _acme-challenge TXT cleanup needed due to a value-comparison
bug in libdns/gandi v1.1.0 that prevents certmagic's cleanup phase from
removing presented TXT values.
The gandi reference card is updated to drop the false "different
credential from Pulumi PAT" claim — verified during the 2026-04-27
incident that Caddy and Pulumi share a single PAT.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the procedure used to restore mealie's SQLite DB from a borgmatic
archive after the post-DR wipe: extract from borg, snapshot the wiped DB,
swap via a helper pod on the ReadWriteOnce PVC, fix UID 911 ownership.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend (not replace) home-manager's default sway keybindings via
lib.mkOptionDefault, with lib.mkForce on the custom overrides that
conflict with defaults. Add Mod+F1 cheatsheet binding (fuzzel-filterable).
Move fuzzel's border-radius/border-width out of [main] into a proper
[border] section with the expected short names.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that argocd's Authentik OAuth2 client is public, `argocd login --sso`
works for day-to-day use. Promote it to the default in AGENTS.md,
argocd-cli reference, and troubleshooting; keep the admin/password flow
documented as a break-glass fallback for when Authentik is unavailable.
Also drops --grpc-web from every interactive login command — confirmed
extraneous (login succeeds without it). Left in CI workflows and
`argocd cluster add` untouched; those are different contexts that I
didn't re-test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that argocd's Authentik OAuth2 client is public (PKCE-only), the
client_secret plumbing is dead code:
- delete argocd-oidc-authentik ExternalSecret and drop it from kustomization
- remove AUTHENTIK_ARGOCD_CLIENT_SECRET env from authentik-worker
- remove argocd-client-secret mapping from authentik-config ExternalSecret
The argocd-client-secret field in the 1Password "Authentik (blumeops)"
item is now unreferenced and can be deleted there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Changes argocd's Authentik OAuth2 client from confidential to public and
drops the clientSecret from argocd-cm. Public + PKCE works for both the
web UI (argocd-server backend) and the argocd CLI (`argocd login --sso`)
without a shared secret, matching OAuth 2.1 guidance.
Confidential → public was needed because the CLI can't hold a client
secret; Authentik's per-app issuer model made the alternative
("cliClientID" pattern with separate public client) awkward since it
requires a shared issuer across apps which Authentik doesn't serve.
Follow-up: deadcode AUTHENTIK_ARGOCD_CLIENT_SECRET env wiring and the
argocd-oidc-authentik ExternalSecret once verified.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds http://localhost:8085/auth/callback to the ArgoCD OAuth2 provider's
redirect_uris so `argocd login --sso` works. Loopback redirect is the
RFC 8252 pattern for native CLI apps; PKCE (already enabled) covers the
code-interception risk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After dispatching, poll the Forgejo API for the run matching our
head_sha and print `mise run runner-logs <N>` so the suggested monitor
command is one copy-paste away. Falls back to the bare command if the
poll times out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tailscale operator still defaults to privileged proxy pods with no
seccomp profile (issue #7359 open upstream). Control remains valid.
Added note about ProxyClass + device plugin remediation path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The upstream binary expects CWD=/app (relative config.yml lookup,
lumberjack logfile at ./log/app.log). Without this, the pod crashed on
startup — the ConfigMap-mounted /app/config.yml wasn't found and zerolog
spammed "mkdir log: permission denied" as it tried to create ./log at
/ as nonroot.
Creates /app as 1777 (tmp-style) so nonroot can write logs; WorkingDir
set to /app so the default config path resolves correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Built from main in run #516 after #339 merged. Follows the navidrome
kustomization convention (deployment image = local ref + :kustomized,
kustomization override = newTag only).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
- Mirrors `github.com/0x2142/frigate-notify` at `v0.5.4` to `forge.ops.eblu.me/mirrors/frigate-notify`.
- Adds `containers/frigate-notify/default.nix` — `buildGoModule` + `dockerTools.buildLayeredImage`, following the `ntfy` pattern.
- Uses `-tags goolm` to avoid the libolm CGO dependency (matrix notifier is imported unconditionally in the upstream but we only use ntfy alerts).
- Runs as nonroot (UID 65534), exposes port 8000, bundles `cacert`/`tzdata`.
## Why
Move `ghcr.io/0x2142/frigate-notify:v0.5.4` (ringtail-deployed) under local control. Aligns with the [[indri → ringtail migration plan]] and the `default.nix` convention for ringtail-targeted containers documented in [[build-container-image]].
## Verification
- `dagger call build-nix --src=. --container-name=frigate-notify export --path=./out.tar.gz` produces a valid 20MB docker archive (10 layers) with `blumeops/frigate-notify` tag locally.
- Hashes pinned for `fetchgit` (src) and `vendorHash` (go modules).
## Follow-up (post-merge)
1. `mise run container-build-and-release frigate-notify` — release from main SHA.
2. C0 follow-up: update `argocd/manifests/frigate/kustomization.yaml` image ref to `registry.ops.eblu.me/blumeops/frigate-notify:v0.5.4-<sha>-nix`.
3. ArgoCD auto-syncs the deployment.
## Test plan
- [ ] `dagger call build-nix` succeeds from a clean checkout.
- [ ] `mise run container-build-and-release frigate-notify --dry-run` looks correct.
- [ ] After release + kustomization swap: frigate-notify pod comes up healthy on ringtail; ntfy alerts still fire on Frigate events.
Reviewed-on: #339
Swaps the k8s runner label from the local bootstrap tag (v0.20.6-9b6be09)
to the equivalent image rebuilt by CI from main. Functionally identical;
closes the bootstrap loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Points the k8s Forgejo runner label at the locally-bootstrapped
runner-job-image built from the Alpine container.py on this branch.
Once merged, CI will rebuild the same image from the same SHA.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps the Dagger engine/CLI from v0.20.1 to v0.20.6 (mise pin, dagger.json
engineVersion, SDK regen) and rewrites the runner-job-image container as a
native Dagger pipeline on Alpine 3.23 using the shared alpine_runtime helper,
replacing the Debian-based Dockerfile. All Forgejo Actions in this repo use
actions/checkout (a JS action), so musl is not a compatibility concern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
- consolidate forgejo-runner how-to docs into current cards
- upgrade the k8s forgejo-runner deployment to the latest v12.8.x runner image
- switch the k8s runner from first-boot register flow to declarative server.connections config
- keep the runner image on the native Dagger build path and update the surrounding manifests/secrets
## Notes
- PR opened early for C1 review
- implementation and deployment verification will follow in subsequent commits
Reviewed-on: #338
Historical one-shot fix from the zot hardening chain — knowledge is
self-evident in containers/ntfy/default.nix and container-version-check
regex. Should have been removed at mikado finalization. Scrubbed the two
wiki-link references in add-container-version-sync-check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forgejo's web action routes don't support API token auth for private
repos (only session cookies or public access). Switch log fetching to
read the zstd-compressed log files directly from indri via SSH —
Forgejo stores all runner logs on disk regardless of which runner
executed the job.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
runner-logs now always authenticates with the Forgejo API token
(via --token flag, FORGEJO_TOKEN env, or 1Password) so it works on
private repos. The --repo default is auto-detected from the git
remote origin URL instead of being hardcoded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The static asset cache block (css/js/png/etc) was missing
proxy_set_header Host, so Caddy received "forge.eblu.me" instead of
"forge.ops.eblu.me" and couldn't route the request. HTML loaded fine
because the main location / block had the header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 7 ArgoCD containers had no resource limits, allowing them to consume
unlimited CPU/memory during node pressure events. This contributed to
cluster-wide probe timeout cascades on minikube-indri.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive docs pass reflecting the new Fly proxy architecture:
- Fly proxy routes through Caddy on indri (not per-service TS Ingress)
- Direct WireGuard peering via --port=41641 pinning
- DERP relay performance lesson in Tailscale docs
- Caddy now in public traffic path
- indri tagged as flyio-target
- Removed fly-reload references
- Updated architecture diagrams and per-service setup guide
- Added changelog fragment
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tailscale Ingress pods in k8s can't establish direct WireGuard
connections (stuck behind pod-network NAT → DERP relay → 20s latency).
Indri's host-level Tailscale CAN peer directly with Fly.
Change all nginx upstreams to route through Caddy on indri instead of
per-service Tailscale Ingress endpoints. Tag indri as flyio-target in
the Tailscale ACL so the Fly proxy can reach it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enable direct peer-to-peer WireGuard connections by pinning tailscaled
to port 41641 and exposing it as a UDP service. Without this, all
traffic routes through Tailscale DERP relays causing 20+ second
latency. Requires dedicated IPv4 (allocated: 168.220.82.221).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The nix-built Alloy image sets User=65534 (nobody). Even with
privileged: true, a non-root user gets no effective capabilities
(CapEff=0). Override with runAsUser: 0 so Beyla gets CAP_BPF and
CAP_SYS_ADMIN needed for eBPF instrumentation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NixOS defaults kernel.unprivileged_bpf_disabled=2, which blocks BPF
syscalls outside the init namespace even with CAP_BPF. Set to 1 so
privileged containers (Beyla/Alloy tracing) can create BPF maps.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Beyla (alloy-tracing) has been failing since April 13 with
"failed to set memlock rlimit: operation not permitted" because k3s
inherits the default 8MB memlock limit. Set LimitMEMLOCK=infinity on
the k3s systemd service so privileged containers can use eBPF.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The earthdistance extension (depends on cube) must be created before
restoring the teslamate database — discovered missing after 2026-04-13 DR.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Queries the Forgejo API to verify the target commit exists on the remote
before dispatching a build, preventing wasted CI runs on unpushed commits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Dockerfile with container.py for native Dagger builds.
Bump devpi-server 6.19.1→6.19.3, devpi-web 5.0.1→5.0.2.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools
- Reuses TLS connections through the Tailscale tunnel instead of handshaking per request
- Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS)
## Trade-off
DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this.
## Still TODO on this branch
- [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder)
- [ ] Docs pass
- [ ] Deploy from branch and verify latency improvement
- [ ] Changelog fragment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #337
Was sending Connection: upgrade on every proxied request, which is
semantically wrong for normal HTTP traffic. Use a map to conditionally
send 'upgrade' only when the client requests a WebSocket switch,
'close' otherwise.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous max bucket was 10s — all slower requests collapsed into +Inf,
making p50/p90/p99 unreadable during the Forgejo archive DoS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>