Add managed role for authentik user on blumeops-pg CNPG cluster,
with ExternalSecret pulling password from 1Password item
"Authentik (blumeops)".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Image registry.ops.eblu.me/blumeops/authentik:v1.0.0-nix built
via Nix on ringtail and verified in zot registry.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Nix-built container using pkgs.authentik with ak entrypoint.
Includes bashInteractive (ak is a bash wrapper), cacert, tzdata.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build artifacts (container images, git tags) are independent of branch
lifecycle and don't need to be deferred or reset during Mikado iterations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mikado cards are discovered through failed attempts, not designed
upfront — they don't belong in plans/. Cards now live where they
topically belong (how-to/authentik/ for this chain). Updated
agent-change-process to document this convention.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lessons learned from first C2 attempt (deploy-authentik):
- When an attempt fails, reset code changes before committing cards
- Cherry-pick doc commits onto clean base if code/docs got mixed
- Open a PR early so the user can review the Mikado graph evolving
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Attempted deployment fails on three independent blockers:
1. Container image doesn't exist (build-authentik-container)
2. PostgreSQL database doesn't exist (provision-authentik-database)
3. 1Password secrets don't exist (create-authentik-secrets)
Created cards for each and added requires to goal card.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Strip detailed phase instructions from deploy-authentik plan (400→50 lines)
- Retain architecture decisions (ringtail, CNPG on indri, Nix containers, kustomize, Tailscale+Caddy) and open questions
- Add `status: active` frontmatter — now visible as a root goal in `mise run docs-mikado`
- Update plans index to reflect Active (C2) status
This is the first real use of the C2 Mikado chain system from #225. Future sessions will discover prerequisites, create sub-cards with `requires`, and work leaf nodes first.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/226
## Summary
- New plan document at `docs/how-to/plans/deploy-authentik.md` covering full Authentik deployment
- 6 phases: Helm analysis, prerequisites (CNPG/Redis/1Password), Nix containers, kustomize manifests, networking, monitoring
- Authentik replaces Dex as the identity provider for central user management and multi-protocol SSO
- Updated plans index and how-to index
## Deployment and Testing
- [x] Pre-commit hooks pass (docs-check-links, docs-check-index, docs-check-frontmatter)
- [ ] Review plan content for accuracy and completeness
- No deployment needed — documentation only
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/224
## Summary
- Create Dex reference card (`docs/reference/services/dex.md`) with quick reference, architecture, identity source, storage, OIDC clients, secrets, and endpoints
- Write federated login explanation article (`docs/explanation/federated-login.md`) covering the Dex + Forgejo two-layer auth model, login flow, and break-glass access
- Add Dex to `services-check` (HTTP health endpoint + k3s pod check)
- Update Grafana docs with new Authentication section documenting SSO via Dex
- Update Forgejo docs with OAuth2 Provider section documenting its role as upstream identity source
- Add Dex to ringtail workloads table and reference service index
- Move `adopt-oidc-provider` plan to `completed/` with final design reflecting actual implementation
## Test plan
- [ ] `mise run services-check` passes (includes new Dex checks)
- [ ] `docs-check-links` passes (all wiki-links resolve)
- [ ] `docs-check-index` passes (new docs are indexed)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/223
## Summary
- Synced driveway_entrance zone coordinates from live Frigate config (adjusted mask boundaries)
- Added `inertia: 3` and `loitering_time: 0` to driveway_entrance zone
- Expanded review alerts to require either `driveway_entrance` or `driveway` zone (was entrance only)
- Updated frigate-notify config to allow alerts from both `driveway_entrance` and `driveway` zones
## Deployment and Testing
- [ ] Merge and sync frigate ArgoCD app on ringtail
- [ ] Sync frigate-notify (restart pod to pick up ConfigMap change)
- [ ] Verify alerts fire for person/car in driveway zone
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/219
## Summary
- Delete `ansible/roles/frigate_detector/` and remove from indri playbook — the Apple Silicon Detector is retired
- Move Mosquitto (MQTT) ArgoCD app from indri minikube to ringtail k3s
- Move ntfy ArgoCD app from indri minikube to ringtail k3s
- Update Frigate docs to reflect detector removal and planned RTX 4080 migration
- Manifests are reused as-is (same `argocd/manifests/mosquitto/` and `argocd/manifests/ntfy/`), just pointed at ringtail
## Deployment
After merge:
1. Sync indri ArgoCD `apps` app with prune to remove old mosquitto/ntfy apps:
```
argocd app sync apps --prune
```
2. Sync new ringtail apps:
```
argocd app sync mosquitto-ringtail
argocd app sync ntfy-ringtail
```
3. Manually clean up the detector LaunchAgent on indri:
```
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist'
ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist'
```
## Notes
- Frigate on indri will lose MQTT/ntfy connectivity — this is expected (user confirmed no downtime concerns)
- ntfy Tailscale Ingress hostname `ntfy` will transfer from indri ProxyGroup to ringtail ProxyGroup
- Caddy on indri proxies `ntfy.ops.eblu.me` → `ntfy.tail8d86e.ts.net`, so no Caddy changes needed
- Frigate + frigate-notify will be ported to ringtail in a follow-up PR
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/216
## Summary
- Add `containers/nettest/default.nix` using `dockerTools.buildLayeredImage` with curl, jq, dnsutils, cacert, and bash — equivalent to the existing Dockerfile
- Update `container-tag-and-release` to require `--nix` or `--dockerfile` flag when both build types exist for a container
- Update `container-list` to show `[dockerfile+nix]` label when both exist
## Deployment and Testing
- [ ] SSH to ringtail, run `nix build -f containers/nettest/default.nix -o result` to verify the nix expression builds
- [ ] Tag `nettest-nix-v1.0.0`, confirm `build-container-nix` workflow runs on `nix-container-builder` runner and pushes to registry
- [ ] Smoke test on ringtail k3s: `kubectl run nettest --image=registry.ops.eblu.me/blumeops/nettest:v1.0.0 --restart=Never && kubectl logs nettest`
- [ ] Verify `mise run container-list` shows `[dockerfile+nix]` for nettest
- [ ] Verify `mise run container-tag-and-release nettest v1.1.0` prompts for build type
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/214
## Summary
- Replace `changed_when: true` with `register` + output inspection on the two 1Password secret tasks in `ringtail.yml`
- Tasks now correctly report `ok` when the secret content hasn't changed, and `changed` only when `kubectl apply` outputs `configured` or `created`
## Test plan
- [ ] Run `mise run provision-ringtail` twice — second run should show both tasks as `ok` not `changed`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/213
## Summary
- Adds `inhibit_idle fullscreen` window commands to sway config on ringtail
- Covers both Wayland-native (`app_id`) and XWayland (`class`) windows
- Prevents swayidle from locking the screen during gamepad-only gaming sessions where controller input isn't detected by the Wayland idle tracker
## Notes
This is a blanket fullscreen inhibit. A more targeted approach (daemon monitoring `/dev/input` gamepad events) may be desired later to allow idle lock during long-running fullscreen apps like Factorio.
## Deployment and Testing
- [ ] `mise run provision-ringtail` to deploy
- [ ] Run a fullscreen app and verify swayidle doesn't lock after 15 minutes
- [ ] Verify lock still activates when no fullscreen window is present
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/212
## Summary
- Configure **swayidle** to lock screen (swaylock) after 15 minutes of inactivity
- Turn off display (DPMS) after 60 minutes, auto-restore on activity
- **swaylock** themed with Catppuccin Macchiato to match existing Sway config
- Add `Mod4+l` keybinding for manual screen lock
- Add PAM service for swaylock authentication
- Disable system suspend/hibernate entirely (workstation should never sleep)
## What changes
All changes in `nixos/ringtail/configuration.nix`:
- `security.pam.services.swaylock` — required for swaylock to authenticate on NixOS
- `systemd.sleep.extraConfig` — blocks all sleep/hibernate modes
- `programs.swaylock` (home-manager) — lock screen appearance config
- `services.swayidle` (home-manager) — idle timeout daemon with lock + DPMS events
- New keybinding `Mod4+l` for manual lock
## Deployment and Testing
- [ ] `mise run provision-ringtail`
- [ ] Verify swayidle is running: `systemctl --user status swayidle`
- [ ] Test manual lock with `Super+l`
- [ ] Verify display DPMS off after idle (can lower timeout temporarily to test)
- [ ] Confirm machine does not suspend: `systemctl status sleep.target`
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/211
k3s kubectl on ringtail needs KUBECONFIG set since the eblume
user doesn't have it in their default environment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
git ls-remote returns multiple lines when a mirror ref exists
(e.g. refs/remotes/remote_mirror_*/main). Take only the first
line to avoid a false mismatch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
Extends ringtail from a desktop/gaming NixOS box into an infrastructure node with a k3s cluster, secrets management, and a Forgejo Actions
runner for building containers with Nix.
### K3s cluster
- Single-node k3s with Traefik/ServiceLB/metrics-server disabled (minimal footprint)
- TLS SAN set to `ringtail.tail8d86e.ts.net` so ArgoCD on indri can manage it via Tailscale
- Containerd registry mirrors pull through Zot on indri (`k3s-registries.yaml`)
- Tailscale interface added to `trustedInterfaces` for cross-node ArgoCD access
- `kubectl` added to system packages
### 1Password Connect + External Secrets Operator
- Four new ArgoCD apps targeting `k3s-ringtail`: `1password-connect-ringtail`, `external-secrets-crds-ringtail`, `external-secrets-ringtail`,
`external-secrets-config-ringtail`
- Reuses the same Helm charts/values as indri, just pointed at ringtail's k3s API server
- Bootstrap secrets (`op-credentials`, `onepassword-token`) provisioned by Ansible pre_tasks via `op read`, then applied to the `1password`
namespace in post_tasks
### Systemd Forgejo Actions runner
- Native `services.gitea-actions-runner` with `forgejo-runner` package — no DinD, no k8s pod, runs directly on the NixOS host
- Label `nix-container-builder:host` — jobs execute on the host with `nix`, `skopeo`, `nodejs`, etc. in PATH
- Registration token fetched from 1Password (`Forgejo Secrets/runner_reg`) by Ansible and written to `/etc/forgejo-runner/token.env`
- Runner's dynamic user (`gitea-runner`) added to `nix.settings.trusted-users` for nix daemon access
### Nix container build workflow
- New `.forgejo/workflows/build-container-nix.yaml` triggers on `*-nix-v[0-9]*` tags (e.g. `nettest-nix-v1.0.0`)
- Builds with `nix build -f containers/<name>/default.nix`, pushes to Zot via `skopeo copy`
- Existing Dockerfile workflow guarded with `if: !contains(github.ref_name, '-nix-v')` to avoid double-triggering
### Mise task updates
- `container-tag-and-release` auto-detects `default.nix` vs `Dockerfile` and uses the appropriate tag format (`-nix-v` vs `-v`)
- `container-list` shows build type indicator (`[nix]` / `[dockerfile]`)
## Post-merge
1. `mise run provision-ringtail` — deploys k3s token, runner token, NixOS rebuild
2. Register k3s cluster in ArgoCD (first time only):
```fish
ssh ringtail 'sudo cat /etc/rancher/k3s/k3s.yaml' | \
sed 's|127.0.0.1|ringtail.tail8d86e.ts.net|' > /tmp/k3s-ringtail.yaml
set -x KUBECONFIG /tmp/k3s-ringtail.yaml
argocd cluster add default --name k3s-ringtail
3. Sync ArgoCD apps in order: 1password-connect-ringtail -> external-secrets-crds-ringtail -> external-secrets-ringtail ->
external-secrets-config-ringtail
4. Verify runner: ssh ringtail 'systemctl status gitea-runner-nix-container-builder'
5. Check Forgejo admin panel for ringtail-nix-builder runner online
6. Test: create containers/<name>/default.nix, tag with <name>-nix-v0.1.0
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/209
nixos-rebuild can dirty the tree (e.g. flake.lock updates), which
blocks the Ansible git module. Force ensures we always reset to the
upstream state.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NixOS doesn't include Python by default. Ansible needs it on the
managed host for module execution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sway/wlroots refuses to start on proprietary NVIDIA by default.
Add --unsupported-gpu flag and disable hardware cursors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Bump Frigate image from `0.16.4-standard-arm64` to `0.17.0-rc2-standard-arm64`
- Adapt `record` config to 0.17 schema: `retain.days`/`mode: all` → `continuous.days`
- Update service docs and version tracker
This is the first step toward the Apple Silicon ZMQ detector. The existing ONNX detector is kept so we can validate the upgrade independently.
## What is NOT changing
- Detector config (still `type: onnx` with YOLO-NAS-s)
- go2rtc streams, MQTT, cameras, zones, review rules
- frigate-notify, storage PVs, Grafana dashboard
## Deployment and Testing
- [ ] `argocd app set frigate --revision upgrade-frigate-0.17 && argocd app sync frigate`
- [ ] Pod starts, `/api/version` returns `0.17.0-rc2`
- [ ] No config errors in pod logs
- [ ] Frigate web UI loads at `https://nvr.ops.eblu.me`
- [ ] Live view works, detection running (`/api/stats` shows `detection_fps > 0`)
- [ ] Recordings being created (`/api/recordings/summary`)
- [ ] MQTT events flowing (check frigate-notify logs)
- [ ] After merge: `argocd app set frigate --revision main && argocd app sync frigate`
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/205
## Summary
- Cap detect FPS to 2 to prevent recording segment backlog from ONNX inference bottleneck (~750ms/frame on ARM64 CPU)
- Sync motion masks from live config (added second mask area)
- Update driveway_entrance zone coordinates from live config
- Add explicit alert labels `[person, car]` while keeping `required_zones: [driveway_entrance]`
## Context
The "No frames have been received" error on the gablecam live view was caused by the detect stream falling behind — ONNX YOLO-NAS-s takes ~750ms per inference on ARM64 CPU, but the sub-stream sends 5 FPS. This caused recording segments to pile up and the ffmpeg watchdog to repeatedly kill/restart the process, creating gaps in the live view.
## Test plan
- [ ] Sync ArgoCD `frigate` app to branch and verify pod restarts cleanly
- [ ] Check `/api/stats` — `skipped_fps` should drop significantly, `process_fps` should be close to 2
- [ ] Verify live view at https://nvr.ops.eblu.me/#gablecam no longer shows "No frames" error
- [ ] Verify detections and alerts still work in the driveway_entrance zone
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/204
## Summary
- Bump External Secrets Operator Helm chart from `helm-chart-1.3.1` to `helm-chart-2.0.0` (operator v1.3.2)
- Updates both the operator app and CRDs app `targetRevision`
- No Helm values changes needed — `installCRDs`, `resources`, `webhook`, `certController` keys are unchanged
## Breaking changes in chart 2.0.0
- **Removed providers:** Alibaba and Device42 (unmaintained) — does not affect our 1Password setup
- **Templating engine v1 deprecated** — our ExternalSecrets don't set `engineVersion`, so they use the default (v2)
- **Webhook `failurePolicy`** for SecretStore is now dynamic
## Deployment
1. Sync CRDs first: `argocd app set external-secrets-crds --revision update/external-secrets-helm-2.0.0 && argocd app sync external-secrets-crds`
2. Sync operator: `argocd app set external-secrets --revision update/external-secrets-helm-2.0.0 && argocd app sync external-secrets`
3. Verify: `kubectl --context=minikube-indri -n external-secrets get pods`
4. After merge, set both apps back to `--revision main`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/203
## Summary
- Add `containers/ntfy/Dockerfile` — three-stage build (Node web UI, Go+CGO server, Alpine runtime) pinned to commit SHA `a03a37fe` (v2.17.0), sourced from forge mirror
- Update ntfy deployment image from `binwiederhier/ntfy:v2.17.0` to `registry.ops.eblu.me/blumeops/ntfy:v1.0.0`
- Note fish shell in CLAUDE.md
## Deployment
After merge, release the container image:
```fish
mise run container-tag-and-release ntfy v1.0.0
```
Then sync:
```fish
argocd app sync ntfy
```
## Test plan
- [x] `docker build` succeeds
- [x] `dagger call build --src=. --container-name=ntfy` succeeds (exit 0, container ID printed)
- [x] `ntfy --help` works in built container
- [ ] Tag and release `ntfy-v1.0.0` after merge
- [ ] Verify ntfy pod starts with new image
- [ ] Verify health endpoint responds at `ntfy.ops.eblu.me/v1/health`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/202
## Summary
- Upgrade ntfy from v2.11.0 to v2.17.0 (6 minor releases, no breaking changes)
- Add reference doc for ntfy service
- Add reference doc for frigate service (ntfy's sole producer via frigate-notify)
- Update reference index and service-versions.yaml tracking
## Notable upstream changes (v2.12.0–v2.17.0)
- **v2.14.0:** Declarative users/ACL config in files
- **v2.15.0:** `require-login` flag for topic-level auth
- **v2.16.0:** Dead man's switch (heartbeat) notifications, notification update/delete
- **v2.17.0:** Priority templating, crash fixes (nil pointer panics)
## Deployment and Testing
- [ ] ArgoCD sync ntfy after merge
- [ ] Verify ntfy pod healthy with new image
- [ ] Send a test notification via `curl -d "test" https://ntfy.ops.eblu.me/test`
- [ ] Verify frigate-notify still delivers alerts to ntfy
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/201