Commit graph

732 commits

Author SHA1 Message Date
Forgejo Actions
2f78d180e8 Update docs release to v1.11.3
- Built changelog from towncrier fragments

[skip ci]
2026-02-23 21:04:33 -08:00
9b4951bf94 Improve Mikado process: cycle discipline, reset rigor, --resume enhancements (#261) v1.11.3
## Summary
- **End-of-cycle prompting:** After closing a leaf node and pushing, the agent should prompt the user to review and suggest ending the session rather than rushing into the next leaf
- **Reset rigor:** Reinforced that errors during impl should trigger a branch reset + plan update (not fix-forward). Documented the `git log --oneline --not main` → `git reset --hard` → `git cherry-pick` pattern with clear threshold guidance
- **`--resume` shows PR number:** Queries the Forgejo API for open PRs matching the branch, displays number/title/URL and a hint to run `pr-comments`
- **`--resume` checks git stash:** Shows stash entries as a non-presumptive hint — informs without assuming they apply

## Test plan
- [ ] `mise run docs-mikado --resume` runs without errors (no active chains case)
- [ ] On a mikado branch with an open PR, verify PR info is shown
- [ ] With stashed work, verify stash entries are displayed
- [ ] Review agent-change-process.md for clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/261
2026-02-23 21:03:27 -08:00
d05d2fbaff C2: Upgrade Grafana to 12.x with Nix container and Kustomize (#260)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 1s
Build Container (Nix) / build (grafana) (push) Successful in 2s
Build Container / build (grafana) (push) Successful in 7s
## Summary

Mikado chain to upgrade Grafana from 11.4.0 (Helm chart) to 12.x with:
- Home-built Nix container image (`forge.ops.eblu.me/eblume/grafana`)
- Kustomize manifests replacing the Helm chart
- Single-source ArgoCD app

## Chain

Goal: `upgrade-grafana`
Leaves: `build-grafana-container`, `kustomize-grafana-deployment`

Track with: `mise run docs-mikado upgrade-grafana`

## Test plan
- [ ] Container builds successfully via Nix
- [ ] Container pushed to registry
- [ ] Kustomize manifests produce equivalent resources to current Helm
- [ ] Pod runs, UI loads, OIDC works, datasources healthy
- [ ] `mise run services-check` passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/260
2026-02-23 18:07:18 -08:00
9b419abf24 Update RUNNER_LABELS to use runner-job-image:v0.19.11-4c5e0f0
Now that the image is built under the new name, point the forgejo
runner at it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:47:14 -08:00
4c5e0f0d16 Rename containers/forgejo-runner to runner-job-image
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (runner-job-image) (push) Successful in 2s
Build Container / build (runner-job-image) (push) Successful in 1m42s
The forgejo-runner container is the CI job execution environment (Dagger,
ArgoCD CLI, etc.), not the runner daemon itself. Rename to runner-job-image
to fix the version-check false positive (Dagger 0.19.11 vs daemon 12.7.0)
and clarify the distinction.

RUNNER_LABELS still references the old image name — will update after
building the image under the new name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:44:51 -08:00
179eca2070 Fix container-build-and-release to resolve refs to full SHA
actions/checkout treats short SHAs as branch name patterns, causing
fetch failures. Always resolve --ref to a full 40-char SHA before
dispatching the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:27:18 -08:00
7641018c6a Fix container build workflows to checkout dispatch ref
When manually dispatching a container build with --ref, the build job
now checks out the specified commit instead of the branch HEAD. This
allows building containers from feature branches before merging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:24:32 -08:00
c70aff256a Fix mikado-branch-invariant-check not validating incoming commits
The commit-msg hook had `pass_filenames: false`, which prevented
pre-commit from passing the commit message file path. Without it,
the hook only validated existing branch history and never checked
the incoming commit against ordering rules — plan-after-impl was
silently accepted.

Remove `pass_filenames: false` so the pending commit is included
in invariant validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:32:34 -08:00
66b5b32f1d Formalize C0/C1/C2 change classification (#259)
## Summary
- **C0 (Quick Fix):** Now explicitly allows direct-to-main commits with no PR required — for low-risk, fix-forward-safe changes
- **C1 (Human Review):** New docs-first workflow with branch deployment (ArgoCD `--revision`, Ansible from checkout). Includes upgrade criteria for escalation to C2
- **C2 (Mikado Chain):** Introduces the **Mikado Branch Invariant** — strict commit ordering where card-introducing commits come first, followed by code progress, followed by card closures. Branch resets required when new prerequisites are discovered

Updates CLAUDE.md rules (3, 4, 8, 9) to reflect that C0 bypasses branching/PR requirements. Also updates ai-assistance-guide, how-to index, and docs-mikado task description.

## Files changed
- `CLAUDE.md` — rules and classification table
- `docs/how-to/agent-change-process.md` — full process rewrite
- `docs/tutorials/ai-assistance-guide.md` — branching and pitfalls sections
- `docs/how-to/how-to.md` — index description
- `mise-tasks/docs-mikado` — task description
- `docs/changelog.d/formalize-change-classification.doc.md` — changelog fragment

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/259
2026-02-23 16:19:54 -08:00
f05e5cccdf Review Grafana: replace Helm upgrade plan with C2 Mikado chain (#258)
## Summary
- Delete the old 3-phase Helm chart upgrade plan (predates Mikado system)
- Create C2 Mikado chain with goal card `upgrade-grafana` and two leaf prereqs:
  - `kustomize-grafana-deployment` — convert Helm to kustomize manifests
  - `build-grafana-container` — home-built Grafana 12.x image (no upstream containers)
- Record first-ever Grafana review: currently at v11.4.0 on Helm chart 8.8.2
- Update service-versions.yaml, how-to index, and plans index

## Service Review Findings
- Grafana is healthy and synced in ArgoCD
- Running v11.4.0, latest upstream is 12.3.3
- Breaking changes for 12.x are low-risk (React panels only, UIDs compliant)
- PVC is disposable — dashboards and datasources are all config-provisioned

## Deployment and Testing
- [ ] No deployment needed — documentation-only change
- [ ] `docs-check-links` passes
- [ ] `docs-check-index` passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/258
2026-02-23 15:06:00 -08:00
2865bf5c27 Review deploy-authentik: rewrite as process guide (#257)
## Summary
- Rewrites deploy-authentik from a historical changelog into a reproducible process guide
- Removes stale version info (`v1.1.2-nix`) and future work section (Forgejo federation is done, rest belongs elsewhere)
- Marks deploy-authentik as completed in plans index and completed archive
- Removes hardcoded image tag from authentik reference card (use `service-versions.yaml`)
- Adds `last-reviewed: 2026-02-23` frontmatter

## Test plan
- [x] All pre-commit hooks pass (docs-check-links, docs-check-index, etc.)
- [x] ArgoCD app verified synced and healthy
- [x] All wiki-links validated

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/257
2026-02-23 14:35:39 -08:00
260b7ca520 Fix dagger call hanging in mise tasks on interactive terminals (#256)
## Summary

- Add `--progress=plain` to all `dagger call` invocations in mise tasks to prevent SIGTTOU hangs

## Root cause

Mise runs task scripts in a child process group that is not the terminal's foreground group. When `dagger call` detects a TTY (inherited from the interactive shell), it tries to render its TUI progress display, which requires terminal ioctls. Since the process is not in the foreground group, the kernel sends SIGTTOU, stopping the process indefinitely.

This only manifests when running from an interactive terminal (e.g. `pre-commit run --all-files` in fish/wezterm). CI and piped contexts are unaffected since there's no TTY.

## Changes

- `mise-tasks/validate-workflows` — add `--progress=plain`
- `mise-tasks/frigate-export-model` — add `--progress=plain`
- `mise-tasks/provision-ringtail` — add `--progress=plain`

## Test plan

- [x] `pre-commit run --all-files` completes without hanging
- [ ] Verify in interactive fish/wezterm terminal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/256
2026-02-23 14:15:58 -08:00
84d2cdcf14 Update tooling dependencies (Feb 2026 cycle)
Pre-commit: trufflehog v3.93.4, ruff v0.15.2, shellcheck v0.11.0.1,
prettier v3.8.1, actionlint v1.7.11

Fly.io: pin nginx 1.28.2-alpine, bump alloy v1.5.1 -> v1.13.1

Forgejo workflows: pin actions/checkout to SHA (v4.3.1)

Mise tasks: normalize httpx>=0.28.0, typer>=0.15.0 across all scripts

Add how-to doc for the monthly tooling dependency update cycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 13:22:09 -08:00
cb9a06bb75 Update tooling dependencies (Feb 2026 cycle) (#254)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m30s
## Summary

Monthly tooling dependency update cycle:

- **Pre-commit hooks**: trufflehog v3.92.5→v3.93.4, ruff v0.14.13→v0.15.2, shellcheck v0.10.0.1→v0.11.0.1, prettier v3.8.0→v3.8.1, actionlint v1.7.10→v1.7.11
- **Fly.io Dockerfile**: pin nginx to 1.28.2-alpine (was unpinned), bump alloy v1.5.1→v1.13.1
- **Mise tasks**: normalize httpx lower bound to >=0.28.0 and typer to >=0.15.0 across all scripts
- **Forgejo workflows**: actions/checkout@v4 is current, no changes needed
- **New how-to doc**: [[update-tooling-dependencies]] documenting this monthly cycle

## No changes needed

- pre-commit-hooks v6.0.0, yamllint v1.38.0, shfmt v3.12.0-2, taplo v0.9.3, ansible-lint 26.1.1 — all already at latest

## Test plan

- [x] `uvx pre-commit run --all-files` — all 24 hooks pass
- [ ] Fly.io deploy (triggered automatically on merge to main via deploy-fly workflow)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/254
2026-02-23 13:08:41 -08:00
75fde54355 Fix Grafana TeslaMate dashboard folder provisioning (#253)
## Summary
- `foldersFromFilesStructure` was `false` in Grafana's sidecar provider config, causing Grafana to ignore the subdirectory structure the sidecar creates from `grafana_folder` annotations
- All 18 TeslaMate dashboards were appearing in the root "Dashboards" folder despite having `grafana_folder: "TeslaMate"` annotations on their ConfigMaps
- Flipping to `true` makes Grafana replicate the sidecar's directory structure as UI folders

## Deployment and Testing
- [ ] Sync `grafana` app: `argocd app sync grafana`
- [ ] Verify TeslaMate dashboards appear under a "TeslaMate" folder in Grafana's dashboard list
- [ ] Verify other dashboards remain in the root "Dashboards" folder

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/253
2026-02-22 18:38:51 -08:00
6871bf32a8 Remove unused gpu panel in frigate dashboard 2026-02-22 18:26:47 -08:00
2c6c6a244a Fix Frigate Prometheus metrics & rebuild Grafana dashboard (#252)
## Summary

- **Prometheus scrape target:** Changed from `frigate.frigate.svc.cluster.local:5000` (broken after ringtail migration) to `nvr.ops.eblu.me` via HTTPS through Caddy on indri
- **Grafana dashboard:** Rebuilt for Frigate 0.17 metrics — 12 panels total:
  - Row 1 (stats): Uptime, Inference Speed, Camera FPS, Detection FPS, GPU Usage, GPU Temp
  - Row 2 (timeseries): CPU Usage, Memory Usage
  - Row 3 (timeseries): Camera FPS + Skipped FPS, GPU Usage + Memory over time
  - Row 4 (timeseries): Storage Usage, Detection Events (rate by camera/label)

## Deployment and Testing

1. Sync prometheus app on branch:
   ```
   argocd app set prometheus --revision fix/frigate-metrics-dashboard && argocd app sync prometheus
   ```
2. Check `prometheus.ops.eblu.me/targets` — frigate job should show UP
3. Sync grafana-config:
   ```
   argocd app sync grafana-config
   ```
4. Check `grafana.ops.eblu.me` — Frigate NVR dashboard should show live data
5. After merge: reset both apps to `--revision main` and sync

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/252
2026-02-22 18:14:17 -08:00
Forgejo Actions
dda7d719b3 Update docs release to v1.11.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-22 17:52:05 -08:00
e655f4556e Upgrade k8s forgejo-runner from v6.3.1 to v12.7.0 (#251) v1.11.2
## Summary

Completes the `upgrade-k8s-runner` mikado chain. Both prerequisites (workflow validation in Dagger, config review against v12 defaults) were resolved in #250.

- Bump runner image `code.forgejo.org/forgejo/runner:6.3.1` → `12.7.0`
- Update `service-versions.yaml` to track new version
- Mark goal card complete (remove `status: active`)

## Deployment and Testing

After merge:
1. `argocd app sync forgejo-runner`
2. Verify runner registers in Forgejo admin → runners
3. Trigger a test workflow (e.g. `branch-cleanup.yaml` manual dispatch)

Rollback: revert image tag to `6.3.1`, push, sync.
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/251
2026-02-22 17:43:39 -08:00
0f6a1898f0 Prepare forgejo-runner v12 upgrade (leaf nodes) (#250)
## Summary
- Review runner config against v12.7.0 defaults — added `shutdown_timeout: 3h`, no breaking changes found
- Add `validate_workflows` Dagger function using `forgejo-runner validate --directory .` inside upstream container
- All 6 workflows pass v12.7.0 schema validation
- Wire `mise run validate-workflows` task and pre-commit hook on `.forgejo/workflows/` changes
- Mark both leaf Mikado cards (`review-runner-config-v12`, `validate-workflows-against-v12`) complete

## Mikado State
After merge, `upgrade-k8s-runner` goal card has no unmet dependencies — ready to execute the actual image bump in a follow-up PR.

## Test Plan
- [x] `dagger call validate-workflows --src=.` passes (all 6 workflows OK)
- [x] Pre-commit hooks pass
- [ ] Reviewer: confirm `shutdown_timeout: 3h` addition to ConfigMap looks reasonable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/250
2026-02-22 17:38:32 -08:00
00b0287bcc Upgrade k8s forgejo-runner from v6.3.1 to v12.x (#249)
## Summary
- C2 Mikado chain for upgrading the k8s forgejo-runner daemon (6 major versions behind)
- Root goal card with two leaf prerequisites: workflow validation and config review
- Ringtail runner is already at ~v12.6.4 via nixpkgs, no work needed there

## Mikado Chain

```
upgrade-k8s-runner (goal)
├── validate-workflows-against-v12 (leaf)
└── review-runner-config-v12 (leaf)
```

Both leaves are actionable now. The biggest risk is workflow schema validation
(introduced in v8/v9) rejecting our existing workflows.

## Next Steps
Work the leaf nodes in a follow-up session, then attempt the goal.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/249
2026-02-22 17:12:45 -08:00
198c810e86 Fix branch-cleanup: fall back to head.label for deleted branches (#248)
## Summary
- Forgejo rewrites `head.ref` to `refs/pull/N/head` once a PR's source branch is deleted from the remote
- The original branch name is preserved in `head.label`
- This was causing 188 out of 246 merged PRs to go undetected by the cleanup script
- Fix: fall back to `head.label` when `head.ref` starts with `refs/pull/`

## Test plan
- [x] Dry run correctly identifies 18 previously-missed local branches
- [x] Live run successfully deleted all 18

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/248
2026-02-22 16:15:51 -08:00
abb7e6fe88 Add branch-cleanup mise task (#247)
## Summary
- New `mise run branch-cleanup` task that finds branches merged into main and deletes them locally and on the Forgejo remote
- Configurable `--cutoff` (default 30 days) skips branches with recent HEAD commits
- Supports `--dry-run`, `--local-only`, `--remote-only` flags
- Interactive confirmation before any deletion

## Test plan
- [x] `mise run branch-cleanup -- --dry-run` shows correct table of candidates
- [ ] Run without `--dry-run` to confirm actual deletion works

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/247
2026-02-22 16:00:09 -08:00
d51c180fe6 Switch Frigate detection model from YOLO-NAS-S to YOLOv9-c (#246)
## Summary
- Replace abandoned YOLO-NAS-S (320x320, `yolonas`) with YOLOv9-c (640x640, `yolo-generic`)
- YOLOv9-c benefits from CUDA Graphs in Frigate 0.17 on the RTX 4080
- Add `export_yolov9` Dagger pipeline and `frigate-export-model` mise task for reproducible model exports
- Model already deployed to `sifaka:/volume1/frigate/models/yolov9-c-640.onnx`

## Config changes
- `model_type: yolonas` → `yolo-generic`
- `input_dtype: int` → `float`
- `width/height: 320` → `640`
- `path:` → `yolov9-c-640.onnx`

## Deployment and Testing
- [ ] Merge and sync Frigate ArgoCD app: `argocd app sync frigate`
- [ ] Verify Frigate starts and detects objects at https://nvr.ops.eblu.me
- [ ] Confirm GPU inference via Frigate system metrics

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/246
2026-02-22 15:14:45 -08:00
2c081eed28 Add Forgejo repository health metrics and Grafana dashboard (#245)
## Summary
- New `forgejo_metrics` Ansible role that queries the Forgejo REST API every 60s and writes Prometheus textfile metrics (open PRs, issues, languages, releases, commits, Actions runs/duration/success)
- Grafana dashboard "Forgejo Repository Health" with 12 panels across 4 rows: overview stats, CI/CD health, repository info, and staleness tracking
- Deletes superseded `forgejo-actions-dashboard` plan doc (this implementation covers a broader scope)

## Deployment and Testing
- [ ] `mise run provision-indri -- --tags forgejo_metrics` to deploy the collector
- [ ] `ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/forgejo.prom'` to verify metrics
- [ ] `argocd app sync grafana-config` to deploy the dashboard
- [ ] Check Grafana dashboard "Forgejo Repository Health" loads with data
- [ ] `mise run services-check` passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/245
2026-02-22 11:16:03 -08:00
Forgejo Actions
c21cf54847 Update docs release to v1.11.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-22 10:21:19 -08:00
e41c28ed90 Replace indri-runner-logs with general-purpose runner-logs Typer CLI (#244) v1.11.1
## Summary
- Replace bash `indri-runner-logs` with a Python Typer CLI `runner-logs` that supports filtering by runner host (`indri`, `ringtail`, or `all`) with rich table output
- Add missing `#USAGE` declarations to `docs-review`, `docs-review-stale`, and `service-review` so flags work without the `--` separator
- Update docs references in `review-documentation.md` and `review-services.md` to use the new flag syntax

## Test plan
- [x] `mise run runner-logs all` lists runs from both runners
- [x] `mise run runner-logs ringtail` filters to ringtail-only runs
- [x] `mise run docs-review-stale --threshold 90` works without `--`
- [x] `mise run docs-review --limit 5` works without `--`
- [x] `mise run service-review --limit 3` works without `--`
- [x] Pre-commit hooks pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/244
2026-02-22 10:20:11 -08:00
c897fc8e1f Use Zot registry icon on homepage dashboard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 09:19:46 -08:00
Forgejo Actions
627caeb61f Update docs release to v1.11.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-22 09:16:00 -08:00
c427f04ec4 Review 3 docs: agent-change-process, build-authentik-container, create-authentik-secrets (#243) v1.11.0
## Summary
- Stamped `last-reviewed: 2026-02-22` on three never-reviewed docs
- `agent-change-process.md`: accurate, no content changes
- `build-authentik-container.md`: accurate, container image verified in registry
- `create-authentik-secrets.md`: added note about additional OIDC client secret fields added since original card was written

## Changelog
- `docs/changelog.d/doc-review/agent-change-process.doc.md` (not added — stamp-only, no user-visible change)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/243
2026-02-22 09:12:31 -08:00
529ba10939 Fix frigate-notify: webapi polling, dedup, hi-res snapshots (#242)
## Summary
- Switch from MQTT to webapi polling (v0.5.4 requires only one method)
- Poll every 15s for responsive alerts
- **`notify_once: true`** — one notification per event instead of repeats as object changes zones
- **`nosnap: drop`** — skip events without snapshots (was causing all events to be dropped on v0.3.5)
- **`snap_hires: true`** — use recording stream for higher quality snapshot images

## Deployment and Testing
- [ ] Sync: `argocd app set frigate --revision fix/frigate-notify-config && argocd app sync frigate`
- [ ] Verify pod starts: `kubectl --context=k3s-ringtail -n frigate get pods -l app=frigate-notify`
- [ ] Check logs for successful startup and event processing (no "No snapshot" drops)
- [ ] Wait for a motion event and confirm single ntfy notification with hi-res snapshot
- [ ] After merge: `argocd app set frigate --revision main && argocd app sync frigate`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/242
2026-02-22 09:05:45 -08:00
7dcab826fa Upgrade frigate-notify from v0.3.5 to v0.5.4 (#241)
## Summary
- Service review: upgrade frigate-notify from v0.3.5 to v0.5.4
- No breaking changes for current MQTT + ntfy config
- Notable additions: high-res snapshots, MQTT topic parsing fixes, env var parsing fixes

## Deployment and Testing
- [ ] Sync frigate app on ringtail: `argocd app set frigate --revision review/frigate-notify-v0.5.4 && argocd app sync frigate`
- [ ] Verify pod starts cleanly: `kubectl --context=k3s-ringtail -n frigate get pods`
- [ ] Trigger a test alert (motion event) and confirm ntfy notification arrives
- [ ] After merge: `argocd app set frigate --revision main && argocd app sync frigate`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/241
2026-02-22 08:42:47 -08:00
a5429d5a34 Update ringtail flake inputs, add flake-update pipeline (#240)
## Summary
- Update all ringtail NixOS flake inputs (nixpkgs, disko, home-manager) to latest
- Add `flake_update` Dagger function (`nix flake update`) alongside existing `flake_lock` (`nix flake lock`)
- Add how-to guide for managing the ringtail lockfile
- Update dagger and ringtail reference cards

## Deployment and Testing
- [x] `mise run provision-ringtail` — deployed successfully, `changed=2` (repo + rebuild)
- [x] `mise run services-check` — all services healthy
- [x] Doc link and index checks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/240
2026-02-22 08:17:52 -08:00
b4015153c6 No navidrome authentikation 2026-02-21 20:33:48 -08:00
a82c705bf6 Add SSO login button to Jellyfin login page
Deploy branding.xml with a "Sign in with Authentik" button in the
login disclaimer. Local password login remains available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 20:08:57 -08:00
07fb48626d Add Authentik SSO integration for Jellyfin (#239)
## Summary
- Add Authentik OIDC provider + application for Jellyfin via blueprint (all authenticated users allowed, no policy binding)
- Wire `jellyfin-client-secret` through ExternalSecret and Authentik worker deployment
- Install [jellyfin-plugin-sso](https://github.com/9p4/jellyfin-plugin-sso) v4.0.0.3 via Ansible, with OIDC config template
- Authentik `admins` group maps to Jellyfin administrator role
- Local login left enabled; SSO is additive

## Deployment and Testing
- [ ] Sync ArgoCD `authentik` app on branch — verify provider + application appear in Authentik admin
- [ ] `mise run provision-indri -- --tags jellyfin --check --diff` (dry run)
- [ ] `mise run provision-indri -- --tags jellyfin` (deploy plugin + config)
- [ ] Test SSO flow: `https://jellyfin.ops.eblu.me/sso/OID/start/authentik`
- [ ] Verify `eblume` account auto-links via `preferred_username` match
- [ ] Verify admins group → Jellyfin admin
- [ ] Reset ArgoCD app revision to main after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/239
2026-02-21 20:05:44 -08:00
e1c2892878 Fix container tags deleted during old-tag cleanup
Five container manifests were removed when deleting old-style tags
(shared digests). Rebuild on a72a0d8 and update references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:26:29 -08:00
a72a0d8e8e Update all container images to new upstream-version tagging scheme (#238)
## Summary
- Updates all 15 container image references across 14 ArgoCD manifest files
- Migrates from old internal `vX.Y.Z` tags to new `v<upstream-version>-<sha>` format
- Covers: authentik, cv, devpi, forgejo-runner, homepage, kiwix-serve, kubectl, miniflux, navidrome, ntfy, quartz, teslamate, transmission

## Deployment and Testing
- [ ] Sync all ArgoCD apps on branch revision
- [ ] Verify all services come up healthy
- [ ] Merge and re-sync on main
- [ ] Clean up old-style tags from zot registry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/238
2026-02-21 15:58:11 -08:00
55d31c9c0b Docs pass: update zot Mikado chain for completion
- harden-zot-registry: fix Authentik hostname, check off all
  verified items, add metrics config to "what was done"
- enforce-tag-immutability: fix admins permissions (was missing
  update)
- agent-change-process: clarify that requires: is permanent and
  status: active is the only completion marker
- zot reference: update modified date
- wire-ci-registry-auth fragment: add metrics fix
- Remove stale harden-zot-mikado-cards.ai.md planning fragment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 15:32:34 -08:00
8775c3841a Allow anonymous access to zot /metrics endpoint
Add accessControl.metrics.users with empty string to allow
unauthenticated Prometheus/Alloy scraping. Zot represents
anonymous users with an empty username internally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 12:37:59 -08:00
ff63679efb Enable zot registry auth + wire CI credentials (#237)
## Summary

- Enable OIDC + API key authentication on zot registry with three-tier accessControl
  - `anonymousPolicy: ["read"]` — anyone can pull
  - `artifact-workloads` group: `["read", "create"]` — CI push, no overwrite/delete
  - `admins` group: `["read", "create", "update", "delete"]` — break-glass
- Wire both CI push paths (Dagger and Nix/skopeo) with `ZOT_CI_API_KEY` credentials
- Add `artifact-workloads` PolicyBinding in Authentik blueprint for zot app access
- Add `ZOT_CI_API_KEY` to Forgejo Actions secrets via existing ansible role

Completes the `wire-ci-registry-auth` and `harden-zot-registry` Mikado cards.

## Manual Deployment Steps (after merge)

1. Deploy Authentik blueprint: `argocd app sync authentik`
2. In Authentik admin UI: set a password for the `zot-ci` service account
3. Deploy zot config: `mise run provision-indri -- --tags zot`
4. Log in to `https://registry.ops.eblu.me` as `zot-ci` via OIDC → generate API key
5. Store API key in 1Password as `zot-ci-apikey` in blumeops vault
6. Sync Forgejo secrets: `mise run provision-indri -- --tags forgejo_actions_secrets`
7. Trigger a test container build to verify CI push
8. Verify anonymous pull: `curl -sf https://registry.ops.eblu.me/v2/_catalog`

## Uncertainties

- **Zot `accessControl` group matching with OIDC:** Groups from Authentik's `profile` scope claim should map to zot policy groups, but the exact claim-to-group matching needs runtime verification
- **`http.auth.apikey: true`:** This config key is documented but needs verification against the specific zot version built from source on indri
- **API key permissions:** Need to confirm zot API keys inherit the generating user's group for accessControl evaluation

## Test Plan

- [ ] `mise run provision-indri -- --check --diff --tags zot` shows expected config changes
- [ ] Anonymous pull works after deploy
- [ ] Unauthenticated push fails (401)
- [ ] OIDC browser login redirects to Authentik and back
- [ ] API key push works after key generation
- [ ] CI push succeeds with both Dagger and skopeo paths
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/237
2026-02-21 12:20:29 -08:00
30a7c4de9b Close register-zot-oidc-client Mikado card
Completed in PR #236. Updated card to reflect what was actually
implemented, including deviations (worker env var wiring, manual
service account setup).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 08:49:32 -08:00
21b6533aea Register Zot as OIDC client in Authentik (#236)
## Summary
- Add Authentik blueprint (`zot.yaml`) with OAuth2 provider, application, `artifact-workloads` group, and `zot-ci` service account
- Wire `zot-client-secret` through ExternalSecret → worker Deployment env var → blueprint `!Env`
- Add Ansible pre_task to fetch OIDC secret from 1Password (item ID `oor7os5kapczgpbwv7obkca4y4`)
- Add `oidc-credentials.json.j2` template and deploy task in zot role (with `when` guard)

## Manual Steps Required Before Deploy
1. Generate client secret: `openssl rand -hex 32`
2. Store in 1Password: add field `zot-client-secret` to "Authentik (blumeops)" item in vault `blumeops`

## What This Does NOT Do
- Does NOT modify `config.json.j2` (that's the root goal `harden-zot-registry`)
- Does NOT wire CI auth (that's `wire-ci-registry-auth`)
- Does NOT set service account password or API keys (manual post-deploy)

## Verification
After ArgoCD sync:
- [ ] Authentik admin UI shows "Zot Registry" application
- [ ] OIDC discovery at `https://authentik.ops.eblu.me/application/o/zot/.well-known/openid-configuration` returns valid JSON
- [ ] Blueprint status is `successful`
- [ ] `artifact-workloads` group exists with `zot-ci` service account

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/236
2026-02-21 08:45:06 -08:00
04e036c603 Fold enforce-tag-immutability into harden-zot-registry (#235)
## Summary

- Removed `status: active` from `enforce-tag-immutability` card — its requirements are folded into the parent `harden-zot-registry` goal's `accessControl` configuration
- Updated `harden-zot-registry` with three-tier access control spec (anonymous read, artifact-workloads read+create, admins full)
- Added `artifact-workloads` group creation step to `register-zot-oidc-client`
- Added service account context to `wire-ci-registry-auth`

## Rationale

Tag immutability requires authentication to be meaningful. Without auth, everyone is anonymous and gets the same policy. Rather than client-side push checks, the registry enforces immutability server-side: CI gets `["read", "create"]` (no update/delete), so pushing an existing tag is rejected by zot itself.

## Test plan

- [ ] `mise run docs-check-links` passes
- [ ] `mise run docs-mikado` shows enforce-tag-immutability as resolved

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/235
2026-02-21 08:05:16 -08:00
64691da4fb Complete adopt-commit-based-container-tags Mikado card
All 14 containers now have commit-SHA-based tags in the registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:28:45 -08:00
96a2d420fb Update install-dagger-on-nix-runner card with actual resolution
Dagger can't run on the bare nix runner (needs container runtime).
Used nix eval directly instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:23:06 -08:00
b8bc0bffd8 Update ringtail flake.lock (remove dagger input)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:21:36 -08:00
02f2be5523 Use nix eval instead of dagger for nix runner version extraction
Dagger CLI needs a container runtime (Docker/containerd) to start its
engine, which the bare nix runner doesn't have. Use nix eval directly
instead — it's already available and more appropriate for a nix host.

Reverts the dagger flake input since it's not usable here.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:21:16 -08:00
de037ae51f Update ringtail flake.lock with dagger/nix input
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:15:19 -08:00
a2d2808270 Add dagger CLI to nix runner via github:dagger/nix flake
Dagger was removed from nixpkgs due to trademark concerns. Use the
official dagger/nix flake as a flake input instead, passing the package
through via specialArgs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 23:14:46 -08:00