Home-built Grafana 12.3.3 container is ready. Dockerfile builds from
Alpine 3.22 + official OSS tarball, verified via dagger and
container-version-check.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add home-built Grafana 12.3.3 container image based on Alpine 3.22
with pre-built OSS tarball from dl.grafana.com. Uses UID 472 for PVC
compatibility with the official image, standard Grafana paths, and
multi-arch support via TARGETPLATFORM detection.
Update service-versions.yaml to track 12.3.3.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Discovered during deployment: the grafana binary lives at bin/grafana
inside the extracted directory, not on $PATH. The Dockerfile must set
PATH or use the full path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Discovered during build: the Grafana OSS tarball extracts to
grafana-<version>, not grafana-v<version>. The Dockerfile mv
command must match this naming.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The forgejo-runner container is the CI job execution environment (Dagger,
ArgoCD CLI, etc.), not the runner daemon itself. Rename to runner-job-image
to fix the version-check false positive (Dagger 0.19.11 vs daemon 12.7.0)
and clarify the distinction.
RUNNER_LABELS still references the old image name — will update after
building the image under the new name.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
actions/checkout treats short SHAs as branch name patterns, causing
fetch failures. Always resolve --ref to a full 40-char SHA before
dispatching the workflow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When manually dispatching a container build with --ref, the build job
now checks out the specified commit instead of the branch HEAD. This
allows building containers from feature branches before merging.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The commit-msg hook had `pass_filenames: false`, which prevented
pre-commit from passing the commit message file path. Without it,
the hook only validated existing branch history and never checked
the incoming commit against ordering rules — plan-after-impl was
silently accepted.
Remove `pass_filenames: false` so the pending commit is included
in invariant validation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- **C0 (Quick Fix):** Now explicitly allows direct-to-main commits with no PR required — for low-risk, fix-forward-safe changes
- **C1 (Human Review):** New docs-first workflow with branch deployment (ArgoCD `--revision`, Ansible from checkout). Includes upgrade criteria for escalation to C2
- **C2 (Mikado Chain):** Introduces the **Mikado Branch Invariant** — strict commit ordering where card-introducing commits come first, followed by code progress, followed by card closures. Branch resets required when new prerequisites are discovered
Updates CLAUDE.md rules (3, 4, 8, 9) to reflect that C0 bypasses branching/PR requirements. Also updates ai-assistance-guide, how-to index, and docs-mikado task description.
## Files changed
- `CLAUDE.md` — rules and classification table
- `docs/how-to/agent-change-process.md` — full process rewrite
- `docs/tutorials/ai-assistance-guide.md` — branching and pitfalls sections
- `docs/how-to/how-to.md` — index description
- `mise-tasks/docs-mikado` — task description
- `docs/changelog.d/formalize-change-classification.doc.md` — changelog fragment
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/259
## Summary
- Delete the old 3-phase Helm chart upgrade plan (predates Mikado system)
- Create C2 Mikado chain with goal card `upgrade-grafana` and two leaf prereqs:
- `kustomize-grafana-deployment` — convert Helm to kustomize manifests
- `build-grafana-container` — home-built Grafana 12.x image (no upstream containers)
- Record first-ever Grafana review: currently at v11.4.0 on Helm chart 8.8.2
- Update service-versions.yaml, how-to index, and plans index
## Service Review Findings
- Grafana is healthy and synced in ArgoCD
- Running v11.4.0, latest upstream is 12.3.3
- Breaking changes for 12.x are low-risk (React panels only, UIDs compliant)
- PVC is disposable — dashboards and datasources are all config-provisioned
## Deployment and Testing
- [ ] No deployment needed — documentation-only change
- [ ] `docs-check-links` passes
- [ ] `docs-check-index` passes
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/258
## Summary
- Rewrites deploy-authentik from a historical changelog into a reproducible process guide
- Removes stale version info (`v1.1.2-nix`) and future work section (Forgejo federation is done, rest belongs elsewhere)
- Marks deploy-authentik as completed in plans index and completed archive
- Removes hardcoded image tag from authentik reference card (use `service-versions.yaml`)
- Adds `last-reviewed: 2026-02-23` frontmatter
## Test plan
- [x] All pre-commit hooks pass (docs-check-links, docs-check-index, etc.)
- [x] ArgoCD app verified synced and healthy
- [x] All wiki-links validated
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/257
## Summary
- Add `--progress=plain` to all `dagger call` invocations in mise tasks to prevent SIGTTOU hangs
## Root cause
Mise runs task scripts in a child process group that is not the terminal's foreground group. When `dagger call` detects a TTY (inherited from the interactive shell), it tries to render its TUI progress display, which requires terminal ioctls. Since the process is not in the foreground group, the kernel sends SIGTTOU, stopping the process indefinitely.
This only manifests when running from an interactive terminal (e.g. `pre-commit run --all-files` in fish/wezterm). CI and piped contexts are unaffected since there's no TTY.
## Changes
- `mise-tasks/validate-workflows` — add `--progress=plain`
- `mise-tasks/frigate-export-model` — add `--progress=plain`
- `mise-tasks/provision-ringtail` — add `--progress=plain`
## Test plan
- [x] `pre-commit run --all-files` completes without hanging
- [ ] Verify in interactive fish/wezterm terminal
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/256
## Summary
- `foldersFromFilesStructure` was `false` in Grafana's sidecar provider config, causing Grafana to ignore the subdirectory structure the sidecar creates from `grafana_folder` annotations
- All 18 TeslaMate dashboards were appearing in the root "Dashboards" folder despite having `grafana_folder: "TeslaMate"` annotations on their ConfigMaps
- Flipping to `true` makes Grafana replicate the sidecar's directory structure as UI folders
## Deployment and Testing
- [ ] Sync `grafana` app: `argocd app sync grafana`
- [ ] Verify TeslaMate dashboards appear under a "TeslaMate" folder in Grafana's dashboard list
- [ ] Verify other dashboards remain in the root "Dashboards" folder
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/253
## Summary
Completes the `upgrade-k8s-runner` mikado chain. Both prerequisites (workflow validation in Dagger, config review against v12 defaults) were resolved in #250.
- Bump runner image `code.forgejo.org/forgejo/runner:6.3.1` → `12.7.0`
- Update `service-versions.yaml` to track new version
- Mark goal card complete (remove `status: active`)
## Deployment and Testing
After merge:
1. `argocd app sync forgejo-runner`
2. Verify runner registers in Forgejo admin → runners
3. Trigger a test workflow (e.g. `branch-cleanup.yaml` manual dispatch)
Rollback: revert image tag to `6.3.1`, push, sync.
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/251
## Summary
- Review runner config against v12.7.0 defaults — added `shutdown_timeout: 3h`, no breaking changes found
- Add `validate_workflows` Dagger function using `forgejo-runner validate --directory .` inside upstream container
- All 6 workflows pass v12.7.0 schema validation
- Wire `mise run validate-workflows` task and pre-commit hook on `.forgejo/workflows/` changes
- Mark both leaf Mikado cards (`review-runner-config-v12`, `validate-workflows-against-v12`) complete
## Mikado State
After merge, `upgrade-k8s-runner` goal card has no unmet dependencies — ready to execute the actual image bump in a follow-up PR.
## Test Plan
- [x] `dagger call validate-workflows --src=.` passes (all 6 workflows OK)
- [x] Pre-commit hooks pass
- [ ] Reviewer: confirm `shutdown_timeout: 3h` addition to ConfigMap looks reasonable
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/250
## Summary
- C2 Mikado chain for upgrading the k8s forgejo-runner daemon (6 major versions behind)
- Root goal card with two leaf prerequisites: workflow validation and config review
- Ringtail runner is already at ~v12.6.4 via nixpkgs, no work needed there
## Mikado Chain
```
upgrade-k8s-runner (goal)
├── validate-workflows-against-v12 (leaf)
└── review-runner-config-v12 (leaf)
```
Both leaves are actionable now. The biggest risk is workflow schema validation
(introduced in v8/v9) rejecting our existing workflows.
## Next Steps
Work the leaf nodes in a follow-up session, then attempt the goal.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/249
## Summary
- Forgejo rewrites `head.ref` to `refs/pull/N/head` once a PR's source branch is deleted from the remote
- The original branch name is preserved in `head.label`
- This was causing 188 out of 246 merged PRs to go undetected by the cleanup script
- Fix: fall back to `head.label` when `head.ref` starts with `refs/pull/`
## Test plan
- [x] Dry run correctly identifies 18 previously-missed local branches
- [x] Live run successfully deleted all 18
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/248
## Summary
- New `mise run branch-cleanup` task that finds branches merged into main and deletes them locally and on the Forgejo remote
- Configurable `--cutoff` (default 30 days) skips branches with recent HEAD commits
- Supports `--dry-run`, `--local-only`, `--remote-only` flags
- Interactive confirmation before any deletion
## Test plan
- [x] `mise run branch-cleanup -- --dry-run` shows correct table of candidates
- [ ] Run without `--dry-run` to confirm actual deletion works
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/247
## Summary
- Replace bash `indri-runner-logs` with a Python Typer CLI `runner-logs` that supports filtering by runner host (`indri`, `ringtail`, or `all`) with rich table output
- Add missing `#USAGE` declarations to `docs-review`, `docs-review-stale`, and `service-review` so flags work without the `--` separator
- Update docs references in `review-documentation.md` and `review-services.md` to use the new flag syntax
## Test plan
- [x] `mise run runner-logs all` lists runs from both runners
- [x] `mise run runner-logs ringtail` filters to ringtail-only runs
- [x] `mise run docs-review-stale --threshold 90` works without `--`
- [x] `mise run docs-review --limit 5` works without `--`
- [x] `mise run service-review --limit 3` works without `--`
- [x] Pre-commit hooks pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/244
## Summary
- Switch from MQTT to webapi polling (v0.5.4 requires only one method)
- Poll every 15s for responsive alerts
- **`notify_once: true`** — one notification per event instead of repeats as object changes zones
- **`nosnap: drop`** — skip events without snapshots (was causing all events to be dropped on v0.3.5)
- **`snap_hires: true`** — use recording stream for higher quality snapshot images
## Deployment and Testing
- [ ] Sync: `argocd app set frigate --revision fix/frigate-notify-config && argocd app sync frigate`
- [ ] Verify pod starts: `kubectl --context=k3s-ringtail -n frigate get pods -l app=frigate-notify`
- [ ] Check logs for successful startup and event processing (no "No snapshot" drops)
- [ ] Wait for a motion event and confirm single ntfy notification with hi-res snapshot
- [ ] After merge: `argocd app set frigate --revision main && argocd app sync frigate`
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/242
Deploy branding.xml with a "Sign in with Authentik" button in the
login disclaimer. Local password login remains available.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Five container manifests were removed when deleting old-style tags
(shared digests). Rebuild on a72a0d8 and update references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Updates all 15 container image references across 14 ArgoCD manifest files
- Migrates from old internal `vX.Y.Z` tags to new `v<upstream-version>-<sha>` format
- Covers: authentik, cv, devpi, forgejo-runner, homepage, kiwix-serve, kubectl, miniflux, navidrome, ntfy, quartz, teslamate, transmission
## Deployment and Testing
- [ ] Sync all ArgoCD apps on branch revision
- [ ] Verify all services come up healthy
- [ ] Merge and re-sync on main
- [ ] Clean up old-style tags from zot registry
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/238
- harden-zot-registry: fix Authentik hostname, check off all
verified items, add metrics config to "what was done"
- enforce-tag-immutability: fix admins permissions (was missing
update)
- agent-change-process: clarify that requires: is permanent and
status: active is the only completion marker
- zot reference: update modified date
- wire-ci-registry-auth fragment: add metrics fix
- Remove stale harden-zot-mikado-cards.ai.md planning fragment
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add accessControl.metrics.users with empty string to allow
unauthenticated Prometheus/Alloy scraping. Zot represents
anonymous users with an empty username internally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Enable OIDC + API key authentication on zot registry with three-tier accessControl
- `anonymousPolicy: ["read"]` — anyone can pull
- `artifact-workloads` group: `["read", "create"]` — CI push, no overwrite/delete
- `admins` group: `["read", "create", "update", "delete"]` — break-glass
- Wire both CI push paths (Dagger and Nix/skopeo) with `ZOT_CI_API_KEY` credentials
- Add `artifact-workloads` PolicyBinding in Authentik blueprint for zot app access
- Add `ZOT_CI_API_KEY` to Forgejo Actions secrets via existing ansible role
Completes the `wire-ci-registry-auth` and `harden-zot-registry` Mikado cards.
## Manual Deployment Steps (after merge)
1. Deploy Authentik blueprint: `argocd app sync authentik`
2. In Authentik admin UI: set a password for the `zot-ci` service account
3. Deploy zot config: `mise run provision-indri -- --tags zot`
4. Log in to `https://registry.ops.eblu.me` as `zot-ci` via OIDC → generate API key
5. Store API key in 1Password as `zot-ci-apikey` in blumeops vault
6. Sync Forgejo secrets: `mise run provision-indri -- --tags forgejo_actions_secrets`
7. Trigger a test container build to verify CI push
8. Verify anonymous pull: `curl -sf https://registry.ops.eblu.me/v2/_catalog`
## Uncertainties
- **Zot `accessControl` group matching with OIDC:** Groups from Authentik's `profile` scope claim should map to zot policy groups, but the exact claim-to-group matching needs runtime verification
- **`http.auth.apikey: true`:** This config key is documented but needs verification against the specific zot version built from source on indri
- **API key permissions:** Need to confirm zot API keys inherit the generating user's group for accessControl evaluation
## Test Plan
- [ ] `mise run provision-indri -- --check --diff --tags zot` shows expected config changes
- [ ] Anonymous pull works after deploy
- [ ] Unauthenticated push fails (401)
- [ ] OIDC browser login redirects to Authentik and back
- [ ] API key push works after key generation
- [ ] CI push succeeds with both Dagger and skopeo paths
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/237
Completed in PR #236. Updated card to reflect what was actually
implemented, including deviations (worker env var wiring, manual
service account setup).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Add Authentik blueprint (`zot.yaml`) with OAuth2 provider, application, `artifact-workloads` group, and `zot-ci` service account
- Wire `zot-client-secret` through ExternalSecret → worker Deployment env var → blueprint `!Env`
- Add Ansible pre_task to fetch OIDC secret from 1Password (item ID `oor7os5kapczgpbwv7obkca4y4`)
- Add `oidc-credentials.json.j2` template and deploy task in zot role (with `when` guard)
## Manual Steps Required Before Deploy
1. Generate client secret: `openssl rand -hex 32`
2. Store in 1Password: add field `zot-client-secret` to "Authentik (blumeops)" item in vault `blumeops`
## What This Does NOT Do
- Does NOT modify `config.json.j2` (that's the root goal `harden-zot-registry`)
- Does NOT wire CI auth (that's `wire-ci-registry-auth`)
- Does NOT set service account password or API keys (manual post-deploy)
## Verification
After ArgoCD sync:
- [ ] Authentik admin UI shows "Zot Registry" application
- [ ] OIDC discovery at `https://authentik.ops.eblu.me/application/o/zot/.well-known/openid-configuration` returns valid JSON
- [ ] Blueprint status is `successful`
- [ ] `artifact-workloads` group exists with `zot-ci` service account
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/236
## Summary
- Removed `status: active` from `enforce-tag-immutability` card — its requirements are folded into the parent `harden-zot-registry` goal's `accessControl` configuration
- Updated `harden-zot-registry` with three-tier access control spec (anonymous read, artifact-workloads read+create, admins full)
- Added `artifact-workloads` group creation step to `register-zot-oidc-client`
- Added service account context to `wire-ci-registry-auth`
## Rationale
Tag immutability requires authentication to be meaningful. Without auth, everyone is anonymous and gets the same policy. Rather than client-side push checks, the registry enforces immutability server-side: CI gets `["read", "create"]` (no update/delete), so pushing an existing tag is rejected by zot itself.
## Test plan
- [ ] `mise run docs-check-links` passes
- [ ] `mise run docs-mikado` shows enforce-tag-immutability as resolved
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/235
Dagger can't run on the bare nix runner (needs container runtime).
Used nix eval directly instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dagger CLI needs a container runtime (Docker/containerd) to start its
engine, which the bare nix runner doesn't have. Use nix eval directly
instead — it's already available and more appropriate for a nix host.
Reverts the dagger flake input since it's not usable here.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>