branch-cleanup: Add PROTECTED_PREFIXES with preserve/* exclusion so
preserved work-in-progress branches are never deleted.
observability.md: Document Pyroscope profiling work on branch
preserve/pyroscope-profiling/pr-313, blocked on ringtail kernel
sysctl settings (kptr_restrict=0, perf_event_paranoid≤1). Also
document Faro/RUM as future potential with privacy considerations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Merges `build-container.yaml` and `build-container-nix.yaml` into a single workflow
- Detect job classifies each changed container by presence of `Dockerfile` and/or `default.nix`
- Dockerfile containers build on `k8s` (indri) via Dagger; Nix containers build on `nix-container-builder` (ringtail) via nix-build + skopeo
- Containers with both build files (alloy, nettest, ntfy) get built on both runners
## Test plan
- [ ] Push a change to a Dockerfile-only container (e.g. grafana) — verify it builds on k8s only
- [ ] Push a change to a nix-only container (e.g. jobsync) — verify it builds on nix-container-builder only
- [ ] Push a change to a dual container (e.g. ntfy) — verify it builds on both runners
- [ ] Test workflow_dispatch with a specific container name
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #306
## Summary
Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications.
**Design:**
- Grafana Unified Alerting evaluates rules against Prometheus/Loki
- ntfy webhook contact point delivers iOS notifications
- Anti-noise policy: page once per 24h per alert group
- Every alert links to a runbook in `docs/how-to/alerts/`
- services-check eventually queries the alerting API instead of doing its own probes
**Chain (bottom-up):**
1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy
2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure
3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks
4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API
5. `deploy-infra-alerting` — goal card
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #303
Replace single aggregate camera_fps check with per-camera FPS validation
and NFS storage accessibility check. Motivated by an outage where Frigate
API responded OK but NFS mount was inaccessible, causing "no frames" in UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase retention: continuous 3→180d, detections 14→30d, alerts 30→730d.
Plenty of NFS headroom (~9.4 TiB free, ~6.6 GB/day for one camera).
Add frigate-recording check to services-check that verifies camera_fps > 0,
which would have caught the 6-day outage from the mqtt config removal.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ai-sources now skips docs/ to avoid duplicating ai-docs output.
CLAUDE.md notes ai-sources as available for deep context on problems
with a large surface area.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ai-sources: concatenates all git-tracked files (excluding lock files
and other non-useful artifacts) for full codebase AI context (~353K
tokens).
ai-docs: now concatenates all docs instead of a curated subset (~85K
tokens).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Caddy is now a mcquack LaunchAgent, not brew services
- Add missing Jellyfin and Caddy to shutdown commands and autostart list
- docs-preview: accept paths with or without docs/ prefix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mosquitto has been dormant since frigate-notify switched from MQTT to
webapi polling (529ba10). Tear down live infra (ArgoCD app, namespace)
and remove all manifests, service-versions entry, services-check, and
doc references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1Password adds account ID and timestamp to export filenames. The script
now globs ~/Documents for .1pux files instead of expecting a fixed name.
Also fixes a Rich markup error with bracket characters in the prompt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Review build-jobsync-container.md: fix nonexistent `mirror-sync` task
reference (Forgejo mirrors sync automatically), mark reviewed
- Remove bat hint from docs-review checklist (output not visible in
agent sessions), keep docs-preview hint as user-facing step
- Simplify review-documentation.md visual preview section
- Fix Python 3.14 tarfile deprecation warning in docs-preview
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New `mise run docs-preview <card>` task builds docs via Dagger and serves
them locally in the production quartz container (image parsed from ArgoCD
kustomization), opening the browser directly to the specified card.
Container auto-cleans after 1 hour.
Also updates docs-review checklist and review-documentation how-to to
reference the visual preview workflow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Add jobsync pod check (ringtail k3s) and HTTP endpoint to `services-check`
- Add JobSync entry to homepage dashboard under new "Apps" group
- Mark jobsync as reviewed at v1.1.4 (current with upstream)
- Changelog fragment added
## Deployment and Testing
- [ ] Sync homepage app from branch: `argocd app set homepage --revision review/jobsync && argocd app sync homepage`
- [ ] Verify JobSync appears on go.ops.eblu.me dashboard
- [ ] Run `mise run services-check` to verify new checks pass
- [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage`
Reviewed-on: #291
## Summary
Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days.
**Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion.
**Fix:**
- Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML
- Replace nginx SPA fallback with `=404` + Quartz's static `404.html`
- Remove `robots.txt` exclusions (no longer needed)
**Docs cleanup (Obsidian.nvim compat no longer needed):**
- Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages
- Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history)
- Drop `docs-check-index` and `docs-check-filenames` prek hooks
- Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity
- Add `ai-docs` doc tree listing to replace index files for AI context
- Add natural cross-links from reference cards to fix orphan docs
## Deployment and Testing
- [ ] Merge and let the build pipeline run
- [ ] Verify docs.eblu.me serves pages correctly with full page loads
- [ ] Verify non-existent URLs return 404
- [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs
Reviewed-on: #290
A close commit with zero preceding impl commits is valid — some leaf
nodes involve operational steps (e.g., creating a mirror) with no code
changes. Removed the false-positive check.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
During finalization, all mikado frontmatter (requires, status, branch) should
be removed — cards become plain documentation linked via wiki-links. Updated
agent-change-process docs and cleaned up 10 cards from closed chains. Also
fixed ai-docs referencing deleted plans/ files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The bash parameter expansion `${var/pat/rep}` treats `\/` in the
replacement as a literal backslash-slash, not an escaped delimiter.
This produced URLs like `https:\/\/eblume:...` instead of
`https://eblume:...`, breaking Forgejo's URL parser (500 on mirror
settings pages) and preventing mirror syncs.
Use prefix stripping (`${var#prefix}`) instead.
All 22 corrupted mirrors have been repaired on indri.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Misfiled fragment from feature/ branch created a subdirectory under
changelog.d/ which towncrier doesn't support. Move the fragment to the
correct flat location and add a changelog-check mise task + prek hook
to prevent this from happening again.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The mikado-branch-invariant-check hook now inspects staged files (commit-msg
hook) and historical commit files (standalone mode) to reject impl commits
that touch markdown files with Mikado frontmatter (requires:, status:, or
branch: mikado/). Cards should only be modified by plan, close, or finalize.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The LLM should read the file itself using its tools rather than
receiving it inline in the task output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- **mirror-create**: Auto-includes GitHub PAT from 1Password for authenticated upstream fetches at mirror creation time
- **mirror-update-pats**: New mise task that SSHes into indri and rewrites the git remote URL in every GitHub mirror's bare repo config to embed the PAT. Idempotent, supports `--dry-run`
- **app.ini.j2**: Explicit `[mirror]` section with `DEFAULT_INTERVAL = 8h` and `MIN_INTERVAL = 10m` (bakes in the defaults for visibility)
- **manage-forgejo-mirrors**: New how-to doc covering mirror creation, PAT storage, the `mirror-update-pats` task, and the full 20-day PAT rotation procedure
## Context
GitHub tightened unauthenticated rate limits for git clone/fetch in May 2025. With 23 GitHub mirrors syncing every 8 hours, authenticated fetches avoid throttling. The PAT is stored in 1Password (`Forgejo Secrets` → `github-mirror-pat`) and has been applied to all existing mirrors.
## Deployment and Testing
- [x] `mirror-update-pats` dry-run verified (23 mirrors detected)
- [x] `mirror-update-pats` applied to all 23 GitHub mirrors on indri
- [x] Idempotency confirmed (re-run shows 0 updated, 23 skipped)
- [ ] Provision indri with `--tags forgejo` to apply `[mirror]` config
- [ ] Trigger a manual mirror sync and verify success in Forgejo UI
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/269
The --style=header --color=never --decorations=always flags are now built
into the script so callers can just run `mise run ai-docs`. Also adds a
note to CLAUDE.md to never truncate the output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Categorized reference of all mise tasks with descriptions. Added to
the tools section of the reference index and to the ai-docs context
priming script.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Creates mirrors in the mirrors/ Forgejo org via API. Supports
GitHub, Codeberg, and generic git URLs with auto-detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
After investigating deployed container images, confirmed that squash-merging PRs orphans the commit SHAs embedded in container image tags. Two of our currently deployed images (prometheus, grafana) reference branch commits not on main.
This PR:
- Documents the squash-merge SHA orphan problem and the post-merge workflow in [[build-container-image]]
- Adds step 9 to the C1 process: after merging a PR that changes `containers/`, do a follow-up C0 to point manifests at the rebuilt `[main]` tag
- Rewrites `container-list` as a `uv run --script` (typer + rich + httpx)
- Adds optional container name filter (`mise run container-list prometheus` shows 10 tags instead of 4)
- Annotates every tag with `[main]` or `[branch]` based on git commit ancestry
## Test plan
- [x] `mise run container-list` — all containers shown with `[main]`/`[branch]` hints
- [x] `mise run container-list prometheus` — filtered view, more tags, correctly shows `[main]` and `[branch]`
- [x] `mise run container-list nonexistent` — error message with exit code 1
- [x] Pre-commit hooks pass
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/263
## Summary
- **End-of-cycle prompting:** After closing a leaf node and pushing, the agent should prompt the user to review and suggest ending the session rather than rushing into the next leaf
- **Reset rigor:** Reinforced that errors during impl should trigger a branch reset + plan update (not fix-forward). Documented the `git log --oneline --not main` → `git reset --hard` → `git cherry-pick` pattern with clear threshold guidance
- **`--resume` shows PR number:** Queries the Forgejo API for open PRs matching the branch, displays number/title/URL and a hint to run `pr-comments`
- **`--resume` checks git stash:** Shows stash entries as a non-presumptive hint — informs without assuming they apply
## Test plan
- [ ] `mise run docs-mikado --resume` runs without errors (no active chains case)
- [ ] On a mikado branch with an open PR, verify PR info is shown
- [ ] With stashed work, verify stash entries are displayed
- [ ] Review agent-change-process.md for clarity
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/261
actions/checkout treats short SHAs as branch name patterns, causing
fetch failures. Always resolve --ref to a full 40-char SHA before
dispatching the workflow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When manually dispatching a container build with --ref, the build job
now checks out the specified commit instead of the branch HEAD. This
allows building containers from feature branches before merging.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- **C0 (Quick Fix):** Now explicitly allows direct-to-main commits with no PR required — for low-risk, fix-forward-safe changes
- **C1 (Human Review):** New docs-first workflow with branch deployment (ArgoCD `--revision`, Ansible from checkout). Includes upgrade criteria for escalation to C2
- **C2 (Mikado Chain):** Introduces the **Mikado Branch Invariant** — strict commit ordering where card-introducing commits come first, followed by code progress, followed by card closures. Branch resets required when new prerequisites are discovered
Updates CLAUDE.md rules (3, 4, 8, 9) to reflect that C0 bypasses branching/PR requirements. Also updates ai-assistance-guide, how-to index, and docs-mikado task description.
## Files changed
- `CLAUDE.md` — rules and classification table
- `docs/how-to/agent-change-process.md` — full process rewrite
- `docs/tutorials/ai-assistance-guide.md` — branching and pitfalls sections
- `docs/how-to/how-to.md` — index description
- `mise-tasks/docs-mikado` — task description
- `docs/changelog.d/formalize-change-classification.doc.md` — changelog fragment
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/259
## Summary
- Add `--progress=plain` to all `dagger call` invocations in mise tasks to prevent SIGTTOU hangs
## Root cause
Mise runs task scripts in a child process group that is not the terminal's foreground group. When `dagger call` detects a TTY (inherited from the interactive shell), it tries to render its TUI progress display, which requires terminal ioctls. Since the process is not in the foreground group, the kernel sends SIGTTOU, stopping the process indefinitely.
This only manifests when running from an interactive terminal (e.g. `pre-commit run --all-files` in fish/wezterm). CI and piped contexts are unaffected since there's no TTY.
## Changes
- `mise-tasks/validate-workflows` — add `--progress=plain`
- `mise-tasks/frigate-export-model` — add `--progress=plain`
- `mise-tasks/provision-ringtail` — add `--progress=plain`
## Test plan
- [x] `pre-commit run --all-files` completes without hanging
- [ ] Verify in interactive fish/wezterm terminal
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/256
## Summary
- Review runner config against v12.7.0 defaults — added `shutdown_timeout: 3h`, no breaking changes found
- Add `validate_workflows` Dagger function using `forgejo-runner validate --directory .` inside upstream container
- All 6 workflows pass v12.7.0 schema validation
- Wire `mise run validate-workflows` task and pre-commit hook on `.forgejo/workflows/` changes
- Mark both leaf Mikado cards (`review-runner-config-v12`, `validate-workflows-against-v12`) complete
## Mikado State
After merge, `upgrade-k8s-runner` goal card has no unmet dependencies — ready to execute the actual image bump in a follow-up PR.
## Test Plan
- [x] `dagger call validate-workflows --src=.` passes (all 6 workflows OK)
- [x] Pre-commit hooks pass
- [ ] Reviewer: confirm `shutdown_timeout: 3h` addition to ConfigMap looks reasonable
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/250
## Summary
- C2 Mikado chain for upgrading the k8s forgejo-runner daemon (6 major versions behind)
- Root goal card with two leaf prerequisites: workflow validation and config review
- Ringtail runner is already at ~v12.6.4 via nixpkgs, no work needed there
## Mikado Chain
```
upgrade-k8s-runner (goal)
├── validate-workflows-against-v12 (leaf)
└── review-runner-config-v12 (leaf)
```
Both leaves are actionable now. The biggest risk is workflow schema validation
(introduced in v8/v9) rejecting our existing workflows.
## Next Steps
Work the leaf nodes in a follow-up session, then attempt the goal.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/249
## Summary
- Forgejo rewrites `head.ref` to `refs/pull/N/head` once a PR's source branch is deleted from the remote
- The original branch name is preserved in `head.label`
- This was causing 188 out of 246 merged PRs to go undetected by the cleanup script
- Fix: fall back to `head.label` when `head.ref` starts with `refs/pull/`
## Test plan
- [x] Dry run correctly identifies 18 previously-missed local branches
- [x] Live run successfully deleted all 18
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/248
## Summary
- New `mise run branch-cleanup` task that finds branches merged into main and deletes them locally and on the Forgejo remote
- Configurable `--cutoff` (default 30 days) skips branches with recent HEAD commits
- Supports `--dry-run`, `--local-only`, `--remote-only` flags
- Interactive confirmation before any deletion
## Test plan
- [x] `mise run branch-cleanup -- --dry-run` shows correct table of candidates
- [ ] Run without `--dry-run` to confirm actual deletion works
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/247
## Summary
- Replace bash `indri-runner-logs` with a Python Typer CLI `runner-logs` that supports filtering by runner host (`indri`, `ringtail`, or `all`) with rich table output
- Add missing `#USAGE` declarations to `docs-review`, `docs-review-stale`, and `service-review` so flags work without the `--` separator
- Update docs references in `review-documentation.md` and `review-services.md` to use the new flag syntax
## Test plan
- [x] `mise run runner-logs all` lists runs from both runners
- [x] `mise run runner-logs ringtail` filters to ringtail-only runs
- [x] `mise run docs-review-stale --threshold 90` works without `--`
- [x] `mise run docs-review --limit 5` works without `--`
- [x] `mise run service-review --limit 3` works without `--`
- [x] Pre-commit hooks pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/244
## Summary
- Replace git-tag-triggered container builds with path-based triggers on main and workflow_dispatch
- Image tags now encode upstream app version + commit SHA (`vX.Y.Z-<sha>`) for full traceability
- Replace `container-tag-and-release` task with `container-build-and-release` (dispatches workflows via Forgejo API)
- Update dagger `publish()` to accept `commit_sha` parameter
- Update all docs and references to the new workflow
## Deployment and Testing
- [ ] Merge to main
- [ ] `mise run container-build-and-release <name>` for each container to populate new-format tags
- [ ] Verify tags in registry via `mise run container-list`
- [ ] Existing images untouched — old tags remain available
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/232
## Summary
- Enable OIDC + API key authentication on zot with anonymous pull preserved
- Enforce tag immutability for version tags
- Adopt commit-SHA-based container image tagging
Details in the [[harden-zot-registry]] Mikado chain (`mise run docs-mikado harden-zot-registry`).
## Test plan
- [ ] Anonymous pull still works
- [ ] Unauthenticated push fails (401)
- [ ] CI container builds pass with new auth and tagging
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/231
## Summary
C2 Mikado chain for deploying Authentik as the SSO identity provider, replacing Dex.
This PR will evolve over multiple sessions. Each iteration adds documentation (prerequisite cards) and eventually code as leaf nodes are resolved.
## Current Mikado State
- **Goal:** `deploy-authentik` (active)
- **Leaf prerequisites:**
- `build-authentik-container` — Build Nix container image
- `provision-authentik-database` — Create PostgreSQL database on CNPG cluster
- `create-authentik-secrets` — Create 1Password item with credentials
## Process refinements
- Updated agent-change-process with lessons from first attempt: reset code before committing cards, open PRs early
## Test plan
- [ ] `mise run docs-mikado` shows correct dependency chain
- [ ] Leaf nodes can be worked independently
- [ ] Container builds on ringtail
- [ ] Authentik starts and reaches healthy state
- [ ] Forgejo OAuth2 connector works
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/227