Commit graph

124 commits

Author SHA1 Message Date
66a47738dd Add ringtail post-deploy maintenance: kernel check, generation pruning, GC
Update manage-lockfile doc with post-deploy steps (kernel update detection,
reboot guidance, generation pruning). Add prune-ringtail-generations mise
task that keeps the 5 most recent generations plus the most recent one
matching the booted kernel for safe rollback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:55:45 -07:00
fc8d2cdb12 Add preserve/* branch protection and document Pyroscope blocker
branch-cleanup: Add PROTECTED_PREFIXES with preserve/* exclusion so
preserved work-in-progress branches are never deleted.

observability.md: Document Pyroscope profiling work on branch
preserve/pyroscope-profiling/pr-313, blocked on ringtail kernel
sysctl settings (kptr_restrict=0, perf_event_paranoid≤1). Also
document Faro/RUM as future potential with privacy considerations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:32:25 -07:00
d021b3534f Deploy Prowler CIS scanner (#310)
All checks were successful
Build Container / detect (push) Successful in 4s
Build Container / build-dockerfile (prowler) (push) Successful in 10s
## Summary
- Deploy Prowler 5 as a weekly CronJob on minikube-indri for CIS Kubernetes Benchmark v1.11 scanning
- Custom slim container build (strips PowerShell, Trivy, and non-K8s providers from upstream)
- Reports (HTML, CSV, JSON-OCSF) written to NFS share on sifaka at `/volume1/reports/prowler/`
- Read-only ClusterRole for pod, RBAC, and control plane inspection
- Host path mounts + hostPID for kubelet file permission checks

## Follow-ups
- Mirror prowler-cloud/prowler on forge for supply chain control
- Build and push container image, update kustomization.yaml newTag
- Consider adding k3s-ringtail scanning (core + RBAC checks only)

## Test plan
- [ ] Build container: `mise run container-release prowler v5.22.0`
- [ ] Update `argocd/manifests/prowler/kustomization.yaml` newTag to built image tag
- [ ] Sync ArgoCD: `argocd app sync apps && argocd app set prowler --revision deploy-prowler && argocd app sync prowler`
- [ ] Trigger manual job: `kubectl create job --from=cronjob/prowler prowler-manual -n prowler --context=minikube-indri`
- [ ] Verify reports appear on sifaka NFS share
- [ ] `mise run services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #310
2026-03-24 16:08:09 -07:00
fc45989a6c Decommission JobSync service (#308)
All checks were successful
Build Container / detect (push) Successful in 3s
## Summary

- Remove all JobSync infrastructure: ArgoCD app, k8s manifests, container build (nix), Caddy reverse proxy entry, Homepage dashboard entry, service-versions tracking, and all documentation
- Runtime teardown already completed: ArgoCD app cascade-deleted (removes deployment, PVC, service, ingress, external-secret), forge mirror deleted, 1Password item archived, local clone removed

## Motivation

Replacing JobSync with a datasette-based job tracking pipeline driven by mise tasks and a Claude agent frontend. JobSync's Next.js server actions don't expose a useful API for automation.

## Remaining manual steps after merge

- Provision Caddy to remove the stale proxy route: `mise run provision-indri -- --tags caddy`
- Sync Homepage: `argocd app sync homepage`
- Verify namespace cleanup on ringtail: `kubectl get ns jobsync --context=k3s-ringtail` (should be gone)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #308
2026-03-24 08:44:23 -07:00
0d422f5234 Update tooling dependencies (March 2026) (#307)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 2m51s
## Summary

Monthly tooling dependency update per [[update-tooling-dependencies]].

- **Prek hooks:** trufflehog v3.93.4→v3.94.0, ruff v0.15.2→v0.15.7, shfmt v3.12.0-2→v3.13.0-1, ansible-lint floor→26.3.0, ansible-core floor→2.18
- **Fly.io proxy:** nginx 1.28.2→1.29.6, Grafana Alloy v1.13.1→v1.14.1
- **Forgejo workflows:** actions/checkout v4.3.1→v6.0.2 (SHA-pinned across all 5 workflows)
- **Mise tasks:** tightened Python lower bounds — rich≥14.0.0, typer≥0.24.0, httpx≥0.28.1, pyyaml≥6.0.2

## Test plan

- [x] `prek run --all-files` passes
- [ ] Verify Fly.io deploy succeeds after merge (nginx minor bump + Alloy bump)
- [ ] Spot-check a workflow run with the new actions/checkout v6

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #307
2026-03-24 08:11:46 -07:00
bd0ff30d3f Unify container build workflows (#306)
All checks were successful
Build Container / detect (push) Successful in 3s
## Summary
- Merges `build-container.yaml` and `build-container-nix.yaml` into a single workflow
- Detect job classifies each changed container by presence of `Dockerfile` and/or `default.nix`
- Dockerfile containers build on `k8s` (indri) via Dagger; Nix containers build on `nix-container-builder` (ringtail) via nix-build + skopeo
- Containers with both build files (alloy, nettest, ntfy) get built on both runners

## Test plan
- [ ] Push a change to a Dockerfile-only container (e.g. grafana) — verify it builds on k8s only
- [ ] Push a change to a nix-only container (e.g. jobsync) — verify it builds on nix-container-builder only
- [ ] Push a change to a dual container (e.g. ntfy) — verify it builds on both runners
- [ ] Test workflow_dispatch with a specific container name

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #306
2026-03-23 20:55:50 -07:00
6d65e6928c C2: Deploy infrastructure alerting pipeline (#303)
## Summary

Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications.

**Design:**
- Grafana Unified Alerting evaluates rules against Prometheus/Loki
- ntfy webhook contact point delivers iOS notifications
- Anti-noise policy: page once per 24h per alert group
- Every alert links to a runbook in `docs/how-to/alerts/`
- services-check eventually queries the alerting API instead of doing its own probes

**Chain (bottom-up):**
1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy
2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure
3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks
4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API
5. `deploy-infra-alerting` — goal card

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #303
2026-03-22 14:52:56 -07:00
f1620abb17 Improve Frigate health checks to catch NFS and camera failures
Replace single aggregate camera_fps check with per-camera FPS validation
and NFS storage accessibility check. Motivated by an outage where Frigate
API responded OK but NFS mount was inaccessible, causing "no frames" in UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 09:55:53 -07:00
ef8c2118a1 Standardize USAGE pragmas and typer parsing across mise tasks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 11:42:01 -07:00
cfe3391f1a Bump Frigate retention and add recording health check
Increase retention: continuous 3→180d, detections 14→30d, alerts 30→730d.
Plenty of NFS headroom (~9.4 TiB free, ~6.6 GB/day for one camera).

Add frigate-recording check to services-check that verifies camera_fps > 0,
which would have caught the 6-day outage from the mqtt config removal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 18:24:11 -07:00
1f000c8e39 Add last-updated subsort to docs-review, review gilbert card
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 13:22:01 -07:00
d121006086 Exclude docs from ai-sources, mention ai-sources in CLAUDE.md
ai-sources now skips docs/ to avoid duplicating ai-docs output.
CLAUDE.md notes ai-sources as available for deep context on problems
with a large surface area.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 18:40:35 -07:00
9566d8fec9 Add ai-sources task, update ai-docs to include all docs
ai-sources: concatenates all git-tracked files (excluding lock files
and other non-useful artifacts) for full codebase AI context (~353K
tokens).

ai-docs: now concatenates all docs instead of a curated subset (~85K
tokens).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 18:37:50 -07:00
4d195f7fb4 Review restore-1password-backup doc: fix offsite TBD, clarify archive name, add BorgBase to backups
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:13:07 -07:00
8b3b17d555 Review restart-indri doc: fix Caddy/Jellyfin service management, fix docs-preview path handling
- Caddy is now a mcquack LaunchAgent, not brew services
- Add missing Jellyfin and Caddy to shutdown commands and autostart list
- docs-preview: accept paths with or without docs/ prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 10:09:38 -07:00
40f1568088 Remove unused Mosquitto MQTT broker from ringtail
Mosquitto has been dormant since frigate-notify switched from MQTT to
webapi polling (529ba10). Tear down live infra (ArgoCD app, namespace)
and remove all manifests, service-versions entry, services-check, and
doc references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:37:31 -07:00
009196f6c1 Fix op-backup: auto-detect .1pux exports with suffixed filenames
1Password adds account ID and timestamp to export filenames. The script
now globs ~/Documents for .1pux files instead of expecting a fixed name.
Also fixes a Rich markup error with bracket characters in the prompt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:23:24 -07:00
d5a92fead8 Review build-jobsync-container, refine docs-preview tooling
- Review build-jobsync-container.md: fix nonexistent `mirror-sync` task
  reference (Forgejo mirrors sync automatically), mark reviewed
- Remove bat hint from docs-review checklist (output not visible in
  agent sessions), keep docs-preview hint as user-facing step
- Simplify review-documentation.md visual preview section
- Fix Python 3.14 tarfile deprecation warning in docs-preview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:11:34 -07:00
d01a165b91 Add docs-preview task and visual preview step to doc review
New `mise run docs-preview <card>` task builds docs via Dagger and serves
them locally in the production quartz container (image parsed from ArgoCD
kustomization), opening the browser directly to the specified card.
Container auto-cleans after 1 hour.

Also updates docs-review checklist and review-documentation how-to to
reference the visual preview workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:04:01 -07:00
87d4de244b Review jobsync: add to services-check and homepage (#291)
## Summary
- Add jobsync pod check (ringtail k3s) and HTTP endpoint to `services-check`
- Add JobSync entry to homepage dashboard under new "Apps" group
- Mark jobsync as reviewed at v1.1.4 (current with upstream)
- Changelog fragment added

## Deployment and Testing
- [ ] Sync homepage app from branch: `argocd app set homepage --revision review/jobsync && argocd app sync homepage`
- [ ] Verify JobSync appears on go.ops.eblu.me dashboard
- [ ] Run `mise run services-check` to verify new checks pass
- [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage`

Reviewed-on: #291
2026-03-11 17:36:51 -07:00
4f0476a851 Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container (Nix) / detect (push) Successful in 1s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 10s
## Summary

Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days.

**Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion.

**Fix:**
- Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML
- Replace nginx SPA fallback with `=404` + Quartz's static `404.html`
- Remove `robots.txt` exclusions (no longer needed)

**Docs cleanup (Obsidian.nvim compat no longer needed):**
- Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages
- Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history)
- Drop `docs-check-index` and `docs-check-filenames` prek hooks
- Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity
- Add `ai-docs` doc tree listing to replace index files for AI context
- Add natural cross-links from reference cards to fix orphan docs

## Deployment and Testing

- [ ] Merge and let the build pipeline run
- [ ] Verify docs.eblu.me serves pages correctly with full page loads
- [ ] Verify non-existent URLs return 404
- [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs

Reviewed-on: #290
2026-03-09 11:59:43 -07:00
1c3bf35dad Fix mikado invariant check rejecting close without impl
A close commit with zero preceding impl commits is valid — some leaf
nodes involve operational steps (e.g., creating a mirror) with no code
changes. Removed the false-positive check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 20:41:03 -08:00
77a1ea15d2 Remove mikado frontmatter from closed chains, clarify finalization rules
During finalization, all mikado frontmatter (requires, status, branch) should
be removed — cards become plain documentation linked via wiki-links. Updated
agent-change-process docs and cleaned up 10 cards from closed chains. Also
fixed ai-docs referencing deleted plans/ files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:43:19 -08:00
f91b8d0d98 Fix mirror-update-pats corrupting all GitHub mirror URLs
The bash parameter expansion `${var/pat/rep}` treats `\/` in the
replacement as a literal backslash-slash, not an escaped delimiter.
This produced URLs like `https:\/\/eblume:...` instead of
`https://eblume:...`, breaking Forgejo's URL parser (500 on mirror
settings pages) and preventing mirror syncs.

Use prefix stripping (`${var#prefix}`) instead.

All 22 corrupted mirrors have been repaired on indri.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 11:46:41 -08:00
6c5a99883f Add pre-commit check for changelog fragment placement
Misfiled fragment from feature/ branch created a subdirectory under
changelog.d/ which towncrier doesn't support. Move the fragment to the
correct flat location and add a changelog-check mise task + prek hook
to prevent this from happening again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:49:01 -08:00
2d5b78a4d7 Add Forge (public) to services-check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 08:48:15 -08:00
a87c997ee1 Expose Forgejo publicly at forge.eblu.me (#278)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m28s
## Summary

Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.

- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)

## Deployment Order

1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync

## Verification Checklist

- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes

Reviewed-on: #278
2026-03-03 08:40:41 -08:00
940219338a Slightly less confusing output when mikado cards share a prereq 2026-03-02 17:58:18 -08:00
6131f881a8 Enforce impl commits can't modify Mikado card files
The mikado-branch-invariant-check hook now inspects staged files (commit-msg
hook) and historical commit files (standalone mode) to reject impl commits
that touch markdown files with Mikado frontmatter (requires:, status:, or
branch: mikado/). Cards should only be modified by plan, close, or finalize.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:44:34 -08:00
8e9d89ca73 docs-review: print file path instead of content for LLM usage
The LLM should read the file itself using its tools rather than
receiving it inline in the task output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 07:24:37 -08:00
84338c32c2 Add authenticated GitHub PAT for Forgejo mirror sync (#269)
## Summary

- **mirror-create**: Auto-includes GitHub PAT from 1Password for authenticated upstream fetches at mirror creation time
- **mirror-update-pats**: New mise task that SSHes into indri and rewrites the git remote URL in every GitHub mirror's bare repo config to embed the PAT. Idempotent, supports `--dry-run`
- **app.ini.j2**: Explicit `[mirror]` section with `DEFAULT_INTERVAL = 8h` and `MIN_INTERVAL = 10m` (bakes in the defaults for visibility)
- **manage-forgejo-mirrors**: New how-to doc covering mirror creation, PAT storage, the `mirror-update-pats` task, and the full 20-day PAT rotation procedure

## Context

GitHub tightened unauthenticated rate limits for git clone/fetch in May 2025. With 23 GitHub mirrors syncing every 8 hours, authenticated fetches avoid throttling. The PAT is stored in 1Password (`Forgejo Secrets` → `github-mirror-pat`) and has been applied to all existing mirrors.

## Deployment and Testing

- [x] `mirror-update-pats` dry-run verified (23 mirrors detected)
- [x] `mirror-update-pats` applied to all 23 GitHub mirrors on indri
- [x] Idempotency confirmed (re-run shows 0 updated, 23 skipped)
- [ ] Provision indri with `--tags forgejo` to apply `[mirror]` config
- [ ] Trigger a manual mirror sync and verify success in Forgejo UI

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/269
2026-02-25 20:20:23 -08:00
23dc79058e Bake default display options into ai-docs mise task
The --style=header --color=never --decorations=always flags are now built
into the script so callers can just run `mise run ai-docs`. Also adds a
note to CLAUDE.md to never truncate the output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 17:42:47 -08:00
c1f4f0169b Add mise-tasks reference card and include in ai-docs
Categorized reference of all mise tasks with descriptions. Added to
the tools section of the reference index and to the ai-docs context
priming script.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 21:17:34 -08:00
33e3e4e098 Add mirror-create mise task for upstream mirrors
Creates mirrors in the mirrors/ Forgejo org via API. Supports
GitHub, Codeberg, and generic git URLs with auto-detection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 21:15:17 -08:00
1b9f706a30 Document container tag provenance and enhance container-list (#263)
## Summary

After investigating deployed container images, confirmed that squash-merging PRs orphans the commit SHAs embedded in container image tags. Two of our currently deployed images (prometheus, grafana) reference branch commits not on main.

This PR:

- Documents the squash-merge SHA orphan problem and the post-merge workflow in [[build-container-image]]
- Adds step 9 to the C1 process: after merging a PR that changes `containers/`, do a follow-up C0 to point manifests at the rebuilt `[main]` tag
- Rewrites `container-list` as a `uv run --script` (typer + rich + httpx)
- Adds optional container name filter (`mise run container-list prometheus` shows 10 tags instead of 4)
- Annotates every tag with `[main]` or `[branch]` based on git commit ancestry

## Test plan

- [x] `mise run container-list` — all containers shown with `[main]`/`[branch]` hints
- [x] `mise run container-list prometheus` — filtered view, more tags, correctly shows `[main]` and `[branch]`
- [x] `mise run container-list nonexistent` — error message with exit code 1
- [x] Pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/263
2026-02-24 09:54:58 -08:00
9b4951bf94 Improve Mikado process: cycle discipline, reset rigor, --resume enhancements (#261)
## Summary
- **End-of-cycle prompting:** After closing a leaf node and pushing, the agent should prompt the user to review and suggest ending the session rather than rushing into the next leaf
- **Reset rigor:** Reinforced that errors during impl should trigger a branch reset + plan update (not fix-forward). Documented the `git log --oneline --not main` → `git reset --hard` → `git cherry-pick` pattern with clear threshold guidance
- **`--resume` shows PR number:** Queries the Forgejo API for open PRs matching the branch, displays number/title/URL and a hint to run `pr-comments`
- **`--resume` checks git stash:** Shows stash entries as a non-presumptive hint — informs without assuming they apply

## Test plan
- [ ] `mise run docs-mikado --resume` runs without errors (no active chains case)
- [ ] On a mikado branch with an open PR, verify PR info is shown
- [ ] With stashed work, verify stash entries are displayed
- [ ] Review agent-change-process.md for clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/261
2026-02-23 21:03:27 -08:00
179eca2070 Fix container-build-and-release to resolve refs to full SHA
actions/checkout treats short SHAs as branch name patterns, causing
fetch failures. Always resolve --ref to a full 40-char SHA before
dispatching the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:27:18 -08:00
7641018c6a Fix container build workflows to checkout dispatch ref
When manually dispatching a container build with --ref, the build job
now checks out the specified commit instead of the branch HEAD. This
allows building containers from feature branches before merging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:24:32 -08:00
66b5b32f1d Formalize C0/C1/C2 change classification (#259)
## Summary
- **C0 (Quick Fix):** Now explicitly allows direct-to-main commits with no PR required — for low-risk, fix-forward-safe changes
- **C1 (Human Review):** New docs-first workflow with branch deployment (ArgoCD `--revision`, Ansible from checkout). Includes upgrade criteria for escalation to C2
- **C2 (Mikado Chain):** Introduces the **Mikado Branch Invariant** — strict commit ordering where card-introducing commits come first, followed by code progress, followed by card closures. Branch resets required when new prerequisites are discovered

Updates CLAUDE.md rules (3, 4, 8, 9) to reflect that C0 bypasses branching/PR requirements. Also updates ai-assistance-guide, how-to index, and docs-mikado task description.

## Files changed
- `CLAUDE.md` — rules and classification table
- `docs/how-to/agent-change-process.md` — full process rewrite
- `docs/tutorials/ai-assistance-guide.md` — branching and pitfalls sections
- `docs/how-to/how-to.md` — index description
- `mise-tasks/docs-mikado` — task description
- `docs/changelog.d/formalize-change-classification.doc.md` — changelog fragment

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/259
2026-02-23 16:19:54 -08:00
260b7ca520 Fix dagger call hanging in mise tasks on interactive terminals (#256)
## Summary

- Add `--progress=plain` to all `dagger call` invocations in mise tasks to prevent SIGTTOU hangs

## Root cause

Mise runs task scripts in a child process group that is not the terminal's foreground group. When `dagger call` detects a TTY (inherited from the interactive shell), it tries to render its TUI progress display, which requires terminal ioctls. Since the process is not in the foreground group, the kernel sends SIGTTOU, stopping the process indefinitely.

This only manifests when running from an interactive terminal (e.g. `pre-commit run --all-files` in fish/wezterm). CI and piped contexts are unaffected since there's no TTY.

## Changes

- `mise-tasks/validate-workflows` — add `--progress=plain`
- `mise-tasks/frigate-export-model` — add `--progress=plain`
- `mise-tasks/provision-ringtail` — add `--progress=plain`

## Test plan

- [x] `pre-commit run --all-files` completes without hanging
- [ ] Verify in interactive fish/wezterm terminal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/256
2026-02-23 14:15:58 -08:00
cb9a06bb75 Update tooling dependencies (Feb 2026 cycle) (#254)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m30s
## Summary

Monthly tooling dependency update cycle:

- **Pre-commit hooks**: trufflehog v3.92.5→v3.93.4, ruff v0.14.13→v0.15.2, shellcheck v0.10.0.1→v0.11.0.1, prettier v3.8.0→v3.8.1, actionlint v1.7.10→v1.7.11
- **Fly.io Dockerfile**: pin nginx to 1.28.2-alpine (was unpinned), bump alloy v1.5.1→v1.13.1
- **Mise tasks**: normalize httpx lower bound to >=0.28.0 and typer to >=0.15.0 across all scripts
- **Forgejo workflows**: actions/checkout@v4 is current, no changes needed
- **New how-to doc**: [[update-tooling-dependencies]] documenting this monthly cycle

## No changes needed

- pre-commit-hooks v6.0.0, yamllint v1.38.0, shfmt v3.12.0-2, taplo v0.9.3, ansible-lint 26.1.1 — all already at latest

## Test plan

- [x] `uvx pre-commit run --all-files` — all 24 hooks pass
- [ ] Fly.io deploy (triggered automatically on merge to main via deploy-fly workflow)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/254
2026-02-23 13:08:41 -08:00
0f6a1898f0 Prepare forgejo-runner v12 upgrade (leaf nodes) (#250)
## Summary
- Review runner config against v12.7.0 defaults — added `shutdown_timeout: 3h`, no breaking changes found
- Add `validate_workflows` Dagger function using `forgejo-runner validate --directory .` inside upstream container
- All 6 workflows pass v12.7.0 schema validation
- Wire `mise run validate-workflows` task and pre-commit hook on `.forgejo/workflows/` changes
- Mark both leaf Mikado cards (`review-runner-config-v12`, `validate-workflows-against-v12`) complete

## Mikado State
After merge, `upgrade-k8s-runner` goal card has no unmet dependencies — ready to execute the actual image bump in a follow-up PR.

## Test Plan
- [x] `dagger call validate-workflows --src=.` passes (all 6 workflows OK)
- [x] Pre-commit hooks pass
- [ ] Reviewer: confirm `shutdown_timeout: 3h` addition to ConfigMap looks reasonable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/250
2026-02-22 17:38:32 -08:00
00b0287bcc Upgrade k8s forgejo-runner from v6.3.1 to v12.x (#249)
## Summary
- C2 Mikado chain for upgrading the k8s forgejo-runner daemon (6 major versions behind)
- Root goal card with two leaf prerequisites: workflow validation and config review
- Ringtail runner is already at ~v12.6.4 via nixpkgs, no work needed there

## Mikado Chain

```
upgrade-k8s-runner (goal)
├── validate-workflows-against-v12 (leaf)
└── review-runner-config-v12 (leaf)
```

Both leaves are actionable now. The biggest risk is workflow schema validation
(introduced in v8/v9) rejecting our existing workflows.

## Next Steps
Work the leaf nodes in a follow-up session, then attempt the goal.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/249
2026-02-22 17:12:45 -08:00
198c810e86 Fix branch-cleanup: fall back to head.label for deleted branches (#248)
## Summary
- Forgejo rewrites `head.ref` to `refs/pull/N/head` once a PR's source branch is deleted from the remote
- The original branch name is preserved in `head.label`
- This was causing 188 out of 246 merged PRs to go undetected by the cleanup script
- Fix: fall back to `head.label` when `head.ref` starts with `refs/pull/`

## Test plan
- [x] Dry run correctly identifies 18 previously-missed local branches
- [x] Live run successfully deleted all 18

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/248
2026-02-22 16:15:51 -08:00
abb7e6fe88 Add branch-cleanup mise task (#247)
## Summary
- New `mise run branch-cleanup` task that finds branches merged into main and deletes them locally and on the Forgejo remote
- Configurable `--cutoff` (default 30 days) skips branches with recent HEAD commits
- Supports `--dry-run`, `--local-only`, `--remote-only` flags
- Interactive confirmation before any deletion

## Test plan
- [x] `mise run branch-cleanup -- --dry-run` shows correct table of candidates
- [ ] Run without `--dry-run` to confirm actual deletion works

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/247
2026-02-22 16:00:09 -08:00
d51c180fe6 Switch Frigate detection model from YOLO-NAS-S to YOLOv9-c (#246)
## Summary
- Replace abandoned YOLO-NAS-S (320x320, `yolonas`) with YOLOv9-c (640x640, `yolo-generic`)
- YOLOv9-c benefits from CUDA Graphs in Frigate 0.17 on the RTX 4080
- Add `export_yolov9` Dagger pipeline and `frigate-export-model` mise task for reproducible model exports
- Model already deployed to `sifaka:/volume1/frigate/models/yolov9-c-640.onnx`

## Config changes
- `model_type: yolonas` → `yolo-generic`
- `input_dtype: int` → `float`
- `width/height: 320` → `640`
- `path:` → `yolov9-c-640.onnx`

## Deployment and Testing
- [ ] Merge and sync Frigate ArgoCD app: `argocd app sync frigate`
- [ ] Verify Frigate starts and detects objects at https://nvr.ops.eblu.me
- [ ] Confirm GPU inference via Frigate system metrics

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/246
2026-02-22 15:14:45 -08:00
e41c28ed90 Replace indri-runner-logs with general-purpose runner-logs Typer CLI (#244)
## Summary
- Replace bash `indri-runner-logs` with a Python Typer CLI `runner-logs` that supports filtering by runner host (`indri`, `ringtail`, or `all`) with rich table output
- Add missing `#USAGE` declarations to `docs-review`, `docs-review-stale`, and `service-review` so flags work without the `--` separator
- Update docs references in `review-documentation.md` and `review-services.md` to use the new flag syntax

## Test plan
- [x] `mise run runner-logs all` lists runs from both runners
- [x] `mise run runner-logs ringtail` filters to ringtail-only runs
- [x] `mise run docs-review-stale --threshold 90` works without `--`
- [x] `mise run docs-review --limit 5` works without `--`
- [x] `mise run service-review --limit 3` works without `--`
- [x] Pre-commit hooks pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/244
2026-02-22 10:20:11 -08:00
ffa8727660 Adopt commit-based container tags (#232)
## Summary
- Replace git-tag-triggered container builds with path-based triggers on main and workflow_dispatch
- Image tags now encode upstream app version + commit SHA (`vX.Y.Z-<sha>`) for full traceability
- Replace `container-tag-and-release` task with `container-build-and-release` (dispatches workflows via Forgejo API)
- Update dagger `publish()` to accept `commit_sha` parameter
- Update all docs and references to the new workflow

## Deployment and Testing
- [ ] Merge to main
- [ ] `mise run container-build-and-release <name>` for each container to populate new-format tags
- [ ] Verify tags in registry via `mise run container-list`
- [ ] Existing images untouched — old tags remain available

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/232
2026-02-20 22:56:20 -08:00
0e2c10176d Harden zot registry, pt 1 (#231)
## Summary
- Enable OIDC + API key authentication on zot with anonymous pull preserved
- Enforce tag immutability for version tags
- Adopt commit-SHA-based container image tagging

Details in the [[harden-zot-registry]] Mikado chain (`mise run docs-mikado harden-zot-registry`).

## Test plan
- [ ] Anonymous pull still works
- [ ] Unauthenticated push fails (401)
- [ ] CI container builds pass with new auth and tagging
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/231
2026-02-20 22:50:01 -08:00
71cb256527 Deploy Authentik identity provider (C2 Mikado) (#227)
## Summary
C2 Mikado chain for deploying Authentik as the SSO identity provider, replacing Dex.

This PR will evolve over multiple sessions. Each iteration adds documentation (prerequisite cards) and eventually code as leaf nodes are resolved.

## Current Mikado State
- **Goal:** `deploy-authentik` (active)
- **Leaf prerequisites:**
  - `build-authentik-container` — Build Nix container image
  - `provision-authentik-database` — Create PostgreSQL database on CNPG cluster
  - `create-authentik-secrets` — Create 1Password item with credentials

## Process refinements
- Updated agent-change-process with lessons from first attempt: reset code before committing cards, open PRs early

## Test plan
- [ ] `mise run docs-mikado` shows correct dependency chain
- [ ] Leaf nodes can be worked independently
- [ ] Container builds on ringtail
- [ ] Authentik starts and reaches healthy state
- [ ] Forgejo OAuth2 connector works

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/227
2026-02-20 12:55:59 -08:00