Commit graph

126 commits

Author SHA1 Message Date
5b008a6ab6 Document ai-sources in AI guide, change process, and mise-tasks ref
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 18:43:39 -07:00
4d195f7fb4 Review restore-1password-backup doc: fix offsite TBD, clarify archive name, add BorgBase to backups
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:13:07 -07:00
8b3b17d555 Review restart-indri doc: fix Caddy/Jellyfin service management, fix docs-preview path handling
- Caddy is now a mcquack LaunchAgent, not brew services
- Add missing Jellyfin and Caddy to shutdown commands and autostart list
- docs-preview: accept paths with or without docs/ prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 10:09:38 -07:00
4c5e7d763d Review deploy-jobsync doc: add missing env var, update tag example
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 15:45:07 -07:00
8b9cc4effd Add how-to card for running 1Password backup
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:17:45 -07:00
d5a92fead8 Review build-jobsync-container, refine docs-preview tooling
- Review build-jobsync-container.md: fix nonexistent `mirror-sync` task
  reference (Forgejo mirrors sync automatically), mark reviewed
- Remove bat hint from docs-review checklist (output not visible in
  agent sessions), keep docs-preview hint as user-facing step
- Simplify review-documentation.md visual preview section
- Fix Python 3.14 tarfile deprecation warning in docs-preview

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:11:34 -07:00
d01a165b91 Add docs-preview task and visual preview step to doc review
New `mise run docs-preview <card>` task builds docs via Dagger and serves
them locally in the production quartz container (image parsed from ArgoCD
kustomization), opening the browser directly to the specified card.
Container auto-cleans after 1 hour.

Also updates docs-review checklist and review-documentation how-to to
reference the visual preview workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:04:01 -07:00
4f0476a851 Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container (Nix) / detect (push) Successful in 1s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 10s
## Summary

Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days.

**Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion.

**Fix:**
- Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML
- Replace nginx SPA fallback with `=404` + Quartz's static `404.html`
- Remove `robots.txt` exclusions (no longer needed)

**Docs cleanup (Obsidian.nvim compat no longer needed):**
- Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages
- Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history)
- Drop `docs-check-index` and `docs-check-filenames` prek hooks
- Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity
- Add `ai-docs` doc tree listing to replace index files for AI context
- Add natural cross-links from reference cards to fix orphan docs

## Deployment and Testing

- [ ] Merge and let the build pipeline run
- [ ] Verify docs.eblu.me serves pages correctly with full page loads
- [ ] Verify non-existent URLs return 404
- [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs

Reviewed-on: #290
2026-03-09 11:59:43 -07:00
770a7b2d6a Add JobSync reference card, observability docs, and RAPIDAPI_KEY plumbing (#289)
## Summary
- Add JobSync service reference card (`docs/reference/services/jobsync.md`) with architecture, secrets, observability, and JSearch API docs
- Add JobSync and Ollama to ringtail's workloads table (both were missing)
- Add JobSync to the reference index
- Wire `RAPIDAPI_KEY` through ExternalSecret and deployment env var for JSearch job search automation
- Document Loki log queries for observability (no metrics endpoint exists)
- Update deploy-jobsync how-to with new env var, observability section, and reference card link

## Deployment and Testing
- [ ] Sign up for RapidAPI JSearch API (free tier: 500 req/month)
- [ ] Add `rapidapi_key` field to "JobSync" 1Password item
- [ ] Merge PR
- [ ] `argocd app sync jobsync` to pick up new env var
- [ ] Verify job search works at https://jobsync.ops.eblu.me/dashboard/automations

Reviewed-on: #289
2026-03-08 15:06:52 -07:00
3a811fb188 Deploy JobSync — job search tracker on ringtail k3s (#288)
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container / build (jobsync) (push) Successful in 2s
Build Container (Nix) / build (jobsync) (push) Successful in 8s
## Summary

C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster.

### Mikado Graph

```
deploy-jobsync (goal)
├── build-jobsync-container
│   └── mirror-jobsync
└── integrate-jobsync-ollama
```

### What is JobSync?

Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching.

### Key Decisions

- **Ringtail k3s** (not minikube-indri) — colocates with Ollama for zero-latency AI
- **Nix container** via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge
- **Ollama for AI** — uses existing deployment, no API keys needed for AI features
- **No upstream fork** — vanilla JobSync, Anthropic AI deferred to future work if needed

### Current Status

Planning phase — cards committed, ready for review before implementation begins.

Reviewed-on: #288
2026-03-08 11:02:05 -07:00
0e09521ce3 Review manage-flyio-proxy.md — no issues found
Add last-reviewed date. Content is accurate and complete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:03:46 -08:00
6a033d55be Review and update review-services.md
- Add last-reviewed date
- Align service type sections with actual types (argocd/ansible/nixos)
- Remove nonexistent "Helm Chart" and "Hybrid" sections
- Fold custom container guidance into ArgoCD section
- Reference kustomization.yaml for image tags instead of Helm charts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:03:08 -08:00
e47a3b2ebb Review and update review-documentation.md
- Add last-reviewed date
- Replace raw pulumi commands with mise task equivalents
- Reference C0/C1/C2 change classification for making changes
- Note that prek handles link validation automatically

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:59:51 -08:00
ba7236ade0 Add how-to guide for upgrading Dagger
Documents the correct two-phase upgrade procedure to avoid the
chicken-and-egg problem where CI can't build its own replacement.
Also fixes outdated version references in the Dagger reference card.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 20:31:30 -08:00
6d84fcfb05 Review how-to index: strip prose, add last-reviewed
Removed descriptions, table formatting, and Mikado chain commentary
from the how-to index — it should be links only. Added last-reviewed
date.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 07:52:06 -08:00
c029e5851a Review migrate-forgejo-from-brew doc, fix stale Phase 3 reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:29:58 -08:00
77a1ea15d2 Remove mikado frontmatter from closed chains, clarify finalization rules
During finalization, all mikado frontmatter (requires, status, branch) should
be removed — cards become plain documentation linked via wiki-links. Updated
agent-change-process docs and cleaned up 10 cards from closed chains. Also
fixed ai-docs referencing deleted plans/ files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:43:19 -08:00
55a846eb25 Retire plans directory, convert migrate-forgejo-from-brew to mikado card
The plans/ directory predated the mikado method approach. Deleted all
completed and abandoned plans, converted the still-relevant
migrate-forgejo-from-brew into a lean mikado chain root card under
how-to/forgejo/, cleaned up dangling wiki-links across docs, and
fixed a stale "pre-commit" reference to "prek".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:28:14 -08:00
5ddb47de1c Review upgrade-grafana doc: fix image tag ref, add sidecar link
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 07:53:22 -08:00
b3d5478020 Use towncrier orphan fragment naming for C0 changes
C0 changes have no branch name, so `main.<type>.md` fragments collide.
Switch to towncrier's `+<slug>.<type>.md` orphan convention and rename
existing `main.*` fragments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 15:30:00 -08:00
a2bb9abbdb Home-build grafana-sidecar container (#281)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (grafana-sidecar) (push) Successful in 2s
Build Container / build (grafana-sidecar) (push) Successful in 6s
## Summary
- Home-build the k8s-sidecar container (`grafana-sidecar`) from forge mirror, replacing upstream `quay.io/kiwigrid/k8s-sidecar:1.28.0`
- Pinned to v1.28.0 — v2.x deferred due to 135% memory regression and readOnlyRootFilesystem crashloop
- Adds Dockerfile, service-versions entry, docs, and changelog fragment
- Manifest switch to home-built image pending container build

## Deployment and Testing
- [ ] `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization.yaml with built image tag
- [ ] `argocd app set grafana --revision feature/grafana-sidecar && argocd app sync grafana`
- [ ] Verify sidecar logs and dashboards at https://grafana.ops.eblu.me
- [ ] Post-merge: `argocd app set grafana --revision main && argocd app sync grafana`

Reviewed-on: #281
2026-03-03 13:48:24 -08:00
81a8ca24b9 Clarify that changelog fragments apply to all change levels (C0–C2)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 13:15:06 -08:00
cf8736c73b Review kustomize-grafana-deployment: fix manifest table to match reality
The doc listed a nonexistent configmap.yaml instead of the actual raw
config files (grafana.ini, datasources.yaml, provider.yaml) consumed
by kustomization.yaml's configMapGenerator. Added last-reviewed date.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:14:41 -08:00
a87c997ee1 Expose Forgejo publicly at forge.eblu.me (#278)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m28s
## Summary

Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.

- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)

## Deployment Order

1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync

## Verification Checklist

- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes

Reviewed-on: #278
2026-03-03 08:40:41 -08:00
7a1875936c Switch git hooks from pre-commit to prek (#276)
## Summary

- Replace pre-commit with [prek](https://github.com/j178/prek), a faster Rust-native drop-in alternative
- Migrate config from `.pre-commit-config.yaml` (YAML) to `prek.toml` (TOML)
- Add new built-in checks: case conflicts, private key detection, executable shebangs
- Install prek via mise native registry (`aqua:j178/prek`) instead of pipx
- Update all doc references across README, contributing guide, and how-to docs

## Notes

- `check-yaml` still uses the remote `pre-commit-hooks` repo because prek's builtin fast path doesn't support `--unsafe` yet (needed for Ansible custom YAML tags)
- All existing custom hooks (docs validation, container version check, mikado invariant, workflow validation) work unchanged
- Tested: all hooks pass on clean tree, deliberate doc link breakage is caught

## Test plan

- [x] `prek run --all-files` passes all checks
- [x] Broken wiki-link correctly caught by `docs-check-links`
- [x] taplo-format auto-fixes TOML formatting on commit
- [x] commit-msg hook (mikado invariant) fires correctly

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/276
2026-03-02 18:15:23 -08:00
08b9570ac7 Review build-authentik-from-source Mikado chain docs
Fix go-server-derivation: wrong path target (webui not authentik-django)
and missing internal/web/static.go patch. Remove stale DRF fork content
from mirror-build-deps (no longer needed as of 2026.2.0). Add
last-reviewed to all 5 cards without it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:28:09 -08:00
2a2811d7a5 Review authentik-api-client-generation doc: fix stale content
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:21:46 -08:00
2d4098e480 Fix authentik 2026.2.0 migration ordering bug (#275)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container (Nix) / detect (push) Successful in 1s
Build Container / build (authentik) (push) Successful in 1s
Build Container (Nix) / build (authentik) (push) Successful in 3m6s
## Summary

- Patch `authentik_rbac/0010` migration to depend on `authentik_core/0056`, fixing non-deterministic ordering that crashes startup with `FieldError: Cannot resolve keyword 'group_id'`
- Upstream bug: goauthentik/authentik#19616, #20634 — no fix released yet
- Document the issue in the lessons-learned table

## Deployment and Testing

- [ ] CI builds container image
- [ ] Deploy from branch: `argocd app set authentik --revision fix/authentik-migration-ordering && argocd app sync authentik`
- [ ] Pods reach Running/Ready without crash-looping
- [ ] `kubectl logs` show 0056 migrating before 0010
- [ ] authentik UI loads at authentik.ops.eblu.me
- [ ] `mise run services-check`
- [ ] After merge: `argocd app set authentik --revision main && argocd app sync authentik`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/275
2026-03-01 16:28:36 -08:00
efa9806bfa C2: Build authentik from source (Mikado chain) (#274)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container (Nix) / detect (push) Successful in 1s
Build Container / build (authentik) (push) Successful in 2s
Build Container (Nix) / build (authentik) (push) Successful in 22s
## Mikado Chain: build-authentik-from-source

Replace `pkgs.authentik` from nixpkgs with a custom Nix derivation built from source.
This removes the dependency on the nixpkgs packaging timeline and gives full version control.

Target version: **2025.12.4** (nixpkgs reference, upgrading from deployed 2025.10.1).

### Dependency Graph

```
build-authentik-from-source (goal)
├── authentik-go-server-derivation
│   ├── authentik-api-client-generation  ← IN PROGRESS
│   └── authentik-python-backend-derivation
├── authentik-web-ui-derivation
│   └── authentik-api-client-generation  ← IN PROGRESS
└── authentik-python-backend-derivation
```

### Ready Leaves
- `authentik-api-client-generation` — Go + TypeScript client generation from OpenAPI schema
- `authentik-python-backend-derivation` — Django backend with 60+ deps, 4 in-tree packages

### Architecture
Ported from [nixpkgs `pkgs/by-name/au/authentik/package.nix`](https://github.com/NixOS/nixpkgs/tree/master/pkgs/by-name/au/authentik):
- `source.nix` — shared version/source fetch
- `client-go.nix` — Go API client generation
- `client-ts.nix` — TypeScript API client generation
- `api-go-vendor-hook.nix` — Go vendor directory injection hook
- (more components to follow as leaves are closed)

### Related Cards
- [[build-authentik-from-source]] — Goal card
- [[authentik-api-client-generation]]
- [[authentik-python-backend-derivation]]
- [[authentik-web-ui-derivation]]
- [[authentik-go-server-derivation]]

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/274
2026-03-01 13:45:00 -08:00
0aaf9bb8b2 Add Dagger local build step to authentik source build goal
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:39:25 -08:00
7094ea7d3e Start C2 Mikado chain: build authentik from source
Create goal card and 4 prerequisite cards for building authentik from a
custom Nix derivation instead of using pkgs.authentik from nixpkgs. This
removes the dependency on the nixpkgs packaging timeline and gives full
version control over authentik releases.

Chain: mikado/authentik-source-build
Leaf nodes: authentik-api-client-generation, authentik-python-backend-derivation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:20:17 -08:00
8d1e98617b Review build-grafana-container docs: stamp reviewed, fix cross-links
Also fix stale grafana.md reference card (Helm → Kustomize).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 07:28:06 -08:00
7cecaf0471 Review forgejo-runner docs: stamp reviewed, fix cross-links
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 15:10:20 -08:00
9a7acffa26 Review manage-forgejo-mirrors doc: clarify cron default, stamp reviewed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 07:17:18 -08:00
84338c32c2 Add authenticated GitHub PAT for Forgejo mirror sync (#269)
## Summary

- **mirror-create**: Auto-includes GitHub PAT from 1Password for authenticated upstream fetches at mirror creation time
- **mirror-update-pats**: New mise task that SSHes into indri and rewrites the git remote URL in every GitHub mirror's bare repo config to embed the PAT. Idempotent, supports `--dry-run`
- **app.ini.j2**: Explicit `[mirror]` section with `DEFAULT_INTERVAL = 8h` and `MIN_INTERVAL = 10m` (bakes in the defaults for visibility)
- **manage-forgejo-mirrors**: New how-to doc covering mirror creation, PAT storage, the `mirror-update-pats` task, and the full 20-day PAT rotation procedure

## Context

GitHub tightened unauthenticated rate limits for git clone/fetch in May 2025. With 23 GitHub mirrors syncing every 8 hours, authenticated fetches avoid throttling. The PAT is stored in 1Password (`Forgejo Secrets` → `github-mirror-pat`) and has been applied to all existing mirrors.

## Deployment and Testing

- [x] `mirror-update-pats` dry-run verified (23 mirrors detected)
- [x] `mirror-update-pats` applied to all 23 GitHub mirrors on indri
- [x] Idempotency confirmed (re-run shows 0 updated, 23 skipped)
- [ ] Provision indri with `--tags forgejo` to apply `[mirror]` config
- [ ] Trigger a manual mirror sync and verify success in Forgejo UI

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/269
2026-02-25 20:20:23 -08:00
e273f399ea Review 3 how-to docs and fix update-tailscale-acls inaccuracies
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 07:02:49 -08:00
34a1314f8d Document AirPlay cross-VLAN firewall rules and fix rule ordering
AirPlay from Main to IoT VLAN (Samsung Frame TV) required adding
established/related, AirPlay port, and dynamic reverse port rules —
but the root cause was rule ordering (allows appended after blocks).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 20:49:31 -08:00
cd578144f7 Migrate upstream mirrors to mirrors/ Forgejo org (#265)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container (Nix) / build (homepage) (push) Successful in 3s
Build Container (Nix) / build (navidrome) (push) Successful in 3s
Build Container (Nix) / build (ntfy) (push) Successful in 8s
Build Container / detect (push) Successful in 42s
Build Container / build (navidrome) (push) Successful in 9m37s
Build Container / build (homepage) (push) Successful in 9m56s
Build Container / build (ntfy) (push) Successful in 2m35s
## Summary

- Created `mirrors` Forgejo organization for upstream mirror repos
- Transferred 22 mirror repos from `eblume/` to `mirrors/` (mirror sync config preserved)
- Deleted unused repos: hajimari, hister
- Updated all container build URLs (homepage, navidrome, ntfy Dockerfiles + nix)
- Updated documentation references (migrate-forgejo-from-brew, upstream-fork-strategy, fix-ntfy-nix-version)
- `dotfiles` intentionally kept under `eblume/` per user request
- `devpi` transferred to `mirrors/`

Repos remaining under `eblume/`: blumeops, cv, mcquack, dotfiles

## Cleanup TODO

- [ ] Delete temp Forgejo API token "claude-migration-temp" (Settings > Applications)

## Test Plan

- [x] Verified mirror config (mirror=true, original_url) survived transfer on test repo (tesla_auth)
- [x] All pre-commit hooks pass (including container-version-check, docs-check-links)
- [ ] Verify a mirror repo sync runs successfully after transfer (check mirrors/authentik or similar)
- [ ] Rebuild containers from branch to verify Dockerfile URLs resolve

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/265
2026-02-24 20:43:14 -08:00
1b9f706a30 Document container tag provenance and enhance container-list (#263)
## Summary

After investigating deployed container images, confirmed that squash-merging PRs orphans the commit SHAs embedded in container image tags. Two of our currently deployed images (prometheus, grafana) reference branch commits not on main.

This PR:

- Documents the squash-merge SHA orphan problem and the post-merge workflow in [[build-container-image]]
- Adds step 9 to the C1 process: after merging a PR that changes `containers/`, do a follow-up C0 to point manifests at the rebuilt `[main]` tag
- Rewrites `container-list` as a `uv run --script` (typer + rich + httpx)
- Adds optional container name filter (`mise run container-list prometheus` shows 10 tags instead of 4)
- Annotates every tag with `[main]` or `[branch]` based on git commit ancestry

## Test plan

- [x] `mise run container-list` — all containers shown with `[main]`/`[branch]` hints
- [x] `mise run container-list prometheus` — filtered view, more tags, correctly shows `[main]` and `[branch]`
- [x] `mise run container-list nonexistent` — error message with exit code 1
- [x] Pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/263
2026-02-24 09:54:58 -08:00
b1ba96f6d6 Review migrate-grafana-to-authentik: fix file paths, add last-reviewed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 07:29:41 -08:00
9b4951bf94 Improve Mikado process: cycle discipline, reset rigor, --resume enhancements (#261)
## Summary
- **End-of-cycle prompting:** After closing a leaf node and pushing, the agent should prompt the user to review and suggest ending the session rather than rushing into the next leaf
- **Reset rigor:** Reinforced that errors during impl should trigger a branch reset + plan update (not fix-forward). Documented the `git log --oneline --not main` → `git reset --hard` → `git cherry-pick` pattern with clear threshold guidance
- **`--resume` shows PR number:** Queries the Forgejo API for open PRs matching the branch, displays number/title/URL and a hint to run `pr-comments`
- **`--resume` checks git stash:** Shows stash entries as a non-presumptive hint — informs without assuming they apply

## Test plan
- [ ] `mise run docs-mikado --resume` runs without errors (no active chains case)
- [ ] On a mikado branch with an open PR, verify PR info is shown
- [ ] With stashed work, verify stash entries are displayed
- [ ] Review agent-change-process.md for clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/261
2026-02-23 21:03:27 -08:00
d05d2fbaff C2: Upgrade Grafana to 12.x with Nix container and Kustomize (#260)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 1s
Build Container (Nix) / build (grafana) (push) Successful in 2s
Build Container / build (grafana) (push) Successful in 7s
## Summary

Mikado chain to upgrade Grafana from 11.4.0 (Helm chart) to 12.x with:
- Home-built Nix container image (`forge.ops.eblu.me/eblume/grafana`)
- Kustomize manifests replacing the Helm chart
- Single-source ArgoCD app

## Chain

Goal: `upgrade-grafana`
Leaves: `build-grafana-container`, `kustomize-grafana-deployment`

Track with: `mise run docs-mikado upgrade-grafana`

## Test plan
- [ ] Container builds successfully via Nix
- [ ] Container pushed to registry
- [ ] Kustomize manifests produce equivalent resources to current Helm
- [ ] Pod runs, UI loads, OIDC works, datasources healthy
- [ ] `mise run services-check` passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/260
2026-02-23 18:07:18 -08:00
4c5e0f0d16 Rename containers/forgejo-runner to runner-job-image
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (runner-job-image) (push) Successful in 2s
Build Container / build (runner-job-image) (push) Successful in 1m42s
The forgejo-runner container is the CI job execution environment (Dagger,
ArgoCD CLI, etc.), not the runner daemon itself. Rename to runner-job-image
to fix the version-check false positive (Dagger 0.19.11 vs daemon 12.7.0)
and clarify the distinction.

RUNNER_LABELS still references the old image name — will update after
building the image under the new name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:44:51 -08:00
66b5b32f1d Formalize C0/C1/C2 change classification (#259)
## Summary
- **C0 (Quick Fix):** Now explicitly allows direct-to-main commits with no PR required — for low-risk, fix-forward-safe changes
- **C1 (Human Review):** New docs-first workflow with branch deployment (ArgoCD `--revision`, Ansible from checkout). Includes upgrade criteria for escalation to C2
- **C2 (Mikado Chain):** Introduces the **Mikado Branch Invariant** — strict commit ordering where card-introducing commits come first, followed by code progress, followed by card closures. Branch resets required when new prerequisites are discovered

Updates CLAUDE.md rules (3, 4, 8, 9) to reflect that C0 bypasses branching/PR requirements. Also updates ai-assistance-guide, how-to index, and docs-mikado task description.

## Files changed
- `CLAUDE.md` — rules and classification table
- `docs/how-to/agent-change-process.md` — full process rewrite
- `docs/tutorials/ai-assistance-guide.md` — branching and pitfalls sections
- `docs/how-to/how-to.md` — index description
- `mise-tasks/docs-mikado` — task description
- `docs/changelog.d/formalize-change-classification.doc.md` — changelog fragment

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/259
2026-02-23 16:19:54 -08:00
f05e5cccdf Review Grafana: replace Helm upgrade plan with C2 Mikado chain (#258)
## Summary
- Delete the old 3-phase Helm chart upgrade plan (predates Mikado system)
- Create C2 Mikado chain with goal card `upgrade-grafana` and two leaf prereqs:
  - `kustomize-grafana-deployment` — convert Helm to kustomize manifests
  - `build-grafana-container` — home-built Grafana 12.x image (no upstream containers)
- Record first-ever Grafana review: currently at v11.4.0 on Helm chart 8.8.2
- Update service-versions.yaml, how-to index, and plans index

## Service Review Findings
- Grafana is healthy and synced in ArgoCD
- Running v11.4.0, latest upstream is 12.3.3
- Breaking changes for 12.x are low-risk (React panels only, UIDs compliant)
- PVC is disposable — dashboards and datasources are all config-provisioned

## Deployment and Testing
- [ ] No deployment needed — documentation-only change
- [ ] `docs-check-links` passes
- [ ] `docs-check-index` passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/258
2026-02-23 15:06:00 -08:00
2865bf5c27 Review deploy-authentik: rewrite as process guide (#257)
## Summary
- Rewrites deploy-authentik from a historical changelog into a reproducible process guide
- Removes stale version info (`v1.1.2-nix`) and future work section (Forgejo federation is done, rest belongs elsewhere)
- Marks deploy-authentik as completed in plans index and completed archive
- Removes hardcoded image tag from authentik reference card (use `service-versions.yaml`)
- Adds `last-reviewed: 2026-02-23` frontmatter

## Test plan
- [x] All pre-commit hooks pass (docs-check-links, docs-check-index, etc.)
- [x] ArgoCD app verified synced and healthy
- [x] All wiki-links validated

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/257
2026-02-23 14:35:39 -08:00
84d2cdcf14 Update tooling dependencies (Feb 2026 cycle)
Pre-commit: trufflehog v3.93.4, ruff v0.15.2, shellcheck v0.11.0.1,
prettier v3.8.1, actionlint v1.7.11

Fly.io: pin nginx 1.28.2-alpine, bump alloy v1.5.1 -> v1.13.1

Forgejo workflows: pin actions/checkout to SHA (v4.3.1)

Mise tasks: normalize httpx>=0.28.0, typer>=0.15.0 across all scripts

Add how-to doc for the monthly tooling dependency update cycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 13:22:09 -08:00
cb9a06bb75 Update tooling dependencies (Feb 2026 cycle) (#254)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m30s
## Summary

Monthly tooling dependency update cycle:

- **Pre-commit hooks**: trufflehog v3.92.5→v3.93.4, ruff v0.14.13→v0.15.2, shellcheck v0.10.0.1→v0.11.0.1, prettier v3.8.0→v3.8.1, actionlint v1.7.10→v1.7.11
- **Fly.io Dockerfile**: pin nginx to 1.28.2-alpine (was unpinned), bump alloy v1.5.1→v1.13.1
- **Mise tasks**: normalize httpx lower bound to >=0.28.0 and typer to >=0.15.0 across all scripts
- **Forgejo workflows**: actions/checkout@v4 is current, no changes needed
- **New how-to doc**: [[update-tooling-dependencies]] documenting this monthly cycle

## No changes needed

- pre-commit-hooks v6.0.0, yamllint v1.38.0, shfmt v3.12.0-2, taplo v0.9.3, ansible-lint 26.1.1 — all already at latest

## Test plan

- [x] `uvx pre-commit run --all-files` — all 24 hooks pass
- [ ] Fly.io deploy (triggered automatically on merge to main via deploy-fly workflow)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/254
2026-02-23 13:08:41 -08:00
e655f4556e Upgrade k8s forgejo-runner from v6.3.1 to v12.7.0 (#251)
## Summary

Completes the `upgrade-k8s-runner` mikado chain. Both prerequisites (workflow validation in Dagger, config review against v12 defaults) were resolved in #250.

- Bump runner image `code.forgejo.org/forgejo/runner:6.3.1` → `12.7.0`
- Update `service-versions.yaml` to track new version
- Mark goal card complete (remove `status: active`)

## Deployment and Testing

After merge:
1. `argocd app sync forgejo-runner`
2. Verify runner registers in Forgejo admin → runners
3. Trigger a test workflow (e.g. `branch-cleanup.yaml` manual dispatch)

Rollback: revert image tag to `6.3.1`, push, sync.
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/251
2026-02-22 17:43:39 -08:00
0f6a1898f0 Prepare forgejo-runner v12 upgrade (leaf nodes) (#250)
## Summary
- Review runner config against v12.7.0 defaults — added `shutdown_timeout: 3h`, no breaking changes found
- Add `validate_workflows` Dagger function using `forgejo-runner validate --directory .` inside upstream container
- All 6 workflows pass v12.7.0 schema validation
- Wire `mise run validate-workflows` task and pre-commit hook on `.forgejo/workflows/` changes
- Mark both leaf Mikado cards (`review-runner-config-v12`, `validate-workflows-against-v12`) complete

## Mikado State
After merge, `upgrade-k8s-runner` goal card has no unmet dependencies — ready to execute the actual image bump in a follow-up PR.

## Test Plan
- [x] `dagger call validate-workflows --src=.` passes (all 6 workflows OK)
- [x] Pre-commit hooks pass
- [ ] Reviewer: confirm `shutdown_timeout: 3h` addition to ConfigMap looks reasonable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/250
2026-02-22 17:38:32 -08:00