Doc review: 5 stalest cards; scale back ai-docs rule; document heph CLI

Reviewed (verified against live state) and stamped last-reviewed on: - reference/services/argocd.md: SSO via Authentik (public PKCE client), dual-cluster management (minikube + ringtail k3s), corrected sync policy (everything is manual sync, including the apps root) - reference/services/authentik.md: blueprint list grown to 8 OIDC clients, postgresql-* secret fields, client-type table - reference/services/grafana.md: TeslaMate datasource now pg.ops.eblu.me:5434 (ringtail), dashboard inventory refreshed, TeslaMate dashboards via pinned-tag init container - reference/infrastructure/unifi.md: UnPoller now a locally-built image - how-to/mealie/plan-a-meal.md: procedure verified; stored API token currently returns 401 (operational fix tracked separately) AGENTS.md: replace the mandatory full ai-docs read with a find-relevant- docs-first rule (bulk ai-docs/ai-sources now opt-in), and document the heph CLI surface for reading and manipulating Blumeops tasks. agent-change-process.md updated to match. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 15:57:45 -07:00 · 2026-06-09 15:57:45 -07:00 · 2a2ba0bcb7
commit 2a2ba0bcb7
parent 1c41cca903
9 changed files with 99 additions and 50 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -12,10 +12,9 @@ blumeops is Erich Blume's GitOps repository for personal infrastructure, orchest

 ## Rules

-1. **Always run `mise run ai-docs` at session start**
-    This will refresh your context with important information you will be assumed to know and follow.
-    **Read the full output** — never truncate, pipe to `head`/`tail`, or skip sections.
-    For problems with a large surface area, ask the user if `mise run ai-sources` should also be run — it concatenates all non-doc source files (~270K tokens) for deep codebase context.
+1. **Start every task by finding and reading the relevant docs**
+    Search `docs/` for cards related to the change area (grep for titles/tags, follow `[[wiki-links]]`) and read what you find before acting. Wiki-links refer to cards under `docs/` by filename stem.
+    For problems with a very large surface area, bulk context is available: `mise run ai-docs` concatenates all docs (~130K tokens) and `mise run ai-sources` all non-doc source files (~270K tokens). These are opt-in — confirm with the user before loading either wholesale; targeted reading is usually better.
 2. **Always use `--context=minikube-indri` with kubectl** (or `--context=k3s-ringtail` for ringtail services) - work contexts must never be touched
    **NEVER run `minikube delete`** — it destroys all PVs, etcd, and cluster state. Use `minikube stop`/`minikube start` for restarts. If minikube is stuck, see [[restart-indri]]. Full rebuild from scratch requires the DR procedure in [[rebuild-minikube-cluster]].
 3. **Classify the change as C0/C1/C2 before starting** (see below) — this determines branching and PR requirements
@ -148,13 +147,42 @@ Create a new spork: `mise run spork-create <mirror-name>`
 ## Task Discovery

 BlumeOps tasks live in [hephaestus](https://github.com/eblume/hephaestus) (`heph`),
-the user's self-hosted context/task system. Fetch them with the CLI:
+the user's self-hosted context/task system. The CLI is a thin client of the
+local `hephd` daemon. (This replaced the retired `blumeops-tasks` mise task,
+which read from Todoist.)
+
+### Reading tasks

 ```fish
-heph list --project Blumeops --json  # outstanding Blumeops tasks as JSON
+heph list --project Blumeops --json   # outstanding Blumeops tasks as JSON
+heph next                             # tactical "what is next?" ranking
+heph show <node_id>                   # one task with its scalars
+heph context <node_id>                # print the task's canonical-context doc
+heph log <node_id>                    # print the task's latest log entries
 ```

-(This replaced the retired `blumeops-tasks` mise task, which read from Todoist.)
+JSON rows carry `node_id` (use this as `<ID>` in all commands below), `title`,
+`state`, `do_date`/`late_on` (epoch ms), `recurrence` (RFC-5545), and
+`attention` (red|orange|white|blue — a1–a4 urgency tiers; blue = on-deck).
+
+### Manipulating tasks
+
+```fish
+heph done <node_id>                   # mark done (recurring tasks roll forward)
+heph drop <node_id>                   # mark dropped
+heph skip <node_id>                   # skip a recurring task's current occurrence
+heph log <node_id> "text"             # append a log entry
+heph context <node_id> --append "…"   # append to the canonical-context doc (--body replaces; `-` reads stdin)
+heph edit <node_id> --do-date +3d     # reschedule; also --late-on/--recur/--attention/--project (`none` clears)
+heph task "Title" --project Blumeops --do-date fri --attention white  # create a task
+```
+
+Date forms: `today|tomorrow|+3d|fri|YYYY-MM-DD`. Recurrence: presets
+(`daily|weekly|monthly|yearly|weekdays`) or natural language (`"every 3 days"`).
+
+Conventions: don't save TODOs to agent memory — file them as heph tasks under
+the Blumeops project. When completing a recurring chore (e.g. "BlumeOps doc
+review"), `heph log` a short note of what was done, then `heph done` it.

 Most operational scripts are stored in `./mise-tasks/`. For scripts with any logic or
 complexity, use uv run --script 's with explicit dependencies. Complex
--- a/docs/changelog.d/doc-review-stalest-five.ai.md
+++ b/docs/changelog.d/doc-review-stalest-five.ai.md
@ -0,0 +1 @@
+Scaled back the agent context rule: agents now start tasks by finding and reading relevant docs instead of mandatorily ingesting the full `mise run ai-docs` output (which had grown to ~130K tokens). Also documented the full `heph` CLI task workflow (read, log, complete, create) in AGENTS.md.
--- a/docs/changelog.d/doc-review-stalest-five.doc.md
+++ b/docs/changelog.d/doc-review-stalest-five.doc.md
@ -0,0 +1 @@
+Reviewed the five stalest documentation cards (argocd, authentik, grafana, unifi, plan-a-meal): brought ArgoCD's SSO/dual-cluster/sync-policy story up to date, expanded Authentik's blueprint and OIDC client inventory to all eight clients, fixed Grafana's TeslaMate datasource target and dashboard list, and noted UnPoller's locally-built image.
--- a/docs/explanation/agent-change-process.md
+++ b/docs/explanation/agent-change-process.md
@ -1,6 +1,6 @@
 ---
 title: Agent Change Process
-modified: 2026-03-15
+modified: 2026-06-09
 last-reviewed: 2026-02-23
 tags:
  - explanation
@ -25,13 +25,13 @@ Before starting work, classify the change:

 When in doubt, start at C1. Upgrade to C2 if complexity spirals or the user requests it.

-**Context loading:** All change classes start with `mise run ai-docs` (~85K tokens of documentation). For problems with a large surface area, ask the user if `mise run ai-sources` should also be run — it concatenates all non-doc source files (~270K tokens). Together they cover the full codebase without overlap.
+**Context loading:** All change classes start by finding and reading the docs relevant to the change area — grep `docs/` and follow wiki-links. For problems with a very large surface area, bulk context is available on request: `mise run ai-docs` concatenates all docs (~130K tokens) and `mise run ai-sources` all non-doc source files (~270K tokens). Confirm with the user before loading either wholesale.

 ## C0 — Quick Fix

 A change where the risk is low enough that problems can be quickly fixed forward.

-1. Run `mise run ai-docs` to load context
+1. Find and read the docs relevant to the change area
 2. Implement the change directly on main
 3. Add a changelog fragment if the change is user-visible or noteworthy (`docs/changelog.d/+<descriptive-slug>.<type>.md`)
 4. Commit and push
@ -46,7 +46,7 @@ A change with enough complexity or risk that a human should review it, but not s

 ### Process

-1. Run `mise run ai-docs` to load context
+1. Find and read the docs relevant to the change area
 2. **Search related docs** — read existing documentation and reference cards related to the change area
 3. **Create a feature branch** and open a PR early (draft is fine)
 4. **Documentation first** — commit doc changes reflecting the desired end state before writing code. This helps the reviewer understand intent and catches design issues early
@ -77,7 +77,7 @@ A complex, multi-session change managed through the [Mikado method](https://mika

 Before writing any code, invest in understanding the problem:

-1. Run `mise run ai-docs` to load context
+1. Find and read the docs relevant to the change area
 2. Search related docs, reference cards, and existing how-to guides for the change area
 3. Think through the dependency graph — what prerequisites exist? What could go wrong?
 4. Create Mikado cards for everything you can anticipate (you'll discover more later — that's the point of the method)
@ -220,7 +220,7 @@ When the final leaf node is closed and no `status: active` cards remain:

 When starting a new session to continue C2 work:

-1. Run `mise run ai-docs` to load context
+1. Find and read the docs relevant to the change area
 2. Run `mise run docs-mikado --resume` — this will:
   - Detect the current branch and match it to an active chain
   - Show the chain state, ready leaf nodes, and current position in the invariant
--- a/docs/how-to/mealie/plan-a-meal.md
+++ b/docs/how-to/mealie/plan-a-meal.md
@ -1,6 +1,7 @@
 ---
 title: Plan a Meal
-modified: 2026-03-17
+modified: 2026-06-09
+last-reviewed: 2026-06-09
 tags:
  - how-to
  - mealie
--- a/docs/reference/infrastructure/unifi.md
+++ b/docs/reference/infrastructure/unifi.md
@ -1,6 +1,7 @@
 ---
 title: UniFi
-modified: 2026-03-16
+modified: 2026-06-09
+last-reviewed: 2026-06-09
 tags:
  - infrastructure
  - networking
@ -71,7 +72,7 @@ Attempted Feb 2026 with the `ubiquiti-community/unifi` Terraform provider via Pu

 ## Monitoring

-UniFi metrics are exported to Prometheus via [UnPoller](https://github.com/unpoller/unpoller), running as a k8s deployment in the `monitoring` namespace on indri. UnPoller polls the UX7 controller API using an API key and exposes metrics on port 9130.
+UniFi metrics are exported to Prometheus via [UnPoller](https://github.com/unpoller/unpoller), running as a k8s deployment in the `monitoring` namespace on indri's minikube (`argocd/manifests/unpoller/`, locally-built image `registry.ops.eblu.me/blumeops/unpoller`). UnPoller polls the UX7 controller API using an API key and exposes metrics on port 9130.

 - **Prometheus job:** `unpoller`
 - **Metrics prefix:** `unifi_`
--- a/docs/reference/services/argocd.md
+++ b/docs/reference/services/argocd.md
@ -1,6 +1,7 @@
 ---
 title: ArgoCD
-modified: 2026-02-07
+modified: 2026-06-09
+last-reviewed: 2026-06-09
 tags:
  - service
  - gitops
@ -18,22 +19,38 @@ GitOps continuous delivery platform for the [[cluster|Kubernetes cluster]].
 | **Tailscale URL** | https://argocd.tail8d86e.ts.net |
 | **Namespace** | `argocd` |
 | **Git Source** | `ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git` |
-| **Manifests Path** | `argocd/` |
+| **Manifests Path** | `argocd/apps/` (Applications), `argocd/manifests/` (workloads) |
+
+## Clusters
+
+A single ArgoCD instance (on indri's minikube) manages both clusters:
+
+| Cluster | Destination | Apps |
+|---------|-------------|------|
+| minikube (indri) | `https://kubernetes.default.svc` | Most services |
+| k3s ([[ringtail]]) | `https://ringtail.tail8d86e.ts.net:6443` | GPU workloads and `*-ringtail` apps |

 ## Sync Policy

-| Application | Sync Policy | Rationale |
-|-------------|-------------|-----------|
-| `apps` | Automated | Picks up new Application manifests |
-| All workloads | Manual | Explicit control over deployments |
+All applications use **manual sync** — including the `apps` app-of-apps root. To pick up newly added Application manifests, sync `apps` explicitly:

-## Credentials
+```bash
+argocd app sync apps
+```

- Admin password: 1Password (blumeops vault)
- Git deploy key (SSH): 1Password
+This gives explicit control over every deployment; nothing rolls out on push alone.
+
+## Authentication
+
+- **SSO via [[authentik]]** — OIDC with a public PKCE client (`argocd`), shared by the web UI and CLI: `argocd login argocd.ops.eblu.me --sso`. The Authentik `admins` group maps to `role:admin` via the RBAC ConfigMap; the default policy grants no access.
+- **Local admin** — break-glass password in 1Password (blumeops vault), for when Authentik is down.
+
+The git deploy key (SSH) is injected via [[external-secrets]].

 ## Related

 - [[argocd-cli]] - CLI usage and deployment workflows
 - [[apps|Apps]] - Full application registry
 - [[forgejo]] - Git source
+- [[authentik]] - OIDC identity provider for SSO
+- [[federated-login]] - How authentication works across BlumeOps
--- a/docs/reference/services/authentik.md
+++ b/docs/reference/services/authentik.md
@ -1,6 +1,7 @@
 ---
 title: Authentik
-modified: 2026-02-20
+modified: 2026-06-09
+last-reviewed: 2026-06-09
 tags:
  - service
  - security
@ -42,9 +43,7 @@ Authentik configuration is managed via Blueprints (YAML) stored as a ConfigMap m

 - **`common.yaml`** — shared identity resources (`admins` group)
 - **`mfa.yaml`** — MFA enforcement on the default authentication flow (`not_configured_action: configure`)
- **`grafana.yaml`** — Grafana OAuth2 provider, application, and policy binding
- **`forgejo.yaml`** — Forgejo OAuth2 provider, application, and policy binding
- **`zot.yaml`** — Zot registry OAuth2 provider, application, and policy binding
+- One blueprint per OIDC client (provider, application, and policy binding): `grafana.yaml`, `forgejo.yaml`, `zot.yaml`, `argocd.yaml`, `jellyfin.yaml`, `mealie.yaml`, `paperless.yaml`, `heph.yaml`

 Group membership is included in the `profile` scope claim (Authentik built-in). Services use `--group-claim-name groups` to read it.

@ -52,13 +51,18 @@ Blueprint file: `argocd/manifests/authentik/configmap-blueprint.yaml`

 ## OIDC Clients

-| Client | Status |
-|--------|--------|
-| [[grafana]] | Active |
-| [[forgejo]] | Active |
-| [[zot]] | Active |
+| Client | Type |
+|--------|------|
+| [[grafana]] | Confidential |
+| [[forgejo]] | Confidential |
+| [[zot]] | Confidential |
+| [[argocd]] | Public (PKCE, shared by web UI and CLI) |
+| [[jellyfin]] | Confidential |
+| [[mealie]] | Confidential |
+| [[paperless]] | Confidential |
+| heph | Public (PKCE, with `offline_access` for spoke sync refresh tokens) |

-Future clients: [[argocd]], [[miniflux]]
+Future clients: [[miniflux]]

 ## Secrets

@ -67,11 +71,10 @@ Injected via [[external-secrets]] from the "Authentik (blumeops)" 1Password item
 | 1Password Field | Purpose |
 |-----------------|---------|
 | `secret-key` | Authentik secret key |
-| `db-password` | PostgreSQL password |
-| `grafana-client-secret` | OIDC client secret for Grafana |
-| `forgejo-client-secret` | OIDC client secret for Forgejo |
-| `zot-client-secret` | OIDC client secret for Zot |
-| `api-token` | Authentik API token |
+| `postgresql-host` / `-port` / `-name` / `-user` / `-password` | PostgreSQL connection |
+| `<client>-client-secret` | OIDC client secret, one per confidential client (grafana, forgejo, zot, jellyfin, mealie, paperless) |
+
+The item also holds an `api-token` field (Authentik API access for admin scripting); it is not synced into the cluster.

 ## Container Image

--- a/docs/reference/services/grafana.md
+++ b/docs/reference/services/grafana.md
@ -1,6 +1,7 @@
 ---
 title: Grafana
-modified: 2026-02-28
+modified: 2026-06-09
+last-reviewed: 2026-06-09
 tags:
  - service
  - observability
@ -25,7 +26,7 @@ Dashboards and visualization for BlumeOps observability.

 Grafana supports two login methods:

- **SSO via [[authentik]]** — OIDC login through Authentik (`auth.generic_oauth`). Users click "Sign in with Authentik", authenticate at Authentik, and are redirected back as Admin.
+- **SSO via [[authentik]]** — OIDC login through Authentik (`auth.generic_oauth`). Members of the Authentik `admins` group get the Admin role; everyone else gets Viewer (`role_attribute_path` in `grafana.ini`).
 - **Local admin** — break-glass login using the password from 1Password ("Grafana (blumeops)"). Always available if Authentik is down.

 The OIDC client secret is injected via [[external-secrets]] (`grafana-authentik-oauth` secret in monitoring namespace).
@ -37,7 +38,7 @@ The OIDC client secret is injected via [[external-secrets]] (`grafana-authentik-
 | Prometheus | prometheus | `prometheus.monitoring.svc.cluster.local:9090` |
 | Loki | loki | `loki.monitoring.svc.cluster.local:3100` |
 | Tempo | tempo | `tempo.monitoring.svc.cluster.local:3200` |
-| TeslaMate | postgres | `blumeops-pg-rw.databases.svc.cluster.local:5432` |
+| TeslaMate | postgres | `pg.ops.eblu.me:5434` (TeslaMate's database on [[ringtail]], via Caddy L4) |

 ## Dashboard Provisioning

@ -49,13 +50,9 @@ Optional annotation: `grafana_folder: "FolderName"`

 ## Key Dashboards

- macOS System - Host metrics for indri
- Minikube - Kubernetes cluster overview
- Borgmatic Backups - Backup status and trends
- Services Health - HTTP probe results
- Docs APM - Request rate, latency, cache for docs.eblu.me
- Fly.io Proxy Health - Aggregate proxy health across all upstream services
- TeslaMate (18 dashboards) - Vehicle data
+Provisioned dashboards live in `argocd/manifests/grafana-config/dashboards/` (one ConfigMap per dashboard). Coverage as of 2026-06: alerts, borgmatic, CV APM, devpi, docs APM, fly.io proxy, forgejo, frigate, jellyfin, kubernetes, loki, macOS (indri host), postgresql, ringtail, shower APM, sifaka disks, snowflake proxy, tempo, transmission, zot.
+
+TeslaMate's dashboards are not in the repo — an init container fetches them from the forge mirror at a pinned tag (`TESLAMATE_VERSION` in `argocd/manifests/grafana/deployment.yaml`).

 ## Related
				`@ -0,0 +1 @@`
				Scaled back the agent context rule: agents now start tasks by finding and reading relevant docs instead of mandatorily ingesting the full `mise run ai-docs` output (which had grown to ~130K tokens). Also documented the full `heph` CLI task workflow (read, log, complete, create) in AGENTS.md.
				`@ -0,0 +1 @@`
				`Reviewed the five stalest documentation cards (argocd, authentik, grafana, unifi, plan-a-meal): brought ArgoCD's SSO/dual-cluster/sync-policy story up to date, expanded Authentik's blueprint and OIDC client inventory to all eight clients, fixed Grafana's TeslaMate datasource target and dashboard list, and noted UnPoller's locally-built image.`