Doc review: 5 stalest cards; scale back ai-docs rule; document heph CLI
Reviewed (verified against live state) and stamped last-reviewed on: - reference/services/argocd.md: SSO via Authentik (public PKCE client), dual-cluster management (minikube + ringtail k3s), corrected sync policy (everything is manual sync, including the apps root) - reference/services/authentik.md: blueprint list grown to 8 OIDC clients, postgresql-* secret fields, client-type table - reference/services/grafana.md: TeslaMate datasource now pg.ops.eblu.me:5434 (ringtail), dashboard inventory refreshed, TeslaMate dashboards via pinned-tag init container - reference/infrastructure/unifi.md: UnPoller now a locally-built image - how-to/mealie/plan-a-meal.md: procedure verified; stored API token currently returns 401 (operational fix tracked separately) AGENTS.md: replace the mandatory full ai-docs read with a find-relevant- docs-first rule (bulk ai-docs/ai-sources now opt-in), and document the heph CLI surface for reading and manipulating Blumeops tasks. agent-change-process.md updated to match. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
1c41cca903
commit
2a2ba0bcb7
9 changed files with 99 additions and 50 deletions
42
AGENTS.md
42
AGENTS.md
|
|
@ -12,10 +12,9 @@ blumeops is Erich Blume's GitOps repository for personal infrastructure, orchest
|
|||
|
||||
## Rules
|
||||
|
||||
1. **Always run `mise run ai-docs` at session start**
|
||||
This will refresh your context with important information you will be assumed to know and follow.
|
||||
**Read the full output** — never truncate, pipe to `head`/`tail`, or skip sections.
|
||||
For problems with a large surface area, ask the user if `mise run ai-sources` should also be run — it concatenates all non-doc source files (~270K tokens) for deep codebase context.
|
||||
1. **Start every task by finding and reading the relevant docs**
|
||||
Search `docs/` for cards related to the change area (grep for titles/tags, follow `[[wiki-links]]`) and read what you find before acting. Wiki-links refer to cards under `docs/` by filename stem.
|
||||
For problems with a very large surface area, bulk context is available: `mise run ai-docs` concatenates all docs (~130K tokens) and `mise run ai-sources` all non-doc source files (~270K tokens). These are opt-in — confirm with the user before loading either wholesale; targeted reading is usually better.
|
||||
2. **Always use `--context=minikube-indri` with kubectl** (or `--context=k3s-ringtail` for ringtail services) - work contexts must never be touched
|
||||
**NEVER run `minikube delete`** — it destroys all PVs, etcd, and cluster state. Use `minikube stop`/`minikube start` for restarts. If minikube is stuck, see [[restart-indri]]. Full rebuild from scratch requires the DR procedure in [[rebuild-minikube-cluster]].
|
||||
3. **Classify the change as C0/C1/C2 before starting** (see below) — this determines branching and PR requirements
|
||||
|
|
@ -148,13 +147,42 @@ Create a new spork: `mise run spork-create <mirror-name>`
|
|||
## Task Discovery
|
||||
|
||||
BlumeOps tasks live in [hephaestus](https://github.com/eblume/hephaestus) (`heph`),
|
||||
the user's self-hosted context/task system. Fetch them with the CLI:
|
||||
the user's self-hosted context/task system. The CLI is a thin client of the
|
||||
local `hephd` daemon. (This replaced the retired `blumeops-tasks` mise task,
|
||||
which read from Todoist.)
|
||||
|
||||
### Reading tasks
|
||||
|
||||
```fish
|
||||
heph list --project Blumeops --json # outstanding Blumeops tasks as JSON
|
||||
heph list --project Blumeops --json # outstanding Blumeops tasks as JSON
|
||||
heph next # tactical "what is next?" ranking
|
||||
heph show <node_id> # one task with its scalars
|
||||
heph context <node_id> # print the task's canonical-context doc
|
||||
heph log <node_id> # print the task's latest log entries
|
||||
```
|
||||
|
||||
(This replaced the retired `blumeops-tasks` mise task, which read from Todoist.)
|
||||
JSON rows carry `node_id` (use this as `<ID>` in all commands below), `title`,
|
||||
`state`, `do_date`/`late_on` (epoch ms), `recurrence` (RFC-5545), and
|
||||
`attention` (red|orange|white|blue — a1–a4 urgency tiers; blue = on-deck).
|
||||
|
||||
### Manipulating tasks
|
||||
|
||||
```fish
|
||||
heph done <node_id> # mark done (recurring tasks roll forward)
|
||||
heph drop <node_id> # mark dropped
|
||||
heph skip <node_id> # skip a recurring task's current occurrence
|
||||
heph log <node_id> "text" # append a log entry
|
||||
heph context <node_id> --append "…" # append to the canonical-context doc (--body replaces; `-` reads stdin)
|
||||
heph edit <node_id> --do-date +3d # reschedule; also --late-on/--recur/--attention/--project (`none` clears)
|
||||
heph task "Title" --project Blumeops --do-date fri --attention white # create a task
|
||||
```
|
||||
|
||||
Date forms: `today|tomorrow|+3d|fri|YYYY-MM-DD`. Recurrence: presets
|
||||
(`daily|weekly|monthly|yearly|weekdays`) or natural language (`"every 3 days"`).
|
||||
|
||||
Conventions: don't save TODOs to agent memory — file them as heph tasks under
|
||||
the Blumeops project. When completing a recurring chore (e.g. "BlumeOps doc
|
||||
review"), `heph log` a short note of what was done, then `heph done` it.
|
||||
|
||||
Most operational scripts are stored in `./mise-tasks/`. For scripts with any logic or
|
||||
complexity, use uv run --script 's with explicit dependencies. Complex
|
||||
|
|
|
|||
1
docs/changelog.d/doc-review-stalest-five.ai.md
Normal file
1
docs/changelog.d/doc-review-stalest-five.ai.md
Normal file
|
|
@ -0,0 +1 @@
|
|||
Scaled back the agent context rule: agents now start tasks by finding and reading relevant docs instead of mandatorily ingesting the full `mise run ai-docs` output (which had grown to ~130K tokens). Also documented the full `heph` CLI task workflow (read, log, complete, create) in AGENTS.md.
|
||||
1
docs/changelog.d/doc-review-stalest-five.doc.md
Normal file
1
docs/changelog.d/doc-review-stalest-five.doc.md
Normal file
|
|
@ -0,0 +1 @@
|
|||
Reviewed the five stalest documentation cards (argocd, authentik, grafana, unifi, plan-a-meal): brought ArgoCD's SSO/dual-cluster/sync-policy story up to date, expanded Authentik's blueprint and OIDC client inventory to all eight clients, fixed Grafana's TeslaMate datasource target and dashboard list, and noted UnPoller's locally-built image.
|
||||
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Agent Change Process
|
||||
modified: 2026-03-15
|
||||
modified: 2026-06-09
|
||||
last-reviewed: 2026-02-23
|
||||
tags:
|
||||
- explanation
|
||||
|
|
@ -25,13 +25,13 @@ Before starting work, classify the change:
|
|||
|
||||
When in doubt, start at C1. Upgrade to C2 if complexity spirals or the user requests it.
|
||||
|
||||
**Context loading:** All change classes start with `mise run ai-docs` (~85K tokens of documentation). For problems with a large surface area, ask the user if `mise run ai-sources` should also be run — it concatenates all non-doc source files (~270K tokens). Together they cover the full codebase without overlap.
|
||||
**Context loading:** All change classes start by finding and reading the docs relevant to the change area — grep `docs/` and follow wiki-links. For problems with a very large surface area, bulk context is available on request: `mise run ai-docs` concatenates all docs (~130K tokens) and `mise run ai-sources` all non-doc source files (~270K tokens). Confirm with the user before loading either wholesale.
|
||||
|
||||
## C0 — Quick Fix
|
||||
|
||||
A change where the risk is low enough that problems can be quickly fixed forward.
|
||||
|
||||
1. Run `mise run ai-docs` to load context
|
||||
1. Find and read the docs relevant to the change area
|
||||
2. Implement the change directly on main
|
||||
3. Add a changelog fragment if the change is user-visible or noteworthy (`docs/changelog.d/+<descriptive-slug>.<type>.md`)
|
||||
4. Commit and push
|
||||
|
|
@ -46,7 +46,7 @@ A change with enough complexity or risk that a human should review it, but not s
|
|||
|
||||
### Process
|
||||
|
||||
1. Run `mise run ai-docs` to load context
|
||||
1. Find and read the docs relevant to the change area
|
||||
2. **Search related docs** — read existing documentation and reference cards related to the change area
|
||||
3. **Create a feature branch** and open a PR early (draft is fine)
|
||||
4. **Documentation first** — commit doc changes reflecting the desired end state before writing code. This helps the reviewer understand intent and catches design issues early
|
||||
|
|
@ -77,7 +77,7 @@ A complex, multi-session change managed through the [Mikado method](https://mika
|
|||
|
||||
Before writing any code, invest in understanding the problem:
|
||||
|
||||
1. Run `mise run ai-docs` to load context
|
||||
1. Find and read the docs relevant to the change area
|
||||
2. Search related docs, reference cards, and existing how-to guides for the change area
|
||||
3. Think through the dependency graph — what prerequisites exist? What could go wrong?
|
||||
4. Create Mikado cards for everything you can anticipate (you'll discover more later — that's the point of the method)
|
||||
|
|
@ -220,7 +220,7 @@ When the final leaf node is closed and no `status: active` cards remain:
|
|||
|
||||
When starting a new session to continue C2 work:
|
||||
|
||||
1. Run `mise run ai-docs` to load context
|
||||
1. Find and read the docs relevant to the change area
|
||||
2. Run `mise run docs-mikado --resume` — this will:
|
||||
- Detect the current branch and match it to an active chain
|
||||
- Show the chain state, ready leaf nodes, and current position in the invariant
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: Plan a Meal
|
||||
modified: 2026-03-17
|
||||
modified: 2026-06-09
|
||||
last-reviewed: 2026-06-09
|
||||
tags:
|
||||
- how-to
|
||||
- mealie
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: UniFi
|
||||
modified: 2026-03-16
|
||||
modified: 2026-06-09
|
||||
last-reviewed: 2026-06-09
|
||||
tags:
|
||||
- infrastructure
|
||||
- networking
|
||||
|
|
@ -71,7 +72,7 @@ Attempted Feb 2026 with the `ubiquiti-community/unifi` Terraform provider via Pu
|
|||
|
||||
## Monitoring
|
||||
|
||||
UniFi metrics are exported to Prometheus via [UnPoller](https://github.com/unpoller/unpoller), running as a k8s deployment in the `monitoring` namespace on indri. UnPoller polls the UX7 controller API using an API key and exposes metrics on port 9130.
|
||||
UniFi metrics are exported to Prometheus via [UnPoller](https://github.com/unpoller/unpoller), running as a k8s deployment in the `monitoring` namespace on indri's minikube (`argocd/manifests/unpoller/`, locally-built image `registry.ops.eblu.me/blumeops/unpoller`). UnPoller polls the UX7 controller API using an API key and exposes metrics on port 9130.
|
||||
|
||||
- **Prometheus job:** `unpoller`
|
||||
- **Metrics prefix:** `unifi_`
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: ArgoCD
|
||||
modified: 2026-02-07
|
||||
modified: 2026-06-09
|
||||
last-reviewed: 2026-06-09
|
||||
tags:
|
||||
- service
|
||||
- gitops
|
||||
|
|
@ -18,22 +19,38 @@ GitOps continuous delivery platform for the [[cluster|Kubernetes cluster]].
|
|||
| **Tailscale URL** | https://argocd.tail8d86e.ts.net |
|
||||
| **Namespace** | `argocd` |
|
||||
| **Git Source** | `ssh://forgejo@forge.ops.eblu.me:2222/eblume/blumeops.git` |
|
||||
| **Manifests Path** | `argocd/` |
|
||||
| **Manifests Path** | `argocd/apps/` (Applications), `argocd/manifests/` (workloads) |
|
||||
|
||||
## Clusters
|
||||
|
||||
A single ArgoCD instance (on indri's minikube) manages both clusters:
|
||||
|
||||
| Cluster | Destination | Apps |
|
||||
|---------|-------------|------|
|
||||
| minikube (indri) | `https://kubernetes.default.svc` | Most services |
|
||||
| k3s ([[ringtail]]) | `https://ringtail.tail8d86e.ts.net:6443` | GPU workloads and `*-ringtail` apps |
|
||||
|
||||
## Sync Policy
|
||||
|
||||
| Application | Sync Policy | Rationale |
|
||||
|-------------|-------------|-----------|
|
||||
| `apps` | Automated | Picks up new Application manifests |
|
||||
| All workloads | Manual | Explicit control over deployments |
|
||||
All applications use **manual sync** — including the `apps` app-of-apps root. To pick up newly added Application manifests, sync `apps` explicitly:
|
||||
|
||||
## Credentials
|
||||
```bash
|
||||
argocd app sync apps
|
||||
```
|
||||
|
||||
- Admin password: 1Password (blumeops vault)
|
||||
- Git deploy key (SSH): 1Password
|
||||
This gives explicit control over every deployment; nothing rolls out on push alone.
|
||||
|
||||
## Authentication
|
||||
|
||||
- **SSO via [[authentik]]** — OIDC with a public PKCE client (`argocd`), shared by the web UI and CLI: `argocd login argocd.ops.eblu.me --sso`. The Authentik `admins` group maps to `role:admin` via the RBAC ConfigMap; the default policy grants no access.
|
||||
- **Local admin** — break-glass password in 1Password (blumeops vault), for when Authentik is down.
|
||||
|
||||
The git deploy key (SSH) is injected via [[external-secrets]].
|
||||
|
||||
## Related
|
||||
|
||||
- [[argocd-cli]] - CLI usage and deployment workflows
|
||||
- [[apps|Apps]] - Full application registry
|
||||
- [[forgejo]] - Git source
|
||||
- [[authentik]] - OIDC identity provider for SSO
|
||||
- [[federated-login]] - How authentication works across BlumeOps
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: Authentik
|
||||
modified: 2026-02-20
|
||||
modified: 2026-06-09
|
||||
last-reviewed: 2026-06-09
|
||||
tags:
|
||||
- service
|
||||
- security
|
||||
|
|
@ -42,9 +43,7 @@ Authentik configuration is managed via Blueprints (YAML) stored as a ConfigMap m
|
|||
|
||||
- **`common.yaml`** — shared identity resources (`admins` group)
|
||||
- **`mfa.yaml`** — MFA enforcement on the default authentication flow (`not_configured_action: configure`)
|
||||
- **`grafana.yaml`** — Grafana OAuth2 provider, application, and policy binding
|
||||
- **`forgejo.yaml`** — Forgejo OAuth2 provider, application, and policy binding
|
||||
- **`zot.yaml`** — Zot registry OAuth2 provider, application, and policy binding
|
||||
- One blueprint per OIDC client (provider, application, and policy binding): `grafana.yaml`, `forgejo.yaml`, `zot.yaml`, `argocd.yaml`, `jellyfin.yaml`, `mealie.yaml`, `paperless.yaml`, `heph.yaml`
|
||||
|
||||
Group membership is included in the `profile` scope claim (Authentik built-in). Services use `--group-claim-name groups` to read it.
|
||||
|
||||
|
|
@ -52,13 +51,18 @@ Blueprint file: `argocd/manifests/authentik/configmap-blueprint.yaml`
|
|||
|
||||
## OIDC Clients
|
||||
|
||||
| Client | Status |
|
||||
|--------|--------|
|
||||
| [[grafana]] | Active |
|
||||
| [[forgejo]] | Active |
|
||||
| [[zot]] | Active |
|
||||
| Client | Type |
|
||||
|--------|------|
|
||||
| [[grafana]] | Confidential |
|
||||
| [[forgejo]] | Confidential |
|
||||
| [[zot]] | Confidential |
|
||||
| [[argocd]] | Public (PKCE, shared by web UI and CLI) |
|
||||
| [[jellyfin]] | Confidential |
|
||||
| [[mealie]] | Confidential |
|
||||
| [[paperless]] | Confidential |
|
||||
| heph | Public (PKCE, with `offline_access` for spoke sync refresh tokens) |
|
||||
|
||||
Future clients: [[argocd]], [[miniflux]]
|
||||
Future clients: [[miniflux]]
|
||||
|
||||
## Secrets
|
||||
|
||||
|
|
@ -67,11 +71,10 @@ Injected via [[external-secrets]] from the "Authentik (blumeops)" 1Password item
|
|||
| 1Password Field | Purpose |
|
||||
|-----------------|---------|
|
||||
| `secret-key` | Authentik secret key |
|
||||
| `db-password` | PostgreSQL password |
|
||||
| `grafana-client-secret` | OIDC client secret for Grafana |
|
||||
| `forgejo-client-secret` | OIDC client secret for Forgejo |
|
||||
| `zot-client-secret` | OIDC client secret for Zot |
|
||||
| `api-token` | Authentik API token |
|
||||
| `postgresql-host` / `-port` / `-name` / `-user` / `-password` | PostgreSQL connection |
|
||||
| `<client>-client-secret` | OIDC client secret, one per confidential client (grafana, forgejo, zot, jellyfin, mealie, paperless) |
|
||||
|
||||
The item also holds an `api-token` field (Authentik API access for admin scripting); it is not synced into the cluster.
|
||||
|
||||
## Container Image
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: Grafana
|
||||
modified: 2026-02-28
|
||||
modified: 2026-06-09
|
||||
last-reviewed: 2026-06-09
|
||||
tags:
|
||||
- service
|
||||
- observability
|
||||
|
|
@ -25,7 +26,7 @@ Dashboards and visualization for BlumeOps observability.
|
|||
|
||||
Grafana supports two login methods:
|
||||
|
||||
- **SSO via [[authentik]]** — OIDC login through Authentik (`auth.generic_oauth`). Users click "Sign in with Authentik", authenticate at Authentik, and are redirected back as Admin.
|
||||
- **SSO via [[authentik]]** — OIDC login through Authentik (`auth.generic_oauth`). Members of the Authentik `admins` group get the Admin role; everyone else gets Viewer (`role_attribute_path` in `grafana.ini`).
|
||||
- **Local admin** — break-glass login using the password from 1Password ("Grafana (blumeops)"). Always available if Authentik is down.
|
||||
|
||||
The OIDC client secret is injected via [[external-secrets]] (`grafana-authentik-oauth` secret in monitoring namespace).
|
||||
|
|
@ -37,7 +38,7 @@ The OIDC client secret is injected via [[external-secrets]] (`grafana-authentik-
|
|||
| Prometheus | prometheus | `prometheus.monitoring.svc.cluster.local:9090` |
|
||||
| Loki | loki | `loki.monitoring.svc.cluster.local:3100` |
|
||||
| Tempo | tempo | `tempo.monitoring.svc.cluster.local:3200` |
|
||||
| TeslaMate | postgres | `blumeops-pg-rw.databases.svc.cluster.local:5432` |
|
||||
| TeslaMate | postgres | `pg.ops.eblu.me:5434` (TeslaMate's database on [[ringtail]], via Caddy L4) |
|
||||
|
||||
## Dashboard Provisioning
|
||||
|
||||
|
|
@ -49,13 +50,9 @@ Optional annotation: `grafana_folder: "FolderName"`
|
|||
|
||||
## Key Dashboards
|
||||
|
||||
- macOS System - Host metrics for indri
|
||||
- Minikube - Kubernetes cluster overview
|
||||
- Borgmatic Backups - Backup status and trends
|
||||
- Services Health - HTTP probe results
|
||||
- Docs APM - Request rate, latency, cache for docs.eblu.me
|
||||
- Fly.io Proxy Health - Aggregate proxy health across all upstream services
|
||||
- TeslaMate (18 dashboards) - Vehicle data
|
||||
Provisioned dashboards live in `argocd/manifests/grafana-config/dashboards/` (one ConfigMap per dashboard). Coverage as of 2026-06: alerts, borgmatic, CV APM, devpi, docs APM, fly.io proxy, forgejo, frigate, jellyfin, kubernetes, loki, macOS (indri host), postgresql, ringtail, shower APM, sifaka disks, snowflake proxy, tempo, transmission, zot.
|
||||
|
||||
TeslaMate's dashboards are not in the repo — an init container fetches them from the forge mirror at a pinned tag (`TESLAMATE_VERSION` in `argocd/manifests/grafana/deployment.yaml`).
|
||||
|
||||
## Related
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue