blumeops

Author	SHA1	Message	Date
Erich Blume	922265c88f	Add changelog fragment for grafana doc review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 07:28:47 -08:00
Erich Blume	8d1e98617b	Review build-grafana-container docs: stamp reviewed, fix cross-links Also fix stale grafana.md reference card (Helm → Kustomize). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 07:28:06 -08:00
Erich Blume	02eb169403	Pin blumeops-pg to PostgreSQL 18.3 Replace floating :18 tag with pinned :18.3 (upstream out-of-cycle release fixing 18.2 regressions). Stamps service as reviewed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 16:25:32 -08:00
Erich Blume	2312e5fbf8	Update ringtail flake inputs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 15:22:29 -08:00
Erich Blume	7cecaf0471	Review forgejo-runner docs: stamp reviewed, fix cross-links Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 15:10:20 -08:00
Erich Blume	776caa87f5	Sync Frigate zone coordinates from live API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 07:52:09 -08:00
Forgejo Actions	fa223f8e3b	Update docs release to v1.11.5 - Built changelog from towncrier fragments [skip ci]	2026-02-26 07:56:02 -08:00
Erich Blume	be3cdad1cb	Add HA for CV and Docs: zero-downtime deploys (#273 ) v1.11.5 ## Summary - Set `replicas: 2` with `maxUnavailable: 0` / `maxSurge: 1` on CV and Docs deployments so rolling updates never drop below 2 ready pods - Add PodDisruptionBudgets (`minAvailable: 1`) to protect against node drains and cluster maintenance - Add Fly.io cache purge step to `cv-deploy.yaml` workflow (docs already had this) so CV deploys don't serve stale cached content ## Deployment and Testing - [ ] `argocd app diff cv` / `argocd app diff docs` from branch - [ ] Deploy from branch: `argocd app set cv --revision feature/ha-cv-docs-zero-downtime && argocd app sync cv` - [ ] Verify 2 pods running: `kubectl get pods -n cv --context=minikube-indri` - [ ] Test rolling restart: `kubectl rollout restart deployment/cv -n cv --context=minikube-indri` - [ ] During rollout, confirm continuous availability via `curl -I https://cv.eblu.me` - [ ] After merge: reset ArgoCD to main, re-sync both apps Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/273	2026-02-26 07:53:21 -08:00
Erich Blume	8e9d89ca73	docs-review: print file path instead of content for LLM usage The LLM should read the file itself using its tools rather than receiving it inline in the task output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 07:24:37 -08:00
Erich Blume	9a7acffa26	Review manage-forgejo-mirrors doc: clarify cron default, stamp reviewed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 07:17:18 -08:00
Erich Blume	fb83c5c577	Add explicit ExternalSecret defaults for SSA sync parity The external-secrets webhook injects conversionStrategy, decodingStrategy, and metadataPolicy defaults on admission. Declaring them explicitly prevents ArgoCD SSA from flagging the resource as OutOfSync. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 07:02:54 -08:00
Erich Blume	db561c6b0e	Upgrade ArgoCD v3.2.6 → v3.3.2 with Server-Side Apply (#272 ) ## Summary - Upgrade ArgoCD from v3.2.6 to v3.3.2 - Enable `ServerSideApply=true` sync option (required by v3.3 — ApplicationSet CRD exceeds client-side apply annotation limit) - Update service-versions.yaml with review for argocd and 1password-connect ## Breaking changes reviewed - Server-Side Apply required: Added to syncOptions ✅ - Source Hydrator git notes: Not used — N/A - Application path cleaning removed: Not used — N/A - Settings API field restriction: Authenticated access only — N/A ## Deployment and Testing - [ ] Sync the `apps` app first (picks up SSA syncOption change) - [ ] `argocd app set argocd --revision feature/argocd-v3.3.2` - [ ] `argocd app sync argocd` - [ ] Verify all argocd pods running with v3.3.2 images - [ ] Verify other apps still sync correctly - [ ] After merge: `argocd app set argocd --revision main && argocd app sync argocd` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/272	2026-02-26 06:51:50 -08:00
Erich Blume	95c8424e62	Add Transmission metrics exporter and Grafana dashboard (#271 ) ## Summary - Add `metalmatze/transmission-exporter` as a sidecar container in the torrent deployment, exposing Prometheus metrics on port 19091 - Add metrics port to the torrent service for Prometheus scraping - Add Prometheus scrape job targeting the transmission exporter - Create Grafana dashboard with: - Overview stats (download/upload speed, active/total torrents) - Transfer speed timeseries (download + upload over time) - Transfer volume stats (total downloaded/uploaded in selected range) - Per-torrent download and upload rate timeseries - Per-torrent details table (ratio, uploaded, percent done) ## Deployment and Testing - [ ] Sync ArgoCD `torrent` app from branch — verify exporter sidecar starts - [ ] Verify exporter metrics: `kubectl exec` into pod, `curl localhost:19091/metrics` - [ ] Verify Prometheus scrapes it: check targets at prometheus.ops.eblu.me - [ ] Open Grafana, find "Transmission" dashboard, verify panels populate - [ ] Sync ArgoCD `prometheus` app from branch - [ ] Sync ArgoCD `grafana-config` app from branch Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/271	2026-02-25 22:23:33 -08:00
Erich Blume	03d71544ec	Add multi-cluster observability with ringtail metrics and dashboards (#270 ) ## Summary - Add `cluster` label (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s metrics/logs, and Alloy host metrics/logs - Deploy kube-state-metrics on ringtail's k3s cluster (ArgoCD app + manifests) - Deploy Alloy on ringtail to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki - Replace single-cluster "Minikube Kubernetes" and "K8s Services Health" dashboards with: - Kubernetes Clusters dashboard — multi-cluster with `cluster` and `namespace` template variables - Ringtail (k3s) dashboard — dedicated ringtail view with GPU usage panels ## Deployment and Testing 1. Sync `apps` on indri ArgoCD to pick up new app definitions (`kube-state-metrics-ringtail`, `alloy-ringtail`) 2. Sync `prometheus` → verify `cluster` label on scraped metrics 3. Sync `alloy-k8s` → verify `cluster=indri` on remote-written metrics and logs 4. Run `mise run provision-indri -- --tags alloy` → verify `cluster=indri` on host Alloy metrics/logs 5. Sync `kube-state-metrics-ringtail` → verify pods running on ringtail 6. Sync `alloy-ringtail` → verify pods running, check Prometheus for `kube_pod_info{cluster="ringtail"}` 7. Sync `grafana-config` → verify dashboards appear, cluster variable populates both values 8. Check Loki for `{cluster="ringtail"}` logs from ringtail pods ## Notes - Alloy on ringtail uses `insecure_skip_verify=true` for TLS to Prometheus/Loki (Tailscale-managed certs not in container trust store) — tighten later - DNS resolution for `*.tail8d86e.ts.net` from ringtail pods depends on CoreDNS inheriting host's MagicDNS resolver; may need CoreDNS forwarding rules if pods can't resolve - The old services dashboard (blackbox probes) is removed — those probes are still running in alloy-k8s and the data is still in Prometheus, just not in a dedicated dashboard Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/270	2026-02-25 22:01:00 -08:00
Erich Blume	2243f2e0a1	Filter driveway zone to person/dog/cat only in Frigate Parked car was being re-detected every few minutes at night due to IR illumination noise triggering motion detection. Restrict the driveway zone to [person, dog, cat] so cars and birds no longer create events there. Cars still alert via the driveway_entrance zone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 20:45:07 -08:00
Erich Blume	84338c32c2	Add authenticated GitHub PAT for Forgejo mirror sync (#269 ) ## Summary - mirror-create: Auto-includes GitHub PAT from 1Password for authenticated upstream fetches at mirror creation time - mirror-update-pats: New mise task that SSHes into indri and rewrites the git remote URL in every GitHub mirror's bare repo config to embed the PAT. Idempotent, supports `--dry-run` - app.ini.j2: Explicit `[mirror]` section with `DEFAULT_INTERVAL = 8h` and `MIN_INTERVAL = 10m` (bakes in the defaults for visibility) - manage-forgejo-mirrors: New how-to doc covering mirror creation, PAT storage, the `mirror-update-pats` task, and the full 20-day PAT rotation procedure ## Context GitHub tightened unauthenticated rate limits for git clone/fetch in May 2025. With 23 GitHub mirrors syncing every 8 hours, authenticated fetches avoid throttling. The PAT is stored in 1Password (`Forgejo Secrets` → `github-mirror-pat`) and has been applied to all existing mirrors. ## Deployment and Testing - [x] `mirror-update-pats` dry-run verified (23 mirrors detected) - [x] `mirror-update-pats` applied to all 23 GitHub mirrors on indri - [x] Idempotency confirmed (re-run shows 0 updated, 23 skipped) - [ ] Provision indri with `--tags forgejo` to apply `[mirror]` config - [ ] Trigger a manual mirror sync and verify success in Forgejo UI Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/269	2026-02-25 20:20:23 -08:00
Erich Blume	23dc79058e	Bake default display options into ai-docs mise task The --style=header --color=never --decorations=always flags are now built into the script so callers can just run `mise run ai-docs`. Also adds a note to CLAUDE.md to never truncate the output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 17:42:47 -08:00
Erich Blume	de54b4e33d	Port CloudNative-PG off Helm to direct release manifest (#268 ) ## Summary - Point ArgoCD app directly at forge-mirrored upstream repo (`mirrors/cloudnative-pg`) instead of the Helm charts repo - Use `directory.include` to select the specific release manifest (`cnpg-1.27.1.yaml`) from the `releases/` directory - No vendored files, no Helm — upgrades are a two-line change (`targetRevision` + `directory.include`) - Delete unused `values.yaml` (was empty, all Helm defaults) ## Deployment and Testing - [ ] Register mirror repo in ArgoCD: `argocd repo add ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git --ssh-private-key-path <key>` - [ ] `argocd app set cloudnative-pg --revision feature/cnpg-direct-source && argocd app sync cloudnative-pg` - [ ] Verify operator pod running: `kubectl get pods -n cnpg-system --context=minikube-indri` - [ ] Verify CRDs exist: `kubectl get crd --context=minikube-indri \| grep cnpg` - [ ] Verify existing clusters healthy: `kubectl get clusters -A --context=minikube-indri` - [ ] After merge: `argocd app set cloudnative-pg --revision main && argocd app sync cloudnative-pg` ## Notes - The forge mirror was created via `mise run mirror-create` from `https://github.com/cloudnative-pg/cloudnative-pg.git` - ArgoCD may need the mirror repo added to its known repositories if the credential template doesn't already match `mirrors/*` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/268	2026-02-25 17:37:53 -08:00
Erich Blume	285ad4141f	Fix Frigate detection events rate metric name in Grafana dashboard The panel queried frigate_camera_events but the actual metric exposed by Frigate is frigate_camera_events_total with a "camera" label (not "camera_name"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 16:51:57 -08:00
Forgejo Actions	4736c7e9bd	Update docs release to v1.11.4 - Built changelog from towncrier fragments [skip ci]	2026-02-25 07:04:23 -08:00
Erich Blume	e273f399ea	Review 3 how-to docs and fix update-tailscale-acls inaccuracies v1.11.4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:02:49 -08:00
Erich Blume	5f9bc20345	Fix mirror org refs in ArgoCD apps and widen credential template (#266 ) ## Summary - Widen `repo-creds-forge` URL prefix from `/eblume/` to host-wide `/` so it matches repos in all forge orgs (fixes `mirrors/` repos not getting SSH credentials) - Update 8 ArgoCD app definitions from `eblume/<mirror>` → `mirrors/<mirror>` (immich-charts, cloudnative-pg-charts, external-secrets, connect-helm-charts) - Fix stale alloy clone comment in Ansible defaults - Bump immich v2.5.2 → v2.5.6 (bug-fix patches only) - Update ArgoCD README bootstrap command and credential docs ## Context Mirrors were migrated from `forge.ops.eblu.me/eblume/` to `forge.ops.eblu.me/mirrors/` in commit ``cd57814``. Container Dockerfiles and image tags were updated, but ArgoCD app definitions and the repo credential template were missed, causing `ComparisonError` on apps that source Helm charts from mirrored repos. ## Deployment 1. Sync the ArgoCD `argocd` app first (picks up the widened credential template) 2. Sync the `apps` app (picks up new repo URLs for all 8 apps) 3. Verify immich resolves its ComparisonError: `argocd app get immich` 4. Sync immich to deploy v2.5.6: `argocd app sync immich` 5. Spot-check: `argocd app get external-secrets`, `argocd app get cloudnative-pg`, `argocd app get 1password-connect` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/266	2026-02-25 06:55:53 -08:00
Erich Blume	5c31b6b42a	Add changelog fragments for C0 commits in this session Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:21:00 -08:00
Erich Blume	4f8f2985c1	Update prometheus and teslamate image tags after mirror migration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:18:15 -08:00
Erich Blume	c1f4f0169b	Add mise-tasks reference card and include in ai-docs Categorized reference of all mise tasks with descriptions. Added to the tools section of the reference index and to the ai-docs context priming script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:17:34 -08:00
Erich Blume	33e3e4e098	Add mirror-create mise task for upstream mirrors Creates mirrors in the mirrors/ Forgejo org via API. Supports GitHub, Codeberg, and generic git URLs with auto-detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:15:17 -08:00
Erich Blume	e0f9ebebdf	Update homepage, navidrome, ntfy, miniflux image tags after mirror migration Prometheus and teslamate builds still in progress — will update in a follow-up commit once their `33b7f0f` tags land. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:06:08 -08:00
Erich Blume	33b7f0f353	Switch prometheus, teslamate, miniflux to forge mirrors All checks were successful Build Container / detect (push) Successful in 2s Details Build Container (Nix) / detect (push) Successful in 2s Details Build Container (Nix) / build (miniflux) (push) Successful in 3s Details Build Container (Nix) / build (prometheus) (push) Successful in 3s Details Build Container (Nix) / build (teslamate) (push) Successful in 2s Details Build Container / build (miniflux) (push) Successful in 1m14s Details Build Container / build (teslamate) (push) Successful in 13m42s Details Build Container / build (prometheus) (push) Successful in 15m20s Details Created miniflux mirror at mirrors/miniflux. All three containers now clone from forge.ops.eblu.me/mirrors/ instead of GitHub directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:01:08 -08:00
Erich Blume	34a1314f8d	Document AirPlay cross-VLAN firewall rules and fix rule ordering AirPlay from Main to IoT VLAN (Samsung Frame TV) required adding established/related, AirPlay port, and dynamic reverse port rules — but the root cause was rule ordering (allows appended after blocks). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 20:49:31 -08:00
Erich Blume	cd578144f7	Migrate upstream mirrors to mirrors/ Forgejo org (#265 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container (Nix) / build (homepage) (push) Successful in 3s Details Build Container (Nix) / build (navidrome) (push) Successful in 3s Details Build Container (Nix) / build (ntfy) (push) Successful in 8s Details Build Container / detect (push) Successful in 42s Details Build Container / build (navidrome) (push) Successful in 9m37s Details Build Container / build (homepage) (push) Successful in 9m56s Details Build Container / build (ntfy) (push) Successful in 2m35s Details ## Summary - Created `mirrors` Forgejo organization for upstream mirror repos - Transferred 22 mirror repos from `eblume/` to `mirrors/` (mirror sync config preserved) - Deleted unused repos: hajimari, hister - Updated all container build URLs (homepage, navidrome, ntfy Dockerfiles + nix) - Updated documentation references (migrate-forgejo-from-brew, upstream-fork-strategy, fix-ntfy-nix-version) - `dotfiles` intentionally kept under `eblume/` per user request - `devpi` transferred to `mirrors/` Repos remaining under `eblume/`: blumeops, cv, mcquack, dotfiles ## Cleanup TODO - [ ] Delete temp Forgejo API token "claude-migration-temp" (Settings > Applications) ## Test Plan - [x] Verified mirror config (mirror=true, original_url) survived transfer on test repo (tesla_auth) - [x] All pre-commit hooks pass (including container-version-check, docs-check-links) - [ ] Verify a mirror repo sync runs successfully after transfer (check mirrors/authentik or similar) - [ ] Rebuild containers from branch to verify Dockerfile URLs resolve Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/265	2026-02-24 20:43:14 -08:00
Erich Blume	5c75419c85	Reworked README	2026-02-24 15:20:26 -08:00
Erich Blume	61ee6a4d38	Fix Grafana ConfigMap labels lost in configMapGenerator migration The hand-written configmap.yaml had app.kubernetes.io/name and app.kubernetes.io/instance labels; configMapGenerator dropped them. Add options.labels to both generator entries to restore parity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 14:46:20 -08:00
Erich Blume	9b44a8ec51	Add kustomize images: and configMapGenerator: across services (#264 ) ## Summary - Move hardcoded image tags to kustomization.yaml `images:` transformer across 22 services — image names in manifests become version-agnostic templates, with tags centralized in one place per service - Replace hand-written ConfigMap manifests with `configMapGenerator:` in 12 services — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes - Create new `kustomization.yaml` for forgejo-runner and nvidia-device-plugin (switches ArgoCD from directory mode to kustomize mode, rendered output identical) ### Services modified Images only (8): cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent Images + configMapGenerator (10): alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus Images only, no configMapGenerator (4): authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is) Skipped entirely (6): argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml) ### What changes at deploy time - images: — no functional diff, `kustomize build` produces identical output with tags - configMapGenerator: — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change ## Test plan - [x] `kubectl kustomize` builds all 30 service directories successfully - [x] Image tags verified in rendered output for all modified services - [x] ConfigMap hash suffixes verified in rendered output - [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names - [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.) - [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes - [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264	2026-02-24 14:25:19 -08:00
Erich Blume	86aeb60ec9	Fix TeslaMate dashboards: add database to PostgreSQL jsonData Grafana 12.x's grafana-postgresql-datasource plugin requires the database name in jsonData, not just the top-level database field. Without it, the frontend blocks all queries with "no default database configured", causing all TeslaMate panels to show "No Data." Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 13:49:07 -08:00
Erich Blume	495c3e8496	Fix Grafana OAuth role mapping from Authentik groups The INI parser was stripping outer single quotes from role_attribute_path = 'Admin', causing Grafana to evaluate 'Admin' as a JMESPath field identifier instead of a string literal. This resulted in all OAuth users getting the default Viewer role. Replaced with a proper group-based expression that checks for the 'admins' Authentik group and maps to Admin/Viewer accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 13:41:08 -08:00
Erich Blume	4acd2e58d4	Update prometheus and grafana to main-SHA container tags Prometheus: v3.9.1-74029e1 [branch] -> v3.9.1-2ba5d8a [main] Grafana: v12.3.3-09ac36b [branch] -> v12.3.3-d05d2fb [main] These images were built during PR development and referenced branch commits that won't survive branch cleanup. The [main] tags are identical rebuilds from the squash-merge commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 09:58:09 -08:00
Erich Blume	1b9f706a30	Document container tag provenance and enhance container-list (#263 ) ## Summary After investigating deployed container images, confirmed that squash-merging PRs orphans the commit SHAs embedded in container image tags. Two of our currently deployed images (prometheus, grafana) reference branch commits not on main. This PR: - Documents the squash-merge SHA orphan problem and the post-merge workflow in [[build-container-image]] - Adds step 9 to the C1 process: after merging a PR that changes `containers/`, do a follow-up C0 to point manifests at the rebuilt `[main]` tag - Rewrites `container-list` as a `uv run --script` (typer + rich + httpx) - Adds optional container name filter (`mise run container-list prometheus` shows 10 tags instead of 4) - Annotates every tag with `[main]` or `[branch]` based on git commit ancestry ## Test plan - [x] `mise run container-list` — all containers shown with `[main]`/`[branch]` hints - [x] `mise run container-list prometheus` — filtered view, more tags, correctly shows `[main]` and `[branch]` - [x] `mise run container-list nonexistent` — error message with exit code 1 - [x] Pre-commit hooks pass Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/263	2026-02-24 09:54:58 -08:00
Erich Blume	2ba5d8a8aa	Port Prometheus to local container build (#262 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (prometheus) (push) Successful in 2s Details Build Container / build (prometheus) (push) Successful in 7s Details ## Summary - Add three-stage Dockerfile for Prometheus v3.9.1 (Node UI → Go binaries → Alpine runtime) - Produces `prometheus` and `promtool` binaries with embedded web UI assets - Follows navidrome/ntfy pattern for supply chain control via Zot registry ## Deployment and Testing - [ ] `dagger call build --src=. --container-name=prometheus` succeeds - [ ] Container reports correct version via `prometheus --version` - [ ] `promtool --version` works - [ ] Update statefulset image reference after successful build - [ ] Deploy from branch: `argocd app set prometheus --revision <branch> && argocd app sync prometheus` - [ ] Health probes pass (`/-/healthy`, `/-/ready`) - [ ] Web UI loads, scrape targets work, remote write functions Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/262	2026-02-24 09:15:57 -08:00
Erich Blume	b1ba96f6d6	Review migrate-grafana-to-authentik: fix file paths, add last-reviewed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 07:29:41 -08:00
Erich Blume	588da6bbcb	Review cloudnative-pg: v1.28.1 is current, no upgrade needed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 07:27:25 -08:00
Forgejo Actions	2f78d180e8	Update docs release to v1.11.3 - Built changelog from towncrier fragments [skip ci]	2026-02-23 21:04:33 -08:00
Erich Blume	9b4951bf94	Improve Mikado process: cycle discipline, reset rigor, --resume enhancements (#261 ) v1.11.3 ## Summary - End-of-cycle prompting: After closing a leaf node and pushing, the agent should prompt the user to review and suggest ending the session rather than rushing into the next leaf - Reset rigor: Reinforced that errors during impl should trigger a branch reset + plan update (not fix-forward). Documented the `git log --oneline --not main` → `git reset --hard` → `git cherry-pick` pattern with clear threshold guidance - `--resume` shows PR number: Queries the Forgejo API for open PRs matching the branch, displays number/title/URL and a hint to run `pr-comments` - `--resume` checks git stash: Shows stash entries as a non-presumptive hint — informs without assuming they apply ## Test plan - [ ] `mise run docs-mikado --resume` runs without errors (no active chains case) - [ ] On a mikado branch with an open PR, verify PR info is shown - [ ] With stashed work, verify stash entries are displayed - [ ] Review agent-change-process.md for clarity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/261	2026-02-23 21:03:27 -08:00
Erich Blume	d05d2fbaff	C2: Upgrade Grafana to 12.x with Nix container and Kustomize (#260 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 1s Details Build Container (Nix) / build (grafana) (push) Successful in 2s Details Build Container / build (grafana) (push) Successful in 7s Details ## Summary Mikado chain to upgrade Grafana from 11.4.0 (Helm chart) to 12.x with: - Home-built Nix container image (`forge.ops.eblu.me/eblume/grafana`) - Kustomize manifests replacing the Helm chart - Single-source ArgoCD app ## Chain Goal: `upgrade-grafana` Leaves: `build-grafana-container`, `kustomize-grafana-deployment` Track with: `mise run docs-mikado upgrade-grafana` ## Test plan - [ ] Container builds successfully via Nix - [ ] Container pushed to registry - [ ] Kustomize manifests produce equivalent resources to current Helm - [ ] Pod runs, UI loads, OIDC works, datasources healthy - [ ] `mise run services-check` passes Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/260	2026-02-23 18:07:18 -08:00
Erich Blume	9b419abf24	Update RUNNER_LABELS to use runner-job-image:v0.19.11-4c5e0f0 Now that the image is built under the new name, point the forgejo runner at it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 17:47:14 -08:00
Erich Blume	4c5e0f0d16	Rename containers/forgejo-runner to runner-job-image All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (runner-job-image) (push) Successful in 2s Details Build Container / build (runner-job-image) (push) Successful in 1m42s Details The forgejo-runner container is the CI job execution environment (Dagger, ArgoCD CLI, etc.), not the runner daemon itself. Rename to runner-job-image to fix the version-check false positive (Dagger 0.19.11 vs daemon 12.7.0) and clarify the distinction. RUNNER_LABELS still references the old image name — will update after building the image under the new name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 17:44:51 -08:00
Erich Blume	179eca2070	Fix container-build-and-release to resolve refs to full SHA actions/checkout treats short SHAs as branch name patterns, causing fetch failures. Always resolve --ref to a full 40-char SHA before dispatching the workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 17:27:18 -08:00
Erich Blume	7641018c6a	Fix container build workflows to checkout dispatch ref When manually dispatching a container build with --ref, the build job now checks out the specified commit instead of the branch HEAD. This allows building containers from feature branches before merging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 17:24:32 -08:00
Erich Blume	c70aff256a	Fix mikado-branch-invariant-check not validating incoming commits The commit-msg hook had `pass_filenames: false`, which prevented pre-commit from passing the commit message file path. Without it, the hook only validated existing branch history and never checked the incoming commit against ordering rules — plan-after-impl was silently accepted. Remove `pass_filenames: false` so the pending commit is included in invariant validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 16:32:34 -08:00
Erich Blume	66b5b32f1d	Formalize C0/C1/C2 change classification (#259 ) ## Summary - C0 (Quick Fix): Now explicitly allows direct-to-main commits with no PR required — for low-risk, fix-forward-safe changes - C1 (Human Review): New docs-first workflow with branch deployment (ArgoCD `--revision`, Ansible from checkout). Includes upgrade criteria for escalation to C2 - C2 (Mikado Chain): Introduces the Mikado Branch Invariant — strict commit ordering where card-introducing commits come first, followed by code progress, followed by card closures. Branch resets required when new prerequisites are discovered Updates CLAUDE.md rules (3, 4, 8, 9) to reflect that C0 bypasses branching/PR requirements. Also updates ai-assistance-guide, how-to index, and docs-mikado task description. ## Files changed - `CLAUDE.md` — rules and classification table - `docs/how-to/agent-change-process.md` — full process rewrite - `docs/tutorials/ai-assistance-guide.md` — branching and pitfalls sections - `docs/how-to/how-to.md` — index description - `mise-tasks/docs-mikado` — task description - `docs/changelog.d/formalize-change-classification.doc.md` — changelog fragment Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/259	2026-02-23 16:19:54 -08:00
Erich Blume	f05e5cccdf	Review Grafana: replace Helm upgrade plan with C2 Mikado chain (#258 ) ## Summary - Delete the old 3-phase Helm chart upgrade plan (predates Mikado system) - Create C2 Mikado chain with goal card `upgrade-grafana` and two leaf prereqs: - `kustomize-grafana-deployment` — convert Helm to kustomize manifests - `build-grafana-container` — home-built Grafana 12.x image (no upstream containers) - Record first-ever Grafana review: currently at v11.4.0 on Helm chart 8.8.2 - Update service-versions.yaml, how-to index, and plans index ## Service Review Findings - Grafana is healthy and synced in ArgoCD - Running v11.4.0, latest upstream is 12.3.3 - Breaking changes for 12.x are low-risk (React panels only, UIDs compliant) - PVC is disposable — dashboards and datasources are all config-provisioned ## Deployment and Testing - [ ] No deployment needed — documentation-only change - [ ] `docs-check-links` passes - [ ] `docs-check-index` passes Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/258	2026-02-23 15:06:00 -08:00

1 2 3 4 5 ...

522 commits