blumeops

Author	SHA1	Message	Date
Erich Blume	fd0bebb0fc	Localize authentik-redis container (#309 ) All checks were successful Build Container / detect (push) Successful in 3s Details Build Container / build-dockerfile (alloy) (push) Successful in 12s Details Build Container / build-dockerfile (ntfy) (push) Successful in 11s Details Build Container / build-nix (alloy) (push) Successful in 20s Details Build Container / build-nix (authentik) (push) Successful in 6m10s Details Build Container / build-nix (authentik-redis) (push) Successful in 20s Details Build Container / build-nix (ntfy) (push) Successful in 6s Details ## Summary - Replace upstream `docker.io/library/redis:7-alpine` (Redis 7.4.8) with a nix-built container using Redis 8.2.3 from nixpkgs - Introduce attached service pattern: `parent` field in service-versions.yaml, `<parent>-<component>` naming convention, and `assert pkgs.redis.version == version` in default.nix to prevent silent version drift on `flake.lock` updates - Document the pattern in [[review-services]] so future attached services slot in cleanly - Backfill `parent: grafana` on existing `grafana-sidecar` entry ## Version drift protection 1. `flake.lock` update bumps nixpkgs redis → `assert` in `default.nix` breaks `nix-build` 2. Developer updates `version` in `default.nix` → prek's `container-version-check` demands matching `service-versions.yaml` update 3. Both must agree before commit succeeds ## Test plan - [ ] Build container from branch on ringtail (`mise run container-build-and-release authentik-redis`) - [ ] Update kustomization `newTag` to branch-built image tag - [ ] Sync authentik ArgoCD app from branch (`argocd app set authentik --revision localize-redis && argocd app sync authentik`) - [ ] Verify Authentik login, session persistence, and task queue still work - [ ] After merge: C0 follow-up to update `newTag` to the main-built image tag 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #309	2026-03-24 13:27:36 -07:00
Erich Blume	fc45989a6c	Decommission JobSync service (#308 ) All checks were successful Build Container / detect (push) Successful in 3s Details ## Summary - Remove all JobSync infrastructure: ArgoCD app, k8s manifests, container build (nix), Caddy reverse proxy entry, Homepage dashboard entry, service-versions tracking, and all documentation - Runtime teardown already completed: ArgoCD app cascade-deleted (removes deployment, PVC, service, ingress, external-secret), forge mirror deleted, 1Password item archived, local clone removed ## Motivation Replacing JobSync with a datasette-based job tracking pipeline driven by mise tasks and a Claude agent frontend. JobSync's Next.js server actions don't expose a useful API for automation. ## Remaining manual steps after merge - Provision Caddy to remove the stale proxy route: `mise run provision-indri -- --tags caddy` - Sync Homepage: `argocd app sync homepage` - Verify namespace cleanup on ringtail: `kubectl get ns jobsync --context=k3s-ringtail` (should be gone) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #308	2026-03-24 08:44:23 -07:00
Erich Blume	0d422f5234	Update tooling dependencies (March 2026) (#307 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 2m51s Details ## Summary Monthly tooling dependency update per [[update-tooling-dependencies]]. - Prek hooks: trufflehog v3.93.4→v3.94.0, ruff v0.15.2→v0.15.7, shfmt v3.12.0-2→v3.13.0-1, ansible-lint floor→26.3.0, ansible-core floor→2.18 - Fly.io proxy: nginx 1.28.2→1.29.6, Grafana Alloy v1.13.1→v1.14.1 - Forgejo workflows: actions/checkout v4.3.1→v6.0.2 (SHA-pinned across all 5 workflows) - Mise tasks: tightened Python lower bounds — rich≥14.0.0, typer≥0.24.0, httpx≥0.28.1, pyyaml≥6.0.2 ## Test plan - [x] `prek run --all-files` passes - [ ] Verify Fly.io deploy succeeds after merge (nginx minor bump + Alloy bump) - [ ] Spot-check a workflow run with the new actions/checkout v6 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #307	2026-03-24 08:11:46 -07:00
Erich Blume	0698013355	Review ArgoCD config tutorial: fix sync policy, typo, add cross-references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 07:55:00 -07:00
Erich Blume	bec554110a	Upgrade Frigate 0.17.0-rc2 → 0.17.1, add motion retention tier Bump from RC to latest stable (security fixes for config endpoint and cross-camera auth). Add new 0.17 motion retention tier at 365 days, reduce continuous from 180 to 30 days. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 07:30:18 -07:00
Erich Blume	9024d41230	Add Grafana alerts dashboard for mobile-friendly alert overview Two panels: currently firing alerts (firing/pending/noData/error) and recent state changes. Refreshes every 30s. Uses Grafana's built-in alertlist panel type — no datasource needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 21:16:54 -07:00
Erich Blume	9efe5c97fe	Fix authentik worker OOMKill: limit concurrency to 2 Dramatiq defaults to one worker process per CPU core. On ringtail (16 cores) this spawned 16 processes, each loading the full Django app, exceeding the 1Gi memory limit and causing a crash loop (228 restarts over 7 days). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 21:05:16 -07:00
Erich Blume	bd0ff30d3f	Unify container build workflows (#306 ) All checks were successful Build Container / detect (push) Successful in 3s Details ## Summary - Merges `build-container.yaml` and `build-container-nix.yaml` into a single workflow - Detect job classifies each changed container by presence of `Dockerfile` and/or `default.nix` - Dockerfile containers build on `k8s` (indri) via Dagger; Nix containers build on `nix-container-builder` (ringtail) via nix-build + skopeo - Containers with both build files (alloy, nettest, ntfy) get built on both runners ## Test plan - [ ] Push a change to a Dockerfile-only container (e.g. grafana) — verify it builds on k8s only - [ ] Push a change to a nix-only container (e.g. jobsync) — verify it builds on nix-container-builder only - [ ] Push a change to a dual container (e.g. ntfy) — verify it builds on both runners - [ ] Test workflow_dispatch with a specific container name 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #306	2026-03-23 20:55:50 -07:00
Erich Blume	d1dac0c241	Upgrade ntfy v2.17.0 → v2.19.2 (#305 ) All checks were successful Build Container (Nix) / detect (push) Successful in 1s Details Build Container / detect (push) Successful in 3s Details Build Container (Nix) / build (ntfy) (push) Successful in 4s Details Build Container / build (ntfy) (push) Successful in 11s Details ## Summary - Upgrade ntfy from v2.17.0 to v2.19.2 - Update Dockerfile and Nix build definitions with new version, commit SHA, and hashes - Add `subPackages = [ "." ]` to Nix build to handle new `tools/loadtest` module in upstream ## Upstream changes (no breaking changes) - v2.18.0: Experimental PostgreSQL backend support - v2.19.0: PostgreSQL read replica support, notification sound throttling - v2.19.1-2: PostgreSQL bug fixes, web push race condition fix ## Test plan - [ ] Container builds complete on Forgejo Actions (both Dockerfile and Nix) - [ ] Update kustomization.yaml `newTag` to the built nix image tag - [ ] `argocd app set ntfy --revision upgrade/ntfy-v2.19.2 && argocd app sync ntfy` - [ ] Verify ntfy health: `curl https://ntfy.ops.eblu.me/v1/health` - [ ] Send a test notification Reviewed-on: #305	2026-03-23 10:32:06 -07:00
Erich Blume	06e721841c	Review 12 reference docs: fix stale image refs, expand stubs, add cross-refs Replace hardcoded image tags in Quick Reference tables with pointers to kustomization manifests (tags drift with every container release). Fix Prometheus CNPG scrape target, remove misleading .ts.net URLs, expand external-secrets stub, add backup/disaster-recovery cross-references. Limit doc-reviewer agent to one doc per cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 09:51:57 -07:00
Erich Blume	3750428b58	Fix ArgoCD apps app permanent OutOfSync Remove `group: ""` from ignoreDifferences in tailscale-operator and tailscale-operator-ringtail — ArgoCD normalizes away the empty string field, so the live state never matches git. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 20:42:37 -07:00
Erich Blume	e9b8e3d80b	Revert Tailscale operator to v1.94.2 — images not yet published v1.96.3 exists as a GitHub release but Docker Hub images for both tailscale/tailscale and tailscale/k8s-operator haven't been published yet (v1.94.2 is still latest). Revert the image tags; the fly/start.sh `tailscale wait` improvement and review date stamps are retained. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 19:41:40 -07:00
Erich Blume	2e46f99820	Upgrade Tailscale operator v1.94.2 → v1.96.3 (#304 ) Some checks failed Deploy Fly.io Proxy / deploy (push) Failing after 7m0s Details ## Summary - Bump Tailscale operator, proxy containers, and init containers from v1.94.2 to v1.96.3 across both clusters (indri + ringtail via shared base kustomization) - Replace hand-rolled `until tailscale status` polling loop in `fly/start.sh` with `tailscale wait --timeout 60s` (new in v1.96.2) - Stamp kube-state-metrics review date (already current at v2.18.0) ## Notable upstream changes (v1.94.2 → v1.96.3) - Go upgraded from 1.25 to 1.26 - `tailscale wait` command — blocks until daemon is running + interface has IP - AuthKey policy now applies only when users are not logged in (behavioral change) - Peer Relay improvements (metrics, EC2 IMDS, UDP socket scaling) - UPnP stability fixes ## Deploy plan 1. Merge PR 2. Sync tailscale-operator on indri: `argocd app sync tailscale-operator` 3. Sync tailscale-operator on ringtail: `argocd app sync tailscale-operator-ringtail --server ringtail...` 4. Verify proxy pods roll with new image: `kubectl --context=minikube-indri -n tailscale get pods` 5. Verify ingress connectivity (spot-check a few `*.tail8d86e.ts.net` services) 6. Rebuild + deploy Fly proxy container (separate step, picks up `tailscale wait` change) ## Test plan - [ ] ArgoCD diff looks clean for both apps before sync - [ ] Proxy pods on indri come up healthy with v1.96.3 images - [ ] Proxy pods on ringtail come up healthy with v1.96.3 images - [ ] Tailscale ingress services remain reachable (e.g., grafana, prometheus) - [ ] Fly proxy rebuild deploys successfully with `tailscale wait` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #304	2026-03-22 19:31:22 -07:00
Forgejo Actions	262299c82a	Update docs release to v1.14.3 - Built changelog from towncrier fragments [skip ci]	2026-03-22 18:20:41 -07:00
Erich Blume	2cb0ce4289	Review and correct Tailscale reference doc Fix wrong ACL path, add missing device tags (ringtail, per-service tags, ci-gateway, flyio-proxy), correct access matrix (PyPI→DevPI, homelab grants), add homelab→homelab SSH rule, document auto approvers section, and add last-reviewed frontmatter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 18:18:45 -07:00
Erich Blume	6d65e6928c	C2: Deploy infrastructure alerting pipeline (#303 ) ## Summary Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications. Design: - Grafana Unified Alerting evaluates rules against Prometheus/Loki - ntfy webhook contact point delivers iOS notifications - Anti-noise policy: page once per 24h per alert group - Every alert links to a runbook in `docs/how-to/alerts/` - services-check eventually queries the alerting API instead of doing its own probes Chain (bottom-up): 1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy 2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure 3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks 4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API 5. `deploy-infra-alerting` — goal card 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #303	2026-03-22 14:52:56 -07:00
Erich Blume	f1620abb17	Improve Frigate health checks to catch NFS and camera failures Replace single aggregate camera_fps check with per-camera FPS validation and NFS storage accessibility check. Motivated by an outage where Frigate API responded OK but NFS mount was inaccessible, causing "no frames" in UI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 09:55:53 -07:00
Erich Blume	613f05dfde	Add consistent OCI labels to all container Dockerfiles All checks were successful Build Container (Nix) / build (miniflux) (push) Successful in 2s Details Build Container (Nix) / build (navidrome) (push) Successful in 2s Details Build Container / build (devpi) (push) Successful in 41s Details Build Container (Nix) / build (nettest) (push) Successful in 15s Details Build Container / build (grafana-sidecar) (push) Successful in 1m27s Details Build Container / build (grafana) (push) Successful in 3m23s Details Build Container (Nix) / build (ntfy) (push) Successful in 3m19s Details Build Container (Nix) / build (prometheus) (push) Successful in 1s Details Build Container (Nix) / build (quartz) (push) Successful in 1s Details Build Container (Nix) / build (runner-job-image) (push) Successful in 1s Details Build Container (Nix) / build (teslamate) (push) Successful in 2s Details Build Container (Nix) / build (transmission) (push) Successful in 2s Details Build Container (Nix) / build (transmission-exporter) (push) Successful in 1s Details Build Container (Nix) / build (unpoller) (push) Successful in 1s Details Build Container / build (kiwix-serve) (push) Successful in 1m17s Details Build Container / build (kubectl) (push) Successful in 41s Details Build Container / build (homepage) (push) Successful in 8m21s Details Build Container / build (mealie) (push) Successful in 1m1s Details Build Container / build (loki) (push) Successful in 8m21s Details Build Container / build (miniflux) (push) Successful in 2m24s Details Build Container / build (nettest) (push) Successful in 14s Details Build Container / build (ntfy) (push) Successful in 8m33s Details Build Container / build (prometheus) (push) Successful in 37s Details Build Container / build (quartz) (push) Successful in 19s Details Build Container / build (navidrome) (push) Successful in 10m36s Details Build Container / build (runner-job-image) (push) Successful in 3m18s Details Build Container / build (transmission) (push) Successful in 20s Details Build Container / build (transmission-exporter) (push) Successful in 21s Details Build Container / build (unpoller) (push) Successful in 11s Details Build Container / build (teslamate) (push) Successful in 4m42s Details Every container now carries title, description, version, source, and vendor labels per the OCI image spec. Version is derived from the existing CONTAINER_APP_VERSION ARG at build time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 20:42:00 -07:00
Erich Blume	0dffdb9974	Add Claude Code subagents for infrastructure workflows Four project-scoped subagents that formalize existing mise task workflows as constrained, specialized AI agents: - infra-health: background health monitor (wraps services-check) - doc-reviewer: persistent-memory documentation reviewer - change-classifier: C0/C1/C2 triage before work begins - mikado-navigator: C2 chain state advisor (wraps docs-mikado) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 11:57:36 -07:00
Erich Blume	ef8c2118a1	Standardize USAGE pragmas and typer parsing across mise tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 11:42:01 -07:00
Erich Blume	0d2779762a	Upgrade Prometheus to v3.10.0 (#301 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (prometheus) (push) Successful in 2s Details Build Container / build (prometheus) (push) Successful in 44m11s Details ## Summary - Bump Prometheus from v3.9.1 to v3.10.0 in custom container Dockerfile - v3.10.0 adds distroless Docker image variants, new PromQL `fill` operators, and performance improvements - Dagger build tested locally — builds cleanly ## Remaining after merge - Update `kustomization.yaml` newTag with the auto-built image tag - Update `service-versions.yaml` (last-reviewed + current-version) - ArgoCD sync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #301	2026-03-18 07:47:46 -07:00
Erich Blume	e0dbcbd997	Update retention changelog to reflect final PVC decision Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 06:46:55 -07:00
Erich Blume	ef199b70f0	Increase Prometheus and Loki data retention Prometheus: 15d → 10y (3650d), PVC 20Gi → 200Gi Loki: 31d (744h) → 365d (8760h), PVC 20Gi → 50Gi Indri has 1.6 TB free on the minikube backing disk — the previous 15-day Prometheus retention was losing valuable long-term metrics data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 06:44:00 -07:00
Erich Blume	3e9873d669	Fix borgmatic backup: use correct kubectl context on indri The Mealie SQLite dump hook used `minikube-indri` (the context name on gilbert), but on indri itself the context is just `minikube`. This caused the before_backup hook to fail, aborting all backups since the hook was added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 06:07:44 -07:00
Erich Blume	cfe3391f1a	Bump Frigate retention and add recording health check Increase retention: continuous 3→180d, detections 14→30d, alerts 30→730d. Plenty of NFS headroom (~9.4 TiB free, ~6.6 GB/day for one camera). Add frigate-recording check to services-check that verifies camera_fps > 0, which would have caught the 6-day outage from the mqtt config removal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 18:24:11 -07:00
Erich Blume	6617e44e5b	Fix Frigate crash: re-add required mqtt config section Frigate's config schema requires an `mqtt` field even when MQTT isn't used. Commit `40f1568` removed it along with Mosquitto, causing Frigate to fail validation on startup. Add `mqtt.enabled: false` to satisfy the schema without needing a broker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 18:10:23 -07:00
Erich Blume	61f02a0335	Localize Alloy container image (#300 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (alloy) (push) Successful in 14s Details Build Container / build (alloy) (push) Successful in 38m34s Details ## Summary - Add `containers/alloy/` with dual Dockerfile + Nix build files for Grafana Alloy v1.14.0 - Both builds fetch source from forge mirror (`forge.ops.eblu.me/mirrors/alloy.git`), build the web UI (Node), then compile the Go binary with `netgo embedalloyui` tags - Update all three alloy deployments (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail) to use `registry.ops.eblu.me/blumeops/alloy` - `promtail_journal_enabled` tag omitted — requires systemd headers and none of our configs use `loki.source.journal` ## Build verification - Dockerfile: Tested locally via `docker build`, binary reports `v1.14.0` with correct tags - Nix: Tested on ringtail via `nix-build`, all three hashes (fetchgit, npmDeps, goModules) resolved and build succeeds ## Post-merge steps 1. Wait for CI to build the container from main (both Dockerfile and Nix workflows) 2. `mise run container-list alloy` to find the `[main]` tagged image 3. C0 follow-up to update `newTag` in all three kustomizations from `v1.14.0-placeholder` to the real tag 4. Sync ArgoCD apps and verify pods come up healthy Reviewed-on: #300	2026-03-17 16:42:53 -07:00
Forgejo Actions	cdba9dca96	Update docs release to v1.14.2 - Built changelog from towncrier fragments [skip ci]	2026-03-17 13:24:13 -07:00
Erich Blume	1f000c8e39	Add last-updated subsort to docs-review, review gilbert card Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 13:22:01 -07:00
Erich Blume	995478b91f	Review jellyfin and automounter services Both services current: jellyfin 10.11.6 (latest upstream), automounter 1.11.0 (Mac App Store). Add missing frigate share to automounter docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 13:06:23 -07:00
Erich Blume	e5ce510fdc	Fix plan-a-meal random recipe API queries Mealie's orderBy=random requires a paginationSeed parameter, otherwise the API returns 422. Added the seed to all random query examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 11:10:48 -07:00
Erich Blume	11330ebea0	Deploy Mealie recipe manager (#299 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (mealie) (push) Successful in 2s Details Build Container / build (mealie) (push) Successful in 8s Details ## Summary - Deploy Mealie (self-hosted recipe manager) on minikube-indri via ArgoCD - Build container from source via forge mirror (`mirrors/mealie`) — multi-stage Dockerfile with Node.js frontend + Python/uv backend - Add Caddy proxy entry for `meals.ops.eblu.me` - Part of a larger meal planning pipeline: Mealie stores categorized recipes, a planner script selects balanced meals, and Ollama generates unified cooking timelines ## Status - [x] Mirror mealie repo on forge - [x] Dockerfile (from-source build) - [x] ArgoCD app + k8s manifests - [x] Caddy proxy entry - [x] Service docs, routing table, app registry - [ ] Local Dagger build test - [ ] Container build + push to registry - [ ] Update kustomization.yaml with real image tag - [ ] Deploy and verify - [ ] Provision Caddy ## Test plan - Build container locally via `dagger call build --src=. --container-name=mealie` - Trigger CI build via `mise run container-build-and-release mealie` - Deploy from branch: `argocd app set mealie --revision deploy-mealie && argocd app sync mealie` - Verify Mealie UI at `https://meals.ops.eblu.me` - Verify API docs at `https://meals.ops.eblu.me/docs` Reviewed-on: #299	2026-03-16 21:59:10 -07:00
Erich Blume	4dc3e5cae2	Add UnPoller for UniFi network metrics (#298 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (unpoller) (push) Successful in 2s Details Build Container / build (unpoller) (push) Successful in 7s Details ## Summary - Deploy UnPoller as a k8s service on indri to export UniFi controller metrics to Prometheus - Custom-built container from forge mirror (`containers/unpoller/Dockerfile`) - Credentials pulled from 1Password via external-secrets - Prometheus scrape job added, docs and service-versions updated ## Test plan - [ ] Build container: `mise run container-release unpoller v2.34.0` - [ ] Update kustomization tag with built image tag - [ ] Deploy from branch: `argocd app set unpoller --revision feature/unpoller && argocd app sync unpoller` - [ ] Verify pod connects to UX7 controller (check logs) - [ ] Confirm `unpoller` target appears in Prometheus - [ ] Query `unifi_` metrics in Grafana 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #298	2026-03-16 15:52:45 -07:00
Erich Blume	a29ced71b5	Upgrade borgmatic 2.0.13 → 2.1.3 (#297 ) ## Summary - Upgraded borgmatic from 2.0.13 to 2.1.3 on indri (via mise/pipx) - Key changes: improved borg warning handling, memory/performance improvements, `source_directories_must_exist` now defaults to true (already set in our config) - Verified: config validates, dry-run passed against both sifaka (local) and borgbase (offsite) repos ## Borg Warnings Investigation The main concern was 2.1.0's change to treat borg warnings as errors. In 2.1.3 this was partially reverted — "file not found" warnings (exit code 107) are back to being warnings. Our config already sets `source_directories_must_exist: true`, and all four source directories were verified present on indri. ## Test plan - [x] `borgmatic --version` confirms 2.1.3 - [x] `borgmatic config validate` passes - [x] `borgmatic create --dry-run` succeeds against both repositories - [x] All source directories verified present on indri - [ ] Verify next scheduled backup (2:00 AM) completes successfully 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #297	2026-03-16 11:05:24 -07:00
Erich Blume	4ca2e39901	Externalize TeslaMate dashboards to forge mirror (#296 ) ## Summary - Replaces 18 TeslaMate dashboard ConfigMaps (713 KB / 22,080 lines) with a Grafana init container - Init container fetches dashboard JSON directly from `mirrors/teslamate` on forge, pinned to `v3.0.0` - Grafana's file provider picks them up from `/tmp/dashboards/TeslaMate/` via `foldersFromFilesStructure` - Non-TeslaMate dashboards remain as ConfigMaps (unchanged) ## How it works - New `init-teslamate-dashboards` init container uses busybox `wget` to fetch each JSON file from `https://forge.eblu.me/mirrors/teslamate/raw/tag/v3.0.0/grafana/dashboards/` - Files land in `/tmp/dashboards/TeslaMate/`, same emptyDir volume the sidecar uses - The sidecar continues to handle ConfigMap-based dashboards; the init container handles TeslaMate - Version pin is in the init container args (TESLAMATE_VERSION) ## Deployment and Testing - [ ] Sync `grafana` app from branch — verify init container runs and fetches dashboards - [ ] Sync `grafana-config` app from branch — verify TeslaMate ConfigMaps are pruned - [ ] Check Grafana UI: TeslaMate folder should still contain all 18 dashboards - [ ] Verify non-TeslaMate dashboards are unaffected - [ ] After merge: sync both apps from main Reviewed-on: #296	2026-03-15 18:31:19 -07:00
Erich Blume	1f0308bbd2	Fix Caddy v2.11 Host header rewrite breaking proxied services Caddy v2.11 (#7454) auto-rewrites the Host header to match the upstream address for HTTPS backends. This causes services behind Tailscale Ingress to see .tail8d86e.ts.net instead of .ops.eblu.me, breaking Authentik OAuth flows, Homepage host validation, and other services that check the Host header. Only apply header_up for HTTPS backends (Tailscale Ingress); HTTP backends (forge, registry, jellyfin, sifaka) are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 18:28:18 -07:00
Erich Blume	2bea048dbf	Externalize Tailscale operator to forge mirror (#295 ) ## Summary - Mirrors `tailscale/tailscale` on forge (`mirrors/tailscale`) - Replaces vendored `operator.yaml` (495 KB / 5,386 lines) with ArgoCD apps sourcing the upstream static manifest, pinned via `targetRevision: v1.94.2` - Adds `tailscale-operator-base` app for indri and `tailscale-operator-base-ringtail` for ringtail - Local kustomization retains only ProxyClass and DNSConfig custom resources - Updates `[[tailscale-operator]]` doc to reflect new sourcing ## Deployment and Testing - [ ] Register `mirrors/tailscale` repo in ArgoCD (it needs to know about the new repo) - [ ] Sync `apps` app to pick up the new `tailscale-operator-base` app definitions - [ ] Sync `tailscale-operator-base` — verify CRDs, RBAC, operator Deployment come up - [ ] Sync `tailscale-operator` — verify ProxyClass, DNSConfig still apply cleanly - [ ] Verify existing Tailscale Ingresses still work (ProxyGroup pods healthy) - [ ] Repeat for ringtail cluster - [ ] After merge: apps already point at tags, no revision reset needed Reviewed-on: #295	2026-03-15 17:44:35 -07:00
Erich Blume	272ea1e767	Upgrade Caddy v2.10.2 → v2.11.2, fix forge mirrors (#294 ) ## Summary - Upgrade Caddy from v2.10.2 to v2.11.2 (7 CVE fixes across v2.11.1 and v2.11.2) - Create `mirrors/caddy-l4` forge mirror for Layer 4 plugin - Migrate all `~/code/3rd` clones on indri from `localhost:3001` to HTTPS `forge.ops.eblu.me/mirrors/` remotes - Remove stale clones (`apple-silicon-detector`, `whisper.cpp`) - Update caddy docs and service-versions tracking ## CVEs Fixed - CVE-2026-27585 through CVE-2026-27590 (path/host bypass, TLS fail-open, FastCGI issues) - Forward auth identity injection (privilege escalation) - `vars_regexp` placeholder secret exposure - Built on Go 1.26.1 (patches Go-level CVEs) ## What was done on indri (not in repo) - `xcaddy build` with Gandi DNS + Layer 4 plugins → `~/code/3rd/caddy/bin/caddy` now v2.11.2 - Remotes updated: caddy, forgejo-runner, zot → `https://forge.ops.eblu.me/mirrors/.git` - Deleted: `~/code/3rd/apple-silicon-detector`, `~/code/3rd/whisper.cpp` ## Deployment and Testing - [x] Ansible dry-run passed (`--tags caddy --check --diff`) - [ ] Restart caddy LaunchAgent to pick up the new binary - [ ] Verify all proxied services respond via `.ops.eblu.me` - [ ] Run `mise run services-check` Reviewed-on: #294	2026-03-15 10:33:48 -07:00
Forgejo Actions	cb95db0bc9	Update docs release to v1.14.1 - Built changelog from towncrier fragments [skip ci]	2026-03-14 10:11:06 -07:00
Erich Blume	53d620365a	Bump zot registry to v2.1.15 (#293 ) ## Summary - Upgrade zot OCI registry from v2.1.13 to v2.1.15 on indri - Addresses CVE-2025-30204 (golang-jwt memory) and open redirect via callback_ui - No config template changes needed (externalUrl is auto-allowlisted) - Requires Go 1.25.7 (bump from 1.25.6 via mise) ## Data Safety - Data directory ~/erichblume/zot is NOT touched during build or deploy - No schema migrations in v2.1.14 or v2.1.15 - Storage format remains OCI spec 1.1.0 ## Deployment Steps - [ ] SSH to indri: bump Go to 1.25.7 via `mise use go@1.25.7` - [ ] Fetch and checkout v2.1.15 in ~/code/3rd/zot - [ ] Build: `mise x -- make binary` - [ ] Restart LaunchAgent - [ ] Verify: `curl -s http://localhost:5050/v2/` returns 200 - [ ] Verify: `curl -s https://registry.ops.eblu.me/v2/_catalog` lists repos - [ ] Verify: `mise run services-check` Reviewed-on: #293	2026-03-14 10:00:40 -07:00
Erich Blume	ab8ea6f301	Bump Grafana Alloy to v1.14.0 (#292 ) ## Summary - Bump alloy-k8s, alloy-ringtail, and alloy-tracing-ringtail image tags from v1.13.1 to v1.14.0 - Mark indri alloy (ansible) as reviewed at v1.14.0 — source rebuild from forge mirror needed - Add missing alloy-ringtail entry to service-versions.yaml - Update alloy reference doc ## Breaking changes reviewed - `loki.secretfilter` options removed — not used in our configs - OTel Collector upgraded to v0.142.0 — Kafka receiver changes don't affect us - Exporter queue default changes — our tracing pipeline (Beyla → batch → otlphttp) uses simple config, low risk ## Deployment and Testing - [ ] Sync alloy-k8s: `argocd app set alloy-k8s --revision bump/alloy-v1.14.0 && argocd app sync alloy-k8s` - [ ] Sync alloy-ringtail: `argocd app set alloy-ringtail --revision bump/alloy-v1.14.0 --server ringtail-argocd && argocd app sync alloy-ringtail` - [ ] Sync alloy-tracing-ringtail similarly - [ ] Verify metrics flowing in Grafana - [ ] Verify traces flowing to Tempo (ringtail) - [ ] Rebuild indri alloy from source (`v1.14.0` tag on forge mirror), SCP to indri, restart - [ ] After merge: reset ArgoCD revisions to main, re-sync Reviewed-on: #292	2026-03-13 16:25:27 -07:00
Erich Blume	40f1568088	Remove unused Mosquitto MQTT broker from ringtail Mosquitto has been dormant since frigate-notify switched from MQTT to webapi polling (`529ba10`). Tear down live infra (ArgoCD app, namespace) and remove all manifests, service-versions entry, services-check, and doc references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 18:37:31 -07:00
Erich Blume	8b9cc4effd	Add how-to card for running 1Password backup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 18:17:45 -07:00
Erich Blume	d01a165b91	Add docs-preview task and visual preview step to doc review New `mise run docs-preview <card>` task builds docs via Dagger and serves them locally in the production quartz container (image parsed from ArgoCD kustomization), opening the browser directly to the specified card. Container auto-cleans after 1 hour. Also updates docs-review checklist and review-documentation how-to to reference the visual preview workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 18:04:01 -07:00
Erich Blume	87d4de244b	Review jobsync: add to services-check and homepage (#291 ) ## Summary - Add jobsync pod check (ringtail k3s) and HTTP endpoint to `services-check` - Add JobSync entry to homepage dashboard under new "Apps" group - Mark jobsync as reviewed at v1.1.4 (current with upstream) - Changelog fragment added ## Deployment and Testing - [ ] Sync homepage app from branch: `argocd app set homepage --revision review/jobsync && argocd app sync homepage` - [ ] Verify JobSync appears on go.ops.eblu.me dashboard - [ ] Run `mise run services-check` to verify new checks pass - [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage` Reviewed-on: #291	2026-03-11 17:36:51 -07:00
Forgejo Actions	ebba3d6e5b	Update docs release to v1.14.0 - Built changelog from towncrier fragments [skip ci]	2026-03-09 12:03:30 -07:00
Erich Blume	4f0476a851	Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290 ) All checks were successful Build Container / detect (push) Successful in 3s Details Build Container (Nix) / detect (push) Successful in 1s Details Build Container (Nix) / build (quartz) (push) Successful in 1s Details Build Container / build (quartz) (push) Successful in 10s Details ## Summary Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days. Root cause: Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion. Fix: - Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML - Replace nginx SPA fallback with `=404` + Quartz's static `404.html` - Remove `robots.txt` exclusions (no longer needed) Docs cleanup (Obsidian.nvim compat no longer needed): - Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages - Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history) - Drop `docs-check-index` and `docs-check-filenames` prek hooks - Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity - Add `ai-docs` doc tree listing to replace index files for AI context - Add natural cross-links from reference cards to fix orphan docs ## Deployment and Testing - [ ] Merge and let the build pipeline run - [ ] Verify docs.eblu.me serves pages correctly with full page loads - [ ] Verify non-existent URLs return 404 - [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs Reviewed-on: #290	2026-03-09 11:59:43 -07:00
Erich Blume	770a7b2d6a	Add JobSync reference card, observability docs, and RAPIDAPI_KEY plumbing (#289 ) ## Summary - Add JobSync service reference card (`docs/reference/services/jobsync.md`) with architecture, secrets, observability, and JSearch API docs - Add JobSync and Ollama to ringtail's workloads table (both were missing) - Add JobSync to the reference index - Wire `RAPIDAPI_KEY` through ExternalSecret and deployment env var for JSearch job search automation - Document Loki log queries for observability (no metrics endpoint exists) - Update deploy-jobsync how-to with new env var, observability section, and reference card link ## Deployment and Testing - [ ] Sign up for RapidAPI JSearch API (free tier: 500 req/month) - [ ] Add `rapidapi_key` field to "JobSync" 1Password item - [ ] Merge PR - [ ] `argocd app sync jobsync` to pick up new env var - [ ] Verify job search works at https://jobsync.ops.eblu.me/dashboard/automations Reviewed-on: #289	2026-03-08 15:06:52 -07:00
Erich Blume	3a811fb188	Deploy JobSync — job search tracker on ringtail k3s (#288 ) All checks were successful Build Container (Nix) / detect (push) Successful in 1s Details Build Container / detect (push) Successful in 2s Details Build Container / build (jobsync) (push) Successful in 2s Details Build Container (Nix) / build (jobsync) (push) Successful in 8s Details ## Summary C2 Mikado chain to deploy [JobSync](https://github.com/Gsync/jobsync) — a self-hosted job application tracker — to ringtail's k3s cluster. ### Mikado Graph ``` deploy-jobsync (goal) ├── build-jobsync-container │ └── mirror-jobsync └── integrate-jobsync-ollama ``` ### What is JobSync? Next.js app with SQLite for tracking job applications. Features resume management, application pipeline tracking, and AI-powered resume review/job matching. ### Key Decisions - Ringtail k3s (not minikube-indri) — colocates with Ollama for zero-latency AI - Nix container via `buildLayeredImage` — no Dockerfile, mirrors upstream source on forge - Ollama for AI — uses existing deployment, no API keys needed for AI features - No upstream fork — vanilla JobSync, Anthropic AI deferred to future work if needed ### Current Status Planning phase — cards committed, ready for review before implementation begins. Reviewed-on: #288	2026-03-08 11:02:05 -07:00
Erich Blume	1c3bf35dad	Fix mikado invariant check rejecting close without impl A close commit with zero preceding impl commits is valid — some leaf nodes involve operational steps (e.g., creating a mirror) with no code changes. Removed the false-positive check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 20:41:03 -08:00

1 2 3 4 5 ...

342 commits