blumeops

Author	SHA1	Message	Date
Erich Blume	a32c99a252	Limit ollama to one loaded model and one parallel request Prevents OOM when switching between models — only one 14B model fits in 16GB VRAM at a time with KV cache for context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 21:23:12 -08:00
Erich Blume	203e3cd567	Add NodePort service for ollama LAN access Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 20:57:18 -08:00
Erich Blume	31d925814f	Deploy Ollama LLM server on ringtail (#277 ) ## Summary - Deploy Ollama as a new ArgoCD-managed service on ringtail's k3s cluster with GPU acceleration - Declarative model management via `models.txt` + sidecar sync script (mirrors kiwix torrent pattern) - Initial models: `qwen2.5:14b`, `deepseek-r1:14b`, `phi4:14b`, `gemma3:12b` - hostPath PV on `/mnt/storage1/ollama` for fast local model storage (200Gi) - Tailscale ingress at `ollama.ops.eblu.me` for API access from tailnet - Enable GPU time-slicing (`replicas: 2`) on nvidia-device-plugin so Frigate and Ollama share the RTX 4080 ## Deployment and Testing - [ ] Deploy nvidia-device-plugin changes first: `argocd app sync nvidia-device-plugin` - [ ] Verify GPU time-slicing: `kubectl describe node ringtail --context=k3s-ringtail` shows `nvidia.com/gpu: 2` - [ ] Sync `apps` app with `--revision feature/ollama-ringtail` - [ ] Set ollama app to branch: `argocd app set ollama --revision feature/ollama-ringtail && argocd app sync ollama` - [ ] Verify model-sync sidecar pulls models: `kubectl logs -n ollama deploy/ollama -c model-sync --context=k3s-ringtail` - [ ] Test API: `curl https://ollama.ops.eblu.me/api/tags` - [ ] Test inference: `curl https://ollama.ops.eblu.me/api/generate -d '{"model":"qwen2.5:14b","prompt":"Hello"}'` - [ ] Verify Frigate still works after GPU sharing change - [ ] After merge: `argocd app set ollama --revision main && argocd app sync ollama` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/277	2026-03-02 20:39:51 -08:00
Forgejo Actions	0f79c61c42	Update docs release to v1.12.1 - Built changelog from towncrier fragments [skip ci]	2026-03-02 18:17:07 -08:00
Forgejo Actions	847e47eaf3	Update docs release to v1.12.0 - Built changelog from towncrier fragments [skip ci]	2026-03-01 17:24:09 -08:00
Erich Blume	503775085d	Deploy authentik 2026.2.0 with migration ordering fix Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 16:32:10 -08:00
Erich Blume	90621e4155	Deploy authentik 2026.2.0 with entry_points fix Update image tag to v2026.2.0-78027eb-nix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 16:04:29 -08:00
Erich Blume	e2c650b027	Deploy authentik 2026.2.0 with BASE_DIR fix Update image tag to v2026.2.0-e49d966-nix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 15:55:50 -08:00
Erich Blume	c0e29476f3	Deploy authentik 2026.2.0 with TMPDIR fix Update image tag to v2026.2.0-b7bfb0b-nix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 15:53:09 -08:00
Erich Blume	38da372f94	Deploy authentik 2026.2.0 with /tmp fix Update image tag to v2026.2.0-2ac353b-nix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 15:51:17 -08:00
Erich Blume	098f3e517c	Deploy authentik 2026.2.0 (source-built) to ArgoCD Update image tag to v2026.2.0-efa9806-nix — the first source-built authentik container from the build-authentik-from-source chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 15:44:35 -08:00
Erich Blume	02eb169403	Pin blumeops-pg to PostgreSQL 18.3 Replace floating :18 tag with pinned :18.3 (upstream out-of-cycle release fixing 18.2 regressions). Stamps service as reviewed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 16:25:32 -08:00
Erich Blume	776caa87f5	Sync Frigate zone coordinates from live API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 07:52:09 -08:00
Forgejo Actions	fa223f8e3b	Update docs release to v1.11.5 - Built changelog from towncrier fragments [skip ci]	2026-02-26 07:56:02 -08:00
Erich Blume	be3cdad1cb	Add HA for CV and Docs: zero-downtime deploys (#273 ) ## Summary - Set `replicas: 2` with `maxUnavailable: 0` / `maxSurge: 1` on CV and Docs deployments so rolling updates never drop below 2 ready pods - Add PodDisruptionBudgets (`minAvailable: 1`) to protect against node drains and cluster maintenance - Add Fly.io cache purge step to `cv-deploy.yaml` workflow (docs already had this) so CV deploys don't serve stale cached content ## Deployment and Testing - [ ] `argocd app diff cv` / `argocd app diff docs` from branch - [ ] Deploy from branch: `argocd app set cv --revision feature/ha-cv-docs-zero-downtime && argocd app sync cv` - [ ] Verify 2 pods running: `kubectl get pods -n cv --context=minikube-indri` - [ ] Test rolling restart: `kubectl rollout restart deployment/cv -n cv --context=minikube-indri` - [ ] During rollout, confirm continuous availability via `curl -I https://cv.eblu.me` - [ ] After merge: reset ArgoCD to main, re-sync both apps Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/273	2026-02-26 07:53:21 -08:00
Erich Blume	fb83c5c577	Add explicit ExternalSecret defaults for SSA sync parity The external-secrets webhook injects conversionStrategy, decodingStrategy, and metadataPolicy defaults on admission. Declaring them explicitly prevents ArgoCD SSA from flagging the resource as OutOfSync. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 07:02:54 -08:00
Erich Blume	db561c6b0e	Upgrade ArgoCD v3.2.6 → v3.3.2 with Server-Side Apply (#272 ) ## Summary - Upgrade ArgoCD from v3.2.6 to v3.3.2 - Enable `ServerSideApply=true` sync option (required by v3.3 — ApplicationSet CRD exceeds client-side apply annotation limit) - Update service-versions.yaml with review for argocd and 1password-connect ## Breaking changes reviewed - Server-Side Apply required: Added to syncOptions ✅ - Source Hydrator git notes: Not used — N/A - Application path cleaning removed: Not used — N/A - Settings API field restriction: Authenticated access only — N/A ## Deployment and Testing - [ ] Sync the `apps` app first (picks up SSA syncOption change) - [ ] `argocd app set argocd --revision feature/argocd-v3.3.2` - [ ] `argocd app sync argocd` - [ ] Verify all argocd pods running with v3.3.2 images - [ ] Verify other apps still sync correctly - [ ] After merge: `argocd app set argocd --revision main && argocd app sync argocd` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/272	2026-02-26 06:51:50 -08:00
Erich Blume	95c8424e62	Add Transmission metrics exporter and Grafana dashboard (#271 ) ## Summary - Add `metalmatze/transmission-exporter` as a sidecar container in the torrent deployment, exposing Prometheus metrics on port 19091 - Add metrics port to the torrent service for Prometheus scraping - Add Prometheus scrape job targeting the transmission exporter - Create Grafana dashboard with: - Overview stats (download/upload speed, active/total torrents) - Transfer speed timeseries (download + upload over time) - Transfer volume stats (total downloaded/uploaded in selected range) - Per-torrent download and upload rate timeseries - Per-torrent details table (ratio, uploaded, percent done) ## Deployment and Testing - [ ] Sync ArgoCD `torrent` app from branch — verify exporter sidecar starts - [ ] Verify exporter metrics: `kubectl exec` into pod, `curl localhost:19091/metrics` - [ ] Verify Prometheus scrapes it: check targets at prometheus.ops.eblu.me - [ ] Open Grafana, find "Transmission" dashboard, verify panels populate - [ ] Sync ArgoCD `prometheus` app from branch - [ ] Sync ArgoCD `grafana-config` app from branch Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/271	2026-02-25 22:23:33 -08:00
Erich Blume	03d71544ec	Add multi-cluster observability with ringtail metrics and dashboards (#270 ) ## Summary - Add `cluster` label (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s metrics/logs, and Alloy host metrics/logs - Deploy kube-state-metrics on ringtail's k3s cluster (ArgoCD app + manifests) - Deploy Alloy on ringtail to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki - Replace single-cluster "Minikube Kubernetes" and "K8s Services Health" dashboards with: - Kubernetes Clusters dashboard — multi-cluster with `cluster` and `namespace` template variables - Ringtail (k3s) dashboard — dedicated ringtail view with GPU usage panels ## Deployment and Testing 1. Sync `apps` on indri ArgoCD to pick up new app definitions (`kube-state-metrics-ringtail`, `alloy-ringtail`) 2. Sync `prometheus` → verify `cluster` label on scraped metrics 3. Sync `alloy-k8s` → verify `cluster=indri` on remote-written metrics and logs 4. Run `mise run provision-indri -- --tags alloy` → verify `cluster=indri` on host Alloy metrics/logs 5. Sync `kube-state-metrics-ringtail` → verify pods running on ringtail 6. Sync `alloy-ringtail` → verify pods running, check Prometheus for `kube_pod_info{cluster="ringtail"}` 7. Sync `grafana-config` → verify dashboards appear, cluster variable populates both values 8. Check Loki for `{cluster="ringtail"}` logs from ringtail pods ## Notes - Alloy on ringtail uses `insecure_skip_verify=true` for TLS to Prometheus/Loki (Tailscale-managed certs not in container trust store) — tighten later - DNS resolution for `*.tail8d86e.ts.net` from ringtail pods depends on CoreDNS inheriting host's MagicDNS resolver; may need CoreDNS forwarding rules if pods can't resolve - The old services dashboard (blackbox probes) is removed — those probes are still running in alloy-k8s and the data is still in Prometheus, just not in a dedicated dashboard Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/270	2026-02-25 22:01:00 -08:00
Erich Blume	2243f2e0a1	Filter driveway zone to person/dog/cat only in Frigate Parked car was being re-detected every few minutes at night due to IR illumination noise triggering motion detection. Restrict the driveway zone to [person, dog, cat] so cars and birds no longer create events there. Cars still alert via the driveway_entrance zone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 20:45:07 -08:00
Erich Blume	de54b4e33d	Port CloudNative-PG off Helm to direct release manifest (#268 ) ## Summary - Point ArgoCD app directly at forge-mirrored upstream repo (`mirrors/cloudnative-pg`) instead of the Helm charts repo - Use `directory.include` to select the specific release manifest (`cnpg-1.27.1.yaml`) from the `releases/` directory - No vendored files, no Helm — upgrades are a two-line change (`targetRevision` + `directory.include`) - Delete unused `values.yaml` (was empty, all Helm defaults) ## Deployment and Testing - [ ] Register mirror repo in ArgoCD: `argocd repo add ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git --ssh-private-key-path <key>` - [ ] `argocd app set cloudnative-pg --revision feature/cnpg-direct-source && argocd app sync cloudnative-pg` - [ ] Verify operator pod running: `kubectl get pods -n cnpg-system --context=minikube-indri` - [ ] Verify CRDs exist: `kubectl get crd --context=minikube-indri \| grep cnpg` - [ ] Verify existing clusters healthy: `kubectl get clusters -A --context=minikube-indri` - [ ] After merge: `argocd app set cloudnative-pg --revision main && argocd app sync cloudnative-pg` ## Notes - The forge mirror was created via `mise run mirror-create` from `https://github.com/cloudnative-pg/cloudnative-pg.git` - ArgoCD may need the mirror repo added to its known repositories if the credential template doesn't already match `mirrors/*` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/268	2026-02-25 17:37:53 -08:00
Erich Blume	285ad4141f	Fix Frigate detection events rate metric name in Grafana dashboard The panel queried frigate_camera_events but the actual metric exposed by Frigate is frigate_camera_events_total with a "camera" label (not "camera_name"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 16:51:57 -08:00
Forgejo Actions	4736c7e9bd	Update docs release to v1.11.4 - Built changelog from towncrier fragments [skip ci]	2026-02-25 07:04:23 -08:00
Erich Blume	5f9bc20345	Fix mirror org refs in ArgoCD apps and widen credential template (#266 ) ## Summary - Widen `repo-creds-forge` URL prefix from `/eblume/` to host-wide `/` so it matches repos in all forge orgs (fixes `mirrors/` repos not getting SSH credentials) - Update 8 ArgoCD app definitions from `eblume/<mirror>` → `mirrors/<mirror>` (immich-charts, cloudnative-pg-charts, external-secrets, connect-helm-charts) - Fix stale alloy clone comment in Ansible defaults - Bump immich v2.5.2 → v2.5.6 (bug-fix patches only) - Update ArgoCD README bootstrap command and credential docs ## Context Mirrors were migrated from `forge.ops.eblu.me/eblume/` to `forge.ops.eblu.me/mirrors/` in commit ``cd57814``. Container Dockerfiles and image tags were updated, but ArgoCD app definitions and the repo credential template were missed, causing `ComparisonError` on apps that source Helm charts from mirrored repos. ## Deployment 1. Sync the ArgoCD `argocd` app first (picks up the widened credential template) 2. Sync the `apps` app (picks up new repo URLs for all 8 apps) 3. Verify immich resolves its ComparisonError: `argocd app get immich` 4. Sync immich to deploy v2.5.6: `argocd app sync immich` 5. Spot-check: `argocd app get external-secrets`, `argocd app get cloudnative-pg`, `argocd app get 1password-connect` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/266	2026-02-25 06:55:53 -08:00
Erich Blume	4f8f2985c1	Update prometheus and teslamate image tags after mirror migration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:18:15 -08:00
Erich Blume	e0f9ebebdf	Update homepage, navidrome, ntfy, miniflux image tags after mirror migration Prometheus and teslamate builds still in progress — will update in a follow-up commit once their `33b7f0f` tags land. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:06:08 -08:00
Erich Blume	61ee6a4d38	Fix Grafana ConfigMap labels lost in configMapGenerator migration The hand-written configmap.yaml had app.kubernetes.io/name and app.kubernetes.io/instance labels; configMapGenerator dropped them. Add options.labels to both generator entries to restore parity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 14:46:20 -08:00
Erich Blume	9b44a8ec51	Add kustomize images: and configMapGenerator: across services (#264 ) ## Summary - Move hardcoded image tags to kustomization.yaml `images:` transformer across 22 services — image names in manifests become version-agnostic templates, with tags centralized in one place per service - Replace hand-written ConfigMap manifests with `configMapGenerator:` in 12 services — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes - Create new `kustomization.yaml` for forgejo-runner and nvidia-device-plugin (switches ArgoCD from directory mode to kustomize mode, rendered output identical) ### Services modified Images only (8): cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent Images + configMapGenerator (10): alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus Images only, no configMapGenerator (4): authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is) Skipped entirely (6): argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml) ### What changes at deploy time - images: — no functional diff, `kustomize build` produces identical output with tags - configMapGenerator: — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change ## Test plan - [x] `kubectl kustomize` builds all 30 service directories successfully - [x] Image tags verified in rendered output for all modified services - [x] ConfigMap hash suffixes verified in rendered output - [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names - [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.) - [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes - [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264	2026-02-24 14:25:19 -08:00
Erich Blume	86aeb60ec9	Fix TeslaMate dashboards: add database to PostgreSQL jsonData Grafana 12.x's grafana-postgresql-datasource plugin requires the database name in jsonData, not just the top-level database field. Without it, the frontend blocks all queries with "no default database configured", causing all TeslaMate panels to show "No Data." Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 13:49:07 -08:00
Erich Blume	495c3e8496	Fix Grafana OAuth role mapping from Authentik groups The INI parser was stripping outer single quotes from role_attribute_path = 'Admin', causing Grafana to evaluate 'Admin' as a JMESPath field identifier instead of a string literal. This resulted in all OAuth users getting the default Viewer role. Replaced with a proper group-based expression that checks for the 'admins' Authentik group and maps to Admin/Viewer accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 13:41:08 -08:00
Erich Blume	4acd2e58d4	Update prometheus and grafana to main-SHA container tags Prometheus: v3.9.1-74029e1 [branch] -> v3.9.1-2ba5d8a [main] Grafana: v12.3.3-09ac36b [branch] -> v12.3.3-d05d2fb [main] These images were built during PR development and referenced branch commits that won't survive branch cleanup. The [main] tags are identical rebuilds from the squash-merge commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 09:58:09 -08:00
Erich Blume	2ba5d8a8aa	Port Prometheus to local container build (#262 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (prometheus) (push) Successful in 2s Details Build Container / build (prometheus) (push) Successful in 7s Details ## Summary - Add three-stage Dockerfile for Prometheus v3.9.1 (Node UI → Go binaries → Alpine runtime) - Produces `prometheus` and `promtool` binaries with embedded web UI assets - Follows navidrome/ntfy pattern for supply chain control via Zot registry ## Deployment and Testing - [ ] `dagger call build --src=. --container-name=prometheus` succeeds - [ ] Container reports correct version via `prometheus --version` - [ ] `promtool --version` works - [ ] Update statefulset image reference after successful build - [ ] Deploy from branch: `argocd app set prometheus --revision <branch> && argocd app sync prometheus` - [ ] Health probes pass (`/-/healthy`, `/-/ready`) - [ ] Web UI loads, scrape targets work, remote write functions Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/262	2026-02-24 09:15:57 -08:00
Forgejo Actions	2f78d180e8	Update docs release to v1.11.3 - Built changelog from towncrier fragments [skip ci]	2026-02-23 21:04:33 -08:00
Erich Blume	d05d2fbaff	C2: Upgrade Grafana to 12.x with Nix container and Kustomize (#260 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 1s Details Build Container (Nix) / build (grafana) (push) Successful in 2s Details Build Container / build (grafana) (push) Successful in 7s Details ## Summary Mikado chain to upgrade Grafana from 11.4.0 (Helm chart) to 12.x with: - Home-built Nix container image (`forge.ops.eblu.me/eblume/grafana`) - Kustomize manifests replacing the Helm chart - Single-source ArgoCD app ## Chain Goal: `upgrade-grafana` Leaves: `build-grafana-container`, `kustomize-grafana-deployment` Track with: `mise run docs-mikado upgrade-grafana` ## Test plan - [ ] Container builds successfully via Nix - [ ] Container pushed to registry - [ ] Kustomize manifests produce equivalent resources to current Helm - [ ] Pod runs, UI loads, OIDC works, datasources healthy - [ ] `mise run services-check` passes Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/260	2026-02-23 18:07:18 -08:00
Erich Blume	9b419abf24	Update RUNNER_LABELS to use runner-job-image:v0.19.11-4c5e0f0 Now that the image is built under the new name, point the forgejo runner at it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 17:47:14 -08:00
Erich Blume	75fde54355	Fix Grafana TeslaMate dashboard folder provisioning (#253 ) ## Summary - `foldersFromFilesStructure` was `false` in Grafana's sidecar provider config, causing Grafana to ignore the subdirectory structure the sidecar creates from `grafana_folder` annotations - All 18 TeslaMate dashboards were appearing in the root "Dashboards" folder despite having `grafana_folder: "TeslaMate"` annotations on their ConfigMaps - Flipping to `true` makes Grafana replicate the sidecar's directory structure as UI folders ## Deployment and Testing - [ ] Sync `grafana` app: `argocd app sync grafana` - [ ] Verify TeslaMate dashboards appear under a "TeslaMate" folder in Grafana's dashboard list - [ ] Verify other dashboards remain in the root "Dashboards" folder Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/253	2026-02-22 18:38:51 -08:00
Erich Blume	6871bf32a8	Remove unused gpu panel in frigate dashboard	2026-02-22 18:26:47 -08:00
Erich Blume	2c6c6a244a	Fix Frigate Prometheus metrics & rebuild Grafana dashboard (#252 ) ## Summary - Prometheus scrape target: Changed from `frigate.frigate.svc.cluster.local:5000` (broken after ringtail migration) to `nvr.ops.eblu.me` via HTTPS through Caddy on indri - Grafana dashboard: Rebuilt for Frigate 0.17 metrics — 12 panels total: - Row 1 (stats): Uptime, Inference Speed, Camera FPS, Detection FPS, GPU Usage, GPU Temp - Row 2 (timeseries): CPU Usage, Memory Usage - Row 3 (timeseries): Camera FPS + Skipped FPS, GPU Usage + Memory over time - Row 4 (timeseries): Storage Usage, Detection Events (rate by camera/label) ## Deployment and Testing 1. Sync prometheus app on branch: ``` argocd app set prometheus --revision fix/frigate-metrics-dashboard && argocd app sync prometheus ``` 2. Check `prometheus.ops.eblu.me/targets` — frigate job should show UP 3. Sync grafana-config: ``` argocd app sync grafana-config ``` 4. Check `grafana.ops.eblu.me` — Frigate NVR dashboard should show live data 5. After merge: reset both apps to `--revision main` and sync Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/252	2026-02-22 18:14:17 -08:00
Forgejo Actions	dda7d719b3	Update docs release to v1.11.2 - Built changelog from towncrier fragments [skip ci]	2026-02-22 17:52:05 -08:00
Erich Blume	e655f4556e	Upgrade k8s forgejo-runner from v6.3.1 to v12.7.0 (#251 ) ## Summary Completes the `upgrade-k8s-runner` mikado chain. Both prerequisites (workflow validation in Dagger, config review against v12 defaults) were resolved in #250. - Bump runner image `code.forgejo.org/forgejo/runner:6.3.1` → `12.7.0` - Update `service-versions.yaml` to track new version - Mark goal card complete (remove `status: active`) ## Deployment and Testing After merge: 1. `argocd app sync forgejo-runner` 2. Verify runner registers in Forgejo admin → runners 3. Trigger a test workflow (e.g. `branch-cleanup.yaml` manual dispatch) Rollback: revert image tag to `6.3.1`, push, sync. Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/251	2026-02-22 17:43:39 -08:00
Erich Blume	0f6a1898f0	Prepare forgejo-runner v12 upgrade (leaf nodes) (#250 ) ## Summary - Review runner config against v12.7.0 defaults — added `shutdown_timeout: 3h`, no breaking changes found - Add `validate_workflows` Dagger function using `forgejo-runner validate --directory .` inside upstream container - All 6 workflows pass v12.7.0 schema validation - Wire `mise run validate-workflows` task and pre-commit hook on `.forgejo/workflows/` changes - Mark both leaf Mikado cards (`review-runner-config-v12`, `validate-workflows-against-v12`) complete ## Mikado State After merge, `upgrade-k8s-runner` goal card has no unmet dependencies — ready to execute the actual image bump in a follow-up PR. ## Test Plan - [x] `dagger call validate-workflows --src=.` passes (all 6 workflows OK) - [x] Pre-commit hooks pass - [ ] Reviewer: confirm `shutdown_timeout: 3h` addition to ConfigMap looks reasonable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/250	2026-02-22 17:38:32 -08:00
Erich Blume	d51c180fe6	Switch Frigate detection model from YOLO-NAS-S to YOLOv9-c (#246 ) ## Summary - Replace abandoned YOLO-NAS-S (320x320, `yolonas`) with YOLOv9-c (640x640, `yolo-generic`) - YOLOv9-c benefits from CUDA Graphs in Frigate 0.17 on the RTX 4080 - Add `export_yolov9` Dagger pipeline and `frigate-export-model` mise task for reproducible model exports - Model already deployed to `sifaka:/volume1/frigate/models/yolov9-c-640.onnx` ## Config changes - `model_type: yolonas` → `yolo-generic` - `input_dtype: int` → `float` - `width/height: 320` → `640` - `path:` → `yolov9-c-640.onnx` ## Deployment and Testing - [ ] Merge and sync Frigate ArgoCD app: `argocd app sync frigate` - [ ] Verify Frigate starts and detects objects at https://nvr.ops.eblu.me - [ ] Confirm GPU inference via Frigate system metrics Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/246	2026-02-22 15:14:45 -08:00
Erich Blume	2c081eed28	Add Forgejo repository health metrics and Grafana dashboard (#245 ) ## Summary - New `forgejo_metrics` Ansible role that queries the Forgejo REST API every 60s and writes Prometheus textfile metrics (open PRs, issues, languages, releases, commits, Actions runs/duration/success) - Grafana dashboard "Forgejo Repository Health" with 12 panels across 4 rows: overview stats, CI/CD health, repository info, and staleness tracking - Deletes superseded `forgejo-actions-dashboard` plan doc (this implementation covers a broader scope) ## Deployment and Testing - [ ] `mise run provision-indri -- --tags forgejo_metrics` to deploy the collector - [ ] `ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/forgejo.prom'` to verify metrics - [ ] `argocd app sync grafana-config` to deploy the dashboard - [ ] Check Grafana dashboard "Forgejo Repository Health" loads with data - [ ] `mise run services-check` passes Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/245	2026-02-22 11:16:03 -08:00
Forgejo Actions	c21cf54847	Update docs release to v1.11.1 - Built changelog from towncrier fragments [skip ci]	2026-02-22 10:21:19 -08:00
Erich Blume	c897fc8e1f	Use Zot registry icon on homepage dashboard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 09:19:46 -08:00
Forgejo Actions	627caeb61f	Update docs release to v1.11.0 - Built changelog from towncrier fragments [skip ci]	2026-02-22 09:16:00 -08:00
Erich Blume	529ba10939	Fix frigate-notify: webapi polling, dedup, hi-res snapshots (#242 ) ## Summary - Switch from MQTT to webapi polling (v0.5.4 requires only one method) - Poll every 15s for responsive alerts - `notify_once: true` — one notification per event instead of repeats as object changes zones - `nosnap: drop` — skip events without snapshots (was causing all events to be dropped on v0.3.5) - `snap_hires: true` — use recording stream for higher quality snapshot images ## Deployment and Testing - [ ] Sync: `argocd app set frigate --revision fix/frigate-notify-config && argocd app sync frigate` - [ ] Verify pod starts: `kubectl --context=k3s-ringtail -n frigate get pods -l app=frigate-notify` - [ ] Check logs for successful startup and event processing (no "No snapshot" drops) - [ ] Wait for a motion event and confirm single ntfy notification with hi-res snapshot - [ ] After merge: `argocd app set frigate --revision main && argocd app sync frigate` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/242	2026-02-22 09:05:45 -08:00
Erich Blume	7dcab826fa	Upgrade frigate-notify from v0.3.5 to v0.5.4 (#241 ) ## Summary - Service review: upgrade frigate-notify from v0.3.5 to v0.5.4 - No breaking changes for current MQTT + ntfy config - Notable additions: high-res snapshots, MQTT topic parsing fixes, env var parsing fixes ## Deployment and Testing - [ ] Sync frigate app on ringtail: `argocd app set frigate --revision review/frigate-notify-v0.5.4 && argocd app sync frigate` - [ ] Verify pod starts cleanly: `kubectl --context=k3s-ringtail -n frigate get pods` - [ ] Trigger a test alert (motion event) and confirm ntfy notification arrives - [ ] After merge: `argocd app set frigate --revision main && argocd app sync frigate` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/241	2026-02-22 08:42:47 -08:00
Erich Blume	07fb48626d	Add Authentik SSO integration for Jellyfin (#239 ) ## Summary - Add Authentik OIDC provider + application for Jellyfin via blueprint (all authenticated users allowed, no policy binding) - Wire `jellyfin-client-secret` through ExternalSecret and Authentik worker deployment - Install [jellyfin-plugin-sso](https://github.com/9p4/jellyfin-plugin-sso) v4.0.0.3 via Ansible, with OIDC config template - Authentik `admins` group maps to Jellyfin administrator role - Local login left enabled; SSO is additive ## Deployment and Testing - [ ] Sync ArgoCD `authentik` app on branch — verify provider + application appear in Authentik admin - [ ] `mise run provision-indri -- --tags jellyfin --check --diff` (dry run) - [ ] `mise run provision-indri -- --tags jellyfin` (deploy plugin + config) - [ ] Test SSO flow: `https://jellyfin.ops.eblu.me/sso/OID/start/authentik` - [ ] Verify `eblume` account auto-links via `preferred_username` match - [ ] Verify admins group → Jellyfin admin - [ ] Reset ArgoCD app revision to main after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/239	2026-02-21 20:05:44 -08:00
Erich Blume	e1c2892878	Fix container tags deleted during old-tag cleanup Five container manifests were removed when deleting old-style tags (shared digests). Rebuild on `a72a0d8` and update references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 16:26:29 -08:00

1 2 3 4 5

230 commits