blumeops

Author	SHA1	Message	Date
Erich Blume	b876e39981	Replace Homepage Helm chart with kustomize manifests and custom Dockerfile (#221 ) ## Summary - Replace third-party Helm chart (jameswynn/homepage v2.1.0, pinned at app v1.2.0) with plain kustomize manifests and a custom Dockerfile building from forge mirror at v1.10.1 - Adds Dockerfile (`containers/homepage/`) with multi-stage build (node:22-slim builder, node:22-alpine runtime) - Creates kustomize manifests: Deployment, Service, ConfigMap (6 config files), ServiceAccount, ClusterRole, ClusterRoleBinding - Keeps existing ingress-tailscale.yaml and all 6 ExternalSecret resources unchanged - Updates ArgoCD app definition from multi-source Helm to single directory source ## Prerequisite - Homepage source mirrored at forge.ops.eblu.me/eblume/homepage.git ✅ - Container must be built and pushed before syncing: `mise run container-release homepage v1.10.1` ## Deployment and Testing - [ ] Build and push container image: `mise run container-release homepage v1.10.1` - [ ] Branch-test via ArgoCD: `argocd app set homepage --revision feature/homepage-kustomize && argocd app sync homepage` - [ ] Verify dashboard loads at go.ops.eblu.me / go.tail8d86e.ts.net - [ ] Verify k8s autodiscovery works (services appear on dashboard) - [ ] Verify widgets load (weather, Forgejo, Jellyfin, etc.) - [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/221	2026-02-19 18:29:19 -08:00
Erich Blume	cabd0bc9cf	Update Frigate zone masks and expand alert notifications (#219 ) ## Summary - Synced driveway_entrance zone coordinates from live Frigate config (adjusted mask boundaries) - Added `inertia: 3` and `loitering_time: 0` to driveway_entrance zone - Expanded review alerts to require either `driveway_entrance` or `driveway` zone (was entrance only) - Updated frigate-notify config to allow alerts from both `driveway_entrance` and `driveway` zones ## Deployment and Testing - [ ] Merge and sync frigate ArgoCD app on ringtail - [ ] Sync frigate-notify (restart pod to pick up ConfigMap change) - [ ] Verify alerts fire for person/car in driveway zone Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/219	2026-02-19 17:32:02 -08:00
Erich Blume	d5d32fe91f	Port Frigate NVR to ringtail k3s with GPU acceleration (#217 ) ## Summary - Enable NVIDIA container toolkit on ringtail NixOS and configure k3s containerd with nvidia runtime - Add NVIDIA device plugin ArgoCD app (RuntimeClass + DaemonSet) to expose `nvidia.com/gpu` resources - Re-target Frigate from indri minikube (arm64, ZMQ detector) to ringtail k3s (x86_64, TensorRT/ONNX) - Switch Frigate image to `-tensorrt` variant with GPU resource limits and increased shared memory ## Manual Prerequisites 1. NFS access: Verify ringtail can mount `sifaka:/volume1/frigate` ```fish ssh ringtail 'sudo mount -t nfs sifaka:/volume1/frigate /mnt/storage1 && ls /mnt/storage1 && sudo umount /mnt/storage1' ``` 2. YOLO model: Verify `/volume1/frigate/models/yolov9m.onnx` exists on sifaka ## Deployment Steps 1. Provision ringtail: `mise run provision-ringtail` 2. Sync ArgoCD apps: `argocd app sync apps --prune` 3. Deploy NVIDIA device plugin: `argocd app sync nvidia-device-plugin` 4. Verify GPU: `kubectl --context=k3s-ringtail get nodes -o json \| jq '.items[].status.capacity'` 5. Deploy Frigate: `argocd app sync frigate` ## Verification - [ ] `nvidia.com/gpu: 1` visible in node capacity - [ ] Frigate pod running with GPU allocated - [ ] Frigate UI loads at `https://nvr.ops.eblu.me` - [ ] Detector shows ONNX/TensorRT on System page - [ ] Camera feed with bounding boxes in live view - [ ] TensorRT engine build completes (watch logs on first start) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/217	2026-02-19 14:27:04 -08:00
Erich Blume	16a4a9a616	Port Mosquitto and ntfy to ringtail k3s, retire Apple Silicon Detector (#216 ) ## Summary - Delete `ansible/roles/frigate_detector/` and remove from indri playbook — the Apple Silicon Detector is retired - Move Mosquitto (MQTT) ArgoCD app from indri minikube to ringtail k3s - Move ntfy ArgoCD app from indri minikube to ringtail k3s - Update Frigate docs to reflect detector removal and planned RTX 4080 migration - Manifests are reused as-is (same `argocd/manifests/mosquitto/` and `argocd/manifests/ntfy/`), just pointed at ringtail ## Deployment After merge: 1. Sync indri ArgoCD `apps` app with prune to remove old mosquitto/ntfy apps: ``` argocd app sync apps --prune ``` 2. Sync new ringtail apps: ``` argocd app sync mosquitto-ringtail argocd app sync ntfy-ringtail ``` 3. Manually clean up the detector LaunchAgent on indri: ``` ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist' ssh indri 'rm ~/Library/LaunchAgents/mcquack.eblume.frigate-detector.plist' ``` ## Notes - Frigate on indri will lose MQTT/ntfy connectivity — this is expected (user confirmed no downtime concerns) - ntfy Tailscale Ingress hostname `ntfy` will transfer from indri ProxyGroup to ringtail ProxyGroup - Caddy on indri proxies `ntfy.ops.eblu.me` → `ntfy.tail8d86e.ts.net`, so no Caddy changes needed - Frigate + frigate-notify will be ported to ringtail in a follow-up PR 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/216	2026-02-19 11:22:44 -08:00
Erich Blume	61ca1ca305	Deploy Tailscale operator on ringtail k3s cluster (#215 ) ## Summary - Extract shared Tailscale operator resources (CRDs, RBAC, Deployment, ProxyClass, DNSConfig) into `tailscale-operator-base/` so both clusters reference the same manifests - Add `tailscale-operator-ringtail/` overlay with 1-replica ProxyGroup and ExternalSecret for the shared OAuth client - Add ArgoCD Application targeting `ringtail.tail8d86e.ts.net:6443` - Update `.yamllint.yaml` ignore path for the moved `operator.yaml` ## Deployment and Testing - [ ] Sync `apps` app to pick up the new Application definition - [ ] `argocd app sync tailscale-operator-ringtail` - [ ] Verify ExternalSecret syncs: `kubectl --context=k3s-ringtail -n tailscale get externalsecret` - [ ] Verify operator pod runs: `kubectl --context=k3s-ringtail -n tailscale get pods` - [ ] Verify ProxyGroup ready: `kubectl --context=k3s-ringtail -n tailscale get proxygroups` - [ ] Verify indri operator still works: `argocd app diff tailscale-operator` - [ ] Check Tailscale admin for new operator device with `tag:k8s-operator` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/215	2026-02-19 09:33:05 -08:00
Erich Blume	918df9e642	Add k3s, 1Password Connect, and systemd nix-container-builder to ringtail (#209 ) ## Summary Extends ringtail from a desktop/gaming NixOS box into an infrastructure node with a k3s cluster, secrets management, and a Forgejo Actions runner for building containers with Nix. ### K3s cluster - Single-node k3s with Traefik/ServiceLB/metrics-server disabled (minimal footprint) - TLS SAN set to `ringtail.tail8d86e.ts.net` so ArgoCD on indri can manage it via Tailscale - Containerd registry mirrors pull through Zot on indri (`k3s-registries.yaml`) - Tailscale interface added to `trustedInterfaces` for cross-node ArgoCD access - `kubectl` added to system packages ### 1Password Connect + External Secrets Operator - Four new ArgoCD apps targeting `k3s-ringtail`: `1password-connect-ringtail`, `external-secrets-crds-ringtail`, `external-secrets-ringtail`, `external-secrets-config-ringtail` - Reuses the same Helm charts/values as indri, just pointed at ringtail's k3s API server - Bootstrap secrets (`op-credentials`, `onepassword-token`) provisioned by Ansible pre_tasks via `op read`, then applied to the `1password` namespace in post_tasks ### Systemd Forgejo Actions runner - Native `services.gitea-actions-runner` with `forgejo-runner` package — no DinD, no k8s pod, runs directly on the NixOS host - Label `nix-container-builder:host` — jobs execute on the host with `nix`, `skopeo`, `nodejs`, etc. in PATH - Registration token fetched from 1Password (`Forgejo Secrets/runner_reg`) by Ansible and written to `/etc/forgejo-runner/token.env` - Runner's dynamic user (`gitea-runner`) added to `nix.settings.trusted-users` for nix daemon access ### Nix container build workflow - New `.forgejo/workflows/build-container-nix.yaml` triggers on `-nix-v[0-9]` tags (e.g. `nettest-nix-v1.0.0`) - Builds with `nix build -f containers/<name>/default.nix`, pushes to Zot via `skopeo copy` - Existing Dockerfile workflow guarded with `if: !contains(github.ref_name, '-nix-v')` to avoid double-triggering ### Mise task updates - `container-tag-and-release` auto-detects `default.nix` vs `Dockerfile` and uses the appropriate tag format (`-nix-v` vs `-v`) - `container-list` shows build type indicator (`[nix]` / `[dockerfile]`) ## Post-merge 1. `mise run provision-ringtail` — deploys k3s token, runner token, NixOS rebuild 2. Register k3s cluster in ArgoCD (first time only): ```fish ssh ringtail 'sudo cat /etc/rancher/k3s/k3s.yaml' \| \ sed 's\|127.0.0.1\|ringtail.tail8d86e.ts.net\|' > /tmp/k3s-ringtail.yaml set -x KUBECONFIG /tmp/k3s-ringtail.yaml argocd cluster add default --name k3s-ringtail 3. Sync ArgoCD apps in order: 1password-connect-ringtail -> external-secrets-crds-ringtail -> external-secrets-ringtail -> external-secrets-config-ringtail 4. Verify runner: ssh ringtail 'systemctl status gitea-runner-nix-container-builder' 5. Check Forgejo admin panel for ringtail-nix-builder runner online 6. Test: create containers/<name>/default.nix, tag with <name>-nix-v0.1.0 Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/209	2026-02-18 21:15:30 -08:00
Erich Blume	5f9b024b4a	Add Apple Silicon ZMQ detector for Frigate (#206 ) ## Summary - New `frigate_detector` ansible role deploys the [apple-silicon-detector](https://github.com/frigate-nvr/apple-silicon-detector) as a LaunchAgent on indri - Switches Frigate from ONNX CPU detector (~117ms) to ZMQ detector backed by CoreML/Neural Engine (~15ms) - Removes detect FPS cap (no longer needed with fast inference) - Updates Frigate docs and adds changelog fragment ## Deployment ### Phase 1: Deploy detector on indri (one-time setup + ansible) ```fish ssh indri 'git clone https://github.com/frigate-nvr/apple-silicon-detector.git ~/code/3rd/apple-silicon-detector' ssh indri 'cd ~/code/3rd/apple-silicon-detector && make install' mise run provision-indri -- --tags frigate_detector --check --diff # dry run mise run provision-indri -- --tags frigate_detector # apply ssh indri 'launchctl list mcquack.eblume.frigate-detector' # verify running ssh indri 'tail ~/Library/Logs/mcquack.frigate-detector.out.log' # verify bound ``` ### Phase 2: Test connectivity ```fish kubectl --context=minikube-indri -n frigate exec deploy/frigate -- nc -vz host.minikube.internal 5555 ``` ### Phase 3: Deploy Frigate config (branch workflow) ```fish argocd app set frigate --revision feature/frigate-zmq-detector && argocd app sync frigate ``` ### Phase 4: Post-deploy checks - [ ] Pod starts, no config errors - [ ] `/api/stats` shows detector type zmq, inference_speed ~15ms - [ ] detect_fps uncapped - [ ] Recordings and MQTT events flowing - [ ] After merge: `argocd app set frigate --revision main && argocd app sync frigate` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/206	2026-02-17 19:03:28 -08:00
Erich Blume	f45897b7c7	Upgrade Frigate 0.16.4 → 0.17.0-rc2 (#205 ) ## Summary - Bump Frigate image from `0.16.4-standard-arm64` to `0.17.0-rc2-standard-arm64` - Adapt `record` config to 0.17 schema: `retain.days`/`mode: all` → `continuous.days` - Update service docs and version tracker This is the first step toward the Apple Silicon ZMQ detector. The existing ONNX detector is kept so we can validate the upgrade independently. ## What is NOT changing - Detector config (still `type: onnx` with YOLO-NAS-s) - go2rtc streams, MQTT, cameras, zones, review rules - frigate-notify, storage PVs, Grafana dashboard ## Deployment and Testing - [ ] `argocd app set frigate --revision upgrade-frigate-0.17 && argocd app sync frigate` - [ ] Pod starts, `/api/version` returns `0.17.0-rc2` - [ ] No config errors in pod logs - [ ] Frigate web UI loads at `https://nvr.ops.eblu.me` - [ ] Live view works, detection running (`/api/stats` shows `detection_fps > 0`) - [ ] Recordings being created (`/api/recordings/summary`) - [ ] MQTT events flowing (check frigate-notify logs) - [ ] After merge: `argocd app set frigate --revision main && argocd app sync frigate` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/205	2026-02-17 16:56:12 -08:00
Erich Blume	acd213559e	Fix frigate live view by capping detect FPS (#204 ) ## Summary - Cap detect FPS to 2 to prevent recording segment backlog from ONNX inference bottleneck (~750ms/frame on ARM64 CPU) - Sync motion masks from live config (added second mask area) - Update driveway_entrance zone coordinates from live config - Add explicit alert labels `[person, car]` while keeping `required_zones: [driveway_entrance]` ## Context The "No frames have been received" error on the gablecam live view was caused by the detect stream falling behind — ONNX YOLO-NAS-s takes ~750ms per inference on ARM64 CPU, but the sub-stream sends 5 FPS. This caused recording segments to pile up and the ffmpeg watchdog to repeatedly kill/restart the process, creating gaps in the live view. ## Test plan - [ ] Sync ArgoCD `frigate` app to branch and verify pod restarts cleanly - [ ] Check `/api/stats` — `skipped_fps` should drop significantly, `process_fps` should be close to 2 - [ ] Verify live view at https://nvr.ops.eblu.me/#gablecam no longer shows "No frames" error - [ ] Verify detections and alerts still work in the driveway_entrance zone 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/204	2026-02-17 16:18:02 -08:00
Erich Blume	105a2c8c08	Update External Secrets Helm chart 1.3.1 → 2.0.0 (#203 ) ## Summary - Bump External Secrets Operator Helm chart from `helm-chart-1.3.1` to `helm-chart-2.0.0` (operator v1.3.2) - Updates both the operator app and CRDs app `targetRevision` - No Helm values changes needed — `installCRDs`, `resources`, `webhook`, `certController` keys are unchanged ## Breaking changes in chart 2.0.0 - Removed providers: Alibaba and Device42 (unmaintained) — does not affect our 1Password setup - Templating engine v1 deprecated — our ExternalSecrets don't set `engineVersion`, so they use the default (v2) - Webhook `failurePolicy` for SecretStore is now dynamic ## Deployment 1. Sync CRDs first: `argocd app set external-secrets-crds --revision update/external-secrets-helm-2.0.0 && argocd app sync external-secrets-crds` 2. Sync operator: `argocd app set external-secrets --revision update/external-secrets-helm-2.0.0 && argocd app sync external-secrets` 3. Verify: `kubectl --context=minikube-indri -n external-secrets get pods` 4. After merge, set both apps back to `--revision main` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/203	2026-02-17 10:43:21 -08:00
Erich Blume	5fbe70d1ba	Port ntfy to locally built container image (#202 ) All checks were successful Build Container / build (push) Successful in 6m28s Details ## Summary - Add `containers/ntfy/Dockerfile` — three-stage build (Node web UI, Go+CGO server, Alpine runtime) pinned to commit SHA `a03a37fe` (v2.17.0), sourced from forge mirror - Update ntfy deployment image from `binwiederhier/ntfy:v2.17.0` to `registry.ops.eblu.me/blumeops/ntfy:v1.0.0` - Note fish shell in CLAUDE.md ## Deployment After merge, release the container image: ```fish mise run container-tag-and-release ntfy v1.0.0 ``` Then sync: ```fish argocd app sync ntfy ``` ## Test plan - [x] `docker build` succeeds - [x] `dagger call build --src=. --container-name=ntfy` succeeds (exit 0, container ID printed) - [x] `ntfy --help` works in built container - [ ] Tag and release `ntfy-v1.0.0` after merge - [ ] Verify ntfy pod starts with new image - [ ] Verify health endpoint responds at `ntfy.ops.eblu.me/v1/health` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/202	2026-02-17 10:18:20 -08:00
Erich Blume	3e604d8fdc	Review ntfy: upgrade to v2.17.0 and add reference docs (#201 ) ## Summary - Upgrade ntfy from v2.11.0 to v2.17.0 (6 minor releases, no breaking changes) - Add reference doc for ntfy service - Add reference doc for frigate service (ntfy's sole producer via frigate-notify) - Update reference index and service-versions.yaml tracking ## Notable upstream changes (v2.12.0–v2.17.0) - v2.14.0: Declarative users/ACL config in files - v2.15.0: `require-login` flag for topic-level auth - v2.16.0: Dead man's switch (heartbeat) notifications, notification update/delete - v2.17.0: Priority templating, crash fixes (nil pointer panics) ## Deployment and Testing - [ ] ArgoCD sync ntfy after merge - [ ] Verify ntfy pod healthy with new image - [ ] Send a test notification via `curl -d "test" https://ntfy.ops.eblu.me/test` - [ ] Verify frigate-notify still delivers alerts to ntfy Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/201	2026-02-17 09:51:40 -08:00
Forgejo Actions	530460171a	Update docs release to v1.9.4 - Built changelog from towncrier fragments [skip ci]	2026-02-17 07:30:39 -08:00
Forgejo Actions	8a48171acf	Update docs release to v1.9.3 - Built changelog from towncrier fragments [skip ci]	2026-02-16 21:25:47 -08:00
Erich Blume	627e2b7894	Add UniFi admin link to homepage dashboard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 19:15:46 -08:00
Erich Blume	d35c26d2b0	Fix mosquitto image tag: use 2.0.22 instead of nonexistent 2.1.2 (#198 ) ## Summary - The `eclipse-mosquitto:2.1.2` tag doesn't exist on Docker Hub — the 2.1.x series only publishes `-alpine` variants - Corrects the pinned tag to `2.0.22`, the latest non-alpine version (matching what the old floating `:2` tag was resolving to) - Updates tracking file and changelog fragment accordingly ## Context The previous PR #197 pinned mosquitto from floating `:2` to `2.1.2`, but the new pod failed with `ErrImagePull` ("manifest unknown"). The old pod is still running on `:2`. ## Test plan - [ ] Verify `eclipse-mosquitto:2.0.22` pulls successfully - [ ] Verify mosquitto pod restarts and passes readiness/liveness probes - [ ] `mise run services-check` passes Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/198	2026-02-16 17:19:32 -08:00
Erich Blume	0aab73af40	Bump mosquitto to 2.1.2 and tailscale-operator to v1.94.2 (#197 ) ## Summary - Pin mosquitto from floating `:2` tag to `2.1.2` (latest upstream, released Feb 9 2026) - Bump tailscale k8s-operator and proxy images from `v1.94.1` to `v1.94.2` - Record 7 reviewed services in `service-versions.yaml` (first service review pass) ## Services reviewed (11 total) \| Service \| Deployed \| Latest \| Status \| \|---------\|----------\|--------\|--------\| \| prometheus \| v3.9.1 \| v3.9.1 \| Current \| \| loki \| 3.6.5 \| 3.6.5 \| Current \| \| kube-state-metrics \| v2.18.0 \| v2.18.0 \| Current \| \| mosquitto \| :2 (floating) \| 2.1.2 \| Pinned in this PR \| \| frigate \| 0.16.4 \| 0.16.4 \| Current \| \| alloy-k8s \| v1.13.1 \| v1.13.1 \| Current \| \| tailscale-operator \| v1.94.1 \| v1.94.2 \| Bumped in this PR \| \| ntfy \| v2.11.0 \| v2.17.0 \| Stale (future PR) \| \| frigate-notify \| v0.3.5 \| v0.5.4 \| Stale (future PR) \| \| homepage \| chart 2.1.0 \| app v1.10.1 \| Stale (future PR) \| \| grafana \| chart 8.8.2 \| chart 10.5.15 \| Stale (future PR) \| ## Deployment and Testing - [ ] `argocd app sync apps` - [ ] `argocd app set mosquitto --revision service-review/mosquitto-tailscale-operator && argocd app sync mosquitto` - [ ] `argocd app set tailscale-operator --revision service-review/mosquitto-tailscale-operator && argocd app sync tailscale-operator` - [ ] Verify mosquitto pod restarts with pinned image - [ ] Verify tailscale operator and proxy pods update - [ ] `mise run services-check` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/197	2026-02-16 17:14:38 -08:00
Forgejo Actions	994bed0693	Update docs release to v1.9.2 - Built changelog from towncrier fragments [skip ci]	2026-02-16 15:51:12 -08:00
Erich Blume	74294094e3	Fix navidrome custom container image v1.0.2 (#194 ) ## Summary - Switch navidrome deployment from upstream `deluan/navidrome:0.60.3` back to custom image `registry.ops.eblu.me/blumeops/navidrome:v1.0.2` - The v1.0.1 image was tagged before the `USER 65534` removal commit, so it still ran as a non-root user that couldn't write to the SQLite data directory - v1.0.2 is built from current main which includes both the `zlib-dev` build fix and the non-root user removal ## Deployment and Testing - [ ] Wait for CI to build `navidrome:v1.0.2` image - [ ] Sync via ArgoCD and verify pod starts without CrashLoopBackOff - [ ] Verify navidrome UI accessible at https://navidrome.ops.eblu.me Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/194	2026-02-16 08:24:33 -08:00
Erich Blume	7ffbd12ac8	Fix Frigate parked car re-detection and enable writable config (#193 ) All checks were successful Build Container / build (push) Successful in 12s Details ## Summary - Remove car-specific `max_frames: 150` which was causing a forget-and-re-detect loop on parked cars (every ~30 seconds at 5fps) - Set `stationary.interval: 0` so Frigate never re-runs detection on stationary objects - Replace read-only configmap subPath mount with initContainer + emptyDir, so Frigate UI changes (zones, masks) persist at runtime ## Context Frigate was spamming notifications because `max_frames` for cars caused it to "forget" a parked car after 150 frames, then immediately re-detect it as a brand new object. The fix follows [Frigate's official parked cars guide](https://docs.frigate.video/guides/parked_cars/). The writable config change also unblocks using `required_zones` for car alerts — zones can now be drawn in the Frigate UI and will survive until pod reschedule (at which point they should be baked into the configmap via IaC). ## Test plan - [ ] Sync frigate app via ArgoCD and verify pod starts with initContainer - [ ] Confirm parked cars no longer trigger repeated alerts - [ ] Draw a zone/mask in Frigate UI, save, verify it persists after Frigate restart - [ ] Set up `driveway_entrance` required zone for car alerts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/193	2026-02-15 17:48:14 -08:00
Erich Blume	6c41338b36	Revert navidrome to upstream image pending container fix Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 08:22:04 -08:00
Erich Blume	accbb80683	Update navidrome image to v1.0.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 08:18:17 -08:00
Erich Blume	996441876d	Document container build pattern and port navidrome (#192 ) Some checks failed Build Container / build (push) Failing after 4m28s Details ## Summary - Add how-to guide (`docs/how-to/build-container-image.md`) covering the full container build workflow: directory layout, Dagger local builds, mise release task, and common patterns with links to existing containers - Port navidrome from upstream `deluan/navidrome:0.60.3` to a custom three-stage build (`containers/navidrome/Dockerfile`) using Node + Go + Alpine - Update navidrome deployment to use `registry.ops.eblu.me/blumeops/navidrome:v1.0.0` ## Deployment and Testing - [x] `dagger call build --src=. --container-name=navidrome` builds successfully - [ ] After merge: `mise run container-tag-and-release navidrome v1.0.0` - [ ] After image published: `argocd app sync navidrome` and verify pod starts Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/192	2026-02-15 08:05:11 -08:00
Forgejo Actions	26c1ff5ce6	Update docs release to v1.9.1 - Built changelog from towncrier fragments [skip ci]	2026-02-15 07:43:00 -08:00
Erich Blume	22f418d0dc	Doc review: connect-to-postgres, create-release-artifact-workflow, deploy-k8s-service (#191 ) ## Summary Review session covering 3 docs, plus a codebase-wide cleanup: ### Docs reviewed - connect-to-postgres — verified end-to-end (psql connection tested), stamped - create-release-artifact-workflow — clarified that `build-blumeops.yaml` is only a version bump example (not a packages API example) - deploy-k8s-service — fixed stale repoURL (`indri:2200` → `forge.ops.eblu.me:2222`), wrong Caddy config keys (`upstream` → `backend`, added missing `host`), updated Homepage group to "Services", added Tailscale tag documentation ### Codebase cleanup - Migrated all remaining `op item get --fields` calls to `op read` URI syntax across 7 files (docs, READMEs, YAML comments) - Simplified the `op read` vs `op item get` guidance in CLAUDE.md ## Side findings (not addressed) - New `immich-pg` CNPG cluster not yet documented in the postgresql reference card ## Test plan - [x] `psql` connection to `pg.ops.eblu.me` verified - [x] All pre-commit hooks pass - [x] `docs-check-links`, `docs-check-index`, `docs-check-frontmatter` pass Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/191	2026-02-15 07:42:01 -08:00
Forgejo Actions	b2b5879e3c	Update docs release to v1.9.0 - Built changelog from towncrier fragments [skip ci]	2026-02-14 21:32:27 -08:00
Erich Blume	04c7f3c45a	Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify (#190 ) ## Summary Deploy a cloud-free NVR stack for the GableCam (ReoLink Elite Floodlight at 192.168.1.159): - Mosquitto — shared MQTT broker in `mqtt` namespace (cluster-internal, no auth) - Ntfy — self-hosted push notifications in `ntfy` namespace, exposed at `ntfy.tail8d86e.ts.net` / `ntfy.ops.eblu.me` - Frigate — NVR with GableCam via HTTP-FLV, ONNX CPU detection, NFS recordings on sifaka, exposed at `nvr.tail8d86e.ts.net` / `nvr.ops.eblu.me` - frigate-notify — bridges Frigate detection events (person, car, dog, cat) to Ntfy alerts via MQTT Also includes: - Prometheus scrape target for Frigate metrics - Grafana dashboard for Frigate (status, inference speed, FPS, CPU/memory, storage) - Caddy reverse proxy entries for `nvr.ops.eblu.me` and `ntfy.ops.eblu.me` ## Prerequisites - [ ] Create NFS share `frigate` on sifaka (`/volume1/frigate`, RW for indri) - [ ] Create 1Password item "Reolink Floodlight Camera" in `blumeops` vault with `username` and `password` fields ## Deployment (after merge) ```bash argocd app sync apps argocd app sync mosquitto argocd app sync ntfy argocd app sync frigate argocd app sync grafana-config argocd app sync prometheus mise run provision-indri -- --tags caddy mise run services-check ``` ## Verification - [ ] Mosquitto pod running, accepting connections on 1883 - [ ] Ntfy web UI accessible at `ntfy.ops.eblu.me` - [ ] Frigate web UI at `nvr.ops.eblu.me` showing GableCam live feed - [ ] Object detection working (ONNX, person/car/dog/cat) - [ ] Recordings appearing in NFS share on sifaka - [ ] frigate-notify sending detection alerts to Ntfy - [ ] Prometheus scraping Frigate metrics - [ ] Grafana dashboard showing Frigate data Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/190	2026-02-14 21:27:44 -08:00
Erich Blume	b77ae19f20	Fix 1Password Connect credentials for chart 2.3.0 Chart 2.3.0 mounts credentials as a file with standard k8s base64 encoding. The old double-encoding workaround (credentials-base64 in stringData) now produces invalid JSON. Use raw JSON (credentials-file) instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 17:30:45 -08:00
Erich Blume	8f4708e26f	Fix navidrome image tag: remove v prefix (0.60.3 not v0.60.3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 17:23:12 -08:00
Erich Blume	b3747f6c95	Tier 1 version bumps (#186 ) All checks were successful Build Container / build (push) Successful in 8s Details ## Summary Audit and upgrade of all deployed images, helm charts, and custom container Dockerfiles to latest stable versions. This PR covers Tier 1 (low-risk minor/patch bumps only). ### Upstream images \| Image \| Old \| New \| \|-------\|-----\|-----\| \| kube-state-metrics \| v2.13.0 \| v2.18.0 \| \| prometheus \| v3.2.1 \| v3.9.1 \| \| loki \| 3.3.2 \| 3.6.5 \| \| alloy \| v1.5.1 \| v1.13.1 \| \| tailscale (proxy + operator) \| v1.92.5 \| v1.94.1 \| \| navidrome \| :latest \| v0.60.3 (pinned) \| ### Helm charts \| Chart \| Old \| New \| \|-------\|-----\|-----\| \| CloudNativePG \| v0.27.0 \| v0.27.1 \| \| 1Password Connect \| 2.2.1 \| 2.3.0 \| ### Custom containers (Dockerfiles updated, images not yet tagged) \| Container \| Changes \| New tag \| \|-----------\|---------\|---------\| \| miniflux \| 2.2.16→2.2.17 (security), alpine 3.22 \| v1.1.0 \| \| kubectl \| v1.34.1→v1.34.4, alpine 3.22 \| v1.1.0 \| \| kiwix-serve \| alpine 3.22 \| v1.1.0 \| \| nettest \| alpine 3.22 \| v0.14.0 \| \| transmission \| alpine 3.22, pkg 4.0.6-r4 \| v1.1.0 \| All custom containers verified with local `dagger call build`. ### Deferred to Tier 2 (separate PRs) - Forgejo runner 6→12 (major version scheme change) - Docker DinD 27→29 - Grafana chart 8→11 (repo migration) - External Secrets 1→2 (breaking changes) - Python 3.12→3.13, Elixir 1.18→1.19, Node 22→24 - Transmission 4.0.6→4.1.0 (not in Alpine yet) ## Deployment After merge: 1. Tag custom containers: `mise run container-tag-and-release <name> <version>` for each 2. Wait for CI builds to complete 3. `argocd app sync apps` then sync individual apps, or let ArgoCD auto-detect Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/186	2026-02-13 17:16:37 -08:00
Erich Blume	d5c00192d5	Configure DinD to use Zot as pull-through registry mirror (#183 ) ## Summary - Add `daemon.json` with `registry-mirrors` to the forgejo-runner ConfigMap, pointing DinD at `http://host.minikube.internal:5050` - Mount `daemon.json` into the DinD sidecar at `/etc/docker/daemon.json` via `subPath` - Docker Hub pulls during Dagger CI builds will now route through Zot's pull-through cache, reducing bandwidth and avoiding rate limits ## Deployment and Testing - [ ] `argocd app sync forgejo-runner` - [ ] Exec into DinD container: `docker info` should show the registry mirror - [ ] Trigger a workflow build and check Zot logs for cache hits Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/183	2026-02-13 12:36:03 -08:00
Erich Blume	ba9b251759	Update forgejo-runner image to v3.2.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 12:16:52 -08:00
Erich Blume	d0c18043b7	Revert forgejo-runner image to v3.1.0 v3.2.0 build failed (GitHub download timeout), rolling back to working image while it rebuilds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 12:07:51 -08:00
Erich Blume	fdd3f6483a	Update forgejo-runner image to v3.2.0 All checks were successful Build Container / build (push) Successful in 7m31s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 11:08:57 -08:00
Forgejo Actions	02b1397f1a	Update docs release to v1.8.2 - Built changelog from towncrier fragments [skip ci]	2026-02-13 10:36:04 -08:00
Erich Blume	0098ac37e0	Move non-secret runner env vars to deployment spec (#181 ) ## Summary - Move FORGEJO_URL, RUNNER_NAME, and RUNNER_LABELS from ExternalSecret template to deployment env vars - ExternalSecret now only contains the actual secret (RUNNER_TOKEN) - Image version changes in RUNNER_LABELS now trigger automatic pod rollouts ## Deployment 1. Merge this PR 2. `argocd app sync forgejo-runner` — the deployment spec change will auto-roll the pod No manual restart needed — that's the whole point :) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/181	2026-02-13 10:29:23 -08:00
Erich Blume	52bbf88aa6	Update forgejo-runner image to v3.1.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 10:21:43 -08:00
Erich Blume	4942dee182	Update homepage layout for new Content/Misc groups Replace old Apps/Observability/Infrastructure layout entries with Content and Misc to match the recategorized ingress annotations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 09:16:40 -08:00
Erich Blume	ca6a845604	Move ArgoCD to Misc homepage group and rename ingress file ArgoCD's tailscale ingress was missed in the recategorization (filed as service-tailscale.yaml instead of ingress-tailscale.yaml). Fix the group annotation and rename the file to match the convention used by all other services. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 09:13:32 -08:00
Erich Blume	48ce5b4120	Recategorize homepage into Content and Misc groups (#179 ) ## Summary - Replace the three homepage groups (Apps, Observability, Infrastructure) with two cleaner groups - Content: Immich, Kiwix, Miniflux, DJ, Grafana - Misc: CV, TeslaMate, Transmission, Docs, Prometheus, PyPI ## Deployment and Testing - [ ] Sync affected ingresses via ArgoCD (all 11 services) - [ ] Verify homepage shows the two new groups correctly Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/179	2026-02-13 09:09:22 -08:00
Forgejo Actions	e21277ae83	Update docs release to v1.8.0 - Built changelog from towncrier fragments [skip ci]	2026-02-12 19:20:27 -08:00
Erich Blume	9c789a1868	Fix cache hit rate on APM and Fly.io dashboards (#177 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m19s Details ## Summary - Remove `match_all = true` from `flyio_nginx_cache_requests_total` in Alloy so the metric only counts requests that go through the proxy cache (excludes health checks with empty `cache_status`) - Change dashboard queries from `rate(...[5m])` to `increase(...[$__range])` — aggregates over the full dashboard time window instead of a 5-minute sliding window, giving meaningful ratios for low-traffic static sites - Add null/NaN value mapping to show "No traffic" in neutral color instead of blank/red ## Root cause Health check requests from Fly.io hit the default nginx server block (no `proxy_cache`), producing entries with empty `upstream_cache_status`. With `match_all = true`, these were counted in the cache metric, diluting the Fly.io dashboard ratio. For APM dashboards, `rate()[5m]` on low-traffic sites with 24h cache validity almost always returns either all-HITs (100%) or no data (blank → red background). ## Deployment - Fly.io proxy redeploy needed for Alloy config change - ArgoCD sync for dashboard ConfigMap changes ## Test plan - [ ] Redeploy Fly.io proxy - [ ] Sync grafana-config in ArgoCD - [ ] Verify CV APM cache hit ratio shows a real percentage (not 100%) - [ ] Verify Docs APM shows "No traffic" in neutral color when idle, real ratio when visited - [ ] Verify Fly.io proxy dashboard cache ratio excludes health checks Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/177	2026-02-12 18:40:48 -08:00
Erich Blume	9717863f65	Update CV release to v1.0.3, add X-Clacks-Overhead header (#176 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m5s Details ## Summary - Update CV release URL from v1.0.2 to v1.0.3 - Add `X-Clacks-Overhead: GNU Terry Pratchett` header to both `docs.eblu.me` and `cv.eblu.me` server blocks in the Fly.io proxy nginx config ## Deployment and Testing - [ ] Sync CV app: `argocd app sync cv` - [ ] Verify CV is serving v1.0.3 content - [ ] Deploy fly proxy (workflow or `mise run fly-deploy`) - [ ] Verify header: `curl -sI https://docs.eblu.me \| grep -i clacks` - [ ] Verify header: `curl -sI https://cv.eblu.me \| grep -i clacks` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/176	2026-02-12 17:08:22 -08:00
Erich Blume	ed5c9c9b48	Update CV release to v1.0.2 (#175 ) ## Summary - Update `CV_RELEASE_URL` in cv deployment from v1.0.1 to v1.0.2 ## Deployment and Testing - [ ] `argocd app sync cv` after merge - [ ] Verify cv.eblu.me serves updated content Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/175	2026-02-12 16:18:55 -08:00
Forgejo Actions	70d8881959	Update docs release to v1.7.1 - Built changelog from towncrier fragments [skip ci]	2026-02-12 14:13:12 -08:00
Erich Blume	7dc03c0af1	Add CV to services-check, update homepage link (#174 ) ## Summary - Add CV to services-check (tailnet endpoint + public cv.eblu.me) - Update CV homepage annotation to point to cv.eblu.me instead of cv.ops.eblu.me ## Deployment and Testing - [ ] `argocd app sync cv` (homepage link change) - [ ] `mise run services-check` passes Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/174	2026-02-12 14:10:03 -08:00
Erich Blume	df372fccb6	Expose CV publicly at cv.eblu.me (#173 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m57s Details ## Summary - Add nginx server block for `cv.eblu.me` (static site, same pattern as docs) - Add DNS CNAME record in Pulumi (`cv.eblu.me` → `blumeops-proxy.fly.dev`) - Add `cv.eblu.me` cert to `fly-setup` mise task - Tag CV Tailscale ingress with `tag:flyio-target` for ACL access - Remove `/_error` test endpoint from docs proxy ## Deployment and Testing - [ ] `argocd app set cv --revision cv/public-cv-eblu-me && argocd app sync cv` - [ ] `fly certs add cv.eblu.me -a blumeops-proxy` - [ ] `mise run fly-deploy` - [ ] Verify proxy: `curl -I -H "Host: cv.eblu.me" https://blumeops-proxy.fly.dev/` - [ ] `mise run dns-preview` then `mise run dns-up` - [ ] Verify live: `curl -I https://cv.eblu.me` - [ ] Merge, then `argocd app set cv --revision main && argocd app sync cv` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/173	2026-02-12 14:05:00 -08:00
Erich Blume	a68542a602	Update CV release to v1.0.1 (#172 ) ## Summary - Update `CV_RELEASE_URL` from v0.1.0 to v1.0.1 in the CV deployment manifest ## Deployment and Testing - [ ] `argocd app sync cv` after merge - [ ] Verify cv.ops.eblu.me serves updated resume Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/172	2026-02-12 13:38:05 -08:00
Forgejo Actions	200be39492	Update docs release to v1.7.0 - Built changelog from towncrier fragments [skip ci]	2026-02-12 11:46:38 -08:00
Erich Blume	01e19023ee	Add CV/resume web app at cv.ops.eblu.me (#169 ) ## Summary - nginx container (`containers/cv/`) downloads and serves a content tarball at startup (same pattern as quartz) - ArgoCD app + k8s manifests (deployment, service, Tailscale ingress) - Caddy route for `cv.ops.eblu.me` - Deploy workflow: resolves "latest" or specific version from Forgejo packages, updates deployment, syncs ArgoCD - Content is built and released from the separate [cv repo](https://forge.ops.eblu.me/eblume/cv) ## Deployment steps (after merge) 1. `mise run container-tag-and-release cv v1.0.0` 2. Run "Release CV" workflow in cv repo (SPECIFIC_VERSION `v0.1.0`) 3. Run "Deploy CV" workflow in blumeops (default: latest) 4. `mise run provision-indri -- --tags caddy` 5. Verify at `https://cv.ops.eblu.me/` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/169	2026-02-12 11:09:41 -08:00

1 2 3 4

172 commits