Fix Frigate Prometheus metrics & rebuild Grafana dashboard (#252)

## Summary

- **Prometheus scrape target:** Changed from `frigate.frigate.svc.cluster.local:5000` (broken after ringtail migration) to `nvr.ops.eblu.me` via HTTPS through Caddy on indri
- **Grafana dashboard:** Rebuilt for Frigate 0.17 metrics — 12 panels total:
  - Row 1 (stats): Uptime, Inference Speed, Camera FPS, Detection FPS, GPU Usage, GPU Temp
  - Row 2 (timeseries): CPU Usage, Memory Usage
  - Row 3 (timeseries): Camera FPS + Skipped FPS, GPU Usage + Memory over time
  - Row 4 (timeseries): Storage Usage, Detection Events (rate by camera/label)

## Deployment and Testing

1. Sync prometheus app on branch:
   ```
   argocd app set prometheus --revision fix/frigate-metrics-dashboard && argocd app sync prometheus
   ```
2. Check `prometheus.ops.eblu.me/targets` — frigate job should show UP
3. Sync grafana-config:
   ```
   argocd app sync grafana-config
   ```
4. Check `grafana.ops.eblu.me` — Frigate NVR dashboard should show live data
5. After merge: reset both apps to `--revision main` and sync

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/252
This commit is contained in:
Erich Blume 2026-02-22 18:14:17 -08:00
commit 2c6c6a244a
3 changed files with 258 additions and 106 deletions

View file

@ -0,0 +1 @@
Fix Frigate Prometheus scrape target to route via Caddy (nvr.ops.eblu.me) after migration to ringtail, and rebuild Grafana dashboard with updated Frigate 0.17 metrics (GPU usage, temperature, skipped FPS, detection events).