Observability cleanup and k8s service monitoring (#43) #43

Merged
eblume merged 9 commits from feature/observability-cleanup into main 2026-01-22 13:51:02 -08:00

9 commits

Author SHA1 Message Date
5f9ed589d4 Fix macOS dashboard instance variable to only show macOS hosts
Use node_memory_wired_bytes (macOS-specific metric) instead of
node_uname_info to populate the instance dropdown. This prevents
Linux hosts like sifaka from appearing in the macOS dashboard.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 13:44:43 -08:00
369a1aa881 Remove transmission metrics exporter (incompatible with Transmission 4)
The metalmatze/transmission-exporter is unmaintained and has JSON parsing
issues with Transmission 4's API changes. Removing:
- Exporter sidecar from transmission deployment
- Transmission dashboard from Grafana
- Prometheus scrape config for transmission
- Metrics port from transmission service

TODO: Write custom transmission exporter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 13:36:15 -08:00
9a69cbebec Enhance minikube dashboard with namespace filter and resource metrics
- Add namespace dropdown variable for filtering all panels
- Add kube-state-metrics for k8s resource tracking
- Show pods, deployments, statefulsets counts
- Show memory/CPU requests by namespace
- Show namespace resource summary table
- Filter pod logs by selected namespace
2026-01-22 13:26:39 -08:00
49cce87ff8 Add kube-state-metrics for k8s resource monitoring 2026-01-22 13:21:41 -08:00
0ec16dd72a Fix dashboard labels to match blackbox probe job names 2026-01-22 13:08:59 -08:00
2dc43c5ae1 Fix alloy-k8s: add pods/log permission for log collection 2026-01-22 13:07:11 -08:00
e8ec538f28 Fix alloy-k8s: add fsGroup for data directory permissions 2026-01-22 13:04:56 -08:00
b5bf773d95 Add transmission exporter, update dashboards for k8s services
- Add transmission-exporter sidecar for full metrics (speed, torrents, etc.)
- Add Prometheus scrape config for transmission metrics
- Update devpi dashboard to use blackbox probe metrics (status, response time, uptime)
- Restore transmission dashboard (will use exporter metrics)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 13:01:58 -08:00
d1007d29f1 Add k8s observability: Alloy for pod logs, service health probes
- Remove stale /opt/homebrew/var/loki from borgmatic backup (Loki migrated to k8s)
- Add Alloy k8s DaemonSet for automatic pod log collection to Loki
- Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd
- Replace stale devpi/transmission dashboards with unified services health dashboard
- The new Alloy k8s deployment auto-discovers all pods including new ones

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 12:51:54 -08:00