Add multi-cluster observability with ringtail metrics and dashboards #270

Merged

eblume merged 4 commits from feature/ringtail-metrics-dashboards into main

2026-02-25 22:01:01 -08:00

Author	SHA1	Message	Date
Erich Blume	55d1760c28	Revert "Fix numeric log levels showing as errors in Grafana" This reverts commit `7a4719ed0c`.	2026-02-25 22:00:20 -08:00
Erich Blume	7a4719ed0c	Fix numeric log levels showing as errors in Grafana 1Password Connect uses numeric log levels (1=error, 2=warn, 3=info, 4=debug) which Grafana's logs panel doesn't recognize, rendering all lines with error styling. Add stage.replace rules in both Alloy configs (indri + ringtail) to map numeric levels to standard strings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 21:57:46 -08:00
Erich Blume	0e2ad47433	Ringtail dashboard: add host metrics, rename to general system health Add prometheus.exporter.unix to ringtail Alloy with host /proc, /sys, and rootfs mounts so node_* metrics flow from the NixOS host. Rewrite the ringtail dashboard from k8s-only to full system health: uptime, CPU usage by mode, memory usage, filesystem table, network traffic, GPU overview, and k8s summary — matching the macOS dashboard pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 21:44:48 -08:00
Erich Blume	cd832cceee	Add multi-cluster observability with ringtail metrics and dashboards Add cluster labels (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s remote_write and pod logs, and Alloy host metrics/logs. Deploy kube-state-metrics and Alloy on ringtail's k3s cluster to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki. Replace single-cluster minikube and services dashboards with a multi-cluster Kubernetes dashboard (cluster + namespace variables) and a dedicated Ringtail dashboard with GPU monitoring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 21:21:18 -08:00