Add multi-cluster observability with ringtail metrics and dashboards #270

Merged
eblume merged 4 commits from feature/ringtail-metrics-dashboards into main 2026-02-25 22:01:01 -08:00

4 commits

Author SHA1 Message Date
55d1760c28 Revert "Fix numeric log levels showing as errors in Grafana"
This reverts commit 7a4719ed0c.
2026-02-25 22:00:20 -08:00
7a4719ed0c Fix numeric log levels showing as errors in Grafana
1Password Connect uses numeric log levels (1=error, 2=warn, 3=info,
4=debug) which Grafana's logs panel doesn't recognize, rendering all
lines with error styling. Add stage.replace rules in both Alloy
configs (indri + ringtail) to map numeric levels to standard strings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:57:46 -08:00
0e2ad47433 Ringtail dashboard: add host metrics, rename to general system health
Add prometheus.exporter.unix to ringtail Alloy with host /proc, /sys,
and rootfs mounts so node_* metrics flow from the NixOS host. Rewrite
the ringtail dashboard from k8s-only to full system health: uptime,
CPU usage by mode, memory usage, filesystem table, network traffic,
GPU overview, and k8s summary — matching the macOS dashboard pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:44:48 -08:00
cd832cceee Add multi-cluster observability with ringtail metrics and dashboards
Add cluster labels (indri/ringtail) to all Prometheus scrape jobs,
Alloy k8s remote_write and pod logs, and Alloy host metrics/logs.
Deploy kube-state-metrics and Alloy on ringtail's k3s cluster to
collect pod metrics and logs, remote-writing to indri's Prometheus
and Loki. Replace single-cluster minikube and services dashboards
with a multi-cluster Kubernetes dashboard (cluster + namespace
variables) and a dedicated Ringtail dashboard with GPU monitoring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:21:18 -08:00