Add Pyroscope server (StatefulSet on ringtail k3s) and Alloy profiling DaemonSet with pyroscope.ebpf collection. Grafana datasource with traces-to-profiles cross-linking. Docs updated across observability reference card, Alloy, Grafana, apps registry, and README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1.7 KiB
1.7 KiB
| title | modified | tags | |
|---|---|---|---|
| Observability | 2026-03-26 |
|
Observability
The four(+) pillars of observability — metrics, logs, traces, and profiles — collected and visualized via the Grafana ecosystem.
Components
| Pillar | Backend | Collector | Cluster |
|---|---|---|---|
| Metrics | prometheus | alloy | indri |
| Logs | loki | alloy | indri |
| Traces | tempo | alloy (Beyla eBPF) | indri (backend), ringtail (collection) |
| Profiles | pyroscope | alloy (pyroscope.ebpf) | ringtail |
All four are visualized in grafana with cross-signal linking (traces → logs, traces → profiles, traces → metrics).
Future: Frontend Monitoring (RUM)
Grafana Faro is a Real User Monitoring SDK that captures page loads, web vitals, errors, and network timings from the browser, feeding into Loki (logs) and Tempo (traces) via Alloy's faro.receiver component. This would add an "outside-in" view of service health from the user's perspective.
Not currently deployed. RUM captures browsing behavior from visitors to public services, creating a data retention liability. Would require careful sanitization before deploying.
Alerting
- deploy-infra-alerting - Alerting pipeline (Grafana Unified Alerting → ntfy)
- runbook-service-probe-failure - Service health check failure runbook
- runbook-postgres-unhealthy - PostgreSQL cluster health runbook
- runbook-pod-not-ready - Pod not ready runbook
- runbook-textfile-stale - Metrics textfile freshness runbook
- runbook-frigate-camera-down - Frigate camera health runbook
- runbook-argocd-out-of-sync - ArgoCD sync status runbook