Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) #286

Merged
eblume merged 9 commits from feature/otel-tracing into main 2026-03-05 10:51:07 -08:00

9 commits

Author SHA1 Message Date
5bc8b7ed8c Fix local-blocks processor: add traces_storage path
The local-blocks processor requires its own dedicated traces WAL
(traces_storage.path), separate from the ingester WAL and the
metrics generator WAL. Without it, the processor fails with
"local blocks processor requires traces wal".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:42:43 -08:00
7b8c50a107 Enable local-blocks processor for Grafana Traces Drilldown
Required by Grafana's TraceQL metrics queries. Keeps recent
traces in memory for query-time aggregation without
duplicating data to storage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:27:00 -08:00
dfef127233 Add Tempo health dashboard to Grafana
Panels: Storage Used, PVC Utilization (% of 10Gi), Total
Blocks, Heap Usage, Storage Over Time, Span Ingestion Rate,
Ingestion Throughput, and Query Latency (p50/p95).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:19:03 -08:00
3eb1bb70a0 Document Tempo storage monitoring query
Add PromQL query for checking Tempo storage utilization
against PVC capacity using tempodb_backend_bytes_total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:16:58 -08:00
309119f7a4 Add Tempo and alloy-tracing-ringtail to service registry and docs
Updates service-versions.yaml, Grafana datasources table,
ArgoCD apps registry, and Tempo image version to 2.10.1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:09:09 -08:00
3512eb10b6 Add Beyla eBPF tracing DaemonSet for ringtail
Deploys a privileged Alloy DaemonSet on ringtail's k3s that
uses Beyla eBPF to auto-instrument HTTP services (Frigate,
ntfy, Ollama, Immich) without code changes. Traces are
exported via OTLP HTTP to Tempo on indri.

Separate from the existing unprivileged alloy-ringtail to
preserve least-privilege for metrics/logs collection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:08:17 -08:00
ba1c0645e3 Add Tempo datasource to Grafana and Prometheus scrape target
Grafana: Tempo datasource with trace-to-log (Loki) and
trace-to-metrics (Prometheus) correlation. Loki gets
derivedFields to link trace IDs back to Tempo.

Prometheus: scrape Tempo operational metrics on port 3200.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:06:18 -08:00
3fc06cda88 Add Tempo manifests and ArgoCD Application
Deploys Grafana Tempo 2.10.1 on minikube-indri for distributed
trace storage. Includes OTLP receivers (gRPC + HTTP), local
filesystem storage with 7d retention, and metrics_generator
that remote-writes span-metrics to Prometheus.

Two Tailscale Ingresses: tempo (query API) and tempo-otlp
(OTLP HTTP receiver for cross-cluster trace ingestion).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:03:17 -08:00
3be9fa948c Add Tempo reference doc and changelog fragment (docs-first)
Tempo is the new distributed tracing backend for BlumeOps,
completing the third observability pillar alongside Prometheus
(metrics) and Loki (logs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:01:47 -08:00