## Summary
Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki).
- **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus
- **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes
- **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking
- **Prometheus** scrapes Tempo operational metrics
### Architecture
```
ringtail (k3s) indri (minikube)
┌──────────────────────┐ ┌─────────────────────┐
│ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │
│ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │
│ Ollama, Immich │ │ ↳ RED → Prometheus │
└──────────────────────┘ │ │
│ Grafana │
│ ↳ Tempo datasource │
└─────────────────────┘
```
### New files (12)
- `docs/reference/services/tempo.md` — reference doc
- `docs/changelog.d/feature-otel-tracing.feature.md`
- `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files)
- `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files)
### Modified files (6)
- `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields
- `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target
- `service-versions.yaml` — tempo + alloy-tracing-ringtail entries
- `docs/reference/services/grafana.md` — Tempo in datasources table
- `docs/reference/reference.md` — Tempo in services index
- `docs/reference/operations/observability.md` — Tempo in components list
## Deployment and Testing
- [ ] Sync `apps` app to pick up new Application definitions
- [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo`
- [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo`
- [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready`
- [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring`
- [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail`
- [ ] Check Beyla discovery in alloy-tracing logs on ringtail
- [ ] Sync grafana-config for updated datasources
- [ ] Sync prometheus for updated scrape config
- [ ] Test Grafana Tempo datasource connection
- [ ] Generate test traffic and search traces in Grafana Explore → Tempo
- [ ] After merge: reset all ArgoCD app revisions back to main
Reviewed-on: #286
2.4 KiB
| title | modified | tags | ||
|---|---|---|---|---|
| Tempo | 2026-03-05 |
|
Grafana Tempo
Distributed tracing backend for BlumeOps infrastructure. Receives traces via OTLP, stores them locally, and generates RED metrics (rate, error, duration) for prometheus.
Quick Reference
| Property | Value |
|---|---|
| URL | https://tempo.ops.eblu.me (when Caddy route added) |
| Tailscale URL | https://tempo.tail8d86e.ts.net |
| OTLP Endpoint | https://tempo-otlp.tail8d86e.ts.net |
| Namespace | monitoring |
| Image | grafana/tempo:2.10.1 |
| Storage | 10Gi PVC (local filesystem) |
| Retention | 7 days |
Architecture
- Single-node deployment with local filesystem storage
- OTLP receivers: gRPC (4317) and HTTP (4318)
metrics_generatorproduces span-metrics and service-graphs, remote-written to prometheus- Queried via grafana Tempo datasource
- Two Tailscale Ingresses: one for query API (3200), one for OTLP HTTP receiver (4318)
Trace Sources
From ringtail (via Beyla eBPF in Alloy):
| Service | Protocol | Coverage |
|---|---|---|
| frigate | HTTP REST | Request rate, error rate, latency, trace spans |
| ntfy | HTTP | Same |
| ollama | HTTP REST | Same (model inference latency) |
| immich | HTTP REST | Same |
Beyla auto-instruments HTTP services via eBPF kernel hooks — no code changes needed. MQTT (Mosquitto) is not instrumented (no eBPF parser for MQTT).
Future: SDK instrumentation Services with OTel SDK support (e.g., Hermes) can send traces directly to the OTLP endpoint for deeper internal spans (DB queries, business logic) alongside eBPF envelope traces.
Storage Monitoring
Tempo exposes tempodb_backend_bytes_total via its /metrics endpoint (scraped by prometheus). To check storage utilization against the 10Gi PVC:
tempodb_backend_bytes_total / 10737418240 * 100
Full PVC-level monitoring (via kubelet volume stats) is not yet available — see backlog.
Grafana Integration
- Tempo datasource with trace-to-log and trace-to-metrics correlation
- Service map and node graph visualization
- Loki derived fields link trace IDs in logs back to Tempo
Related
- alloy - Trace collector (Beyla eBPF on ringtail)
- prometheus - Receives span-metrics from Tempo
- loki - Log correlation via trace IDs
- grafana - Trace visualization