blumeops/docs/changelog.d
Erich Blume c281fb5403 Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286)
## Summary

Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki).

- **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus
- **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes
- **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking
- **Prometheus** scrapes Tempo operational metrics

### Architecture

```
ringtail (k3s)                                indri (minikube)
┌──────────────────────┐                      ┌─────────────────────┐
│ Alloy+Beyla (eBPF)   │──OTLP HTTP────────→ │ Tempo               │
│  ↳ Frigate, ntfy,    │  via tailnet         │  ↳ trace storage    │
│    Ollama, Immich     │                      │  ↳ RED → Prometheus │
└──────────────────────┘                      │                     │
                                              │ Grafana             │
                                              │  ↳ Tempo datasource │
                                              └─────────────────────┘
```

### New files (12)
- `docs/reference/services/tempo.md` — reference doc
- `docs/changelog.d/feature-otel-tracing.feature.md`
- `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files)
- `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files)

### Modified files (6)
- `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields
- `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target
- `service-versions.yaml` — tempo + alloy-tracing-ringtail entries
- `docs/reference/services/grafana.md` — Tempo in datasources table
- `docs/reference/reference.md` — Tempo in services index
- `docs/reference/operations/observability.md` — Tempo in components list

## Deployment and Testing

- [ ] Sync `apps` app to pick up new Application definitions
- [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo`
- [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo`
- [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready`
- [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring`
- [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail`
- [ ] Check Beyla discovery in alloy-tracing logs on ringtail
- [ ] Sync grafana-config for updated datasources
- [ ] Sync prometheus for updated scrape config
- [ ] Test Grafana Tempo datasource connection
- [ ] Generate test traffic and search traces in Grafana Explore → Tempo
- [ ] After merge: reset all ArgoCD app revisions back to main

Reviewed-on: #286
2026-03-05 10:51:07 -08:00
..
+changelog-subdir-hook.infra.md Use towncrier orphan fragment naming for C0 changes 2026-03-03 15:30:00 -08:00
+frigate-db-path.bugfix.md Use towncrier orphan fragment naming for C0 changes 2026-03-03 15:30:00 -08:00
+kiwix-382.infra.md Bump kiwix-serve from 3.8.1 to 3.8.2 2026-03-05 08:12:32 -08:00
+mikado-finalization-cleanup.doc.md Remove mikado frontmatter from closed chains, clarify finalization rules 2026-03-04 20:43:19 -08:00
+ollama-reference-card.doc.md Add Ollama reference card and update indexes 2026-03-04 19:43:14 -08:00
+oomkill-dashboard.infra.md Add OOMKill observability to Kubernetes Clusters dashboard 2026-03-04 20:53:07 -08:00
+orphan-fragment-convention.doc.md Use towncrier orphan fragment naming for C0 changes 2026-03-03 15:30:00 -08:00
+retire-plans-directory.doc.md Retire plans directory, convert migrate-forgejo-from-brew to mikado card 2026-03-04 20:28:14 -08:00
+review-upgrade-grafana.doc.md Review upgrade-grafana doc: fix image tag ref, add sidecar link 2026-03-04 07:53:22 -08:00
+transmission-rate-metrics.bugfix.md Fix per-torrent rate panels showing cumulative bytes instead of rates 2026-03-05 08:01:37 -08:00
.gitkeep Add towncrier changelog system (#86) 2026-02-03 11:48:13 -08:00
changelog-all-levels.doc.md Clarify that changelog fragments apply to all change levels (C0–C2) 2026-03-03 13:15:06 -08:00
feature-argocd-authentik-oidc.feature.md Add Authentik OIDC login for ArgoCD (#284) 2026-03-05 09:07:25 -08:00
feature-dagger-v0.20.infra.md Upgrade Dagger from v0.19.11 to v0.20.0 (#285) 2026-03-05 09:32:13 -08:00
feature-forge-public.feature.md Expose Forgejo publicly at forge.eblu.me (#278) 2026-03-03 08:40:41 -08:00
feature-grafana-sidecar.infra.md Home-build grafana-sidecar container (#281) 2026-03-03 13:48:24 -08:00
feature-ha-cv-docs-zero-downtime.infra.md Add pre-commit check for changelog fragment placement 2026-03-03 10:49:01 -08:00
feature-loki-container.infra.md Build Loki container image locally (#280) 2026-03-03 13:00:43 -08:00
feature-ollama-ringtail.feature.md Deploy Ollama LLM server on ringtail (#277) 2026-03-02 20:39:51 -08:00
feature-otel-tracing.feature.md Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) 2026-03-05 10:51:07 -08:00
feature-transmission-exporter-python.infra.md Replace transmission-exporter with homegrown Python exporter (#283) 2026-03-04 21:55:00 -08:00
feature-transmission-review.infra.md Upgrade Transmission to 4.1.1 (#282) 2026-03-04 07:44:33 -08:00
forgejo-proxy-dashboard.feature.md Add fly.io proxy observability and app logs to Forgejo dashboard 2026-03-03 10:24:53 -08:00
frigate-memory.infra.md Add changelog fragment for Frigate memory limit bump 2026-03-03 13:58:35 -08:00
gandi-bookmark.infra.md Add changelog fragment for Gandi bookmark 2026-03-03 13:06:02 -08:00
implicit-octal.infra.md Allow implicit octals in yamllint and normalize k8s mode values 2026-03-03 13:10:44 -08:00
upgrade-teslamate-v3.0.0.infra.md Upgrade TeslaMate v2.2.0 → v3.0.0 (#279) 2026-03-03 11:56:40 -08:00