blumeops/docs/reference/operations/observability.md

18 lines
383 B
Markdown
Raw Normal View History

---
title: Observability
modified: 2026-02-07
tags:
- operations
---
# Observability
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
Metrics, logs, traces, and dashboards for BlumeOps infrastructure.
## Components
- [[prometheus]] - Metrics storage and querying
- [[loki]] - Log aggregation
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
- [[tempo]] - Distributed tracing
- [[alloy|Alloy]] - Metrics, log, and trace collection
- [[grafana]] - Dashboards and visualization