blumeops/argocd/manifests/grafana/datasources.yaml

69 lines
1.8 KiB
YAML
Raw Normal View History

Add kustomize images: and configMapGenerator: across services (#264) ## Summary - Move hardcoded image tags to kustomization.yaml `images:` transformer across **22 services** — image names in manifests become version-agnostic templates, with tags centralized in one place per service - Replace hand-written ConfigMap manifests with `configMapGenerator:` in **12 services** — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes - Create new `kustomization.yaml` for **forgejo-runner** and **nvidia-device-plugin** (switches ArgoCD from directory mode to kustomize mode, rendered output identical) ### Services modified **Images only (8):** cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent **Images + configMapGenerator (10):** alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus **Images only, no configMapGenerator (4):** authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is) **Skipped entirely (6):** argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml) ### What changes at deploy time - **images:** — no functional diff, `kustomize build` produces identical output with tags - **configMapGenerator:** — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change ## Test plan - [x] `kubectl kustomize` builds all 30 service directories successfully - [x] Image tags verified in rendered output for all modified services - [x] ConfigMap hash suffixes verified in rendered output - [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names - [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.) - [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes - [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264
2026-02-24 14:25:19 -08:00
apiVersion: 1
datasources:
- access: proxy
editable: false
isDefault: true
name: Prometheus
orgId: 1
type: prometheus
uid: prometheus
url: http://prometheus.monitoring.svc.cluster.local:9090
- access: proxy
editable: false
name: Loki
orgId: 1
type: loki
uid: loki
url: http://loki.monitoring.svc.cluster.local:3100
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: '"traceID":"(\w+)"'
name: TraceID
url: "$${__value.raw}"
- access: proxy
editable: false
name: Tempo
orgId: 1
type: tempo
uid: tempo
url: http://tempo.monitoring.svc.cluster.local:3200
jsonData:
tracesToLogsV2:
datasourceUid: loki
filterByTraceID: true
filterBySpanID: false
tracesToMetrics:
datasourceUid: prometheus
spanStartTimeShift: "-1h"
spanEndTimeShift: "1h"
queries:
- name: Request rate
query: "sum(rate(traces_spanmetrics_calls_total{$$__tags}[5m]))"
- name: Error rate
query: "sum(rate(traces_spanmetrics_calls_total{$$__tags, status_code=\"STATUS_CODE_ERROR\"}[5m]))"
- name: Duration (p95)
query: "histogram_quantile(0.95, sum(rate(traces_spanmetrics_duration_seconds_bucket{$$__tags}[5m])) by (le))"
serviceMap:
datasourceUid: prometheus
nodeGraph:
enabled: true
Add kustomize images: and configMapGenerator: across services (#264) ## Summary - Move hardcoded image tags to kustomization.yaml `images:` transformer across **22 services** — image names in manifests become version-agnostic templates, with tags centralized in one place per service - Replace hand-written ConfigMap manifests with `configMapGenerator:` in **12 services** — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes - Create new `kustomization.yaml` for **forgejo-runner** and **nvidia-device-plugin** (switches ArgoCD from directory mode to kustomize mode, rendered output identical) ### Services modified **Images only (8):** cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent **Images + configMapGenerator (10):** alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus **Images only, no configMapGenerator (4):** authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is) **Skipped entirely (6):** argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml) ### What changes at deploy time - **images:** — no functional diff, `kustomize build` produces identical output with tags - **configMapGenerator:** — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change ## Test plan - [x] `kubectl kustomize` builds all 30 service directories successfully - [x] Image tags verified in rendered output for all modified services - [x] ConfigMap hash suffixes verified in rendered output - [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names - [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.) - [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes - [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264
2026-02-24 14:25:19 -08:00
- access: proxy
database: teslamate
editable: false
jsonData:
database: teslamate
connMaxLifetime: 14400
maxIdleConns: 2
maxOpenConns: 5
sslmode: disable
name: TeslaMate
orgId: 1
secureJsonData:
password: $TESLAMATE_DB_PASSWORD
type: postgres
uid: TeslaMate
# teslamate DB migrated to ringtail blumeops-pg (wave-1); reached via the
# Caddy L4 route on indri (pg.ops.eblu.me:5434 -> blumeops-pg-ringtail).
url: pg.ops.eblu.me:5434
Add kustomize images: and configMapGenerator: across services (#264) ## Summary - Move hardcoded image tags to kustomization.yaml `images:` transformer across **22 services** — image names in manifests become version-agnostic templates, with tags centralized in one place per service - Replace hand-written ConfigMap manifests with `configMapGenerator:` in **12 services** — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes - Create new `kustomization.yaml` for **forgejo-runner** and **nvidia-device-plugin** (switches ArgoCD from directory mode to kustomize mode, rendered output identical) ### Services modified **Images only (8):** cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent **Images + configMapGenerator (10):** alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus **Images only, no configMapGenerator (4):** authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is) **Skipped entirely (6):** argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml) ### What changes at deploy time - **images:** — no functional diff, `kustomize build` produces identical output with tags - **configMapGenerator:** — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change ## Test plan - [x] `kubectl kustomize` builds all 30 service directories successfully - [x] Image tags verified in rendered output for all modified services - [x] ConfigMap hash suffixes verified in rendered output - [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names - [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.) - [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes - [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264
2026-02-24 14:25:19 -08:00
user: teslamate