Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42
38 lines
1.2 KiB
YAML
38 lines
1.2 KiB
YAML
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: prometheus-config
|
|
namespace: monitoring
|
|
data:
|
|
prometheus.yml: |
|
|
global:
|
|
scrape_interval: 15s
|
|
evaluation_interval: 15s
|
|
|
|
# Indri system metrics are pushed via Alloy remote_write
|
|
# K8s services are scraped directly
|
|
|
|
scrape_configs:
|
|
# Sifaka NAS node-exporter (via LAN - Docker NATs through indri)
|
|
# Using LAN IP since k8s pods can reach LAN via Docker NAT (same as NFS mounts)
|
|
# If IP changes, fallback: create Tailscale egress in tailscale-operator/egress-sifaka.yaml
|
|
- job_name: "node-exporter-sifaka"
|
|
static_configs:
|
|
- targets: ["192.168.1.203:9100"]
|
|
|
|
# CNPG PostgreSQL metrics (k8s internal)
|
|
- job_name: "cnpg-postgres"
|
|
static_configs:
|
|
- targets: ["blumeops-pg-metrics-tailscale.databases.svc.cluster.local:9187"]
|
|
labels:
|
|
instance: "blumeops-pg"
|
|
|
|
# Prometheus self-monitoring
|
|
- job_name: "prometheus"
|
|
static_configs:
|
|
- targets: ["localhost:9090"]
|
|
|
|
# Loki metrics
|
|
- job_name: "loki"
|
|
static_configs:
|
|
- targets: ["loki.monitoring.svc.cluster.local:3100"]
|