Migrate observability stack to Kubernetes (#42)

Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack.

Summary

  - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal)
  - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses
  - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics
  - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net)
  - Add ACL rule for port 9187 (CNPG metrics)
  - Delete obsolete ansible roles for prometheus and loki

Changes

  - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress
  - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress
  - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications
  - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS
  - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint
  - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics
  - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints
  - pulumi/policy.hujson - ACL for port 9187
  - Deleted ansible/roles/prometheus/ and ansible/roles/loki/

Deployment and Testing

  - Stop prometheus and loki on indri
  - Sync ArgoCD apps (apps, prometheus, loki, grafana)
  - Run mise run provision-indri -- --tags alloy
  - Verify Grafana dashboards show data

🤖 Generated with https://claude.ai/claude-code

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42
This commit is contained in:
Erich Blume 2026-01-22 12:06:02 -08:00
commit 17023085cb
36 changed files with 569 additions and 270 deletions

View file

@ -74,11 +74,11 @@
"dst": ["tag:homelab"],
"ip": ["tcp:3001", "tcp:2200"],
},
// Homelab can reach k8s PostgreSQL for borgmatic backups
// Homelab can reach k8s services: PostgreSQL, CNPG metrics, Prometheus/Loki
{
"src": ["tag:homelab"],
"dst": ["tag:k8s"],
"ip": ["tcp:5432"],
"ip": ["tcp:443", "tcp:5432", "tcp:9187"],
},
],
@ -141,10 +141,10 @@
"accept": ["tag:kiwix:443", "tag:forge:443", "tag:feed:443", "tag:pg:5432"],
"deny": ["tag:grafana:443", "tag:loki:3100", "tag:nas:445", "tag:registry:443", "tag:k8s-api:443"],
},
// Homelab can reach homelab and NAS
// Homelab can reach homelab, NAS, and k8s services (postgres, metrics, prometheus/loki)
{
"src": "tag:homelab",
"accept": ["tag:homelab:22", "tag:nas:445"],
"accept": ["tag:homelab:22", "tag:nas:445", "tag:k8s:443", "tag:k8s:5432", "tag:k8s:9187"],
},
// K8s workloads can reach registry and forge (on indri:3001 HTTP, :2200 SSH)
{