Migrate observability stack to Kubernetes #42

Merged
eblume merged 11 commits from fix/grafana-datasource-hostname into main 2026-01-22 12:06:03 -08:00
Owner

Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack.

Summary

  • Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal)
  • Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses
  • Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics
  • Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net)
  • Add ACL rule for port 9187 (CNPG metrics)
  • Delete obsolete ansible roles for prometheus and loki

Changes

  • argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress
  • argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress
  • argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications
  • argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS
  • argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint
  • argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics
  • ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints
  • pulumi/policy.hujson - ACL for port 9187
  • Deleted ansible/roles/prometheus/ and ansible/roles/loki/

Deployment and Testing

  • Stop prometheus and loki on indri
  • Sync ArgoCD apps (apps, prometheus, loki, grafana)
  • Run mise run provision-indri -- --tags alloy
  • Verify Grafana dashboards show data

🤖 Generated with https://claude.ai/claude-code

Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code
After minikube migration from podman to docker driver, the hostname
host.containers.internal no longer resolves. Use host.minikube.internal
which is the correct hostname for docker driver.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Tailscale service exposing CNPG metrics on port 9187 (cnpg-metrics.tail8d86e.ts.net)
- Add Prometheus scrape config for cnpg-postgres job
- Update PostgreSQL dashboard to use CNPG metric names (cnpg_* prefix)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tcp:9187 to tag:homelab → tag:k8s ACL rule for Prometheus
to scrape CloudNativePG metrics endpoint.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major observability stack migration:
- Deploy Prometheus in k8s with 20Gi PVC, Tailscale Ingress
- Deploy Loki in k8s with 20Gi PVC, Tailscale Ingress
- Update Grafana to use k8s-internal endpoints for data sources
- Update Alloy on indri to push to k8s via Tailscale endpoints
- Prometheus scrapes sifaka via LAN IP (Docker NAT, same as NFS)
- Deprecate ansible prometheus/loki roles

Alloy on indri continues to collect:
- System metrics (via prometheus.exporter.unix)
- Textfile metrics (borgmatic, plex)
- Logs (forgejo, tailscale, borgmatic, zot, plex)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
eblume force-pushed fix/grafana-datasource-hostname from b7f5988ea7 to e4b11a9c25 2026-01-22 07:51:40 -08:00 Compare
eblume force-pushed fix/grafana-datasource-hostname from e4b11a9c25 to 7633a9b7a4 2026-01-22 07:52:05 -08:00 Compare
eblume changed title from Fix Grafana datasource URLs for docker driver to Migrate observability stack to Kubernetes 2026-01-22 08:31:42 -08:00
CGO-enabled build required for macOS native DNS resolver (Tailscale MagicDNS).
Homebrew bottle is built with CGO_ENABLED=0 which uses Go's pure DNS resolver
that doesn't respect /etc/resolver/* on macOS.

- Remove Homebrew installation, use ~/.local/bin/alloy
- Add LaunchAgent plist (mcquack.eblume.alloy)
- Update config paths to ~/.config/grafana-alloy
- Add build instructions in defaults/main.yml
- Add alloy's own logs to mcquack_logs collection
- Remove checks for local prometheus/loki/grafana (now in k8s)
- Update alloy check to use launchctl (no longer brew service)
- Add k8s pod health checks for monitoring stack
- Update HTTP endpoints to use Tailscale URLs
- Reorganize sections for clarity
Shows app name, sync status, health, and revision (truncated to 7 chars)
The previous replace_all edit corrupted the variable definition from
"kubectl" to "$KUBECTL", causing an unbound variable error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add HOME and PATH environment variables to the LaunchAgent plist.
Minikube needs HOME to find its config files (~/.minikube/) and
PATH to find docker for status checks.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
eblume merged commit 17023085cb into main 2026-01-22 12:06:03 -08:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/blumeops!42
No description provided.