Migrate observability stack to Kubernetes #42

Merged
eblume merged 11 commits from fix/grafana-datasource-hostname into main 2026-01-22 12:06:03 -08:00

11 commits

Author SHA1 Message Date
bd8ac77d67 Fix minikube-metrics LaunchAgent environment
Add HOME and PATH environment variables to the LaunchAgent plist.
Minikube needs HOME to find its config files (~/.minikube/) and
PATH to find docker for status checks.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 12:04:57 -08:00
9c7cdf2481 Fix KUBECTL variable definition in minikube-metrics script
The previous replace_all edit corrupted the variable definition from
"kubectl" to "$KUBECTL", causing an unbound variable error.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 11:43:46 -08:00
7ee0410f27 Show target branch instead of commit hash in ArgoCD status 2026-01-22 11:28:34 -08:00
f6a15745bd Add ArgoCD sync status to services check
Shows app name, sync status, health, and revision (truncated to 7 chars)
2026-01-22 11:27:31 -08:00
b457e45d9a Update indri-services-check for k8s observability stack
- Remove checks for local prometheus/loki/grafana (now in k8s)
- Update alloy check to use launchctl (no longer brew service)
- Add k8s pod health checks for monitoring stack
- Update HTTP endpoints to use Tailscale URLs
- Reorganize sections for clarity
2026-01-22 11:25:42 -08:00
3f9d4aefce Switch Alloy from Homebrew to source-built binary with LaunchAgent
CGO-enabled build required for macOS native DNS resolver (Tailscale MagicDNS).
Homebrew bottle is built with CGO_ENABLED=0 which uses Go's pure DNS resolver
that doesn't respect /etc/resolver/* on macOS.

- Remove Homebrew installation, use ~/.local/bin/alloy
- Add LaunchAgent plist (mcquack.eblume.alloy)
- Update config paths to ~/.config/grafana-alloy
- Add build instructions in defaults/main.yml
- Add alloy's own logs to mcquack_logs collection
2026-01-22 10:52:13 -08:00
45519f2cd2 Add port 443 to homelab->k8s ACL for Prometheus/Loki 2026-01-22 10:33:18 -08:00
7633a9b7a4 Migrate Prometheus and Loki to Kubernetes
Major observability stack migration:
- Deploy Prometheus in k8s with 20Gi PVC, Tailscale Ingress
- Deploy Loki in k8s with 20Gi PVC, Tailscale Ingress
- Update Grafana to use k8s-internal endpoints for data sources
- Update Alloy on indri to push to k8s via Tailscale endpoints
- Prometheus scrapes sifaka via LAN IP (Docker NAT, same as NFS)
- Deprecate ansible prometheus/loki roles

Alloy on indri continues to collect:
- System metrics (via prometheus.exporter.unix)
- Textfile metrics (borgmatic, plex)
- Logs (forgejo, tailscale, borgmatic, zot, plex)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 07:52:00 -08:00
74c218063d Allow homelab to scrape CNPG metrics on port 9187
Add tcp:9187 to tag:homelab → tag:k8s ACL rule for Prometheus
to scrape CloudNativePG metrics endpoint.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 07:17:07 -08:00
329f58499b Add CNPG metrics collection for PostgreSQL dashboard
- Add Tailscale service exposing CNPG metrics on port 9187 (cnpg-metrics.tail8d86e.ts.net)
- Add Prometheus scrape config for cnpg-postgres job
- Update PostgreSQL dashboard to use CNPG metric names (cnpg_* prefix)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 22:03:28 -08:00
0c9c306917 Fix Grafana datasource URLs for docker driver
After minikube migration from podman to docker driver, the hostname
host.containers.internal no longer resolves. Use host.minikube.internal
which is the correct hostname for docker driver.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:46:28 -08:00