blumeops

Author	SHA1	Message	Date
Erich Blume	85e36cd807	Operations and observability for sifaka NAS (#135 ) ## Summary - Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring - Formalize existing `node_exporter` container under Ansible management - Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus - Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime) - Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS - Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles ## Prerequisites before deploy - [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP) - [ ] Verify `ssh eblume@sifaka 'docker ps'` works - [ ] Run `mise run provision-sifaka` to deploy containers - [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes - [ ] `argocd app sync prometheus` + `argocd app sync grafana-config` ## Test plan - [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics` - [ ] Verify Prometheus targets page shows both sifaka jobs as UP - [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/135	2026-02-09 17:44:05 -08:00
Erich Blume	3415cad38c	Log real client IPs via Fly-Client-IP header (#130 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 59s Details ## Summary - Add `client_ip` field to the Fly.io nginx JSON log format, sourced from `Fly-Client-IP` header - Extract `client_ip` in the Alloy pipeline so it's available as a parsed field in Loki - Keeps `remote_addr` (the internal proxy IP) for debugging Fixes: Grafana access logs for docs.eblu.me showing 172.16.11.178 for every request instead of real visitor IPs. ## Deployment and Testing - [ ] Deploy updated fly.io proxy: `fly deploy` from `fly/` directory - [ ] Verify in Grafana that new log lines include `client_ip` with real IPs - [ ] Confirm `remote_addr` still shows the proxy IP (preserved for debugging) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/130	2026-02-09 11:02:06 -08:00
Erich Blume	e6cf7e47e0	Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m8s Details ## Summary - Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy - Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test - Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses - Switch Alloy push endpoints from `.ops.eblu.me` (Caddy) to `.tail8d86e.ts.net` (Tailscale Ingress) - Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly ## Manual step (not in PR) Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes. ## Deployment order 1. Pulumi ACLs — `mise run tailnet-preview && mise run tailnet-up` 2. OAuth client — Manual update in Tailscale admin console 3. K8s Ingresses — `argocd app sync apps && argocd app sync docs loki prometheus` 4. Fly.io proxy — `mise run fly-deploy` 5. Verify — `mise run services-check`, check Grafana dashboards ## Test plan - [ ] `mise run tailnet-preview` shows clean diff - [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions - [ ] After deploy: Grafana dashboards show continued log/metric flow - [ ] `curl -sf https://docs.eblu.me` returns 200 - [ ] `mise run services-check` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126	2026-02-08 21:54:18 -08:00
Erich Blume	cc54b4f565	Add Fly.io proxy observability via embedded Alloy (#123 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m16s Details ## Summary - Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus) - Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting) - Create two Grafana dashboards: Docs APM (per-service view filtered by `host="docs.eblu.me"`) and Fly.io Proxy Health (aggregate proxy health across all upstream services) ## Changed Files \| File \| Change \| \|------\|--------\| \| `fly/nginx.conf` \| Add JSON `log_format` + `access_log`, add `stub_status` endpoint \| \| `fly/Dockerfile` \| COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config \| \| `fly/alloy.river` \| New — Alloy config: log tailing, metric extraction, remote_write \| \| `fly/start.sh` \| Start Alloy after Tailscale, before nginx \| \| `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` \| New — Docs APM dashboard \| \| `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` \| New — Fly.io Proxy Health dashboard \| \| `argocd/manifests/grafana-config/kustomization.yaml` \| Register new dashboard configmaps \| \| `docs/reference/services/flyio-proxy.md` \| Document observability setup \| ## Deployment and Testing - [ ] `mise run fly-deploy` — rebuild container with Alloy - [ ] `curl https://docs.eblu.me/` — generate traffic - [ ] `fly logs -a blumeops-proxy` — verify Alloy startup - [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}` - [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}` - [ ] `argocd app sync grafana-config` — deploy dashboards - [ ] Verify dashboards show data in Grafana - [ ] `mise run services-check` — no regressions Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123	2026-02-08 10:05:38 -08:00
Erich Blume	737371ab59	Add pod state observability to minikube dashboard (#83 ) ## Summary - Add "Unhealthy Pods" stat panel showing count of pods in error states (ImagePullBackOff, CrashLoopBackOff, etc.) with red background when > 0 - Add "Pods by Waiting Reason" time series chart showing container waiting states over time - Provides visibility into stuck pods that ArgoCD doesn't track (since it manages CronJobs, not the Jobs/Pods they spawn) ## Context This addresses the issue where a `zim-watcher` cronjob pod was stuck in `ImagePullBackOff` for 11 days without any alerting. ArgoCD showed the CronJob as "Synced, Healthy" because it only manages the CronJob resource, not its spawned Jobs/Pods. ## Deployment and Testing - [ ] Sync grafana-config app to test branch - [ ] Verify dashboard renders correctly - [ ] Confirm "Unhealthy Pods" shows 0 (green) when no issues - [ ] Reset to main after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/83	2026-02-03 07:20:05 -08:00
Erich Blume	4d97ac4c26	Expand homepage widgets and info panels (#81 ) ## Summary - Add greeting and datetime info widgets to homepage header - Add Miniflux widget showing unread/read counts (via existing API key in 1Password) - Add Grafana widget showing dashboards/datasources/alerts (via existing credentials in 1Password) - Add ArgoCD to bookmarks section - Add TODO comments for widgets needing additional setup (Forgejo, Caddy, UniFi, Glances, Navidrome, Transmission, Immich) ## Deployment and Testing - [ ] Sync homepage app to deploy new ExternalSecrets - [ ] Verify greeting and datetime appear in header - [ ] Verify Miniflux widget shows unread/read counts - [ ] Verify Grafana widget shows dashboard stats - [ ] Check that services without credentials still display (just without widgets) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/81	2026-02-02 16:11:20 -08:00
Erich Blume	b8b33b76c8	Remove Plex media server (#78 ) ## Summary - Remove plex_metrics ansible role - Remove Plex Grafana dashboard - Remove Plex log collection from Alloy config - Update indri-services-check to check Jellyfin instead of Plex ## Deployment and Testing - [x] Unloaded plex-metrics LaunchAgent on indri - [x] Deleted plex-metrics plist and script - [x] Deleted plex.prom textfile - [ ] Deploy Alloy config update - [ ] Sync grafana-config to remove dashboard 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/78	2026-01-30 17:06:00 -08:00
Erich Blume	bcc8685316	Add Jellyfin media server deployment (#77 ) ## Summary - Add Jellyfin ansible role for native macOS deployment via Homebrew cask - Add jellyfin_metrics role for Prometheus textfile metrics collection - Add Caddy routing for jellyfin.ops.eblu.me - Add Alloy log collection for Jellyfin stdout/stderr - Add Grafana dashboard for Jellyfin monitoring ## Architecture Jellyfin runs natively on indri (not in k8s) for full VideoToolbox hardware transcoding support. The M1 Mac Mini can handle ~3 concurrent 4K HDR→SDR transcoding streams. ## Deployment and Testing - [ ] Deploy Jellyfin: `mise run provision-indri -- --tags jellyfin,jellyfin_metrics,caddy,alloy` - [ ] Sync Grafana dashboard: `argocd app sync grafana-config` - [ ] Complete Jellyfin setup wizard at https://jellyfin.ops.eblu.me - [ ] Generate API key and save to `~/.jellyfin-api-key` - [ ] Add media libraries (/Volumes/allisonflix/Movies, /Volumes/allisonflix/TV) - [ ] Enable VideoToolbox hardware transcoding - [ ] Verify metrics in Grafana dashboard - [ ] Verify logs in Loki: `{service="jellyfin"}` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/77	2026-01-30 16:57:26 -08:00
Erich Blume	38538ad5f0	Replace hajimari with gethomepage (#75 ) ## Summary - Remove hajimari (unmaintained since Oct 2022, broken helm deps) - Add gethomepage (28k stars, actively maintained, monthly releases) - Migrate custom apps, bookmarks, and search config - Enable k8s RBAC for service autodiscovery - Configure Tailscale ingress at go.tail8d86e.ts.net ## Why the switch Hajimari hasn't released since October 2022. The helm chart has a broken dependency (bjw-s/common URL is 404), and unreleased code on main has bugs. gethomepage has similar k8s autodiscovery via ingress annotations and is very actively maintained. ## Deployment and Testing - [ ] Delete hajimari app from ArgoCD - [ ] Delete hajimari namespace - [ ] Sync apps to pick up new homepage app - [ ] Sync homepage app - [ ] Verify go.ops.eblu.me loads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/75	2026-01-30 13:21:12 -08:00
Erich Blume	316a4c4e42	Shorten Hajimari info descriptions and hide URLs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 16:34:46 -08:00
Erich Blume	d1164c8aac	Add Hajimari service dashboard (#73 ) ## Summary - Add Hajimari as a service dashboard/start page at `go.ops.eblu.me` - Auto-discovers k8s services from ingress annotations - Custom apps for non-k8s services: Forgejo, Registry, Sifaka NAS - Add `nas.ops.eblu.me` Caddy proxy to Synology dashboard ## Services Configured Auto-discovered (k8s ingresses with hajimari.io annotations): - Grafana, ArgoCD, Prometheus, Loki (Observability) - Miniflux, Kiwix, Transmission, TeslaMate, Immich (Apps) - PyPI/devpi (Infrastructure) Custom apps (non-k8s): - Forgejo (forge.ops.eblu.me) - Registry (registry.ops.eblu.me) - Sifaka NAS (nas.ops.eblu.me) Bookmarks: - Tailscale Admin, 1Password, Pulumi ## Deployment and Testing - [ ] Sync `apps` application to pick up new Hajimari Application - [ ] Sync `hajimari` application - [ ] Run `mise run provision-indri -- --tags caddy` for go/nas proxy entries - [ ] Re-sync all k8s apps with hajimari annotations (or wait for natural drift) - [ ] Verify https://go.ops.eblu.me shows dashboard with all services 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/73	2026-01-29 15:51:42 -08:00
Erich Blume	0d8eb651d4	Fix XID Age graph to show threshold context (#69 ) All checks were successful Build Container / build (push) Successful in 1m10s Details ## Summary - Add fixed Y-axis (0-220M) so the 200M autovacuum threshold is always visible - Add dashed threshold lines at 150M (yellow warning) and 200M (red danger) - Update title to clarify the threshold ## Context The raw XID age naturally trends upward between vacuum freezes, which looked alarming without context. Current values (~143K-216K) are at 0.1% of the threshold - completely healthy. ## Deployment and Testing - [ ] Sync grafana-config app to feature branch - [ ] Verify threshold lines appear on PostgreSQL dashboard 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/69	2026-01-29 07:08:21 -08:00
Erich Blume	0604877db2	Add 'Tesla' prefix to all TeslaMate dashboard titles (#68 ) ## Summary - Renamed all 18 TeslaMate Grafana dashboards to include "Tesla" prefix - Improves organization and discoverability in the dashboard list ## Deployment and Testing - [ ] Sync grafana-config app: `argocd app set grafana-config --revision feature/rename-tesla-dashboards && argocd app sync grafana-config` - [ ] Verify dashboards display with "Tesla" prefix in Grafana 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/68	2026-01-29 06:55:44 -08:00
Erich Blume	9114aac8f6	Switch all ExternalSecrets to creationPolicy: Owner ESO now has full ownership of these secrets. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:27:16 -08:00
Erich Blume	dd6cf20d51	Remove obsolete secret templates - Delete 13 .yaml.tpl files replaced by ExternalSecrets - Update immich/README.md with direct CNPG secret copy instructions - Update miniflux/README.md with context flag and ESO note Only 1password-connect/secret-credentials.yaml.tpl remains (bootstrap). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:26:37 -08:00
Erich Blume	351528474c	Add ExternalSecrets for remaining k8s secrets Migrate 10 secret templates to ESO ExternalSecrets with 1Password Connect: - databases: eblume, borgmatic, teslamate passwords - tailscale-operator: OAuth client credentials - grafana-config: admin password, teslamate datasource - teslamate: db password, encryption key - forgejo-runner: runner registration token - argocd: forge SSH credentials All use creationPolicy: Merge for safe migration from existing secrets. Skipped: - miniflux/secret-db: Uses CNPG secret, not 1Password directly - immich/secret-db: Requires 1Password item creation first - 1password-connect: Bootstrap secret, must stay as template Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 19:50:38 -08:00
Erich Blume	272ddb213b	Add TeslaMate deployment for Tesla Model Y data logging (#47 ) ## Summary - Add TeslaMate k8s deployment with Tailscale ingress at tesla.tail8d86e.ts.net - Add teslamate user to CloudNativePG blumeops-pg cluster - Add TeslaMate PostgreSQL datasource to Grafana - Import 18 TeslaMate Grafana dashboards for charging, drives, efficiency, etc. - Add teslamate database to borgmatic backup configuration ## Deployment and Testing - [ ] Create 1Password items: "TeslaMate DB Password" and "TeslaMate Encryption Key" - [ ] Apply database user secret: `op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl \| kubectl apply -f -` - [ ] Sync blumeops-pg: `argocd app sync blumeops-pg` - [ ] Create teslamate database - [ ] Apply teslamate secrets (encryption key, db connection) - [ ] Apply Grafana datasource secret: `op inject -i argocd/manifests/grafana-config/secret-teslamate-datasource.yaml.tpl \| kubectl apply -f -` - [ ] Sync apps and teslamate: `argocd app sync apps teslamate grafana grafana-config` - [ ] Complete Tesla API OAuth flow at https://tesla.tail8d86e.ts.net - [ ] Verify data collection starts - [ ] Verify Grafana dashboards show data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/47	2026-01-22 21:25:44 -08:00
Erich Blume	57bf8512dc	Log filtering cleanup and observability improvements (#45 ) ## Summary - Suppress noisy storage-provisioner Endpoints deprecation warning (upstream minikube issue) - Disable thermal collector on indri Alloy (not supported on macOS M1) - Add macOS power/thermal metrics collection via powermetrics LaunchDaemon - Add Power & Thermal section to macOS Grafana dashboard - Add logfmt parser for k8s log level extraction (Loki, Prometheus, etc.) - Extract more fields from JSON logs (zot compatibility - uses "message" not "msg") - Silence logfmt parse errors for non-logfmt logs - Fix JSON escaping in devpi dashboard ## Deployment and Testing - [x] Deployed Alloy config changes to indri via ansible - [x] Synced alloy-k8s and grafana-config via ArgoCD - [x] Verified power metrics appearing in Prometheus - [x] Verified thermal collector errors stopped - [x] Verified logfmt parse errors silenced - [x] Verified devpi dashboard loads correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/45	2026-01-22 17:30:08 -08:00
Erich Blume	e4a8405de7	Observability cleanup and k8s service monitoring (#43 ) (#43 ) ## Summary - Remove stale `/opt/homebrew/var/loki` from borgmatic backup (Loki migrated to k8s) - Add Alloy k8s DaemonSet for automatic pod log collection with auto-discovery - Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd - Add transmission-exporter sidecar for full metrics (speed, torrent counts, ratios) - Replace stale devpi dashboard with probe-based metrics (status, response time, uptime) - Add unified "K8s Services Health" dashboard for service uptime/response monitoring ## Manual cleanup already performed - Deleted stale textfile metrics on indri: `devpi.prom`, `transmission.prom` - Deleted stale data directories on indri: `/opt/homebrew/var/loki/`, `/opt/homebrew/var/prometheus/` ## Deployment and Testing - [x] Sync `apps` application to pick up new alloy-k8s app - [x] Deploy alloy-k8s on feature branch: `argocd app set alloy-k8s --revision feature/observability-cleanup && argocd app sync alloy-k8s` - [x] Deploy torrent on feature branch (for transmission exporter): `argocd app set torrent --revision feature/observability-cleanup && argocd app sync torrent` - [x] Deploy prometheus on feature branch (for new scrape config): `argocd app set prometheus --revision feature/observability-cleanup && argocd app sync prometheus` - [x] Deploy grafana-config on feature branch (for dashboards): `argocd app set grafana-config --revision feature/observability-cleanup && argocd app sync grafana-config` - [x] Verify pod logs appear in Loki/Grafana - [x] Verify transmission metrics appear in Prometheus - [x] Verify service probe metrics appear in Prometheus - [x] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic config - [ ] After merge, reset apps to main and resync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/43	2026-01-22 13:51:01 -08:00
Erich Blume	17023085cb	Migrate observability stack to Kubernetes (#42 ) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42	2026-01-22 12:06:02 -08:00
Erich Blume	21848a7919	P5.1: Migrate minikube from podman to QEMU2 driver (#38 ) ## Summary - Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support - Update ansible minikube role with qemu installation and containerd runtime - Remove podman role dependency from indri.yml - Add synology user creation steps and post-migration zot reconfiguration notes ## Why Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support. ## Deployment and Testing - [ ] Create k8s-storage user on Synology DSM - [ ] Store credentials in 1Password (synology-k8s-storage) - [ ] Export current k8s state - [ ] Stop and delete podman-based minikube cluster - [ ] Run ansible to create QEMU2 cluster - [ ] Test NFS volume mount with test pod - [ ] Redeploy ArgoCD and all apps - [ ] Verify all services healthy - [ ] Reconfigure zot registry mirrors for containerd (post-migration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38	2026-01-21 16:03:37 -08:00
Erich Blume	7e6742ad24	K8s Migration Phase 2: Grafana to Kubernetes (#30 ) ## Summary - Migrate Grafana from Homebrew/Ansible to Kubernetes deployment - Switch CloudNativePG to use forge-mirrored Helm chart (HTTPS, no auth needed) - Add Grafana Helm chart deployment via ArgoCD with multi-source pattern - Add Grafana config (Tailscale Ingress, 9 dashboard ConfigMaps) - Update Loki to bind 0.0.0.0 for k8s pod access via `host.containers.internal` ## Key Changes - `argocd/apps/grafana.yaml` - Grafana Helm chart Application - `argocd/apps/grafana-config.yaml` - Ingress + dashboard ConfigMaps - `argocd/apps/cloudnative-pg.yaml` - Now uses forge mirror instead of external Helm repo - `ansible/roles/loki/templates/loki-config.yaml.j2` - Bind 0.0.0.0 ## Deployment and Testing - [x] Deploy Loki config change: `mise run provision-indri -- --tags loki` - [x] Create namespace: `ki create namespace monitoring` - [x] Create secret: `op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl \| ki apply -f -` - [x] Sync ArgoCD apps (grafana, grafana-config) - [x] Verify Grafana works at https://grafana.tail8d86e.ts.net - [x] Remove svc:grafana from ansible tailscale_serve - [x] Stop brew grafana: `ssh indri 'brew services stop grafana'` - [x] Delete ansible grafana role 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/30	2026-01-19 14:40:25 -08:00

22 commits