blumeops/argocd/manifests/prometheus/ingress-tailscale.yaml

# Tailscale Ingress for Prometheus
# Allows Alloy on indri to push metrics via remote_write
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-tailscale
  namespace: monitoring
  annotations:
    tailscale.com/funnel: "false"
    tailscale.com/proxy-group: "ingress"
    tailscale.com/tags: "tag:k8s,tag:flyio-target"
    gethomepage.dev/enabled: "true"
    gethomepage.dev/name: "Prometheus"
    gethomepage.dev/group: "Infrastructure"
    gethomepage.dev/icon: "prometheus.png"
    gethomepage.dev/description: "Metrics storage"
    gethomepage.dev/href: "https://prometheus.ops.eblu.me"
    gethomepage.dev/pod-selector: "app=prometheus"
spec:
  ingressClassName: tailscale
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: prometheus
                port:
                  number: 9090
  tls:
    - hosts:
        - prometheus
Migrate observability stack to Kubernetes (#42) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42 2026-01-22 12:06:02 -08:00			`# Tailscale Ingress for Prometheus`
			`# Allows Alloy on indri to push metrics via remote_write`
			`apiVersion: networking.k8s.io/v1`
			`kind: Ingress`
			`metadata:`
			`name: prometheus-tailscale`
			`namespace: monitoring`
			`annotations:`
			`tailscale.com/funnel: "false"`
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126) ## Summary - Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy - Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test - Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses - Switch Alloy push endpoints from `.ops.eblu.me` (Caddy) to `.tail8d86e.ts.net` (Tailscale Ingress) - Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly ## Manual step (not in PR) Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes. ## Deployment order 1. Pulumi ACLs — `mise run tailnet-preview && mise run tailnet-up` 2. OAuth client — Manual update in Tailscale admin console 3. K8s Ingresses — `argocd app sync apps && argocd app sync docs loki prometheus` 4. Fly.io proxy — `mise run fly-deploy` 5. Verify — `mise run services-check`, check Grafana dashboards ## Test plan - [ ] `mise run tailnet-preview` shows clean diff - [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions - [ ] After deploy: Grafana dashboards show continued log/metric flow - [ ] `curl -sf https://docs.eblu.me` returns 200 - [ ] `mise run services-check` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126 2026-02-08 21:54:18 -08:00			`tailscale.com/proxy-group: "ingress"`
			`tailscale.com/tags: "tag:k8s,tag:flyio-target"`
Replace hajimari with gethomepage (#75) ## Summary - Remove hajimari (unmaintained since Oct 2022, broken helm deps) - Add gethomepage (28k stars, actively maintained, monthly releases) - Migrate custom apps, bookmarks, and search config - Enable k8s RBAC for service autodiscovery - Configure Tailscale ingress at go.tail8d86e.ts.net ## Why the switch Hajimari hasn't released since October 2022. The helm chart has a broken dependency (bjw-s/common URL is 404), and unreleased code on main has bugs. gethomepage has similar k8s autodiscovery via ingress annotations and is very actively maintained. ## Deployment and Testing - [ ] Delete hajimari app from ArgoCD - [ ] Delete hajimari namespace - [ ] Sync apps to pick up new homepage app - [ ] Sync homepage app - [ ] Verify go.ops.eblu.me loads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/75 2026-01-30 13:21:12 -08:00			`gethomepage.dev/enabled: "true"`
			`gethomepage.dev/name: "Prometheus"`
Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify (#190) ## Summary Deploy a cloud-free NVR stack for the GableCam (ReoLink Elite Floodlight at 192.168.1.159): - Mosquitto — shared MQTT broker in `mqtt` namespace (cluster-internal, no auth) - Ntfy — self-hosted push notifications in `ntfy` namespace, exposed at `ntfy.tail8d86e.ts.net` / `ntfy.ops.eblu.me` - Frigate — NVR with GableCam via HTTP-FLV, ONNX CPU detection, NFS recordings on sifaka, exposed at `nvr.tail8d86e.ts.net` / `nvr.ops.eblu.me` - frigate-notify — bridges Frigate detection events (person, car, dog, cat) to Ntfy alerts via MQTT Also includes: - Prometheus scrape target for Frigate metrics - Grafana dashboard for Frigate (status, inference speed, FPS, CPU/memory, storage) - Caddy reverse proxy entries for `nvr.ops.eblu.me` and `ntfy.ops.eblu.me` ## Prerequisites - [ ] Create NFS share `frigate` on sifaka (`/volume1/frigate`, RW for indri) - [ ] Create 1Password item "Reolink Floodlight Camera" in `blumeops` vault with `username` and `password` fields ## Deployment (after merge) ```bash argocd app sync apps argocd app sync mosquitto argocd app sync ntfy argocd app sync frigate argocd app sync grafana-config argocd app sync prometheus mise run provision-indri -- --tags caddy mise run services-check ``` ## Verification - [ ] Mosquitto pod running, accepting connections on 1883 - [ ] Ntfy web UI accessible at `ntfy.ops.eblu.me` - [ ] Frigate web UI at `nvr.ops.eblu.me` showing GableCam live feed - [ ] Object detection working (ONNX, person/car/dog/cat) - [ ] Recordings appearing in NFS share on sifaka - [ ] frigate-notify sending detection alerts to Ntfy - [ ] Prometheus scraping Frigate metrics - [ ] Grafana dashboard showing Frigate data Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/190 2026-02-14 21:27:44 -08:00			`gethomepage.dev/group: "Infrastructure"`
Replace hajimari with gethomepage (#75) ## Summary - Remove hajimari (unmaintained since Oct 2022, broken helm deps) - Add gethomepage (28k stars, actively maintained, monthly releases) - Migrate custom apps, bookmarks, and search config - Enable k8s RBAC for service autodiscovery - Configure Tailscale ingress at go.tail8d86e.ts.net ## Why the switch Hajimari hasn't released since October 2022. The helm chart has a broken dependency (bjw-s/common URL is 404), and unreleased code on main has bugs. gethomepage has similar k8s autodiscovery via ingress annotations and is very actively maintained. ## Deployment and Testing - [ ] Delete hajimari app from ArgoCD - [ ] Delete hajimari namespace - [ ] Sync apps to pick up new homepage app - [ ] Sync homepage app - [ ] Verify go.ops.eblu.me loads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/75 2026-01-30 13:21:12 -08:00			`gethomepage.dev/icon: "prometheus.png"`
			`gethomepage.dev/description: "Metrics storage"`
			`gethomepage.dev/href: "https://prometheus.ops.eblu.me"`
			`gethomepage.dev/pod-selector: "app=prometheus"`
Migrate observability stack to Kubernetes (#42) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42 2026-01-22 12:06:02 -08:00			`spec:`
			`ingressClassName: tailscale`
			`rules:`
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126) ## Summary - Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy - Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test - Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses - Switch Alloy push endpoints from `.ops.eblu.me` (Caddy) to `.tail8d86e.ts.net` (Tailscale Ingress) - Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly ## Manual step (not in PR) Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes. ## Deployment order 1. Pulumi ACLs — `mise run tailnet-preview && mise run tailnet-up` 2. OAuth client — Manual update in Tailscale admin console 3. K8s Ingresses — `argocd app sync apps && argocd app sync docs loki prometheus` 4. Fly.io proxy — `mise run fly-deploy` 5. Verify — `mise run services-check`, check Grafana dashboards ## Test plan - [ ] `mise run tailnet-preview` shows clean diff - [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions - [ ] After deploy: Grafana dashboards show continued log/metric flow - [ ] `curl -sf https://docs.eblu.me` returns 200 - [ ] `mise run services-check` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126 2026-02-08 21:54:18 -08:00			`- http:`
Migrate observability stack to Kubernetes (#42) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42 2026-01-22 12:06:02 -08:00			`paths:`
			`- path: /`
			`pathType: Prefix`
			`backend:`
			`service:`
			`name: prometheus`
			`port:`
			`number: 9090`
			`tls:`
			`- hosts:`
			`- prometheus`