## Summary - Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring - Formalize existing `node_exporter` container under Ansible management - Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus - Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime) - Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS - Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles ## Prerequisites before deploy - [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP) - [ ] Verify `ssh eblume@sifaka 'docker ps'` works - [ ] Run `mise run provision-sifaka` to deploy containers - [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes - [ ] `argocd app sync prometheus` + `argocd app sync grafana-config` ## Test plan - [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics` - [ ] Verify Prometheus targets page shows both sifaka jobs as UP - [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/135
46 lines
1.3 KiB
YAML
46 lines
1.3 KiB
YAML
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: prometheus-config
|
|
namespace: monitoring
|
|
data:
|
|
prometheus.yml: |
|
|
global:
|
|
scrape_interval: 15s
|
|
evaluation_interval: 15s
|
|
|
|
# Indri system metrics are pushed via Alloy remote_write
|
|
# K8s services are scraped directly
|
|
|
|
scrape_configs:
|
|
# Sifaka NAS exporters (via Caddy L4 TCP proxy on indri)
|
|
- job_name: "node-exporter-sifaka"
|
|
static_configs:
|
|
- targets: ["nas.ops.eblu.me:9100"]
|
|
|
|
- job_name: "smartctl-sifaka"
|
|
scrape_interval: 60s
|
|
static_configs:
|
|
- targets: ["nas.ops.eblu.me:9633"]
|
|
|
|
# CNPG PostgreSQL metrics (k8s internal)
|
|
- job_name: "cnpg-postgres"
|
|
static_configs:
|
|
- targets: ["blumeops-pg-metrics-tailscale.databases.svc.cluster.local:9187"]
|
|
labels:
|
|
instance: "blumeops-pg"
|
|
|
|
# Prometheus self-monitoring
|
|
- job_name: "prometheus"
|
|
static_configs:
|
|
- targets: ["localhost:9090"]
|
|
|
|
# Loki metrics
|
|
- job_name: "loki"
|
|
static_configs:
|
|
- targets: ["loki.monitoring.svc.cluster.local:3100"]
|
|
|
|
# Kubernetes state metrics (pods, deployments, resource usage, etc.)
|
|
- job_name: "kube-state-metrics"
|
|
static_configs:
|
|
- targets: ["kube-state-metrics.monitoring.svc.cluster.local:8080"]
|