C2: Deploy infrastructure alerting pipeline (#303)
## Summary Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications. **Design:** - Grafana Unified Alerting evaluates rules against Prometheus/Loki - ntfy webhook contact point delivers iOS notifications - Anti-noise policy: page once per 24h per alert group - Every alert links to a runbook in `docs/how-to/alerts/` - services-check eventually queries the alerting API instead of doing its own probes **Chain (bottom-up):** 1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy 2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure 3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks 4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API 5. `deploy-infra-alerting` — goal card 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #303
This commit is contained in:
parent
f1620abb17
commit
6d65e6928c
20 changed files with 1259 additions and 46 deletions
|
|
@ -169,6 +169,43 @@ prometheus.exporter.blackbox "services" {
|
|||
address = "http://argocd-server.argocd.svc.cluster.local:80/healthz"
|
||||
module = "http_2xx"
|
||||
}
|
||||
|
||||
target {
|
||||
name = "prometheus"
|
||||
address = "http://prometheus.monitoring.svc.cluster.local:9090/-/healthy"
|
||||
module = "http_2xx"
|
||||
}
|
||||
|
||||
target {
|
||||
name = "loki"
|
||||
address = "http://loki.monitoring.svc.cluster.local:3100/ready"
|
||||
module = "http_2xx"
|
||||
}
|
||||
|
||||
target {
|
||||
name = "grafana"
|
||||
address = "http://grafana.monitoring.svc.cluster.local:80/api/health"
|
||||
module = "http_2xx"
|
||||
}
|
||||
|
||||
target {
|
||||
name = "teslamate"
|
||||
address = "http://teslamate.teslamate.svc.cluster.local:4000/"
|
||||
module = "http_2xx"
|
||||
}
|
||||
|
||||
target {
|
||||
name = "immich"
|
||||
address = "http://immich-server.immich.svc.cluster.local:2283/api/server/ping"
|
||||
module = "http_2xx"
|
||||
}
|
||||
|
||||
target {
|
||||
name = "navidrome"
|
||||
address = "http://navidrome.navidrome.svc.cluster.local:4533/"
|
||||
module = "http_2xx"
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
// Scrape blackbox probe results
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue