## Summary Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications. **Design:** - Grafana Unified Alerting evaluates rules against Prometheus/Loki - ntfy webhook contact point delivers iOS notifications - Anti-noise policy: page once per 24h per alert group - Every alert links to a runbook in `docs/how-to/alerts/` - services-check eventually queries the alerting API instead of doing its own probes **Chain (bottom-up):** 1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy 2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure 3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks 4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API 5. `deploy-infra-alerting` — goal card 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #303
2.1 KiB
2.1 KiB
| title | modified | tags | ||
|---|---|---|---|---|
| Refactor services-check to Query Alerts | 2026-03-22 |
|
Refactor services-check to Query Alerts
Change mise run services-check from doing its own health probes to querying the Grafana alerting API for currently firing alerts. The script becomes a CLI view into the same alerting system that sends ntfy notifications.
What to Do
1. Query the Grafana Alerting API
Grafana exposes alert state via:
GET /api/v1/provisioning/alert-rules— all configured rulesGET /api/prometheus/grafana/api/v1/alerts— currently firing alerts (Prometheus-compatible format)
The second endpoint is simpler — it returns only active alerts with labels and annotations, similar to Alertmanager's /api/v1/alerts.
2. Rewrite services-check
The new services-check should:
- Query the Grafana alerting API for firing alerts
- Display them in a table with service name, alert name, duration, and runbook link
- If no alerts are firing, print a green "all clear" message
- Exit 0 if no alerts, exit 1 if any are firing
- Optionally keep a few checks that don't map to alerting (e.g., the ArgoCD sync status table as a summary view)
3. Handle Authentication
services-check will need a Grafana API token or service account token. Options:
- Use the existing Grafana admin credentials from 1Password (
op read) - Create a dedicated read-only service account in Grafana
4. Preserve the ArgoCD Summary
The ArgoCD sync/health table in services-check is a useful quick view even when nothing is alerting. Consider keeping it as a separate section that always displays, independent of the alert query.
Verification
mise run services-checkqueries Grafana instead of doing direct probes- Firing alerts are displayed with service name, alert name, and runbook link
- Exit code reflects alert state (0 = clear, 1 = firing)
- Works when Grafana is unreachable (graceful error, not a crash)
- ArgoCD summary table still works
Related
- port-services-check-alerts — Prerequisite: alerts must exist to query
- deploy-infra-alerting — Parent goal