## Summary Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications. **Design:** - Grafana Unified Alerting evaluates rules against Prometheus/Loki - ntfy webhook contact point delivers iOS notifications - Anti-noise policy: page once per 24h per alert group - Every alert links to a runbook in `docs/how-to/alerts/` - services-check eventually queries the alerting API instead of doing its own probes **Chain (bottom-up):** 1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy 2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure 3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks 4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API 5. `deploy-infra-alerting` — goal card 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #303
1.7 KiB
1.7 KiB
| title | modified | tags | |||
|---|---|---|---|---|---|
| Runbook: Textfile Stale | 2026-03-22 |
|
Runbook: Textfile Stale
Alert name: TextfileStale
A Prometheus textfile collector .prom file on indri has not been updated for over 1 hour, indicating the metrics exporter script has stopped running.
Affected Textfiles
| File | LaunchAgent | What it monitors |
|---|---|---|
borgmatic.prom |
mcquack.eblume.borgmatic |
Backup status |
zot.prom |
mcquack.eblume.zot |
Container registry |
minikube.prom |
mcquack.minikube-metrics |
Minikube cluster status |
jellyfin.prom |
mcquack.eblume.jellyfin-metrics |
Media server |
Diagnostic Steps
-
Check which file is stale — the
filelabel in the alert tells you. Verify on indri:ssh indri 'ls -la /opt/homebrew/var/node_exporter/textfile/' -
Check if the LaunchAgent is running:
ssh indri 'launchctl list | grep mcquack' -
Check LaunchAgent logs (plist defines stdout/stderr paths):
ssh indri 'cat ~/Library/Logs/mcquack/<agent-name>.log' -
Try running the exporter manually:
ssh indri 'cat ~/Library/LaunchAgents/mcquack.<agent>.plist' # Find the ProgramArguments, run them manually
Common Causes
- LaunchAgent not loaded —
launchctl load ~/Library/LaunchAgents/mcquack.<agent>.plist - Script error — the exporter script crashed; check logs
- Permissions — the textfile directory is not writable
- Indri reboot — some LaunchAgents may not auto-start
Related
- alloy — Collects textfile metrics via
prometheus.exporter.unix - deploy-infra-alerting — Alerting pipeline overview