blumeops/docs/how-to/runbooks/runbook-textfile-stale.md
Erich Blume 6d65e6928c C2: Deploy infrastructure alerting pipeline (#303)
## Summary

Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications.

**Design:**
- Grafana Unified Alerting evaluates rules against Prometheus/Loki
- ntfy webhook contact point delivers iOS notifications
- Anti-noise policy: page once per 24h per alert group
- Every alert links to a runbook in `docs/how-to/alerts/`
- services-check eventually queries the alerting API instead of doing its own probes

**Chain (bottom-up):**
1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy
2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure
3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks
4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API
5. `deploy-infra-alerting` — goal card

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #303
2026-03-22 14:52:56 -07:00

1.7 KiB

title modified tags
Runbook: Textfile Stale 2026-03-22
how-to
alerting
runbook

Runbook: Textfile Stale

Alert name: TextfileStale

A Prometheus textfile collector .prom file on indri has not been updated for over 1 hour, indicating the metrics exporter script has stopped running.

Affected Textfiles

File LaunchAgent What it monitors
borgmatic.prom mcquack.eblume.borgmatic Backup status
zot.prom mcquack.eblume.zot Container registry
minikube.prom mcquack.minikube-metrics Minikube cluster status
jellyfin.prom mcquack.eblume.jellyfin-metrics Media server

Diagnostic Steps

  1. Check which file is stale — the file label in the alert tells you. Verify on indri:

    ssh indri 'ls -la /opt/homebrew/var/node_exporter/textfile/'
    
  2. Check if the LaunchAgent is running:

    ssh indri 'launchctl list | grep mcquack'
    
  3. Check LaunchAgent logs (plist defines stdout/stderr paths):

    ssh indri 'cat ~/Library/Logs/mcquack/<agent-name>.log'
    
  4. Try running the exporter manually:

    ssh indri 'cat ~/Library/LaunchAgents/mcquack.<agent>.plist'
    # Find the ProgramArguments, run them manually
    

Common Causes

  • LaunchAgent not loadedlaunchctl load ~/Library/LaunchAgents/mcquack.<agent>.plist
  • Script error — the exporter script crashed; check logs
  • Permissions — the textfile directory is not writable
  • Indri reboot — some LaunchAgents may not auto-start