blumeops/docs/how-to/alerts/runbook-textfile-stale.md
Erich Blume 2fa536e547 C2(deploy-infra-alerting): impl add textfile staleness and Frigate alerts
- TextfileStale: fires when a .prom textfile on indri hasn't been
  updated in 1 hour (node_textfile_mtime_seconds). Covers borgmatic,
  zot, minikube, jellyfin exporters.
- FrigateCameraDown: fires when frigate_camera_fps drops to 0 for 5m.
- Add runbooks for both alerts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 13:43:16 -07:00

1.7 KiB

title modified tags
Runbook: Textfile Stale 2026-03-22
how-to
alerting
runbook

Runbook: Textfile Stale

Alert name: TextfileStale

A Prometheus textfile collector .prom file on indri has not been updated for over 1 hour, indicating the metrics exporter script has stopped running.

Affected Textfiles

File LaunchAgent What it monitors
borgmatic.prom mcquack.eblume.borgmatic Backup status
zot.prom mcquack.eblume.zot Container registry
minikube.prom mcquack.minikube-metrics Minikube cluster status
jellyfin.prom mcquack.eblume.jellyfin-metrics Media server

Diagnostic Steps

  1. Check which file is stale — the file label in the alert tells you. Verify on indri:

    ssh indri 'ls -la /opt/homebrew/var/node_exporter/textfile/'
    
  2. Check if the LaunchAgent is running:

    ssh indri 'launchctl list | grep mcquack'
    
  3. Check LaunchAgent logs (plist defines stdout/stderr paths):

    ssh indri 'cat ~/Library/Logs/mcquack/<agent-name>.log'
    
  4. Try running the exporter manually:

    ssh indri 'cat ~/Library/LaunchAgents/mcquack.<agent>.plist'
    # Find the ProgramArguments, run them manually
    

Common Causes

  • LaunchAgent not loadedlaunchctl load ~/Library/LaunchAgents/mcquack.<agent>.plist
  • Script error — the exporter script crashed; check logs
  • Permissions — the textfile directory is not writable
  • Indri reboot — some LaunchAgents may not auto-start