blumeops/docs/how-to/runbooks/runbook-textfile-stale.md
Erich Blume 67883950c3 C2(deploy-infra-alerting): finalize rewrite cards as historical docs
Remove all Mikado frontmatter (status, branch, requires) from chain
cards. Rename docs/how-to/alerts/ to docs/how-to/runbooks/ and update
all runbook_url references. Add changelog fragment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 14:40:21 -07:00

1.7 KiB

title modified tags
Runbook: Textfile Stale 2026-03-22
how-to
alerting
runbook

Runbook: Textfile Stale

Alert name: TextfileStale

A Prometheus textfile collector .prom file on indri has not been updated for over 1 hour, indicating the metrics exporter script has stopped running.

Affected Textfiles

File LaunchAgent What it monitors
borgmatic.prom mcquack.eblume.borgmatic Backup status
zot.prom mcquack.eblume.zot Container registry
minikube.prom mcquack.minikube-metrics Minikube cluster status
jellyfin.prom mcquack.eblume.jellyfin-metrics Media server

Diagnostic Steps

  1. Check which file is stale — the file label in the alert tells you. Verify on indri:

    ssh indri 'ls -la /opt/homebrew/var/node_exporter/textfile/'
    
  2. Check if the LaunchAgent is running:

    ssh indri 'launchctl list | grep mcquack'
    
  3. Check LaunchAgent logs (plist defines stdout/stderr paths):

    ssh indri 'cat ~/Library/Logs/mcquack/<agent-name>.log'
    
  4. Try running the exporter manually:

    ssh indri 'cat ~/Library/LaunchAgents/mcquack.<agent>.plist'
    # Find the ProgramArguments, run them manually
    

Common Causes

  • LaunchAgent not loadedlaunchctl load ~/Library/LaunchAgents/mcquack.<agent>.plist
  • Script error — the exporter script crashed; check logs
  • Permissions — the textfile directory is not writable
  • Indri reboot — some LaunchAgents may not auto-start