## Summary Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications. **Design:** - Grafana Unified Alerting evaluates rules against Prometheus/Loki - ntfy webhook contact point delivers iOS notifications - Anti-noise policy: page once per 24h per alert group - Every alert links to a runbook in `docs/how-to/alerts/` - services-check eventually queries the alerting API instead of doing its own probes **Chain (bottom-up):** 1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy 2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure 3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks 4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API 5. `deploy-infra-alerting` — goal card 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #303
58 lines
1.7 KiB
Markdown
58 lines
1.7 KiB
Markdown
---
|
|
title: "Runbook: Textfile Stale"
|
|
modified: 2026-03-22
|
|
tags:
|
|
- how-to
|
|
- alerting
|
|
- runbook
|
|
---
|
|
|
|
# Runbook: Textfile Stale
|
|
|
|
**Alert name:** `TextfileStale`
|
|
|
|
A Prometheus textfile collector `.prom` file on indri has not been updated for over 1 hour, indicating the metrics exporter script has stopped running.
|
|
|
|
## Affected Textfiles
|
|
|
|
| File | LaunchAgent | What it monitors |
|
|
|------|-------------|------------------|
|
|
| `borgmatic.prom` | `mcquack.eblume.borgmatic` | Backup status |
|
|
| `zot.prom` | `mcquack.eblume.zot` | Container registry |
|
|
| `minikube.prom` | `mcquack.minikube-metrics` | Minikube cluster status |
|
|
| `jellyfin.prom` | `mcquack.eblume.jellyfin-metrics` | Media server |
|
|
|
|
## Diagnostic Steps
|
|
|
|
1. **Check which file is stale** — the `file` label in the alert tells you. Verify on indri:
|
|
```fish
|
|
ssh indri 'ls -la /opt/homebrew/var/node_exporter/textfile/'
|
|
```
|
|
|
|
2. **Check if the LaunchAgent is running**:
|
|
```fish
|
|
ssh indri 'launchctl list | grep mcquack'
|
|
```
|
|
|
|
3. **Check LaunchAgent logs** (plist defines stdout/stderr paths):
|
|
```fish
|
|
ssh indri 'cat ~/Library/Logs/mcquack/<agent-name>.log'
|
|
```
|
|
|
|
4. **Try running the exporter manually**:
|
|
```fish
|
|
ssh indri 'cat ~/Library/LaunchAgents/mcquack.<agent>.plist'
|
|
# Find the ProgramArguments, run them manually
|
|
```
|
|
|
|
## Common Causes
|
|
|
|
- **LaunchAgent not loaded** — `launchctl load ~/Library/LaunchAgents/mcquack.<agent>.plist`
|
|
- **Script error** — the exporter script crashed; check logs
|
|
- **Permissions** — the textfile directory is not writable
|
|
- **Indri reboot** — some LaunchAgents may not auto-start
|
|
|
|
## Related
|
|
|
|
- [[alloy]] — Collects textfile metrics via `prometheus.exporter.unix`
|
|
- [[deploy-infra-alerting]] — Alerting pipeline overview
|