7 alert rules covering services-check probes: - ServiceProbeFailure (11 HTTP probes via Alloy blackbox) - PodNotReady (kube-state-metrics, both clusters) - PostgresClusterUnhealthy (CNPG collector) - TextfileStale (node_textfile_mtime_seconds) - FrigateCameraDown (frigate_camera_fps) - ArgoCDAppOutOfSync (argocd_app_info) 7 runbooks in docs/how-to/alerts/. Remaining uncovered: local indri services (brew/launchctl), ringtail SSH/tailscale, public Fly.io endpoints, k8s API health, frigate storage. These are effectively covered by downstream alerts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| alerts | ||
| authentik | ||
| configuration | ||
| dagger | ||
| deployment | ||
| forgejo-runner | ||
| grafana | ||
| jobsync | ||
| knowledgebase | ||
| mealie | ||
| operations | ||
| ringtail | ||
| zot | ||