C2(deploy-infra-alerting): close port-services-check-alerts

7 alert rules covering services-check probes:
- ServiceProbeFailure (11 HTTP probes via Alloy blackbox)
- PodNotReady (kube-state-metrics, both clusters)
- PostgresClusterUnhealthy (CNPG collector)
- TextfileStale (node_textfile_mtime_seconds)
- FrigateCameraDown (frigate_camera_fps)
- ArgoCDAppOutOfSync (argocd_app_info)

7 runbooks in docs/how-to/alerts/.

Remaining uncovered: local indri services (brew/launchctl), ringtail
SSH/tailscale, public Fly.io endpoints, k8s API health, frigate
storage. These are effectively covered by downstream alerts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-03-22 14:23:42 -07:00
commit 2e2a33d7ca

View file

@ -1,7 +1,6 @@
---
title: Port services-check Alerts to Grafana
modified: 2026-03-22
status: active
requires:
- first-alert-and-runbook
tags: