blumeops/docs/changelog.d
Erich Blume 6d65e6928c C2: Deploy infrastructure alerting pipeline (#303)
## Summary

Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications.

**Design:**
- Grafana Unified Alerting evaluates rules against Prometheus/Loki
- ntfy webhook contact point delivers iOS notifications
- Anti-noise policy: page once per 24h per alert group
- Every alert links to a runbook in `docs/how-to/alerts/`
- services-check eventually queries the alerting API instead of doing its own probes

**Chain (bottom-up):**
1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy
2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure
3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks
4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API
5. `deploy-infra-alerting` — goal card

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #303
2026-03-22 14:52:56 -07:00
..
+claude-code-subagents.ai.md Add Claude Code subagents for infrastructure workflows 2026-03-18 11:57:36 -07:00
+fix-borgmatic-kubectl-context.bugfix.md Fix borgmatic backup: use correct kubectl context on indri 2026-03-18 06:07:44 -07:00
+fix-frigate-mqtt-config.bugfix.md Fix Frigate crash: re-add required mqtt config section 2026-03-17 18:10:23 -07:00
+frigate-health-checks.infra.md Improve Frigate health checks to catch NFS and camera failures 2026-03-22 09:55:53 -07:00
+frigate-retention-and-check.infra.md Bump Frigate retention and add recording health check 2026-03-17 18:24:11 -07:00
+increase-retention.infra.md Update retention changelog to reflect final PVC decision 2026-03-18 06:46:55 -07:00
+oci-labels.infra.md Add consistent OCI labels to all container Dockerfiles 2026-03-18 20:42:00 -07:00
+usage-pragma-consistency.misc.md Standardize USAGE pragmas and typer parsing across mise tasks 2026-03-18 11:42:01 -07:00
.gitkeep Add towncrier changelog system (#86) 2026-02-03 11:48:13 -08:00
feature-localize-alloy-container.infra.md Localize Alloy container image (#300) 2026-03-17 16:42:53 -07:00
mikado-deploy-infra-alerting.feature.md C2: Deploy infrastructure alerting pipeline (#303) 2026-03-22 14:52:56 -07:00
upgrade-prometheus-3.10.0.infra.md Upgrade Prometheus to v3.10.0 (#301) 2026-03-18 07:47:46 -07:00