C2: Deploy infrastructure alerting pipeline #303

Merged
eblume merged 18 commits from mikado/deploy-infra-alerting into main 2026-03-22 14:52:56 -07:00
Showing only changes of commit cdd85c7ac9 - Show all commits

C2(deploy-infra-alerting): impl fix TextfileStale to always return data

Query all textfile mtimes (time() - node_textfile_mtime_seconds) and
threshold at > 3600s, instead of filtering with > 3600 which returns
empty results when everything is fresh.

This means:
- Fresh textfiles: query returns low values, threshold not met → OK
- Stale textfiles: query returns high values, threshold met → Alerting
- Missing textfiles: series vanishes, noDataState=Alerting → Alerting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Erich Blume 2026-03-22 14:13:20 -07:00

View file

@ -93,7 +93,7 @@ groups:
title: TextfileStale
condition: C
for: 15m
noDataState: OK
noDataState: Alerting
execErrState: Alerting
annotations:
summary: >-
@ -110,7 +110,7 @@ groups:
to: 0
model:
expr: >-
time() - node_textfile_mtime_seconds > 3600
time() - node_textfile_mtime_seconds
interval: ""
refId: A
- refId: B
@ -137,7 +137,7 @@ groups:
- evaluator:
type: gt
params:
- 0
- 3600
operator:
type: and
refId: C