C2(deploy-infra-alerting): impl fix alert rule multi-series evaluation

Add reduce step between Prometheus query and threshold to preserve
per-service labels. Without it, Grafana can't distinguish the 5
probe_success series and errors with "duplicate results with labels {}".

Chain: A (prometheus query) → B (reduce last) → C (threshold < 1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-03-22 11:00:21 -07:00
commit 94413f73ba

View file

@ -34,7 +34,7 @@ groups:
rules:
- uid: service-probe-failure
title: ServiceProbeFailure
condition: B
condition: C
for: 2m
noDataState: Alerting
execErrState: Alerting
@ -62,8 +62,20 @@ groups:
from: 0
to: 0
model:
type: threshold
type: reduce
expression: A
reducer: last
settings:
mode: dropNN
refId: B
- refId: C
datasourceUid: "__expr__"
relativeTimeRange:
from: 0
to: 0
model:
type: threshold
expression: B
conditions:
- evaluator:
type: lt
@ -71,9 +83,7 @@ groups:
- 1
operator:
type: and
reducer:
type: last
refId: B
refId: C
templates:
- orgId: 1