C2(deploy-infra-alerting): impl fix alert rule multi-series evaluation
Add reduce step between Prometheus query and threshold to preserve
per-service labels. Without it, Grafana can't distinguish the 5
probe_success series and errors with "duplicate results with labels {}".
Chain: A (prometheus query) → B (reduce last) → C (threshold < 1)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
549c57ab82
commit
94413f73ba
1 changed files with 15 additions and 5 deletions
|
|
@ -34,7 +34,7 @@ groups:
|
|||
rules:
|
||||
- uid: service-probe-failure
|
||||
title: ServiceProbeFailure
|
||||
condition: B
|
||||
condition: C
|
||||
for: 2m
|
||||
noDataState: Alerting
|
||||
execErrState: Alerting
|
||||
|
|
@ -62,8 +62,20 @@ groups:
|
|||
from: 0
|
||||
to: 0
|
||||
model:
|
||||
type: threshold
|
||||
type: reduce
|
||||
expression: A
|
||||
reducer: last
|
||||
settings:
|
||||
mode: dropNN
|
||||
refId: B
|
||||
- refId: C
|
||||
datasourceUid: "__expr__"
|
||||
relativeTimeRange:
|
||||
from: 0
|
||||
to: 0
|
||||
model:
|
||||
type: threshold
|
||||
expression: B
|
||||
conditions:
|
||||
- evaluator:
|
||||
type: lt
|
||||
|
|
@ -71,9 +83,7 @@ groups:
|
|||
- 1
|
||||
operator:
|
||||
type: and
|
||||
reducer:
|
||||
type: last
|
||||
refId: B
|
||||
refId: C
|
||||
|
||||
templates:
|
||||
- orgId: 1
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue