C2: Deploy infrastructure alerting pipeline #303

Merged
eblume merged 18 commits from mikado/deploy-infra-alerting into main 2026-03-22 14:52:56 -07:00
4 changed files with 51 additions and 0 deletions
Showing only changes of commit 261f20601a - Show all commits

C2(deploy-infra-alerting): impl configure grafana alerting pipeline

- Enable unified alerting in grafana.ini
- Create alerting.yaml provisioning file with:
  - ntfy-infra webhook contact point (POST to ntfy.ops.eblu.me/infra-alerts)
  - Notification policy: group_wait 1m, group_interval 12h, repeat_interval 24h
  - Message templates for title and runbook links
- Mount alerting provisioning into Grafana deployment
- Add alerting.yaml to kustomization configMapGenerator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Erich Blume 2026-03-22 10:35:36 -07:00

View file

@ -0,0 +1,42 @@
apiVersion: 1
contactPoints:
- orgId: 1
name: ntfy-infra
receivers:
- uid: ntfy-infra-webhook
type: webhook
settings:
url: https://ntfy.ops.eblu.me/infra-alerts
httpMethod: POST
title: >-
{{ template "ntfy-infra.title" . }}
message: >-
{{ template "ntfy-infra.message" . }}
maxAlerts: "0"
disableResolveMessage: false
policies:
- orgId: 1
receiver: ntfy-infra
group_by:
- alertname
- service
group_wait: 1m
group_interval: 12h
repeat_interval: 24h
templates:
- orgId: 1
name: ntfy-infra
template: |
{{ define "ntfy-infra.title" -}}
[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}
{{- end }}
{{ define "ntfy-infra.message" -}}
{{ range .Alerts -}}
{{ .Annotations.summary }}
{{ if .Annotations.runbook_url }}Runbook: {{ .Annotations.runbook_url }}{{ end }}
{{ end -}}
{{- end }}

View file

@ -277,6 +277,9 @@ spec:
- name: config
mountPath: /etc/grafana/provisioning/datasources/datasources.yaml
subPath: datasources.yaml
- name: config
mountPath: /etc/grafana/provisioning/alerting/alerting.yaml
subPath: alerting.yaml
- name: storage
mountPath: /var/lib/grafana
- name: sc-dashboard-volume

View file

@ -30,3 +30,8 @@ allow_embedding = false
[server]
root_url = https://grafana.ops.eblu.me
[unified_alerting]
enabled = true
evaluation_timeout = 30s
min_interval = 10s

View file

@ -25,6 +25,7 @@ configMapGenerator:
files:
- grafana.ini
- datasources.yaml
- alerting.yaml
options:
labels:
app.kubernetes.io/name: grafana