Add Forgejo repository health metrics and Grafana dashboard (#245)

## Summary
- New `forgejo_metrics` Ansible role that queries the Forgejo REST API every 60s and writes Prometheus textfile metrics (open PRs, issues, languages, releases, commits, Actions runs/duration/success)
- Grafana dashboard "Forgejo Repository Health" with 12 panels across 4 rows: overview stats, CI/CD health, repository info, and staleness tracking
- Deletes superseded `forgejo-actions-dashboard` plan doc (this implementation covers a broader scope)

## Deployment and Testing
- [ ] `mise run provision-indri -- --tags forgejo_metrics` to deploy the collector
- [ ] `ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/forgejo.prom'` to verify metrics
- [ ] `argocd app sync grafana-config` to deploy the dashboard
- [ ] Check Grafana dashboard "Forgejo Repository Health" loads with data
- [ ] `mise run services-check` passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/245
This commit is contained in:
Erich Blume 2026-02-22 11:16:03 -08:00
commit 2c081eed28
12 changed files with 989 additions and 201 deletions

View file

@ -195,6 +195,23 @@
no_log: true
tags: [jellyfin_metrics]
# Forgejo API token for metrics collection
- name: Fetch Forgejo API token for metrics
ansible.builtin.command:
cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/api-token"
delegate_to: localhost
register: _forgejo_metrics_api_token
changed_when: false
no_log: true
check_mode: false
tags: [forgejo_metrics]
- name: Set Forgejo metrics API token fact
ansible.builtin.set_fact:
forgejo_metrics_api_key: "{{ _forgejo_metrics_api_token.stdout }}"
no_log: true
tags: [forgejo_metrics]
roles:
- role: alloy
tags: alloy
@ -218,5 +235,7 @@
tags: jellyfin
- role: jellyfin_metrics
tags: jellyfin_metrics
- role: forgejo_metrics
tags: forgejo_metrics
- role: caddy
tags: caddy

View file

@ -0,0 +1,20 @@
---
# Forgejo metrics collection configuration
# Forgejo server URL
forgejo_metrics_url: "http://localhost:3001"
# Path to file containing Forgejo API token (should have 600 permissions)
forgejo_metrics_api_key_file: "/Users/erichblume/.forgejo-api-key"
# Metrics collection interval in seconds
forgejo_metrics_interval: 60
# Output directory for prometheus textfile collector
forgejo_metrics_dir: /opt/homebrew/var/node_exporter/textfile
# Script installation path
forgejo_metrics_script: /Users/erichblume/.local/bin/forgejo-metrics
# Log directory for metrics script output
forgejo_metrics_log_dir: /opt/homebrew/var/log

View file

@ -0,0 +1,6 @@
---
- name: Reload forgejo-metrics
ansible.builtin.shell: |
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo-metrics.plist 2>/dev/null || true
launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo-metrics.plist
changed_when: true

View file

@ -0,0 +1,55 @@
---
- name: Fetch Forgejo API token (when running with --tags forgejo_metrics)
ansible.builtin.command:
cmd: op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/w3663ffnvkewbftncqxtcpeavy/api-token"
delegate_to: localhost
register: forgejo_metrics_api_key_fallback
changed_when: false
no_log: true
check_mode: false
when: forgejo_metrics_api_key is not defined
- name: Set Forgejo API token fact (fallback)
ansible.builtin.set_fact:
forgejo_metrics_api_key: "{{ forgejo_metrics_api_key_fallback.stdout }}"
no_log: true
when: forgejo_metrics_api_key is not defined
- name: Write Forgejo API token file
ansible.builtin.copy:
content: "{{ forgejo_metrics_api_key }}"
dest: "{{ forgejo_metrics_api_key_file }}"
mode: '0600'
no_log: true
- name: Ensure bin directory exists
ansible.builtin.file:
path: "{{ forgejo_metrics_script | dirname }}"
state: directory
mode: '0755'
- name: Deploy forgejo metrics collection script
ansible.builtin.template:
src: forgejo-metrics.sh.j2
dest: "{{ forgejo_metrics_script }}"
mode: '0755'
notify: Reload forgejo-metrics
- name: Deploy forgejo-metrics LaunchAgent plist
ansible.builtin.template:
src: forgejo-metrics.plist.j2
dest: ~/Library/LaunchAgents/mcquack.eblume.forgejo-metrics.plist
mode: '0644'
notify: Reload forgejo-metrics
- name: Check if forgejo-metrics LaunchAgent is loaded
ansible.builtin.command: launchctl list mcquack.eblume.forgejo-metrics
register: forgejo_metrics_launchctl_check
changed_when: false
failed_when: false
- name: Load forgejo-metrics LaunchAgent if not loaded
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo-metrics.plist
when: forgejo_metrics_launchctl_check.rc != 0
changed_when: true
failed_when: false

View file

@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- {{ ansible_managed }} -->
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mcquack.eblume.forgejo-metrics</string>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>/opt/homebrew/bin:/usr/bin:/bin</string>
</dict>
<key>ProgramArguments</key>
<array>
<string>{{ forgejo_metrics_script }}</string>
</array>
<key>StartInterval</key>
<integer>{{ forgejo_metrics_interval }}</integer>
<key>RunAtLoad</key>
<true/>
<key>StandardErrorPath</key>
<string>{{ forgejo_metrics_log_dir }}/mcquack.forgejo-metrics.err.log</string>
<key>StandardOutPath</key>
<string>{{ forgejo_metrics_log_dir }}/mcquack.forgejo-metrics.out.log</string>
</dict>
</plist>

View file

@ -0,0 +1,162 @@
#!/bin/bash
# {{ ansible_managed }}
# Collects Forgejo repository health metrics for node_exporter textfile collector
set -euo pipefail
FORGEJO_URL="{{ forgejo_metrics_url }}"
API_KEY_FILE="{{ forgejo_metrics_api_key_file }}"
OUTPUT_FILE="{{ forgejo_metrics_dir }}/forgejo.prom"
TEMP_FILE="${OUTPUT_FILE}.tmp"
TOKEN=$(cat "$API_KEY_FILE" 2>/dev/null | tr -d '\n' || true)
# Authenticated API request; returns empty string on failure
api() {
curl -sf -H "Authorization: token ${TOKEN}" -H "Accept: application/json" \
"${FORGEJO_URL}/api/v1${1}" 2>/dev/null || echo ""
}
# jq helper: convert ISO 8601 timestamp (with any tz offset) to epoch seconds
# jq's fromdate only handles Z, so we parse the offset and apply it manually
JQ_EPOCH='def epoch: sub("[.][0-9]+"; "") | if test("[+-][0-9]{2}:[0-9]{2}$") then capture("^(?<dt>.*)(?<sign>[+-])(?<h>[0-9]{2}):(?<m>[0-9]{2})$") | (.dt + "Z" | fromdate) as $base | ((.h | tonumber) * 3600 + (.m | tonumber) * 60) as $off | if .sign == "-" then $base + $off else $base - $off end else sub("Z$"; "") + "Z" | fromdate end;'
forgejo_up=0
if curl -sf "${FORGEJO_URL}/api/v1/version" >/dev/null 2>&1; then
forgejo_up=1
fi
{
# --- Metric type declarations ---
cat << 'HEADER'
# HELP forgejo_up Forgejo server is up and responding
# TYPE forgejo_up gauge
# HELP forgejo_repo_open_pull_requests Number of open pull requests
# TYPE forgejo_repo_open_pull_requests gauge
# HELP forgejo_repo_open_issues Number of open issues
# TYPE forgejo_repo_open_issues gauge
# HELP forgejo_repo_language_bytes Repository language size in bytes
# TYPE forgejo_repo_language_bytes gauge
# HELP forgejo_repo_releases_total Total number of releases
# TYPE forgejo_repo_releases_total gauge
# HELP forgejo_repo_latest_release_timestamp_seconds Unix timestamp of the latest release
# TYPE forgejo_repo_latest_release_timestamp_seconds gauge
# HELP forgejo_repo_latest_commit_timestamp_seconds Unix timestamp of the latest commit on default branch
# TYPE forgejo_repo_latest_commit_timestamp_seconds gauge
# HELP forgejo_actions_runs_total Action runs by status from most recent 30
# TYPE forgejo_actions_runs_total gauge
# HELP forgejo_actions_run_duration_seconds Duration of the latest completed run per workflow in seconds
# TYPE forgejo_actions_run_duration_seconds gauge
# HELP forgejo_actions_last_success_timestamp_seconds Unix timestamp of last successful run per workflow
# TYPE forgejo_actions_last_success_timestamp_seconds gauge
# HELP forgejo_actions_jobs_waiting Number of action runs currently waiting or queued
# TYPE forgejo_actions_jobs_waiting gauge
# HELP forgejo_actions_jobs_running Number of action runs currently in progress
# TYPE forgejo_actions_jobs_running gauge
HEADER
echo "forgejo_up ${forgejo_up}"
if [ "$forgejo_up" -eq 1 ] && [ -n "$TOKEN" ]; then
# Discover all repos accessible to the token owner
repos_json=$(api "/repos/search?limit=50")
[ -z "$repos_json" ] && repos_json='{"data":[]}'
repo_count=$(echo "$repos_json" | jq '.data | length' 2>/dev/null || echo "0")
for i in $(seq 0 $((repo_count - 1))); do
repo_data=$(echo "$repos_json" | jq ".data[$i]")
full_name=$(echo "$repo_data" | jq -r '.full_name')
[ -z "$full_name" ] || [ "$full_name" = "null" ] && continue
r="$full_name"
# Basic repo metrics (from search results — no extra API call)
echo "forgejo_repo_open_pull_requests{repo=\"${r}\"} $(echo "$repo_data" | jq '.open_pr_counter // 0')"
echo "forgejo_repo_open_issues{repo=\"${r}\"} $(echo "$repo_data" | jq '.open_issues_count // 0')"
default_branch=$(echo "$repo_data" | jq -r '.default_branch // "main"')
# --- Languages ---
langs=$(api "/repos/${r}/languages")
if [ -n "$langs" ] && echo "$langs" | jq -e 'type == "object" and length > 0' >/dev/null 2>&1; then
echo "$langs" | jq -r --arg r "$r" \
'to_entries[] | "forgejo_repo_language_bytes{repo=\"\($r)\",language=\"\(.key)\"} \(.value)"' \
2>/dev/null || true
fi
# --- Releases ---
releases=$(api "/repos/${r}/releases?limit=50")
if [ -n "$releases" ] && echo "$releases" | jq -e 'type == "array"' >/dev/null 2>&1; then
echo "forgejo_repo_releases_total{repo=\"${r}\"} $(echo "$releases" | jq 'length')"
# Latest release timestamp and version
echo "$releases" | jq -r --arg r "$r" "${JQ_EPOCH}"'
if length > 0 then
.[0] |
"forgejo_repo_latest_release_timestamp_seconds{repo=\"\($r)\",version=\"\(.tag_name)\"} \((.published_at // .created_at // .created) | epoch)"
else empty end' 2>/dev/null || true
else
echo "forgejo_repo_releases_total{repo=\"${r}\"} 0"
fi
# --- Latest commit on default branch ---
commits=$(api "/repos/${r}/commits?limit=1&sha=${default_branch}")
if [ -n "$commits" ] && echo "$commits" | jq -e 'type == "array" and length > 0' >/dev/null 2>&1; then
echo "$commits" | jq -r --arg r "$r" "${JQ_EPOCH}"'
.[0] |
"forgejo_repo_latest_commit_timestamp_seconds{repo=\"\($r)\"} \((.created // .commit.committer.date) | epoch)"' \
2>/dev/null || true
fi
# --- Action runs ---
runs_json=$(api "/repos/${r}/actions/runs?limit=30")
if [ -n "$runs_json" ] && echo "$runs_json" | jq -e '.workflow_runs | type == "array"' >/dev/null 2>&1; then
# Count by status
echo "$runs_json" | jq -r --arg r "$r" '
.workflow_runs | group_by(.status) | .[] |
"forgejo_actions_runs_total{repo=\"\($r)\",status=\"\(.[0].status)\"} \(length)"' \
2>/dev/null || true
# Jobs waiting/running
waiting=$(echo "$runs_json" | jq '[.workflow_runs[] | select(.status == "waiting" or .status == "queued")] | length' 2>/dev/null || echo "0")
running=$(echo "$runs_json" | jq '[.workflow_runs[] | select(.status == "running")] | length' 2>/dev/null || echo "0")
echo "forgejo_actions_jobs_waiting{repo=\"${r}\"} ${waiting}"
echo "forgejo_actions_jobs_running{repo=\"${r}\"} ${running}"
# Discover current workflow files on the default branch (.forgejo/ or .github/)
current_wfs=""
for wf_dir in .forgejo/workflows .github/workflows; do
wf_list=$(api "/repos/${r}/contents/${wf_dir}?ref=${default_branch}")
if [ -n "$wf_list" ] && echo "$wf_list" | jq -e 'type == "array"' >/dev/null 2>&1; then
current_wfs=$(echo "$wf_list" | jq -r '[.[].name] | join(",")' 2>/dev/null || true)
break
fi
done
# Per-workflow: latest completed run duration and last success timestamp
# Only include workflows that currently exist on the default branch
# Forgejo fields: workflow_id (filename), created/stopped, duration (nanoseconds)
if [ -n "$current_wfs" ]; then
echo "$runs_json" | jq -r --arg r "$r" --arg wfs "$current_wfs" "${JQ_EPOCH}"'
($wfs | split(",")) as $current |
[.workflow_runs[] | select((.status == "success" or .status == "failure") and (.workflow_id | IN($current[])))] |
if length > 0 then
group_by(.workflow_id) | .[] |
(sort_by(.created) | reverse) as $sorted |
($sorted[0]) as $latest |
($latest.workflow_id | sub("[.]ya?ml$"; "")) as $wf |
"forgejo_actions_run_duration_seconds{repo=\"\($r)\",workflow=\"\($wf)\"} \(($latest.duration // 0) / 1000000000 | floor)",
([$sorted[] | select(.status == "success")] |
if length > 0 then
.[0] as $last_ok |
"forgejo_actions_last_success_timestamp_seconds{repo=\"\($r)\",workflow=\"\($wf)\"} \($last_ok.stopped | epoch)"
else empty end)
else empty end' 2>/dev/null || true
fi
fi
done
fi
} > "$TEMP_FILE"
# Atomic move
mv "$TEMP_FILE" "$OUTPUT_FILE"

View file

@ -0,0 +1,699 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-forgejo
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
forgejo.json: |
{
"annotations": {
"list": []
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"mappings": [
{ "options": { "0": { "color": "red", "index": 0, "text": "DOWN" } }, "type": "value" },
{ "options": { "1": { "color": "green", "index": 1, "text": "UP" } }, "type": "value" }
],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "red", "value": null },
{ "color": "green", "value": 1 }
]
}
},
"overrides": []
},
"gridPos": { "h": 4, "w": 4, "x": 0, "y": 0 },
"id": 1,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "forgejo_up",
"refId": "A"
}
],
"title": "Forgejo Status",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 5 },
{ "color": "red", "value": 10 }
]
}
},
"overrides": []
},
"gridPos": { "h": 4, "w": 4, "x": 4, "y": 0 },
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "sum(forgejo_repo_open_pull_requests{repo=~\"$repo\"})",
"refId": "A"
}
],
"title": "Open PRs",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 10 },
{ "color": "red", "value": 25 }
]
}
},
"overrides": []
},
"gridPos": { "h": 4, "w": 4, "x": 8, "y": 0 },
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "sum(forgejo_repo_open_issues{repo=~\"$repo\"})",
"refId": "A"
}
],
"title": "Open Issues",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 604800 },
{ "color": "red", "value": 2592000 }
]
},
"unit": "dtdurations"
},
"overrides": []
},
"gridPos": { "h": 4, "w": 4, "x": 12, "y": 0 },
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "time() - max(forgejo_repo_latest_release_timestamp_seconds{repo=~\"$repo\"})",
"refId": "A"
}
],
"title": "Latest Release Age",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 1 },
{ "color": "red", "value": 3 }
]
}
},
"overrides": []
},
"gridPos": { "h": 4, "w": 4, "x": 16, "y": 0 },
"id": 5,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "sum(forgejo_actions_jobs_running{repo=~\"$repo\"})",
"legendFormat": "Running",
"refId": "A"
},
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "sum(forgejo_actions_jobs_waiting{repo=~\"$repo\"})",
"legendFormat": "Waiting",
"refId": "B"
}
],
"title": "Jobs Running / Waiting",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "thresholds" },
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 43200 },
{ "color": "red", "value": 86400 }
]
},
"unit": "dtdurations"
},
"overrides": []
},
"gridPos": { "h": 4, "w": 4, "x": 20, "y": 0 },
"id": 6,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
"textMode": "auto"
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "time() - max(forgejo_actions_last_success_timestamp_seconds{repo=~\"$repo\"})",
"refId": "A"
}
],
"title": "Last Successful Build",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "bars",
"fillOpacity": 80,
"gradientMode": "none",
"hideFrom": { "legend": false, "tooltip": false, "viz": false },
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": { "type": "linear" },
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "normal" },
"thresholdsStyle": { "mode": "off" }
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [{ "color": "green", "value": null }]
}
},
"overrides": [
{
"matcher": { "id": "byName", "options": "success" },
"properties": [{ "id": "color", "value": { "fixedColor": "green", "mode": "fixed" } }]
},
{
"matcher": { "id": "byName", "options": "failure" },
"properties": [{ "id": "color", "value": { "fixedColor": "red", "mode": "fixed" } }]
}
]
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 4 },
"id": 7,
"options": {
"legend": {
"calcs": ["sum"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": { "mode": "multi", "sort": "desc" }
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "sum by (status) (forgejo_actions_runs_total{repo=~\"$repo\"})",
"legendFormat": "{{status}}",
"refId": "A"
}
],
"title": "Actions Runs by Status",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": { "legend": false, "tooltip": false, "viz": false },
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": { "type": "linear" },
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "none" },
"thresholdsStyle": { "mode": "off" }
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [{ "color": "green", "value": null }]
},
"unit": "s"
},
"overrides": []
},
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 4 },
"id": 8,
"options": {
"legend": {
"calcs": ["mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": { "mode": "multi", "sort": "desc" }
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "forgejo_actions_run_duration_seconds{repo=~\"$repo\"}",
"legendFormat": "{{repo}} / {{workflow}}",
"refId": "A"
}
],
"title": "Run Duration by Workflow",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 30,
"gradientMode": "none",
"hideFrom": { "legend": false, "tooltip": false, "viz": false },
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": { "type": "linear" },
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "normal" },
"thresholdsStyle": { "mode": "off" }
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [{ "color": "green", "value": null }]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 12 },
"id": 9,
"options": {
"legend": {
"calcs": ["lastNotNull"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": { "mode": "multi", "sort": "desc" }
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "sum by (language) (forgejo_repo_language_bytes{repo=~\"$repo\"})",
"legendFormat": "{{language}}",
"refId": "A"
}
],
"title": "Language Distribution",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": { "legend": false, "tooltip": false, "viz": false },
"insertNulls": false,
"lineInterpolation": "stepAfter",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": { "type": "linear" },
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "none" },
"thresholdsStyle": { "mode": "off" }
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [{ "color": "green", "value": null }]
}
},
"overrides": []
},
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 12 },
"id": 10,
"options": {
"legend": {
"calcs": ["lastNotNull"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": { "mode": "multi", "sort": "desc" }
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "forgejo_repo_releases_total{repo=~\"$repo\"}",
"legendFormat": "{{repo}}",
"refId": "A"
}
],
"title": "Releases Total by Repository",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": { "legend": false, "tooltip": false, "viz": false },
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": { "type": "linear" },
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "none" },
"thresholdsStyle": { "mode": "off" }
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 43200 },
{ "color": "red", "value": 86400 }
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 20 },
"id": 11,
"options": {
"legend": {
"calcs": ["lastNotNull"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": { "mode": "multi", "sort": "desc" }
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "time() - forgejo_actions_last_success_timestamp_seconds{repo=~\"$repo\"}",
"legendFormat": "{{repo}} / {{workflow}}",
"refId": "A"
}
],
"title": "Time Since Last Success",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 30,
"gradientMode": "none",
"hideFrom": { "legend": false, "tooltip": false, "viz": false },
"insertNulls": false,
"lineInterpolation": "stepAfter",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": { "type": "linear" },
"showPoints": "never",
"spanNulls": false,
"stacking": { "group": "A", "mode": "normal" },
"thresholdsStyle": { "mode": "off" }
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [{ "color": "green", "value": null }]
}
},
"overrides": [
{
"matcher": { "id": "byName", "options": "Waiting" },
"properties": [{ "id": "color", "value": { "fixedColor": "yellow", "mode": "fixed" } }]
},
{
"matcher": { "id": "byName", "options": "Running" },
"properties": [{ "id": "color", "value": { "fixedColor": "blue", "mode": "fixed" } }]
}
]
},
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 20 },
"id": 12,
"options": {
"legend": {
"calcs": ["mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": { "mode": "multi", "sort": "desc" }
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "sum(forgejo_actions_jobs_waiting{repo=~\"$repo\"})",
"legendFormat": "Waiting",
"refId": "A"
},
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"expr": "sum(forgejo_actions_jobs_running{repo=~\"$repo\"})",
"legendFormat": "Running",
"refId": "B"
}
],
"title": "Queue Depth",
"type": "timeseries"
}
],
"refresh": "30s",
"schemaVersion": 38,
"tags": ["forgejo", "ci-cd", "repository"],
"templating": {
"list": [
{
"current": { "selected": true, "text": "All", "value": "$__all" },
"datasource": { "type": "prometheus", "uid": "prometheus" },
"definition": "label_values(forgejo_repo_open_pull_requests, repo)",
"includeAll": true,
"multi": true,
"name": "repo",
"label": "Repository",
"query": { "query": "label_values(forgejo_repo_open_pull_requests, repo)", "refId": "StandardVariableQuery" },
"refresh": 2,
"sort": 1,
"type": "query"
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Forgejo Repository Health",
"uid": "forgejo",
"version": 1,
"weekStart": ""
}

View file

@ -23,6 +23,7 @@ resources:
- dashboards/configmap-docs-apm.yaml
- dashboards/configmap-flyio.yaml
- dashboards/configmap-sifaka-disks.yaml
- dashboards/configmap-forgejo.yaml
# TeslaMate dashboards
- dashboards/configmap-teslamate-overview.yaml
- dashboards/configmap-teslamate-charges.yaml

View file

@ -0,0 +1 @@
Add Forgejo repository health metrics collector and Grafana dashboard with CI/CD, release, and language tracking across all repos.

View file

@ -60,7 +60,6 @@ Migration and transition plans for upcoming infrastructure changes.
| [[adopt-dagger-ci]] | Adopt Dagger as CI/CD build engine |
| [[upstream-fork-strategy]] | Stacked-branch forking strategy for upstream projects |
| [[adopt-oidc-provider]] | Deploy OIDC identity provider for SSO across services |
| [[forgejo-actions-dashboard]] | Grafana dashboard for Forgejo Actions CI metrics |
| [[upgrade-grafana-helm-chart]] | Upgrade Grafana Helm chart from 8.8.2 to 11.x |
| [[operationalize-reolink-camera]] | Cloud-free NVR with Frigate and ring buffer recording |

View file

@ -1,199 +0,0 @@
---
title: "Plan: Forgejo Actions Dashboard"
modified: 2026-02-11
tags:
- how-to
- plans
- forgejo
- monitoring
- grafana
---
# Plan: Forgejo Actions Dashboard
> **Status:** Planned (not yet executed)
## Background
BlumeOps CI/CD runs on Forgejo Actions. There is currently no visibility into CI health — no metrics on job success/failure rates, durations, queue depth, or runner status. When a build fails silently or takes longer than expected, the only way to notice is to check the Forgejo web UI manually.
### Goals
- **Grafana dashboard** showing CI health at a glance: recent runs, pass/fail rates, durations, queue depth
- **Prometheus metrics** for Forgejo Actions data, following the established textfile exporter pattern
- **Alerting foundation** — once metrics exist, alerts can be added later (e.g., "no successful build in 24h")
## Current State
### What Forgejo Exposes
**Built-in `/metrics` endpoint:** No Actions data. The Prometheus endpoint (currently disabled in `app.ini`) only exposes platform-level counters (`gitea_repositories`, `gitea_issues`, etc.). There is an [open feature request](https://codeberg.org/forgejo/forgejo/issues/4803) to add Actions metrics, but it is not yet implemented.
**API (v11+):** Rich Actions data is available via REST API. BlumeOps runs Forgejo v14.0.2, so all endpoints are available:
| Endpoint | Data |
|----------|------|
| `GET /api/v1/repos/{owner}/{repo}/actions/runs` | Workflow runs: status, duration, timestamps, workflow ID, event, commit SHA |
| `GET /api/v1/repos/{owner}/{repo}/actions/tasks` | Tasks: status, timestamps, workflow ID, run number |
| `GET /api/v1/admin/actions/runners/jobs` | Global job search: status, runner labels, dependencies |
| `GET /api/v1/repos/{owner}/{repo}/actions/runners/jobs` | Per-repo job search |
### Existing Metrics Pattern
Custom exporters on indri follow a consistent pattern:
1. **Bash script** polls a local API and writes `.prom` files
2. **LaunchAgent** runs the script on a schedule (e.g., every 60s)
3. **node_exporter textfile collector** picks up `.prom` files from `/opt/homebrew/var/node_exporter/textfile/`
4. **Alloy** scrapes node_exporter and remote-writes to Prometheus
5. **Grafana dashboard** in a ConfigMap auto-discovered by the sidecar
Examples: `ansible/roles/zot_metrics/`, `ansible/roles/borgmatic_metrics/`, `ansible/roles/jellyfin_metrics/`
### Grafana Dashboard Pattern
Dashboards are stored as ConfigMaps in `argocd/manifests/grafana-config/dashboards/` with label `grafana_dashboard: "1"`. The Grafana sidecar auto-discovers and provisions them. See `configmap-zot.yaml` or `configmap-services.yaml` for examples.
## Plan
### 1. Create `forgejo_actions_metrics` Ansible Role
A new role following the established pattern:
```
ansible/roles/forgejo_actions_metrics/
├── defaults/main.yml # API URL, token var, output dir, repos list
├── tasks/main.yml # Deploy script + LaunchAgent
└── templates/
├── forgejo-actions-metrics.sh.j2 # Collection script
└── forgejo-actions-metrics.plist.j2 # LaunchAgent
```
**The collection script** polls the Forgejo API and writes Prometheus-format metrics:
```
# HELP forgejo_actions_runs_total Total workflow runs by status
# TYPE forgejo_actions_runs_total gauge
forgejo_actions_runs_total{repo="blumeops",status="success"} 42
forgejo_actions_runs_total{repo="blumeops",status="failure"} 3
forgejo_actions_runs_total{repo="blumeops",status="running"} 1
# HELP forgejo_actions_run_duration_seconds Duration of recent workflow runs
# TYPE forgejo_actions_run_duration_seconds gauge
forgejo_actions_run_duration_seconds{repo="blumeops",workflow="build-blumeops",status="success"} 127
# HELP forgejo_actions_jobs_waiting Number of jobs waiting in queue
# TYPE forgejo_actions_jobs_waiting gauge
forgejo_actions_jobs_waiting 0
# HELP forgejo_actions_jobs_running Number of jobs currently running
# TYPE forgejo_actions_jobs_running gauge
forgejo_actions_jobs_running 1
# HELP forgejo_actions_last_success_timestamp_seconds Unix timestamp of last successful run
# TYPE forgejo_actions_last_success_timestamp_seconds gauge
forgejo_actions_last_success_timestamp_seconds{repo="blumeops",workflow="build-blumeops"} 1707600000
# HELP forgejo_actions_up Forgejo Actions API is reachable
# TYPE forgejo_actions_up gauge
forgejo_actions_up 1
```
**Metrics to expose** (refine during implementation):
| Metric | Type | Labels | Source |
|--------|------|--------|--------|
| `forgejo_actions_up` | gauge | — | API reachability check |
| `forgejo_actions_runs_total` | gauge | `repo`, `status` | `/actions/runs` filtered by status |
| `forgejo_actions_run_duration_seconds` | gauge | `repo`, `workflow`, `status` | Most recent run per workflow |
| `forgejo_actions_jobs_waiting` | gauge | — | `/actions/runners/jobs` filtered by status |
| `forgejo_actions_jobs_running` | gauge | — | `/actions/runners/jobs` filtered by status |
| `forgejo_actions_last_success_timestamp_seconds` | gauge | `repo`, `workflow` | Most recent successful run timestamp |
| `forgejo_actions_last_run_status` | gauge | `repo`, `workflow` | 1=success, 0=failure (last run per workflow) |
**Authentication:** The script needs a Forgejo API token. The existing `_forgejo_api_token` pattern from the playbook's `pre_tasks` can be reused, or a dedicated read-only token can be created and stored in 1Password.
**Repos to monitor:** Start with `eblume/blumeops` (the only repo with active workflows). The role should accept a list of repos so more can be added later.
**Collection interval:** 60 seconds (same as zot_metrics, jellyfin_metrics).
### 2. Create Grafana Dashboard ConfigMap
Add `argocd/manifests/grafana-config/dashboards/configmap-forgejo-actions.yaml` with a dashboard showing:
- **Overview row:** jobs running, jobs waiting, last build status
- **Success/failure trend:** runs by status over time
- **Duration trend:** run duration over time, per workflow
- **Staleness:** time since last successful build per workflow
- **Table:** recent runs with status, duration, commit
The specific dashboard layout will be designed during implementation — this plan focuses on the data pipeline.
### 3. Wire Into Ansible Playbook
Add the new role to `ansible/playbooks/indri.yml` alongside the other metrics roles:
```yaml
- role: forgejo_actions_metrics
tags: forgejo_actions_metrics
```
No changes needed to Alloy — it already picks up all `.prom` files from the textfile directory.
## Execution Steps
1. **Create the Ansible role** (`ansible/roles/forgejo_actions_metrics/`)
- Write collection script that queries the Forgejo API
- Write LaunchAgent plist
- Add to `indri.yml` playbook
2. **Create or reuse API token**
- Check if existing Forgejo API token has sufficient permissions
- If not, create a dedicated read-only token and store in 1Password
3. **Deploy and verify metrics collection**
- `mise run provision-indri -- --tags forgejo_actions_metrics`
- Verify `.prom` file appears in textfile directory
- Verify metrics appear in Prometheus: `curl 'https://prometheus.ops.eblu.me/api/v1/query?query=forgejo_actions_up'`
4. **Create Grafana dashboard ConfigMap**
- Build dashboard JSON (can use Grafana UI, then export)
- Wrap in ConfigMap with `grafana_dashboard: "1"` label
- Sync via ArgoCD
5. **Update documentation**
- Add changelog fragment
- Update `docs/reference/services/forgejo.md` if it exists, or note in the plan's reference card
## Verification Checklist
- [ ] Collection script runs without errors on indri
- [ ] `.prom` file in `/opt/homebrew/var/node_exporter/textfile/` has expected metrics
- [ ] Metrics queryable in Prometheus
- [ ] Grafana dashboard loads and shows data
- [ ] LaunchAgent survives indri restart
- [ ] `mise run services-check` passes
## Open Questions
- **Scope of repos:** Start with `eblume/blumeops` only, or also monitor mirrored repos that have workflows?
- **Historical depth:** How far back should the script query? The API paginates — querying the last N runs (e.g., 50) per repo is likely sufficient rather than scanning all history.
- **Runner health:** The Forgejo API does not expose a runner list endpoint. Runner health could be inferred (if jobs stay in "waiting" too long, the runner is likely down), but direct runner metrics aren't available without querying the Forgejo database directly.
## Reference Pattern Files
| File | Purpose |
|------|---------|
| `ansible/roles/zot_metrics/` | Textfile exporter role pattern (simplest example) |
| `ansible/roles/borgmatic_metrics/` | More complex exporter with multiple metrics |
| `ansible/roles/jellyfin_metrics/` | Exporter with API key authentication |
| `argocd/manifests/grafana-config/dashboards/configmap-zot.yaml` | Dashboard ConfigMap pattern |
| `argocd/manifests/grafana-config/dashboards/configmap-services.yaml` | Multi-panel dashboard example |
| `ansible/roles/forgejo/templates/app.ini.j2` | Forgejo configuration |
| `ansible/roles/alloy/templates/config.alloy.j2` | Alloy config (textfile collector) |
## Related
- [[forgejo]] — Forgejo service reference
- [[cluster]] — Grafana and Prometheus run here
- [[grafana]] — Dashboard host

View file

@ -18,7 +18,6 @@ Plans differ from regular how-to guides in that they describe work that has been
| [[add-unifi-pulumi-stack]] | Abandoned | Add Pulumi IaC for UniFi Express 7 (provider bugs — see doc) |
| [[upstream-fork-strategy]] | Planned | Stacked-branch forking strategy for tracking upstream projects |
| [[adopt-oidc-provider]] | Completed | Deploy OIDC identity provider for SSO across services |
| [[forgejo-actions-dashboard]] | Planned | Grafana dashboard and custom Prometheus exporter for Forgejo Actions CI metrics |
| [[upgrade-grafana-helm-chart]] | Planned | Upgrade Grafana Helm chart from 8.8.2 to 11.x (3 phases) |
| [[deploy-authentik]] | Active (C2) | Deploy Authentik IdP — Mikado chain tracked in `how-to/authentik/` |
| [[operationalize-reolink-camera]] | Planned | Cloud-free NVR with Frigate, object detection, and ring buffer recording to sifaka |