Update CI/CD plan: mark Phase 1 complete, add runner observability
All checks were successful
Test CI / test (push) Successful in 0s
All checks were successful
Test CI / test (push) Successful in 0s
- Mark Phase 1 (Enable Actions) as completed with date - Check off all verification items in P1 - Add Step 6 to Phase 4 for runner logging and metrics - Update overview table with status column Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
7893c41020
commit
6a436d141a
3 changed files with 108 additions and 14 deletions
|
|
@ -49,12 +49,12 @@ This plan details the setup of Forgejo Actions as the CI/CD system for blumeops,
|
||||||
|
|
||||||
## Phases
|
## Phases
|
||||||
|
|
||||||
| Phase | Name | Description |
|
| Phase | Name | Description | Status |
|
||||||
|-------|------|-------------|
|
|-------|------|-------------|--------|
|
||||||
| 1 | [Enable Actions](P1_enable_actions.md) | Configure Forgejo for Actions, deploy runner |
|
| 1 | [Enable Actions](P1_enable_actions.md) | Configure Forgejo for Actions, deploy runner | ✅ Complete |
|
||||||
| 2 | [Mirror & Build](P2_mirror_and_build.md) | Mirror upstream Forgejo, create build workflow |
|
| 2 | [Mirror & Build](P2_mirror_and_build.md) | Mirror upstream Forgejo, create build workflow | Planning |
|
||||||
| 3 | [Self-Deploy](P3_self_deploy.md) | Forgejo deploys itself, transition to mcquack |
|
| 3 | [Self-Deploy](P3_self_deploy.md) | Forgejo deploys itself, transition to mcquack | Planning |
|
||||||
| 4 | [Container Builds](P4_container_builds.md) | Build custom container images (devpi, etc.) |
|
| 4 | [Container Builds](P4_container_builds.md) | Build custom container images, runner observability | Planning |
|
||||||
|
|
||||||
## The Bootstrap Problem
|
## The Bootstrap Problem
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
**Goal**: Configure Forgejo to support Actions workflows and deploy a runner in k8s
|
**Goal**: Configure Forgejo to support Actions workflows and deploy a runner in k8s
|
||||||
|
|
||||||
**Status**: Planning
|
**Status**: Completed (2026-01-23)
|
||||||
|
|
||||||
**Prerequisites**: None (uses existing brew-based Forgejo)
|
**Prerequisites**: None (uses existing brew-based Forgejo)
|
||||||
|
|
||||||
|
|
@ -281,13 +281,13 @@ Check https://forge.tail8d86e.ts.net/eblume/blumeops/actions for the workflow ru
|
||||||
|
|
||||||
## Verification Checklist
|
## Verification Checklist
|
||||||
|
|
||||||
- [ ] Actions enabled in app.ini
|
- [x] Actions enabled in app.ini
|
||||||
- [ ] Forgejo restarted successfully
|
- [x] Forgejo restarted successfully
|
||||||
- [ ] Runner token stored in 1Password
|
- [x] Runner token stored in 1Password
|
||||||
- [ ] Runner deployment created in ArgoCD
|
- [x] Runner deployment created in ArgoCD
|
||||||
- [ ] Runner pod running in k8s
|
- [x] Runner pod running in k8s
|
||||||
- [ ] Runner shows as online in Forgejo admin
|
- [x] Runner shows as online in Forgejo admin
|
||||||
- [ ] Test workflow runs successfully
|
- [x] Test workflow runs successfully
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -455,6 +455,96 @@ Add Trivy or similar:
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Step 6: Runner Observability (Logging & Metrics)
|
||||||
|
|
||||||
|
### 6.1 Problem
|
||||||
|
|
||||||
|
The forgejo-runner pod generates logs and metrics that should be collected for:
|
||||||
|
- Debugging failed workflow runs
|
||||||
|
- Monitoring runner health and capacity
|
||||||
|
- Alerting on runner failures
|
||||||
|
|
||||||
|
### 6.2 Log Collection via Alloy
|
||||||
|
|
||||||
|
The forgejo-runner namespace needs to be included in Alloy's k8s log collection. Alloy is already configured to scrape logs from k8s pods - verify the runner namespace is included.
|
||||||
|
|
||||||
|
Check current Alloy config:
|
||||||
|
```bash
|
||||||
|
ssh indri 'cat ~/.config/alloy/config.alloy | grep -A20 discovery.kubernetes'
|
||||||
|
```
|
||||||
|
|
||||||
|
If using namespace filtering, ensure `forgejo-runner` is included.
|
||||||
|
|
||||||
|
### 6.3 Metrics Collection
|
||||||
|
|
||||||
|
The forgejo-runner exposes Prometheus metrics. Add a ServiceMonitor or configure Alloy to scrape:
|
||||||
|
|
||||||
|
**Option A: ServiceMonitor (if using Prometheus Operator)**
|
||||||
|
|
||||||
|
Create `argocd/manifests/forgejo-runner/servicemonitor.yaml`:
|
||||||
|
```yaml
|
||||||
|
apiVersion: monitoring.coreos.com/v1
|
||||||
|
kind: ServiceMonitor
|
||||||
|
metadata:
|
||||||
|
name: forgejo-runner
|
||||||
|
namespace: forgejo-runner
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: forgejo-runner
|
||||||
|
endpoints:
|
||||||
|
- port: metrics
|
||||||
|
interval: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option B: Alloy scrape config**
|
||||||
|
|
||||||
|
Add to Alloy's k8s scrape config to discover the runner pod's metrics endpoint.
|
||||||
|
|
||||||
|
### 6.4 Create Runner Service for Metrics
|
||||||
|
|
||||||
|
Add `argocd/manifests/forgejo-runner/service.yaml`:
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: forgejo-runner-metrics
|
||||||
|
namespace: forgejo-runner
|
||||||
|
labels:
|
||||||
|
app: forgejo-runner
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
app: forgejo-runner
|
||||||
|
ports:
|
||||||
|
- name: metrics
|
||||||
|
port: 8080
|
||||||
|
targetPort: 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
Update kustomization.yaml to include the service.
|
||||||
|
|
||||||
|
### 6.5 Grafana Dashboard
|
||||||
|
|
||||||
|
Consider creating a dashboard for:
|
||||||
|
- Runner status (online/offline)
|
||||||
|
- Job queue depth
|
||||||
|
- Job execution time
|
||||||
|
- Success/failure rates
|
||||||
|
|
||||||
|
### 6.6 Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check runner logs are appearing in Loki
|
||||||
|
# Go to Grafana → Explore → Loki
|
||||||
|
# Query: {namespace="forgejo-runner"}
|
||||||
|
|
||||||
|
# Check metrics are being scraped
|
||||||
|
# Go to Grafana → Explore → Prometheus
|
||||||
|
# Query: forgejo_runner_*
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Verification Checklist
|
## Verification Checklist
|
||||||
|
|
||||||
- [ ] devpi build workflow created
|
- [ ] devpi build workflow created
|
||||||
|
|
@ -464,6 +554,8 @@ Add Trivy or similar:
|
||||||
- [ ] Reusable container workflow created
|
- [ ] Reusable container workflow created
|
||||||
- [ ] (Optional) Python build workflow created
|
- [ ] (Optional) Python build workflow created
|
||||||
- [ ] (Optional) Scheduled builds configured
|
- [ ] (Optional) Scheduled builds configured
|
||||||
|
- [ ] Runner logs visible in Loki
|
||||||
|
- [ ] Runner metrics scraped by Prometheus/Alloy
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -474,8 +566,10 @@ With this phase complete, we have:
|
||||||
2. **Forgejo self-deploys** from CI on tagged releases
|
2. **Forgejo self-deploys** from CI on tagged releases
|
||||||
3. **Container images** built automatically on push
|
3. **Container images** built automatically on push
|
||||||
4. Infrastructure for Python package builds
|
4. Infrastructure for Python package builds
|
||||||
|
5. **Runner observability** with logs in Loki and metrics in Prometheus
|
||||||
|
|
||||||
The CI/CD bootstrap is complete. Future work:
|
The CI/CD bootstrap is complete. Future work:
|
||||||
- Add more container builds as needed
|
- Add more container builds as needed
|
||||||
- Add Python package publishing for internal tools
|
- Add Python package publishing for internal tools
|
||||||
- Consider adding a macOS runner on indri for native builds
|
- Consider adding a macOS runner on indri for native builds
|
||||||
|
- Create Grafana dashboards for CI/CD monitoring
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue