Update CI/CD plan: mark Phase 1 complete, add runner observability
All checks were successful
Test CI / test (push) Successful in 0s
All checks were successful
Test CI / test (push) Successful in 0s
- Mark Phase 1 (Enable Actions) as completed with date - Check off all verification items in P1 - Add Step 6 to Phase 4 for runner logging and metrics - Update overview table with status column Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
7893c41020
commit
6a436d141a
3 changed files with 108 additions and 14 deletions
|
|
@ -49,12 +49,12 @@ This plan details the setup of Forgejo Actions as the CI/CD system for blumeops,
|
|||
|
||||
## Phases
|
||||
|
||||
| Phase | Name | Description |
|
||||
|-------|------|-------------|
|
||||
| 1 | [Enable Actions](P1_enable_actions.md) | Configure Forgejo for Actions, deploy runner |
|
||||
| 2 | [Mirror & Build](P2_mirror_and_build.md) | Mirror upstream Forgejo, create build workflow |
|
||||
| 3 | [Self-Deploy](P3_self_deploy.md) | Forgejo deploys itself, transition to mcquack |
|
||||
| 4 | [Container Builds](P4_container_builds.md) | Build custom container images (devpi, etc.) |
|
||||
| Phase | Name | Description | Status |
|
||||
|-------|------|-------------|--------|
|
||||
| 1 | [Enable Actions](P1_enable_actions.md) | Configure Forgejo for Actions, deploy runner | ✅ Complete |
|
||||
| 2 | [Mirror & Build](P2_mirror_and_build.md) | Mirror upstream Forgejo, create build workflow | Planning |
|
||||
| 3 | [Self-Deploy](P3_self_deploy.md) | Forgejo deploys itself, transition to mcquack | Planning |
|
||||
| 4 | [Container Builds](P4_container_builds.md) | Build custom container images, runner observability | Planning |
|
||||
|
||||
## The Bootstrap Problem
|
||||
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
**Goal**: Configure Forgejo to support Actions workflows and deploy a runner in k8s
|
||||
|
||||
**Status**: Planning
|
||||
**Status**: Completed (2026-01-23)
|
||||
|
||||
**Prerequisites**: None (uses existing brew-based Forgejo)
|
||||
|
||||
|
|
@ -281,13 +281,13 @@ Check https://forge.tail8d86e.ts.net/eblume/blumeops/actions for the workflow ru
|
|||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] Actions enabled in app.ini
|
||||
- [ ] Forgejo restarted successfully
|
||||
- [ ] Runner token stored in 1Password
|
||||
- [ ] Runner deployment created in ArgoCD
|
||||
- [ ] Runner pod running in k8s
|
||||
- [ ] Runner shows as online in Forgejo admin
|
||||
- [ ] Test workflow runs successfully
|
||||
- [x] Actions enabled in app.ini
|
||||
- [x] Forgejo restarted successfully
|
||||
- [x] Runner token stored in 1Password
|
||||
- [x] Runner deployment created in ArgoCD
|
||||
- [x] Runner pod running in k8s
|
||||
- [x] Runner shows as online in Forgejo admin
|
||||
- [x] Test workflow runs successfully
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -455,6 +455,96 @@ Add Trivy or similar:
|
|||
|
||||
---
|
||||
|
||||
## Step 6: Runner Observability (Logging & Metrics)
|
||||
|
||||
### 6.1 Problem
|
||||
|
||||
The forgejo-runner pod generates logs and metrics that should be collected for:
|
||||
- Debugging failed workflow runs
|
||||
- Monitoring runner health and capacity
|
||||
- Alerting on runner failures
|
||||
|
||||
### 6.2 Log Collection via Alloy
|
||||
|
||||
The forgejo-runner namespace needs to be included in Alloy's k8s log collection. Alloy is already configured to scrape logs from k8s pods - verify the runner namespace is included.
|
||||
|
||||
Check current Alloy config:
|
||||
```bash
|
||||
ssh indri 'cat ~/.config/alloy/config.alloy | grep -A20 discovery.kubernetes'
|
||||
```
|
||||
|
||||
If using namespace filtering, ensure `forgejo-runner` is included.
|
||||
|
||||
### 6.3 Metrics Collection
|
||||
|
||||
The forgejo-runner exposes Prometheus metrics. Add a ServiceMonitor or configure Alloy to scrape:
|
||||
|
||||
**Option A: ServiceMonitor (if using Prometheus Operator)**
|
||||
|
||||
Create `argocd/manifests/forgejo-runner/servicemonitor.yaml`:
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: forgejo-runner
|
||||
namespace: forgejo-runner
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: forgejo-runner
|
||||
endpoints:
|
||||
- port: metrics
|
||||
interval: 30s
|
||||
```
|
||||
|
||||
**Option B: Alloy scrape config**
|
||||
|
||||
Add to Alloy's k8s scrape config to discover the runner pod's metrics endpoint.
|
||||
|
||||
### 6.4 Create Runner Service for Metrics
|
||||
|
||||
Add `argocd/manifests/forgejo-runner/service.yaml`:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: forgejo-runner-metrics
|
||||
namespace: forgejo-runner
|
||||
labels:
|
||||
app: forgejo-runner
|
||||
spec:
|
||||
selector:
|
||||
app: forgejo-runner
|
||||
ports:
|
||||
- name: metrics
|
||||
port: 8080
|
||||
targetPort: 8080
|
||||
```
|
||||
|
||||
Update kustomization.yaml to include the service.
|
||||
|
||||
### 6.5 Grafana Dashboard
|
||||
|
||||
Consider creating a dashboard for:
|
||||
- Runner status (online/offline)
|
||||
- Job queue depth
|
||||
- Job execution time
|
||||
- Success/failure rates
|
||||
|
||||
### 6.6 Verification
|
||||
|
||||
```bash
|
||||
# Check runner logs are appearing in Loki
|
||||
# Go to Grafana → Explore → Loki
|
||||
# Query: {namespace="forgejo-runner"}
|
||||
|
||||
# Check metrics are being scraped
|
||||
# Go to Grafana → Explore → Prometheus
|
||||
# Query: forgejo_runner_*
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] devpi build workflow created
|
||||
|
|
@ -464,6 +554,8 @@ Add Trivy or similar:
|
|||
- [ ] Reusable container workflow created
|
||||
- [ ] (Optional) Python build workflow created
|
||||
- [ ] (Optional) Scheduled builds configured
|
||||
- [ ] Runner logs visible in Loki
|
||||
- [ ] Runner metrics scraped by Prometheus/Alloy
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -474,8 +566,10 @@ With this phase complete, we have:
|
|||
2. **Forgejo self-deploys** from CI on tagged releases
|
||||
3. **Container images** built automatically on push
|
||||
4. Infrastructure for Python package builds
|
||||
5. **Runner observability** with logs in Loki and metrics in Prometheus
|
||||
|
||||
The CI/CD bootstrap is complete. Future work:
|
||||
- Add more container builds as needed
|
||||
- Add Python package publishing for internal tools
|
||||
- Consider adding a macOS runner on indri for native builds
|
||||
- Create Grafana dashboards for CI/CD monitoring
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue