blumeops/plans/ci-cd-bootstrap/P5_container_builds.md
Erich Blume 5fcd122494
All checks were successful
Test CI / test (push) Successful in 2s
Reorganize CI/CD bootstrap phases and add custom runner Dockerfile (#50)
## Summary
- Reorder CI/CD bootstrap phases to address chicken-and-egg problem
- P2 is now "Custom Runner Image" (stock runner lacks Node.js)
- Add P3 for "Mirror Forgejo & Build from Source"
- Rename P3 -> P4 (Self-Deploy), P4 -> P5 (Container Builds)
- Add Dockerfile for custom runner with Node.js, npm, docker, build tools
- Update overview with new phase structure, host mode notes, and cross-compilation challenge

## Key Changes

### Phase Reordering
| Old | New | Name |
|-----|-----|------|
| P1 | P1 | Enable Actions (complete) |
| P2 | P2 | **Custom Runner Image** (new focus) |
| - | P3 | **Mirror Forgejo & Build** (new) |
| P3 | P4 | Self-Deploy |
| P4 | P5 | Container Builds |

### Custom Runner Dockerfile
The stock `forgejo/runner:3.5.1` image lacks Node.js, so `actions/checkout@v4` doesn't work. The new Dockerfile adds:
- Node.js + npm (for GitHub Actions)
- Docker CLI (for container builds)
- Build tools (gcc, make, curl, jq)

### Bootstrap Strategy
1. Build custom runner image manually on gilbert (podman build)
2. Push to zot registry
3. Update deployment to use custom image
4. Then enable auto-build workflow for runner

## Deployment and Testing
- [x] Review plan changes
- [x] Build custom runner image manually and verify
- [x] Update runner deployment
- [x] Test `actions/checkout@v4` works

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/50
2026-01-23 18:50:27 -08:00

11 KiB

Phase 5: Container Image Builds

Goal: Set up CI workflows to build custom container images and push to zot registry

Status: Planning

Prerequisites: Phase 4 complete (Forgejo self-deploying, Actions working)


Overview

With Forgejo Actions operational (including custom runner from P2), we can now build container images for:

  • Custom devpi with pre-installed plugins
  • Any other custom images needed for k8s services
  • Release artifacts for Python packages

Note: The custom runner image build is covered in Phase 2. This phase focuses on application container builds.


Use Case 1: devpi Custom Image

Current State

devpi runs from registry.tail8d86e.ts.net/blumeops/devpi:latest, built manually:

  • Base image: python
  • Adds: devpi-server, devpi-web
  • Startup script for auto-initialization

Goal

Automate builds triggered by:

  • Push to devpi repo on forge
  • Manual workflow dispatch
  • Optionally: upstream devpi release (via schedule check)

Step 1: Create Workflow for devpi

1.1 Ensure devpi Repo Has Dockerfile

The Dockerfile already exists at argocd/manifests/devpi/Dockerfile. We'll create a workflow in the blumeops repo that builds it.

1.2 Create Build Workflow

Create .forgejo/workflows/build-devpi.yml in blumeops repo:

name: Build devpi Image

on:
  push:
    paths:
      - 'argocd/manifests/devpi/Dockerfile'
      - 'argocd/manifests/devpi/start.sh'
      - '.forgejo/workflows/build-devpi.yml'
  workflow_dispatch:
    inputs:
      tag:
        description: 'Image tag (default: latest)'
        required: false
        default: 'latest'

env:
  REGISTRY: registry.tail8d86e.ts.net
  IMAGE_NAME: blumeops/devpi

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Determine tag
        id: tag
        run: |
          if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
            TAG="${{ github.event.inputs.tag }}"
          else
            TAG="latest"
          fi
          echo "tag=$TAG" >> "$GITHUB_OUTPUT"

      - name: Build image
        uses: docker/build-push-action@v5
        with:
          context: argocd/manifests/devpi
          file: argocd/manifests/devpi/Dockerfile
          platforms: linux/arm64
          load: true
          tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}

      - name: Push to registry
        run: |
          # Zot has no auth, just push
          docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}

      - name: Verify push
        run: |
          # Check image exists in registry
          curl -sf "https://${{ env.REGISTRY }}/v2/${{ env.IMAGE_NAME }}/tags/list" | jq .

1.3 Runner Needs Registry Access

The runner needs to reach registry.tail8d86e.ts.net. This should work via Tailscale egress (same as Forgejo access).

If not, add egress for registry in argocd/manifests/tailscale-operator/:

apiVersion: tailscale.com/v1alpha1
kind: Connector
metadata:
  name: egress-registry
  namespace: tailscale-operator
spec:
  hostname: egress-registry
  subnetRouter:
    advertiseRoutes:
      - registry.tail8d86e.ts.net/32

Step 2: Test Build Workflow

2.1 Push and Trigger

# Make a small change to trigger
echo "# Build $(date)" >> argocd/manifests/devpi/Dockerfile
git add argocd/manifests/devpi/Dockerfile
git commit -m "Trigger devpi image rebuild"
git push

2.2 Monitor Build

  1. Go to https://forge.tail8d86e.ts.net/eblume/blumeops/actions
  2. Watch "Build devpi Image" workflow
  3. Verify success

2.3 Verify Image in Registry

curl -s https://registry.tail8d86e.ts.net/v2/blumeops/devpi/tags/list | jq .

2.4 Restart devpi to Use New Image

kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi

Step 3: Reusable Container Build Workflow

3.1 Create Reusable Workflow

Create .forgejo/workflows/build-container.yml:

name: Build Container Image

on:
  workflow_call:
    inputs:
      context:
        description: 'Build context path'
        required: true
        type: string
      dockerfile:
        description: 'Dockerfile path (relative to context)'
        required: false
        type: string
        default: 'Dockerfile'
      image_name:
        description: 'Image name (without registry)'
        required: true
        type: string
      tag:
        description: 'Image tag'
        required: false
        type: string
        default: 'latest'
      platforms:
        description: 'Target platforms'
        required: false
        type: string
        default: 'linux/arm64'

env:
  REGISTRY: registry.tail8d86e.ts.net

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: ${{ inputs.context }}
          file: ${{ inputs.context }}/${{ inputs.dockerfile }}
          platforms: ${{ inputs.platforms }}
          push: true
          tags: ${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}

      - name: Verify push
        run: |
          curl -sf "https://${{ env.REGISTRY }}/v2/${{ inputs.image_name }}/tags/list" | jq .

3.2 Use in devpi Workflow

Simplify .forgejo/workflows/build-devpi.yml:

name: Build devpi Image

on:
  push:
    paths:
      - 'argocd/manifests/devpi/**'
  workflow_dispatch:

jobs:
  build:
    uses: ./.forgejo/workflows/build-container.yml
    with:
      context: argocd/manifests/devpi
      image_name: blumeops/devpi

Step 4: Python Package Builds (Optional)

4.1 Use Case

Build Python packages from forge repos and publish to devpi.

Example: mcquack package (LaunchAgent management library)

4.2 Create Python Build Workflow

Create .forgejo/workflows/build-python.yml:

name: Build Python Package

on:
  workflow_call:
    inputs:
      package_path:
        description: 'Path to package (contains pyproject.toml)'
        required: false
        type: string
        default: '.'
      python_version:
        description: 'Python version'
        required: false
        type: string
        default: '3.12'
      publish:
        description: 'Publish to devpi'
        required: false
        type: boolean
        default: false
    secrets:
      DEVPI_PASSWORD:
        required: false

env:
  DEVPI_URL: https://pypi.tail8d86e.ts.net

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ inputs.python_version }}

      - name: Install uv
        run: pip install uv

      - name: Build package
        run: |
          cd ${{ inputs.package_path }}
          uv build

      - name: Upload artifact
        uses: actions/upload-artifact@v4
        with:
          name: dist
          path: ${{ inputs.package_path }}/dist/

      - name: Publish to devpi
        if: inputs.publish
        run: |
          cd ${{ inputs.package_path }}
          uv publish \
            --publish-url ${{ env.DEVPI_URL }}/eblume/dev/ \
            --username eblume \
            --password "${{ secrets.DEVPI_PASSWORD }}"

Step 5: Scheduled Builds (Cron)

5.1 Weekly Rebuild

Keep images fresh with weekly rebuilds:

name: Weekly Image Rebuilds

on:
  schedule:
    # Every Sunday at 3 AM UTC
    - cron: '0 3 * * 0'
  workflow_dispatch:

jobs:
  devpi:
    uses: ./.forgejo/workflows/build-container.yml
    with:
      context: argocd/manifests/devpi
      image_name: blumeops/devpi

Future Improvements

Multi-Arch Builds

For images that need both ARM64 and AMD64:

platforms: linux/arm64,linux/amd64

Requires QEMU emulation setup in runner (already supported by buildx).

Build Caching

Use GitHub/Forgejo cache actions:

- name: Cache Docker layers
  uses: actions/cache@v4
  with:
    path: /tmp/.buildx-cache
    key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}

Security Scanning

Add Trivy or similar:

- name: Run Trivy vulnerability scanner
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: '${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}'

Step 6: Runner Observability (Logging & Metrics)

6.1 Problem

The forgejo-runner pod generates logs and metrics that should be collected for:

  • Debugging failed workflow runs
  • Monitoring runner health and capacity
  • Alerting on runner failures

6.2 Log Collection via Alloy

The forgejo-runner namespace needs to be included in Alloy's k8s log collection. Alloy is already configured to scrape logs from k8s pods - verify the runner namespace is included.

Check current Alloy config:

ssh indri 'cat ~/.config/alloy/config.alloy | grep -A20 discovery.kubernetes'

If using namespace filtering, ensure forgejo-runner is included.

6.3 Metrics Collection

The forgejo-runner exposes Prometheus metrics. Add a ServiceMonitor or configure Alloy to scrape:

Option A: ServiceMonitor (if using Prometheus Operator)

Create argocd/manifests/forgejo-runner/servicemonitor.yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: forgejo-runner
  namespace: forgejo-runner
spec:
  selector:
    matchLabels:
      app: forgejo-runner
  endpoints:
    - port: metrics
      interval: 30s

Option B: Alloy scrape config

Add to Alloy's k8s scrape config to discover the runner pod's metrics endpoint.

6.4 Create Runner Service for Metrics

Add argocd/manifests/forgejo-runner/service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: forgejo-runner-metrics
  namespace: forgejo-runner
  labels:
    app: forgejo-runner
spec:
  selector:
    app: forgejo-runner
  ports:
    - name: metrics
      port: 8080
      targetPort: 8080

Update kustomization.yaml to include the service.

6.5 Grafana Dashboard

Consider creating a dashboard for:

  • Runner status (online/offline)
  • Job queue depth
  • Job execution time
  • Success/failure rates

6.6 Verification

# Check runner logs are appearing in Loki
# Go to Grafana → Explore → Loki
# Query: {namespace="forgejo-runner"}

# Check metrics are being scraped
# Go to Grafana → Explore → Prometheus
# Query: forgejo_runner_*

Verification Checklist

  • devpi build workflow created
  • devpi image builds successfully
  • Image pushed to zot registry
  • devpi pod uses new image
  • Reusable container workflow created
  • (Optional) Python build workflow created
  • (Optional) Scheduled builds configured
  • Runner logs visible in Loki
  • Runner metrics scraped by Prometheus/Alloy

Summary

With this phase complete, we have:

  1. Forgejo Actions running with k8s runner
  2. Forgejo self-deploys from CI on tagged releases
  3. Container images built automatically on push
  4. Infrastructure for Python package builds
  5. Runner observability with logs in Loki and metrics in Prometheus

The CI/CD bootstrap is complete. Future work:

  • Add more container builds as needed
  • Add Python package publishing for internal tools
  • Consider adding a macOS runner on indri for native builds
  • Create Grafana dashboards for CI/CD monitoring