diff --git a/plans/ci-cd-bootstrap/00_overview.md b/plans/ci-cd-bootstrap/00_overview.md new file mode 100644 index 0000000..4b4ed08 --- /dev/null +++ b/plans/ci-cd-bootstrap/00_overview.md @@ -0,0 +1,146 @@ +# Forgejo Actions CI/CD Bootstrap Plan + +This plan details the setup of Forgejo Actions as the CI/CD system for blumeops, starting with the bootstrapping problem: using Forgejo to build and deploy Forgejo itself. + +## Goals + +1. **Forgejo Actions** as the primary CI system (replaces Woodpecker from original plan) +2. **Self-hosted Forgejo** built from source, deployed as mcquack LaunchAgent on indri +3. **Container builds** for ArgoCD manifests (devpi, etc.) +4. **Cron-scheduled tasks** via k8s CronJobs (not Actions) +5. **Local development** parity using `act` for workflow testing + +## Why Forgejo Actions over Woodpecker? + +- Native integration with Forgejo (no OAuth setup, automatic repo detection) +- GitHub Actions compatible syntax (huge ecosystem of reusable actions) +- `act` tool for local testing on gilbert +- Single system to maintain instead of two + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ INDRI │ +│ ┌─────────────────────┐ │ +│ │ Forgejo │ ← Built from source │ +│ │ (mcquack agent) │ ← Deploys itself via CI │ +│ │ │ │ +│ │ - Web UI (3001) │ │ +│ │ - SSH (2200) │ │ +│ │ - Actions enabled │ │ +│ └─────────────────────┘ │ +└─────────────────────────────────────────────────────────────────┘ + │ + │ SSH deploy + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ KUBERNETES (minikube) │ +│ ┌─────────────────────┐ ┌─────────────────────┐ │ +│ │ Forgejo Runner │ │ Other Services │ │ +│ │ (act_runner) │ │ (via ArgoCD) │ │ +│ │ │ │ │ │ +│ │ - Polls Forgejo │ │ │ │ +│ │ - Runs workflows │ │ │ │ +│ │ - Docker-in-Docker │ │ │ │ +│ └─────────────────────┘ └─────────────────────┘ │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Phases + +| Phase | Name | Description | +|-------|------|-------------| +| 1 | [Enable Actions](P1_enable_actions.md) | Configure Forgejo for Actions, deploy runner | +| 2 | [Mirror & Build](P2_mirror_and_build.md) | Mirror upstream Forgejo, create build workflow | +| 3 | [Self-Deploy](P3_self_deploy.md) | Forgejo deploys itself, transition to mcquack | +| 4 | [Container Builds](P4_container_builds.md) | Build custom container images (devpi, etc.) | + +## The Bootstrap Problem + +**Chicken-and-egg**: We need Forgejo Actions to build Forgejo, but Forgejo must be running first. + +**Solution**: +1. Keep current brew-based Forgejo running during setup +2. Enable Actions, deploy runner +3. Mirror upstream Forgejo, create build workflow +4. First CI build creates the binary +5. CI deploys binary to indri as mcquack service +6. `brew services stop forgejo` and uninstall +7. Future builds: Forgejo builds and deploys itself + +**Risk mitigation**: If self-deployment breaks Forgejo: +- blumeops is mirrored to GitHub +- Manual recovery: build on gilbert, scp to indri, restart service +- See Disaster Recovery section in P3 + +## Ansible Role Strategy + +The forgejo ansible role will follow the zot/alloy pattern: + +1. **Check binary exists** at expected path +2. **If missing**: Fail with message pointing to CI trigger instructions +3. **If present**: Deploy config, ensure LaunchAgent loaded + +Ansible does NOT: +- Build the binary (that's CI's job) +- Deploy new versions (that's CI's job) + +Ansible DOES: +- Manage app.ini configuration (sans secrets) +- Manage mcquack LaunchAgent plist +- Ensure service is running +- Collect logs via Alloy + +## Files Summary + +### New Files + +| Path | Purpose | +|------|---------| +| `argocd/apps/forgejo-runner.yaml` | ArgoCD Application for runner | +| `argocd/manifests/forgejo-runner/` | Runner k8s manifests | +| `.forgejo/workflows/build-forgejo.yml` | Build workflow in blumeops repo | +| (on forge) `eblume/forgejo/.forgejo/workflows/` | Build workflow in forgejo mirror | + +### Modified Files + +| Path | Change | +|------|--------| +| `ansible/roles/forgejo/` | Complete rewrite for mcquack pattern | +| `ansible/roles/alloy/defaults/main.yml` | Update forgejo log paths | +| zk cards | Update forgejo, argocd, blumeops cards | + +### Credentials Needed + +| Item | Purpose | Storage | +|------|---------|---------| +| Runner registration token | Runner auth to Forgejo | 1Password | +| SSH deploy key | Runner SSH to indri | 1Password + k8s secret | + +## Related Plans + +- [P7_forgejo.md](../k8s-migration/P7_forgejo.md) - Original k8s migration plan (superseded for Forgejo itself, but SSH hostname split info still relevant) +- [P8_woodpecker.md](../k8s-migration/P8_woodpecker.md) - Original Woodpecker plan (superseded by Forgejo Actions) + +## Decision Log + +### 2026-01-23: Forgejo Actions over Woodpecker + +**Decision**: Use Forgejo Actions instead of Woodpecker CI + +**Rationale**: +- Native Forgejo integration (Actions is built-in) +- GitHub Actions compatible (reuse existing actions) +- `act` for local testing +- One less system to deploy and maintain + +### 2026-01-23: Keep Forgejo on indri (not k8s) + +**Decision**: Forgejo stays on indri as mcquack service, not migrated to k8s + +**Rationale**: +- Avoid circular dependency (ArgoCD needs Forgejo to deploy Forgejo) +- Simpler SSH handling (direct port, no k8s networking complexity) +- Forgejo is critical infrastructure, benefits from isolation +- Can still use Tailscale serve for external access diff --git a/plans/ci-cd-bootstrap/P1_enable_actions.md b/plans/ci-cd-bootstrap/P1_enable_actions.md new file mode 100644 index 0000000..ce1d252 --- /dev/null +++ b/plans/ci-cd-bootstrap/P1_enable_actions.md @@ -0,0 +1,322 @@ +# Phase 1: Enable Forgejo Actions + +**Goal**: Configure Forgejo to support Actions workflows and deploy a runner in k8s + +**Status**: Planning + +**Prerequisites**: None (uses existing brew-based Forgejo) + +--- + +## Current State + +- Forgejo runs via `brew services` on indri +- Config at `/opt/homebrew/var/forgejo/custom/conf/app.ini` +- Actions not enabled +- No runners deployed + +--- + +## Step 1: Enable Actions in Forgejo + +### 1.1 Update app.ini + +SSH to indri and edit the Forgejo config: + +```bash +ssh indri 'vim /opt/homebrew/var/forgejo/custom/conf/app.ini' +``` + +Add the following sections: + +```ini +[actions] +ENABLED = true +DEFAULT_ACTIONS_URL = https://code.forgejo.org + +[repository] +; Allow workflows to be stored in .forgejo/workflows +DEFAULT_REPO_UNITS = repo.code,repo.issues,repo.pulls,repo.releases,repo.wiki,repo.projects,repo.packages,repo.actions +``` + +### 1.2 Restart Forgejo + +```bash +ssh indri 'brew services restart forgejo' +``` + +### 1.3 Verify Actions Enabled + +1. Go to https://forge.tail8d86e.ts.net +2. Navigate to any repo → Settings → Actions +3. Should see "Enable Repository Actions" option + +--- + +## Step 2: Create Runner Registration Token + +### 2.1 Generate Token in Forgejo UI + +1. Go to https://forge.tail8d86e.ts.net/admin/actions/runners +2. Click "Create new Runner" +3. Copy the registration token +4. Store in 1Password (blumeops vault) as "Forgejo Runner Token" + +### 2.2 Create k8s Secret Template + +Create `argocd/manifests/forgejo-runner/secret-token.yaml.tpl`: + +```yaml +# Template for op inject +apiVersion: v1 +kind: Secret +metadata: + name: forgejo-runner-token + namespace: forgejo-runner +type: Opaque +stringData: + token: "op://blumeops//token" +``` + +--- + +## Step 3: Deploy Runner to Kubernetes + +### 3.1 Create ArgoCD Application + +Create `argocd/apps/forgejo-runner.yaml`: + +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: forgejo-runner + namespace: argocd +spec: + project: default + source: + repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git + targetRevision: main + path: argocd/manifests/forgejo-runner + destination: + server: https://kubernetes.default.svc + namespace: forgejo-runner + syncPolicy: + syncOptions: + - CreateNamespace=true +``` + +### 3.2 Create Runner Manifests + +Create directory `argocd/manifests/forgejo-runner/` with: + +**kustomization.yaml**: +```yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +namespace: forgejo-runner +resources: + - namespace.yaml + - deployment.yaml + - serviceaccount.yaml + - secret-token.yaml +``` + +**namespace.yaml**: +```yaml +apiVersion: v1 +kind: Namespace +metadata: + name: forgejo-runner +``` + +**serviceaccount.yaml**: +```yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: forgejo-runner + namespace: forgejo-runner +``` + +**deployment.yaml**: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: forgejo-runner + namespace: forgejo-runner +spec: + replicas: 1 + selector: + matchLabels: + app: forgejo-runner + template: + metadata: + labels: + app: forgejo-runner + spec: + serviceAccountName: forgejo-runner + containers: + - name: runner + image: code.forgejo.org/forgejo/runner:3.5.1 + env: + - name: FORGEJO_INSTANCE_URL + value: "https://forge.tail8d86e.ts.net" + - name: RUNNER_NAME + value: "k8s-runner-1" + - name: RUNNER_TOKEN + valueFrom: + secretKeyRef: + name: forgejo-runner-token + key: token + command: + - /bin/sh + - -c + - | + # Register runner if not already registered + if [ ! -f /data/.runner ]; then + forgejo-runner register \ + --instance "$FORGEJO_INSTANCE_URL" \ + --token "$RUNNER_TOKEN" \ + --name "$RUNNER_NAME" \ + --labels "ubuntu-latest:docker://node:20-bookworm,ubuntu-22.04:docker://ubuntu:22.04" \ + --no-interactive + fi + # Start the runner daemon + forgejo-runner daemon + volumeMounts: + - name: runner-data + mountPath: /data + - name: docker-sock + mountPath: /var/run/docker.sock + resources: + requests: + memory: "256Mi" + cpu: "100m" + limits: + memory: "1Gi" + cpu: "1000m" + volumes: + - name: runner-data + emptyDir: {} + - name: docker-sock + hostPath: + path: /var/run/docker.sock + type: Socket +``` + +**Note**: The runner needs access to Docker to run workflow jobs in containers. In minikube with docker driver, `/var/run/docker.sock` is available. + +--- + +## Step 4: Deploy and Verify + +### 4.1 Inject Secrets and Deploy + +```bash +# Inject secrets +op inject -i argocd/manifests/forgejo-runner/secret-token.yaml.tpl \ + -o argocd/manifests/forgejo-runner/secret-token.yaml + +# Sync apps +argocd app sync apps +argocd app sync forgejo-runner +``` + +### 4.2 Verify Runner Registration + +```bash +# Check runner pod +kubectl --context=minikube-indri -n forgejo-runner get pods + +# Check runner logs +kubectl --context=minikube-indri -n forgejo-runner logs -f deployment/forgejo-runner + +# Verify in Forgejo UI +# Go to https://forge.tail8d86e.ts.net/admin/actions/runners +# Should see "k8s-runner-1" as online +``` + +--- + +## Step 5: Test with Simple Workflow + +### 5.1 Create Test Workflow + +In the blumeops repo, create `.forgejo/workflows/test.yml`: + +```yaml +name: Test CI + +on: + push: + branches: [main] + pull_request: + workflow_dispatch: + +jobs: + test: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Hello World + run: | + echo "Hello from Forgejo Actions!" + echo "Runner: ${{ runner.name }}" + echo "Repo: ${{ github.repository }}" +``` + +### 5.2 Push and Verify + +```bash +git add .forgejo/ +git commit -m "Add test workflow for Forgejo Actions" +git push +``` + +Check https://forge.tail8d86e.ts.net/eblume/blumeops/actions for the workflow run. + +--- + +## Verification Checklist + +- [ ] Actions enabled in app.ini +- [ ] Forgejo restarted successfully +- [ ] Runner token stored in 1Password +- [ ] Runner deployment created in ArgoCD +- [ ] Runner pod running in k8s +- [ ] Runner shows as online in Forgejo admin +- [ ] Test workflow runs successfully + +--- + +## Troubleshooting + +### Runner Can't Connect to Forgejo + +The runner needs to reach `forge.tail8d86e.ts.net` from inside k8s. This should work via Tailscale operator egress (already configured for ArgoCD). + +If not working: +```bash +# Test from inside k8s +kubectl --context=minikube-indri run -it --rm curl --image=curlimages/curl -- \ + curl -v https://forge.tail8d86e.ts.net/api/v1/version +``` + +### Docker Socket Permission Denied + +The runner container needs to access the Docker socket. In minikube with docker driver, this should work. If permission denied: + +```bash +# Check socket permissions +kubectl --context=minikube-indri -n forgejo-runner exec deployment/forgejo-runner -- ls -la /var/run/docker.sock +``` + +May need to run runner as root or adjust security context. + +--- + +## Next Phase + +Once runner is working, proceed to [Phase 2: Mirror & Build](P2_mirror_and_build.md). diff --git a/plans/ci-cd-bootstrap/P2_mirror_and_build.md b/plans/ci-cd-bootstrap/P2_mirror_and_build.md new file mode 100644 index 0000000..1fef474 --- /dev/null +++ b/plans/ci-cd-bootstrap/P2_mirror_and_build.md @@ -0,0 +1,376 @@ +# Phase 2: Mirror Forgejo & Create Build Workflow + +**Goal**: Mirror upstream Forgejo to forge and create a workflow that builds it from source + +**Status**: Planning + +**Prerequisites**: [Phase 1](P1_enable_actions.md) complete (Actions enabled, runner deployed) + +--- + +## Current State + +- Forgejo Actions enabled with runner in k8s +- Upstream Forgejo at https://codeberg.org/forgejo/forgejo +- No local mirror yet + +--- + +## Step 1: Mirror Upstream Forgejo + +### 1.1 User Action: Create Mirror on Forge + +**Manual step** (hairpinning doesn't work from indri): + +1. Go to https://forge.tail8d86e.ts.net +2. Click "+" → "New Migration" +3. Select "Gitea" as clone source +4. URL: `https://codeberg.org/forgejo/forgejo.git` +5. Repository name: `forgejo` +6. Check "This repository will be a mirror" +7. Click "Migrate Repository" + +### 1.2 Clone Mirror Locally + +```bash +git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/forgejo.git ~/code/3rd/forgejo +cd ~/code/3rd/forgejo +``` + +--- + +## Step 2: Understand Forgejo Build Process + +### 2.1 Build Requirements + +From Forgejo's `Makefile` and docs: + +- **Go**: 1.23+ (check `go.mod` for exact version) +- **Node.js**: 20+ (for frontend) +- **Make**: GNU Make +- **Git**: For version embedding + +### 2.2 Build Commands + +```bash +# Install frontend dependencies and build +make deps-frontend +make frontend + +# Build backend +TAGS="bindata sqlite sqlite_unlock_notify" make backend + +# Or all-in-one +TAGS="bindata sqlite sqlite_unlock_notify" make build +``` + +### 2.3 Output + +Binary at `gitea` (yes, the binary is still named `gitea` for compatibility). + +--- + +## Step 3: Create Build Workflow + +### 3.1 SSH Deploy Key for Runner + +The runner needs SSH access to indri to deploy the binary. + +**Generate key on gilbert**: +```bash +ssh-keygen -t ed25519 -C "forgejo-runner-deploy" -f ~/.ssh/forgejo-runner-deploy +``` + +**Add public key to indri's authorized_keys**: +```bash +cat ~/.ssh/forgejo-runner-deploy.pub | ssh indri 'cat >> ~/.ssh/authorized_keys' +``` + +**Store private key in 1Password** (blumeops vault) as "Forgejo Runner Deploy Key" + +**Add to k8s as secret**: + +Create `argocd/manifests/forgejo-runner/secret-ssh.yaml.tpl`: +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: forgejo-runner-ssh + namespace: forgejo-runner +type: Opaque +stringData: + id_ed25519: | + op://blumeops//private-key + known_hosts: | + indri.tail8d86e.ts.net ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIxxxxxx +``` + +Get indri's host key: +```bash +ssh-keyscan indri.tail8d86e.ts.net 2>/dev/null | grep ed25519 +``` + +### 3.2 Create Workflow File + +Create `.forgejo/workflows/build.yml` in the forgejo mirror repo: + +```yaml +name: Build Forgejo + +on: + push: + tags: + - 'v*' + workflow_dispatch: + inputs: + deploy: + description: 'Deploy to indri after build' + required: false + default: 'true' + type: boolean + +env: + GOPROXY: "https://proxy.golang.org,direct" + CGO_ENABLED: "1" + TAGS: "bindata sqlite sqlite_unlock_notify" + +jobs: + build: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + with: + fetch-depth: 0 # Need full history for version + + - name: Setup Go + uses: actions/setup-go@v5 + with: + go-version-file: 'go.mod' + + - name: Setup Node + uses: actions/setup-node@v4 + with: + node-version: '20' + + - name: Get version + id: version + run: | + if [[ "${{ github.ref }}" == refs/tags/* ]]; then + VERSION="${{ github.ref_name }}" + else + VERSION="$(git describe --tags --always)-dev" + fi + echo "version=$VERSION" >> "$GITHUB_OUTPUT" + echo "Building version: $VERSION" + + - name: Build frontend + run: | + make deps-frontend + make frontend + + - name: Build backend + run: | + TAGS="${{ env.TAGS }}" make backend + ./gitea --version + + - name: Rename binary + run: | + mv gitea forgejo-${{ steps.version.outputs.version }}-darwin-arm64 + ls -la forgejo-* + + - name: Upload artifact + uses: actions/upload-artifact@v4 + with: + name: forgejo-${{ steps.version.outputs.version }}-darwin-arm64 + path: forgejo-${{ steps.version.outputs.version }}-darwin-arm64 + + deploy: + needs: build + runs-on: ubuntu-latest + if: github.event_name == 'push' || (github.event_name == 'workflow_dispatch' && github.event.inputs.deploy == 'true') + steps: + - name: Download artifact + uses: actions/download-artifact@v4 + with: + name: forgejo-${{ needs.build.outputs.version }}-darwin-arm64 + + - name: Setup SSH + run: | + mkdir -p ~/.ssh + echo "${{ secrets.DEPLOY_SSH_KEY }}" > ~/.ssh/id_ed25519 + chmod 600 ~/.ssh/id_ed25519 + echo "${{ secrets.DEPLOY_KNOWN_HOSTS }}" > ~/.ssh/known_hosts + + - name: Deploy to indri + run: | + BINARY="forgejo-*-darwin-arm64" + chmod +x $BINARY + + # Copy binary to indri + scp $BINARY erichblume@indri.tail8d86e.ts.net:~/.local/bin/forgejo-new + + # Atomic swap and restart + ssh erichblume@indri.tail8d86e.ts.net << 'EOF' + set -e + cd ~/.local/bin + + # Verify the new binary runs + ./forgejo-new --version + + # Stop current service + launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true + + # Atomic swap + mv forgejo forgejo-old 2>/dev/null || true + mv forgejo-new forgejo + + # Start new service + launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist + + # Verify it's running + sleep 5 + curl -sf http://localhost:3001/api/v1/version || exit 1 + + echo "Deploy successful!" + ./forgejo --version + EOF +``` + +### 3.3 Add Repository Secrets + +In Forgejo, go to the forgejo repo → Settings → Actions → Secrets: + +1. **DEPLOY_SSH_KEY**: Private key from 1Password +2. **DEPLOY_KNOWN_HOSTS**: Output of `ssh-keyscan indri.tail8d86e.ts.net` + +--- + +## Step 4: Build Cross-Platform Consideration + +**Important**: The runner runs Linux containers, but indri is macOS ARM64. + +**Options**: + +### Option A: Cross-compile (Simpler, may have issues) + +Add to build job: +```yaml +env: + GOOS: darwin + GOARCH: arm64 +``` + +CGO cross-compilation is tricky. May need to disable CGO or use a cross-compiler. + +### Option B: Build on macOS (More reliable) + +Run a macOS runner on indri itself (not in k8s). + +```bash +# Install forgejo-runner on indri via mise +ssh indri 'mise use forgejo-runner' + +# Register as a macOS runner +ssh indri 'forgejo-runner register --labels "macos-arm64:host" ...' +``` + +Then workflow uses: +```yaml +runs-on: macos-arm64 +``` + +**Recommendation**: Option B is more reliable for native macOS builds. Consider deploying a runner directly on indri for macOS-specific builds. + +--- + +## Step 5: Test the Build + +### 5.1 Manual Workflow Dispatch + +1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions +2. Select "Build Forgejo" workflow +3. Click "Run workflow" +4. Set deploy=false for first test +5. Monitor the run + +### 5.2 Verify Artifact + +Download the artifact from the workflow run and verify it's a valid binary: +```bash +# If downloaded to gilbert +file forgejo-*-darwin-arm64 +# Should show: Mach-O 64-bit executable arm64 +``` + +--- + +## Alternative: Build on Gilbert, Deploy via CI + +If cross-compilation proves difficult, consider a hybrid approach: + +1. **Build on gilbert** (has Go, Node, is macOS ARM64) +2. **CI just deploys** the built binary + +Workflow in blumeops repo: +```yaml +name: Deploy Forgejo + +on: + workflow_dispatch: + inputs: + binary_path: + description: 'Path to binary on gilbert' + required: true + +jobs: + deploy: + runs-on: ubuntu-latest + steps: + # Fetch binary from gilbert and deploy to indri + # (requires SSH access to both) +``` + +This is less elegant but more pragmatic for macOS targets. + +--- + +## Verification Checklist + +- [ ] Forgejo mirrored to forge +- [ ] SSH deploy key created and stored in 1Password +- [ ] Deploy key added to indri authorized_keys +- [ ] SSH secret added to k8s +- [ ] Workflow file created in forgejo mirror +- [ ] Repository secrets configured +- [ ] Test build completes successfully +- [ ] Binary is valid macOS ARM64 executable + +--- + +## Troubleshooting + +### CGO Cross-Compilation Fails + +If building Linux→macOS fails: +``` +# runtime/cgo +gcc: error: unrecognized command line option '-arch' +``` + +Either: +1. Use Option B (macOS runner on indri) +2. Build with `CGO_ENABLED=0` (loses some features) +3. Use a Docker image with macOS cross-compiler (complex) + +### Artifact Too Large + +Forgejo binary is ~100MB. If upload fails: +- Check Forgejo's artifact size limit in app.ini +- Consider compressing: `gzip -9 forgejo-*` + +--- + +## Next Phase + +Once build is working and produces valid binaries, proceed to [Phase 3: Self-Deploy](P3_self_deploy.md). diff --git a/plans/ci-cd-bootstrap/P3_self_deploy.md b/plans/ci-cd-bootstrap/P3_self_deploy.md new file mode 100644 index 0000000..0c2a616 --- /dev/null +++ b/plans/ci-cd-bootstrap/P3_self_deploy.md @@ -0,0 +1,409 @@ +# Phase 3: Self-Deploy & Transition to mcquack + +**Goal**: Complete the bootstrap - Forgejo deploys itself, transition from brew to mcquack LaunchAgent + +**Status**: Planning + +**Prerequisites**: [Phase 2](P2_mirror_and_build.md) complete (build workflow produces valid binaries) + +--- + +## Overview + +This phase completes the bootstrap: +1. First successful CI deploy creates the binary +2. Transition from brew service to mcquack LaunchAgent +3. Update ansible role to mcquack pattern +4. Remove brew forgejo + +After this phase, Forgejo builds and deploys itself on every tagged release. + +--- + +## Step 1: Prepare indri for mcquack + +### 1.1 Create Directory Structure + +```bash +ssh indri << 'EOF' + mkdir -p ~/.local/bin + mkdir -p ~/.config/forgejo + mkdir -p ~/Library/Logs +EOF +``` + +### 1.2 Prepare Data Directory + +The existing data is at `/opt/homebrew/var/forgejo`. We'll keep it there for now (simpler), or optionally migrate to `~/forgejo`. + +**Option A: Keep existing path** (recommended for simplicity) +- Data stays at `/opt/homebrew/var/forgejo` +- Binary moves to `~/.local/bin/forgejo` + +**Option B: Full migration** +- Move data to `~/forgejo` +- Requires updating app.ini paths + +For this plan, we'll use Option A. + +--- + +## Step 2: First CI Deploy + +### 2.1 Trigger Build with Deploy + +1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions +2. Select "Build Forgejo" workflow +3. Click "Run workflow" +4. Set deploy=true +5. Monitor the run + +### 2.2 Verify Binary Deployed + +```bash +ssh indri 'ls -la ~/.local/bin/forgejo && ~/.local/bin/forgejo --version' +``` + +At this point: +- New binary is at `~/.local/bin/forgejo` +- Brew forgejo is still running +- LaunchAgent doesn't exist yet + +--- + +## Step 3: Create mcquack LaunchAgent + +### 3.1 Create Plist Manually (One-Time Bootstrap) + +```bash +ssh indri << 'EOF' +cat > ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist << 'PLIST' + + + + + Label + mcquack.eblume.forgejo + ProgramArguments + + /Users/erichblume/.local/bin/forgejo + web + --config + /opt/homebrew/var/forgejo/custom/conf/app.ini + --work-path + /opt/homebrew/var/forgejo + + RunAtLoad + + KeepAlive + + StandardOutPath + /Users/erichblume/Library/Logs/mcquack.forgejo.out.log + StandardErrorPath + /Users/erichblume/Library/Logs/mcquack.forgejo.err.log + EnvironmentVariables + + HOME + /Users/erichblume + USER + erichblume + + + +PLIST +EOF +``` + +--- + +## Step 4: Cutover from Brew to mcquack + +### 4.1 Stop Brew Service + +```bash +ssh indri 'brew services stop forgejo' +``` + +### 4.2 Start mcquack Service + +```bash +ssh indri 'launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist' +``` + +### 4.3 Verify Service Running + +```bash +# Check process +ssh indri 'launchctl list | grep forgejo' + +# Check logs +ssh indri 'tail -20 ~/Library/Logs/mcquack.forgejo.err.log' + +# Check HTTP +curl -s https://forge.tail8d86e.ts.net/api/v1/version +``` + +### 4.4 Verify Git Operations + +```bash +# SSH test +ssh -T forgejo@forge.tail8d86e.ts.net + +# Clone test +git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/blumeops.git /tmp/test-clone +rm -rf /tmp/test-clone +``` + +--- + +## Step 5: Update Ansible Role + +### 5.1 Rewrite forgejo Role + +Replace `ansible/roles/forgejo/tasks/main.yml`: + +```yaml +--- +# Forgejo is built from source via CI and deployed automatically. +# This role manages the configuration and LaunchAgent only. +# +# BINARY DEPLOYMENT: +# The binary at ~/.local/bin/forgejo is deployed by Forgejo Actions CI. +# If missing, trigger a build at: +# https://forge.tail8d86e.ts.net/eblume/forgejo/actions +# +# CONFIGURATION: +# app.ini at /opt/homebrew/var/forgejo/custom/conf/app.ini contains secrets +# and is NOT managed by ansible. It is backed up by borgmatic. + +- name: Verify forgejo binary exists + ansible.builtin.stat: + path: "{{ forgejo_binary }}" + register: forgejo_binary_stat + +- name: Fail if forgejo binary not found + ansible.builtin.fail: + msg: | + Forgejo binary not found at {{ forgejo_binary }}. + + The binary is deployed by Forgejo Actions CI. To build and deploy: + 1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions + 2. Select "Build Forgejo" workflow + 3. Click "Run workflow" with deploy=true + + Alternatively, build manually on gilbert and scp to indri. + when: not forgejo_binary_stat.stat.exists + +- name: Check forgejo config exists + ansible.builtin.stat: + path: "{{ forgejo_config }}" + register: forgejo_config_stat + +- name: Fail if forgejo config is missing + ansible.builtin.fail: + msg: | + Forgejo config not found at {{ forgejo_config }} + This file contains secrets and is not managed by ansible. + To restore from backup, run: + borgmatic --config ~/.config/borgmatic/config.yaml extract --archive latest \ + --path {{ forgejo_config }} + when: not forgejo_config_stat.stat.exists + +- name: Deploy forgejo LaunchAgent plist + ansible.builtin.template: + src: forgejo.plist.j2 + dest: ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist + mode: '0644' + notify: Restart forgejo + +- name: Check if forgejo LaunchAgent is loaded + ansible.builtin.command: launchctl list mcquack.eblume.forgejo + register: forgejo_launchctl_check + changed_when: false + failed_when: false + +- name: Load forgejo LaunchAgent if not loaded + ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist + when: forgejo_launchctl_check.rc != 0 + changed_when: true + failed_when: false +``` + +### 5.2 Create defaults/main.yml + +```yaml +--- +# Forgejo binary and paths +forgejo_binary: /Users/erichblume/.local/bin/forgejo +forgejo_work_path: /opt/homebrew/var/forgejo +forgejo_config: "{{ forgejo_work_path }}/custom/conf/app.ini" +forgejo_log_dir: /Users/erichblume/Library/Logs + +# HTTP and SSH ports (must match app.ini) +forgejo_http_port: 3001 +forgejo_ssh_port: 2200 +``` + +### 5.3 Create templates/forgejo.plist.j2 + +```xml + + + + + + Label + mcquack.eblume.forgejo + ProgramArguments + + {{ forgejo_binary }} + web + --config + {{ forgejo_config }} + --work-path + {{ forgejo_work_path }} + + RunAtLoad + + KeepAlive + + StandardOutPath + {{ forgejo_log_dir }}/mcquack.forgejo.out.log + StandardErrorPath + {{ forgejo_log_dir }}/mcquack.forgejo.err.log + EnvironmentVariables + + HOME + /Users/erichblume + USER + erichblume + + + +``` + +### 5.4 Update handlers/main.yml + +```yaml +--- +- name: Restart forgejo + ansible.builtin.shell: | + launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true + launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist + changed_when: true +``` + +--- + +## Step 6: Update Alloy Log Collection + +Update `ansible/roles/alloy/defaults/main.yml`: + +Change forgejo log paths from brew to mcquack: +```yaml +alloy_brew_logs: + # Remove forgejo from here + - path: /opt/homebrew/var/log/tailscaled.log + service: tailscale + stream: stdout + +alloy_mcquack_logs: + # ... existing entries ... + - path: /Users/erichblume/Library/Logs/mcquack.forgejo.out.log + service: forgejo + stream: stdout + - path: /Users/erichblume/Library/Logs/mcquack.forgejo.err.log + service: forgejo + stream: stderr +``` + +--- + +## Step 7: Remove Brew Forgejo + +### 7.1 Uninstall Brew Package + +```bash +ssh indri 'brew uninstall forgejo' +``` + +### 7.2 Remove Old Logs + +```bash +ssh indri 'rm -f /opt/homebrew/var/log/forgejo.log' +``` + +--- + +## Step 8: Run Ansible + +```bash +mise run provision-indri -- --tags forgejo,alloy +``` + +--- + +## Disaster Recovery + +### If CI Deploy Breaks Forgejo + +1. **Build manually on gilbert**: + ```bash + cd ~/code/3rd/forgejo + git pull + mise use go node + TAGS="bindata sqlite sqlite_unlock_notify" make build + scp gitea indri:~/.local/bin/forgejo + ``` + +2. **Restart service**: + ```bash + ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist; launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist' + ``` + +3. **Verify**: + ```bash + curl https://forge.tail8d86e.ts.net/api/v1/version + ``` + +### If Forgejo Won't Start + +1. Check logs: `ssh indri 'tail -100 ~/Library/Logs/mcquack.forgejo.err.log'` +2. Check binary: `ssh indri '~/.local/bin/forgejo --version'` +3. Check config: `ssh indri 'cat /opt/homebrew/var/forgejo/custom/conf/app.ini | head -50'` +4. Try running manually: `ssh indri '~/.local/bin/forgejo web --config /opt/homebrew/var/forgejo/custom/conf/app.ini --work-path /opt/homebrew/var/forgejo'` + +### Switch ArgoCD to GitHub (Nuclear Option) + +If Forgejo is down and you need to deploy fixes: + +```bash +argocd repo add https://github.com/eblume/blumeops.git --username eblume --password $GITHUB_PAT +argocd app set apps --repo https://github.com/eblume/blumeops.git +argocd app sync apps +``` + +After recovery, switch back to Forgejo. + +--- + +## Verification Checklist + +- [ ] CI deploy completed successfully +- [ ] Binary at `~/.local/bin/forgejo` +- [ ] mcquack LaunchAgent created +- [ ] Brew service stopped +- [ ] mcquack service started +- [ ] HTTP works (`curl https://forge.tail8d86e.ts.net/api/v1/version`) +- [ ] SSH works (`ssh -T forgejo@forge.tail8d86e.ts.net`) +- [ ] Git clone/push works +- [ ] Ansible role updated +- [ ] Alloy logs updated +- [ ] Brew package uninstalled +- [ ] `mise run provision-indri` succeeds + +--- + +## Next Phase + +After bootstrap is complete, proceed to [Phase 4: Container Builds](P4_container_builds.md) to set up container image building for ArgoCD. diff --git a/plans/ci-cd-bootstrap/P4_container_builds.md b/plans/ci-cd-bootstrap/P4_container_builds.md new file mode 100644 index 0000000..6e4297c --- /dev/null +++ b/plans/ci-cd-bootstrap/P4_container_builds.md @@ -0,0 +1,409 @@ +# Phase 4: Container Image Builds + +**Goal**: Set up CI workflows to build custom container images and push to zot registry + +**Status**: Planning + +**Prerequisites**: [Phase 3](P3_self_deploy.md) complete (Forgejo self-deploying, Actions working) + +--- + +## Overview + +With Forgejo Actions operational, we can now build container images for: +- Custom devpi with pre-installed plugins +- Any other custom images needed for k8s services +- Release artifacts for Python packages + +--- + +## Use Case 1: devpi Custom Image + +### Current State + +devpi runs from `registry.tail8d86e.ts.net/blumeops/devpi:latest`, built manually: +- Base image: python +- Adds: devpi-server, devpi-web +- Startup script for auto-initialization + +### Goal + +Automate builds triggered by: +- Push to devpi repo on forge +- Manual workflow dispatch +- Optionally: upstream devpi release (via schedule check) + +--- + +## Step 1: Create Workflow for devpi + +### 1.1 Ensure devpi Repo Has Dockerfile + +The Dockerfile already exists at `argocd/manifests/devpi/Dockerfile`. We'll create a workflow in the blumeops repo that builds it. + +### 1.2 Create Build Workflow + +Create `.forgejo/workflows/build-devpi.yml` in blumeops repo: + +```yaml +name: Build devpi Image + +on: + push: + paths: + - 'argocd/manifests/devpi/Dockerfile' + - 'argocd/manifests/devpi/start.sh' + - '.forgejo/workflows/build-devpi.yml' + workflow_dispatch: + inputs: + tag: + description: 'Image tag (default: latest)' + required: false + default: 'latest' + +env: + REGISTRY: registry.tail8d86e.ts.net + IMAGE_NAME: blumeops/devpi + +jobs: + build: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + + - name: Determine tag + id: tag + run: | + if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then + TAG="${{ github.event.inputs.tag }}" + else + TAG="latest" + fi + echo "tag=$TAG" >> "$GITHUB_OUTPUT" + + - name: Build image + uses: docker/build-push-action@v5 + with: + context: argocd/manifests/devpi + file: argocd/manifests/devpi/Dockerfile + platforms: linux/arm64 + load: true + tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }} + + - name: Push to registry + run: | + # Zot has no auth, just push + docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }} + + - name: Verify push + run: | + # Check image exists in registry + curl -sf "https://${{ env.REGISTRY }}/v2/${{ env.IMAGE_NAME }}/tags/list" | jq . +``` + +### 1.3 Runner Needs Registry Access + +The runner needs to reach `registry.tail8d86e.ts.net`. This should work via Tailscale egress (same as Forgejo access). + +If not, add egress for registry in `argocd/manifests/tailscale-operator/`: +```yaml +apiVersion: tailscale.com/v1alpha1 +kind: Connector +metadata: + name: egress-registry + namespace: tailscale-operator +spec: + hostname: egress-registry + subnetRouter: + advertiseRoutes: + - registry.tail8d86e.ts.net/32 +``` + +--- + +## Step 2: Test Build Workflow + +### 2.1 Push and Trigger + +```bash +# Make a small change to trigger +echo "# Build $(date)" >> argocd/manifests/devpi/Dockerfile +git add argocd/manifests/devpi/Dockerfile +git commit -m "Trigger devpi image rebuild" +git push +``` + +### 2.2 Monitor Build + +1. Go to https://forge.tail8d86e.ts.net/eblume/blumeops/actions +2. Watch "Build devpi Image" workflow +3. Verify success + +### 2.3 Verify Image in Registry + +```bash +curl -s https://registry.tail8d86e.ts.net/v2/blumeops/devpi/tags/list | jq . +``` + +### 2.4 Restart devpi to Use New Image + +```bash +kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi +``` + +--- + +## Step 3: Reusable Container Build Workflow + +### 3.1 Create Reusable Workflow + +Create `.forgejo/workflows/build-container.yml`: + +```yaml +name: Build Container Image + +on: + workflow_call: + inputs: + context: + description: 'Build context path' + required: true + type: string + dockerfile: + description: 'Dockerfile path (relative to context)' + required: false + type: string + default: 'Dockerfile' + image_name: + description: 'Image name (without registry)' + required: true + type: string + tag: + description: 'Image tag' + required: false + type: string + default: 'latest' + platforms: + description: 'Target platforms' + required: false + type: string + default: 'linux/arm64' + +env: + REGISTRY: registry.tail8d86e.ts.net + +jobs: + build: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + + - name: Build and push + uses: docker/build-push-action@v5 + with: + context: ${{ inputs.context }} + file: ${{ inputs.context }}/${{ inputs.dockerfile }} + platforms: ${{ inputs.platforms }} + push: true + tags: ${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }} + + - name: Verify push + run: | + curl -sf "https://${{ env.REGISTRY }}/v2/${{ inputs.image_name }}/tags/list" | jq . +``` + +### 3.2 Use in devpi Workflow + +Simplify `.forgejo/workflows/build-devpi.yml`: + +```yaml +name: Build devpi Image + +on: + push: + paths: + - 'argocd/manifests/devpi/**' + workflow_dispatch: + +jobs: + build: + uses: ./.forgejo/workflows/build-container.yml + with: + context: argocd/manifests/devpi + image_name: blumeops/devpi +``` + +--- + +## Step 4: Python Package Builds (Optional) + +### 4.1 Use Case + +Build Python packages from forge repos and publish to devpi. + +Example: `mcquack` package (LaunchAgent management library) + +### 4.2 Create Python Build Workflow + +Create `.forgejo/workflows/build-python.yml`: + +```yaml +name: Build Python Package + +on: + workflow_call: + inputs: + package_path: + description: 'Path to package (contains pyproject.toml)' + required: false + type: string + default: '.' + python_version: + description: 'Python version' + required: false + type: string + default: '3.12' + publish: + description: 'Publish to devpi' + required: false + type: boolean + default: false + secrets: + DEVPI_PASSWORD: + required: false + +env: + DEVPI_URL: https://pypi.tail8d86e.ts.net + +jobs: + build: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Setup Python + uses: actions/setup-python@v5 + with: + python-version: ${{ inputs.python_version }} + + - name: Install uv + run: pip install uv + + - name: Build package + run: | + cd ${{ inputs.package_path }} + uv build + + - name: Upload artifact + uses: actions/upload-artifact@v4 + with: + name: dist + path: ${{ inputs.package_path }}/dist/ + + - name: Publish to devpi + if: inputs.publish + run: | + cd ${{ inputs.package_path }} + uv publish \ + --publish-url ${{ env.DEVPI_URL }}/eblume/dev/ \ + --username eblume \ + --password "${{ secrets.DEVPI_PASSWORD }}" +``` + +--- + +## Step 5: Scheduled Builds (Cron) + +### 5.1 Weekly Rebuild + +Keep images fresh with weekly rebuilds: + +```yaml +name: Weekly Image Rebuilds + +on: + schedule: + # Every Sunday at 3 AM UTC + - cron: '0 3 * * 0' + workflow_dispatch: + +jobs: + devpi: + uses: ./.forgejo/workflows/build-container.yml + with: + context: argocd/manifests/devpi + image_name: blumeops/devpi +``` + +--- + +## Future Improvements + +### Multi-Arch Builds + +For images that need both ARM64 and AMD64: + +```yaml +platforms: linux/arm64,linux/amd64 +``` + +Requires QEMU emulation setup in runner (already supported by buildx). + +### Build Caching + +Use GitHub/Forgejo cache actions: + +```yaml +- name: Cache Docker layers + uses: actions/cache@v4 + with: + path: /tmp/.buildx-cache + key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }} +``` + +### Security Scanning + +Add Trivy or similar: + +```yaml +- name: Run Trivy vulnerability scanner + uses: aquasecurity/trivy-action@master + with: + image-ref: '${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}' +``` + +--- + +## Verification Checklist + +- [ ] devpi build workflow created +- [ ] devpi image builds successfully +- [ ] Image pushed to zot registry +- [ ] devpi pod uses new image +- [ ] Reusable container workflow created +- [ ] (Optional) Python build workflow created +- [ ] (Optional) Scheduled builds configured + +--- + +## Summary + +With this phase complete, we have: +1. **Forgejo Actions** running with k8s runner +2. **Forgejo self-deploys** from CI on tagged releases +3. **Container images** built automatically on push +4. Infrastructure for Python package builds + +The CI/CD bootstrap is complete. Future work: +- Add more container builds as needed +- Add Python package publishing for internal tools +- Consider adding a macOS runner on indri for native builds diff --git a/plans/completed/k8s-migration/00_overview.md b/plans/completed/k8s-migration/00_overview.md new file mode 100644 index 0000000..5e336c0 --- /dev/null +++ b/plans/completed/k8s-migration/00_overview.md @@ -0,0 +1,79 @@ +# Blumeops Minikube Migration Plan + +**Status**: Completed (2026-01-23) + +This plan detailed the phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster. The migration is now complete for all services that will be migrated. + +## Final Status + +| Phase | Name | Status | Notes | +|-------|------|--------|-------| +| 0 | [Foundation](P0_foundation.complete.md) | ✅ Complete | Container registry (zot) + minikube cluster | +| 1 | [K8s Infrastructure](P1_k8s_infrastructure.complete.md) | ✅ Complete | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster | +| 2 | [Grafana](P2_grafana.complete.md) | ✅ Complete | Migrated Grafana via ArgoCD | +| 3 | [PostgreSQL](P3_postgresql.complete.md) | ✅ Complete | Data migration to k8s PostgreSQL | +| 4 | [Miniflux](P4_miniflux.complete.md) | ✅ Complete | Migrated Miniflux via ArgoCD | +| 5 | [devpi](P5_devpi.complete.md) | ✅ Complete | Migrated devpi via ArgoCD | +| 5.1 | [Docker Migration](P5.1_docker_migration.complete.md) | ✅ Complete | Switched minikube to docker driver (not QEMU2) | +| 6 | [Kiwix](P6_kiwix.complete.md) | ✅ Complete | Migrated Kiwix + Transmission via ArgoCD | +| 7 | [Forgejo](P7_forgejo.md) | ⏭️ Won't Do | Forgejo stays on indri - see [CI/CD Bootstrap](../../ci-cd-bootstrap/) | +| 8 | [Woodpecker](P8_woodpecker.md) | ⏭️ Won't Do | Replaced by Forgejo Actions - see [CI/CD Bootstrap](../../ci-cd-bootstrap/) | +| 9 | [Cleanup](P9_cleanup.md) | ⏭️ Won't Do | Observability cleanup done separately (2026-01-22) | + +## What Was Migrated to K8s + +| Service | Status | Notes | +|---------|--------|-------| +| Grafana | ✅ In k8s | Helm chart via ArgoCD | +| PostgreSQL | ✅ In k8s | CloudNativePG operator | +| Miniflux | ✅ In k8s | Using k8s PostgreSQL | +| devpi | ✅ In k8s | Custom container image | +| Kiwix | ✅ In k8s | NFS mount from sifaka | +| Transmission | ✅ In k8s | NFS mount from sifaka | +| Prometheus | ✅ In k8s | Migrated 2026-01-22 | +| Loki | ✅ In k8s | Migrated 2026-01-22 | +| Alloy (k8s) | ✅ In k8s | DaemonSet for pod logs | +| TeslaMate | ✅ In k8s | Added 2026-01-23 | + +## What Stays on Indri + +| Service | Reason | +|---------|--------| +| **Forgejo** | Critical infrastructure, avoids circular dependency with ArgoCD | +| **Zot Registry** | K8s needs images to start - must be outside k8s | +| **Alloy (host)** | Collects host-level metrics and logs | +| **Borgmatic** | Backup system must survive k8s failures | +| **Plex** | Uses own NAT traversal, not Tailscale | + +## Architecture Decisions Made + +### Minikube Driver: Docker (not QEMU2/Podman) +- Original plan called for QEMU2, but docker driver proved simpler +- NFS mounts work via Docker NAT through indri's LAN IP +- API server accessible via Tailscale TCP passthrough + +### Forgejo: Stays on Indri +- Original P7 planned k8s migration +- Decision changed: Forgejo is critical infrastructure +- Will be built from source via Forgejo Actions CI +- See [CI/CD Bootstrap Plan](../../ci-cd-bootstrap/) for details + +### CI/CD: Forgejo Actions (not Woodpecker) +- Original P8 planned Woodpecker deployment +- Decision changed: Use Forgejo's native Actions instead +- Simpler (one less system), GitHub Actions compatible +- See [CI/CD Bootstrap Plan](../../ci-cd-bootstrap/) for details + +### Observability: Migrated to K8s +- Original plan kept Prometheus/Loki on indri +- Changed: Migrated both to k8s (2026-01-22) +- Alloy on indri pushes to k8s endpoints +- Alloy DaemonSet in k8s collects pod logs + +## Lessons Learned + +1. **Docker driver is simpler than QEMU2** - Direct NFS mounts work, no VM complexity +2. **Tailscale operator works well** - Easy service exposure with automatic TLS +3. **CloudNativePG is production-ready** - Good operator, easy backups +4. **Keep critical infra outside k8s** - Forgejo and zot must survive k8s failures +5. **CGO matters on macOS** - Alloy needed CGO=1 for Tailscale DNS resolution diff --git a/plans/k8s-migration/P0_foundation.complete.md b/plans/completed/k8s-migration/P0_foundation.complete.md similarity index 100% rename from plans/k8s-migration/P0_foundation.complete.md rename to plans/completed/k8s-migration/P0_foundation.complete.md diff --git a/plans/k8s-migration/P1_k8s_infrastructure.complete.md b/plans/completed/k8s-migration/P1_k8s_infrastructure.complete.md similarity index 100% rename from plans/k8s-migration/P1_k8s_infrastructure.complete.md rename to plans/completed/k8s-migration/P1_k8s_infrastructure.complete.md diff --git a/plans/k8s-migration/P2_grafana.complete.md b/plans/completed/k8s-migration/P2_grafana.complete.md similarity index 100% rename from plans/k8s-migration/P2_grafana.complete.md rename to plans/completed/k8s-migration/P2_grafana.complete.md diff --git a/plans/k8s-migration/P3_postgresql.complete.md b/plans/completed/k8s-migration/P3_postgresql.complete.md similarity index 100% rename from plans/k8s-migration/P3_postgresql.complete.md rename to plans/completed/k8s-migration/P3_postgresql.complete.md diff --git a/plans/k8s-migration/P4_miniflux.complete.md b/plans/completed/k8s-migration/P4_miniflux.complete.md similarity index 100% rename from plans/k8s-migration/P4_miniflux.complete.md rename to plans/completed/k8s-migration/P4_miniflux.complete.md diff --git a/plans/k8s-migration/P5.1_docker_migration.complete.md b/plans/completed/k8s-migration/P5.1_docker_migration.complete.md similarity index 100% rename from plans/k8s-migration/P5.1_docker_migration.complete.md rename to plans/completed/k8s-migration/P5.1_docker_migration.complete.md diff --git a/plans/k8s-migration/P5_devpi.complete.md b/plans/completed/k8s-migration/P5_devpi.complete.md similarity index 100% rename from plans/k8s-migration/P5_devpi.complete.md rename to plans/completed/k8s-migration/P5_devpi.complete.md diff --git a/plans/k8s-migration/P6_kiwix.complete.md b/plans/completed/k8s-migration/P6_kiwix.complete.md similarity index 100% rename from plans/k8s-migration/P6_kiwix.complete.md rename to plans/completed/k8s-migration/P6_kiwix.complete.md diff --git a/plans/k8s-migration/P7_forgejo.md b/plans/completed/k8s-migration/P7_forgejo.md similarity index 100% rename from plans/k8s-migration/P7_forgejo.md rename to plans/completed/k8s-migration/P7_forgejo.md diff --git a/plans/k8s-migration/P8_woodpecker.md b/plans/completed/k8s-migration/P8_woodpecker.md similarity index 100% rename from plans/k8s-migration/P8_woodpecker.md rename to plans/completed/k8s-migration/P8_woodpecker.md diff --git a/plans/k8s-migration/P9_cleanup.md b/plans/completed/k8s-migration/P9_cleanup.md similarity index 100% rename from plans/k8s-migration/P9_cleanup.md rename to plans/completed/k8s-migration/P9_cleanup.md diff --git a/plans/k8s-migration/00_overview.md b/plans/k8s-migration/00_overview.md deleted file mode 100644 index 643122c..0000000 --- a/plans/k8s-migration/00_overview.md +++ /dev/null @@ -1,151 +0,0 @@ -# Blumeops Minikube Migration Plan - -This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes. - -## Phases - -| Phase | Name | Status | Description | -|-------|------|--------|-------------| -| 0 | [Foundation](P0_foundation.complete.md) | Complete | Container registry + minikube cluster | -| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | Complete | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster | -| 2 | [Grafana](P2_grafana.complete.md) | Complete | Migrate Grafana (pilot) via ArgoCD | -| 3 | [PostgreSQL](P3_postgresql.complete.md) | Complete | Data migration to k8s PostgreSQL | -| 4 | [Miniflux](P4_miniflux.complete.md) | Complete | Migrate Miniflux via ArgoCD | -| 5 | [devpi](P5_devpi.complete.md) | Complete | Migrate devpi via ArgoCD | -| 5.1 | [QEMU2 Migration](P5.1_qemu2_migration.md) | Pending | Switch minikube from podman to qemu2 driver | -| 6 | [Kiwix](P6_kiwix.md) | Blocked | Migrate Kiwix + Transmission via ArgoCD (blocked on P5.1) | -| 7 | [Forgejo](P7_forgejo.md) | Pending | Migrate Forgejo (highest risk) via ArgoCD | -| 8 | [Woodpecker](P8_woodpecker.md) | Pending | Deploy CI/CD via ArgoCD | -| 9 | [Cleanup](P9_cleanup.md) | Pending | Remove deprecated services | - -## Architecture Overview - -### Services Staying on Indri (Outside K8s) -| Service | Reason | -|---------|--------| -| **Zot Registry** (NEW) | Avoid circular dependency - k8s needs images to start | -| **Prometheus** | Observability backbone must survive k8s failures | -| **Loki** | Log aggregation backbone | -| **Borgmatic** | Backup system | -| **Grafana-alloy** | Metrics/logs collector on host | -| **Plex** | Until Jellyfin replacement | - -### Services Moving to K8s -| Service | Complexity | Dependencies | -|---------|------------|--------------| -| Grafana | LOW | Phase 1 | -| Kiwix | MEDIUM | Phase 5.1 (QEMU2), shared storage | -| Transmission | MEDIUM | Phase 5.1 (QEMU2), shared storage | -| Miniflux | MEDIUM | PostgreSQL | -| devpi | MEDIUM | Registry | -| PostgreSQL | HIGH | Phase 1 | -| Forgejo | HIGH | PostgreSQL | -| Woodpecker CI | MEDIUM | Forgejo | - -## Technical Decisions - -### Container Registry: Zot -- OCI-native, lightweight -- Native support for proxying multiple registries (Docker Hub, GHCR, Quay) -- Built from source at `~/code/3rd/zot` (not in homebrew) -- Binary: `~/code/3rd/zot/bin/zot-darwin-arm64` -- Config: `~/.config/zot/config.json` -- Data: `~/zot/` - -### Minikube Driver: QEMU2 (migrating from Podman) -- **Original choice (Podman)** proved unable to mount external volumes (NFS, SMB, hostPath) -- Podman's rootless containers lack CAP_SYS_ADMIN for filesystem mounts -- **QEMU2** creates an actual VM with full kernel capabilities -- Phase 5.1 handles the migration from podman to qemu2 -- `minikube start --driver=qemu2 --container-runtime=containerd` - -### PostgreSQL: CloudNativePG Operator -- Production-grade operator -- Built-in backup/restore -- Prometheus metrics -- PITR support - -### K8s Service Exposure: Tailscale Operator -- `loadBalancerClass: tailscale` on Services -- Automatic TLS and MagicDNS names -- ACL-controlled access - -### LaunchAgent Requirements (Critical) -LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths**: -- `/Users/erichblume/code/3rd/zot/bin/zot-darwin-arm64` for zot (built from source) -- `/opt/homebrew/opt/mise/bin/mise x --` for mise-managed tools -- `/opt/homebrew/opt/postgresql@18/bin/pg_dump` for postgres tools - -This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors). -`brew services` handles this automatically but those aren't tracked in ansible. - -### Backup Strategy - -Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at `/Volumes/backups`. This ensures backups continue even if k8s is down. - -| Service | Backup Approach | -|---------|-----------------| -| **Zot Registry** | No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control | -| **Minikube** | No backup of cluster state - declarative manifests in git, can recreate | -| **PostgreSQL (k8s)** | CloudNativePG scheduled backups to sifaka (Phase 1) | -| **Grafana (k8s)** | Dashboards in ansible source control, no runtime backup needed | -| **Miniflux (k8s)** | Database backed up via CloudNativePG | -| **Forgejo (k8s)** | Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration | -| **devpi (k8s)** | Private packages backed up, PyPI cache re-fetchable | -| **Kiwix (k8s)** | ZIM files re-downloadable via torrent, no backup needed | - -**Borgmatic config changes:** None required for Phase 0. Future phases may add k8s PV paths if needed. - ---- - -## Critical Files - -| File | Purpose | -|------|---------| -| `ansible/playbooks/indri.yml` | Main playbook - add k8s roles, remove migrated services | -| `ansible/roles/tailscale_serve/defaults/main.yml` | Transition services to Tailscale operator | -| `pulumi/policy.hujson` | Add tags: k8s, registry, ci | -| `ansible/roles/borgmatic/defaults/main.yml` | Update PostgreSQL endpoint | -| `mise-tasks/indri-services-check` | Add k8s health checks | - -## New Directory Structure - -``` -ansible/ - k8s/ - operators/ - tailscale-operator.yaml - cloudnative-pg.yaml - databases/ - blumeops-pg.yaml - apps/ - grafana/ - miniflux/ - forgejo/ - devpi/ - kiwix/ - woodpecker/ - roles/ - zot/ # NEW - podman/ # NEW - minikube/ # NEW -``` - -## Risk Mitigation - -- **Circular dependency prevention**: Zot registry runs outside k8s -- **Observability**: Prometheus/Loki stay on indri -- **Data loss prevention**: borgmatic + manual backups before each phase -- **Recovery**: Can manually push images, restore from backups - -## Container Images (All ARM64) - -| Service | Image | -|---------|-------| -| Miniflux | `ghcr.io/miniflux/miniflux:latest` | -| Forgejo | `codeberg.org/forgejo/forgejo:10` | -| Grafana | `grafana/grafana:latest` | -| Kiwix | `ghcr.io/kiwix/kiwix-serve:3.8.1` | -| Woodpecker | `woodpeckerci/woodpecker-server` | - -Note: Zot runs as a native binary on indri (built from source at `~/code/3rd/zot`), not as a container.