Retire k8s-migration plan and create ci-cd-bootstrap plan

This commit is contained in:
Erich Blume 2026-01-23 14:13:01 -08:00
commit 016f1043c8
18 changed files with 1741 additions and 151 deletions

View file

@ -0,0 +1,146 @@
# Forgejo Actions CI/CD Bootstrap Plan
This plan details the setup of Forgejo Actions as the CI/CD system for blumeops, starting with the bootstrapping problem: using Forgejo to build and deploy Forgejo itself.
## Goals
1. **Forgejo Actions** as the primary CI system (replaces Woodpecker from original plan)
2. **Self-hosted Forgejo** built from source, deployed as mcquack LaunchAgent on indri
3. **Container builds** for ArgoCD manifests (devpi, etc.)
4. **Cron-scheduled tasks** via k8s CronJobs (not Actions)
5. **Local development** parity using `act` for workflow testing
## Why Forgejo Actions over Woodpecker?
- Native integration with Forgejo (no OAuth setup, automatic repo detection)
- GitHub Actions compatible syntax (huge ecosystem of reusable actions)
- `act` tool for local testing on gilbert
- Single system to maintain instead of two
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ INDRI │
│ ┌─────────────────────┐ │
│ │ Forgejo │ ← Built from source │
│ │ (mcquack agent) │ ← Deploys itself via CI │
│ │ │ │
│ │ - Web UI (3001) │ │
│ │ - SSH (2200) │ │
│ │ - Actions enabled │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ SSH deploy
┌─────────────────────────────────────────────────────────────────┐
│ KUBERNETES (minikube) │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Forgejo Runner │ │ Other Services │ │
│ │ (act_runner) │ │ (via ArgoCD) │ │
│ │ │ │ │ │
│ │ - Polls Forgejo │ │ │ │
│ │ - Runs workflows │ │ │ │
│ │ - Docker-in-Docker │ │ │ │
│ └─────────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
## Phases
| Phase | Name | Description |
|-------|------|-------------|
| 1 | [Enable Actions](P1_enable_actions.md) | Configure Forgejo for Actions, deploy runner |
| 2 | [Mirror & Build](P2_mirror_and_build.md) | Mirror upstream Forgejo, create build workflow |
| 3 | [Self-Deploy](P3_self_deploy.md) | Forgejo deploys itself, transition to mcquack |
| 4 | [Container Builds](P4_container_builds.md) | Build custom container images (devpi, etc.) |
## The Bootstrap Problem
**Chicken-and-egg**: We need Forgejo Actions to build Forgejo, but Forgejo must be running first.
**Solution**:
1. Keep current brew-based Forgejo running during setup
2. Enable Actions, deploy runner
3. Mirror upstream Forgejo, create build workflow
4. First CI build creates the binary
5. CI deploys binary to indri as mcquack service
6. `brew services stop forgejo` and uninstall
7. Future builds: Forgejo builds and deploys itself
**Risk mitigation**: If self-deployment breaks Forgejo:
- blumeops is mirrored to GitHub
- Manual recovery: build on gilbert, scp to indri, restart service
- See Disaster Recovery section in P3
## Ansible Role Strategy
The forgejo ansible role will follow the zot/alloy pattern:
1. **Check binary exists** at expected path
2. **If missing**: Fail with message pointing to CI trigger instructions
3. **If present**: Deploy config, ensure LaunchAgent loaded
Ansible does NOT:
- Build the binary (that's CI's job)
- Deploy new versions (that's CI's job)
Ansible DOES:
- Manage app.ini configuration (sans secrets)
- Manage mcquack LaunchAgent plist
- Ensure service is running
- Collect logs via Alloy
## Files Summary
### New Files
| Path | Purpose |
|------|---------|
| `argocd/apps/forgejo-runner.yaml` | ArgoCD Application for runner |
| `argocd/manifests/forgejo-runner/` | Runner k8s manifests |
| `.forgejo/workflows/build-forgejo.yml` | Build workflow in blumeops repo |
| (on forge) `eblume/forgejo/.forgejo/workflows/` | Build workflow in forgejo mirror |
### Modified Files
| Path | Change |
|------|--------|
| `ansible/roles/forgejo/` | Complete rewrite for mcquack pattern |
| `ansible/roles/alloy/defaults/main.yml` | Update forgejo log paths |
| zk cards | Update forgejo, argocd, blumeops cards |
### Credentials Needed
| Item | Purpose | Storage |
|------|---------|---------|
| Runner registration token | Runner auth to Forgejo | 1Password |
| SSH deploy key | Runner SSH to indri | 1Password + k8s secret |
## Related Plans
- [P7_forgejo.md](../k8s-migration/P7_forgejo.md) - Original k8s migration plan (superseded for Forgejo itself, but SSH hostname split info still relevant)
- [P8_woodpecker.md](../k8s-migration/P8_woodpecker.md) - Original Woodpecker plan (superseded by Forgejo Actions)
## Decision Log
### 2026-01-23: Forgejo Actions over Woodpecker
**Decision**: Use Forgejo Actions instead of Woodpecker CI
**Rationale**:
- Native Forgejo integration (Actions is built-in)
- GitHub Actions compatible (reuse existing actions)
- `act` for local testing
- One less system to deploy and maintain
### 2026-01-23: Keep Forgejo on indri (not k8s)
**Decision**: Forgejo stays on indri as mcquack service, not migrated to k8s
**Rationale**:
- Avoid circular dependency (ArgoCD needs Forgejo to deploy Forgejo)
- Simpler SSH handling (direct port, no k8s networking complexity)
- Forgejo is critical infrastructure, benefits from isolation
- Can still use Tailscale serve for external access

View file

@ -0,0 +1,322 @@
# Phase 1: Enable Forgejo Actions
**Goal**: Configure Forgejo to support Actions workflows and deploy a runner in k8s
**Status**: Planning
**Prerequisites**: None (uses existing brew-based Forgejo)
---
## Current State
- Forgejo runs via `brew services` on indri
- Config at `/opt/homebrew/var/forgejo/custom/conf/app.ini`
- Actions not enabled
- No runners deployed
---
## Step 1: Enable Actions in Forgejo
### 1.1 Update app.ini
SSH to indri and edit the Forgejo config:
```bash
ssh indri 'vim /opt/homebrew/var/forgejo/custom/conf/app.ini'
```
Add the following sections:
```ini
[actions]
ENABLED = true
DEFAULT_ACTIONS_URL = https://code.forgejo.org
[repository]
; Allow workflows to be stored in .forgejo/workflows
DEFAULT_REPO_UNITS = repo.code,repo.issues,repo.pulls,repo.releases,repo.wiki,repo.projects,repo.packages,repo.actions
```
### 1.2 Restart Forgejo
```bash
ssh indri 'brew services restart forgejo'
```
### 1.3 Verify Actions Enabled
1. Go to https://forge.tail8d86e.ts.net
2. Navigate to any repo → Settings → Actions
3. Should see "Enable Repository Actions" option
---
## Step 2: Create Runner Registration Token
### 2.1 Generate Token in Forgejo UI
1. Go to https://forge.tail8d86e.ts.net/admin/actions/runners
2. Click "Create new Runner"
3. Copy the registration token
4. Store in 1Password (blumeops vault) as "Forgejo Runner Token"
### 2.2 Create k8s Secret Template
Create `argocd/manifests/forgejo-runner/secret-token.yaml.tpl`:
```yaml
# Template for op inject
apiVersion: v1
kind: Secret
metadata:
name: forgejo-runner-token
namespace: forgejo-runner
type: Opaque
stringData:
token: "op://blumeops/<runner-token-item>/token"
```
---
## Step 3: Deploy Runner to Kubernetes
### 3.1 Create ArgoCD Application
Create `argocd/apps/forgejo-runner.yaml`:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: forgejo-runner
namespace: argocd
spec:
project: default
source:
repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
targetRevision: main
path: argocd/manifests/forgejo-runner
destination:
server: https://kubernetes.default.svc
namespace: forgejo-runner
syncPolicy:
syncOptions:
- CreateNamespace=true
```
### 3.2 Create Runner Manifests
Create directory `argocd/manifests/forgejo-runner/` with:
**kustomization.yaml**:
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: forgejo-runner
resources:
- namespace.yaml
- deployment.yaml
- serviceaccount.yaml
- secret-token.yaml
```
**namespace.yaml**:
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: forgejo-runner
```
**serviceaccount.yaml**:
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: forgejo-runner
namespace: forgejo-runner
```
**deployment.yaml**:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: forgejo-runner
namespace: forgejo-runner
spec:
replicas: 1
selector:
matchLabels:
app: forgejo-runner
template:
metadata:
labels:
app: forgejo-runner
spec:
serviceAccountName: forgejo-runner
containers:
- name: runner
image: code.forgejo.org/forgejo/runner:3.5.1
env:
- name: FORGEJO_INSTANCE_URL
value: "https://forge.tail8d86e.ts.net"
- name: RUNNER_NAME
value: "k8s-runner-1"
- name: RUNNER_TOKEN
valueFrom:
secretKeyRef:
name: forgejo-runner-token
key: token
command:
- /bin/sh
- -c
- |
# Register runner if not already registered
if [ ! -f /data/.runner ]; then
forgejo-runner register \
--instance "$FORGEJO_INSTANCE_URL" \
--token "$RUNNER_TOKEN" \
--name "$RUNNER_NAME" \
--labels "ubuntu-latest:docker://node:20-bookworm,ubuntu-22.04:docker://ubuntu:22.04" \
--no-interactive
fi
# Start the runner daemon
forgejo-runner daemon
volumeMounts:
- name: runner-data
mountPath: /data
- name: docker-sock
mountPath: /var/run/docker.sock
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "1000m"
volumes:
- name: runner-data
emptyDir: {}
- name: docker-sock
hostPath:
path: /var/run/docker.sock
type: Socket
```
**Note**: The runner needs access to Docker to run workflow jobs in containers. In minikube with docker driver, `/var/run/docker.sock` is available.
---
## Step 4: Deploy and Verify
### 4.1 Inject Secrets and Deploy
```bash
# Inject secrets
op inject -i argocd/manifests/forgejo-runner/secret-token.yaml.tpl \
-o argocd/manifests/forgejo-runner/secret-token.yaml
# Sync apps
argocd app sync apps
argocd app sync forgejo-runner
```
### 4.2 Verify Runner Registration
```bash
# Check runner pod
kubectl --context=minikube-indri -n forgejo-runner get pods
# Check runner logs
kubectl --context=minikube-indri -n forgejo-runner logs -f deployment/forgejo-runner
# Verify in Forgejo UI
# Go to https://forge.tail8d86e.ts.net/admin/actions/runners
# Should see "k8s-runner-1" as online
```
---
## Step 5: Test with Simple Workflow
### 5.1 Create Test Workflow
In the blumeops repo, create `.forgejo/workflows/test.yml`:
```yaml
name: Test CI
on:
push:
branches: [main]
pull_request:
workflow_dispatch:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Hello World
run: |
echo "Hello from Forgejo Actions!"
echo "Runner: ${{ runner.name }}"
echo "Repo: ${{ github.repository }}"
```
### 5.2 Push and Verify
```bash
git add .forgejo/
git commit -m "Add test workflow for Forgejo Actions"
git push
```
Check https://forge.tail8d86e.ts.net/eblume/blumeops/actions for the workflow run.
---
## Verification Checklist
- [ ] Actions enabled in app.ini
- [ ] Forgejo restarted successfully
- [ ] Runner token stored in 1Password
- [ ] Runner deployment created in ArgoCD
- [ ] Runner pod running in k8s
- [ ] Runner shows as online in Forgejo admin
- [ ] Test workflow runs successfully
---
## Troubleshooting
### Runner Can't Connect to Forgejo
The runner needs to reach `forge.tail8d86e.ts.net` from inside k8s. This should work via Tailscale operator egress (already configured for ArgoCD).
If not working:
```bash
# Test from inside k8s
kubectl --context=minikube-indri run -it --rm curl --image=curlimages/curl -- \
curl -v https://forge.tail8d86e.ts.net/api/v1/version
```
### Docker Socket Permission Denied
The runner container needs to access the Docker socket. In minikube with docker driver, this should work. If permission denied:
```bash
# Check socket permissions
kubectl --context=minikube-indri -n forgejo-runner exec deployment/forgejo-runner -- ls -la /var/run/docker.sock
```
May need to run runner as root or adjust security context.
---
## Next Phase
Once runner is working, proceed to [Phase 2: Mirror & Build](P2_mirror_and_build.md).

View file

@ -0,0 +1,376 @@
# Phase 2: Mirror Forgejo & Create Build Workflow
**Goal**: Mirror upstream Forgejo to forge and create a workflow that builds it from source
**Status**: Planning
**Prerequisites**: [Phase 1](P1_enable_actions.md) complete (Actions enabled, runner deployed)
---
## Current State
- Forgejo Actions enabled with runner in k8s
- Upstream Forgejo at https://codeberg.org/forgejo/forgejo
- No local mirror yet
---
## Step 1: Mirror Upstream Forgejo
### 1.1 User Action: Create Mirror on Forge
**Manual step** (hairpinning doesn't work from indri):
1. Go to https://forge.tail8d86e.ts.net
2. Click "+" → "New Migration"
3. Select "Gitea" as clone source
4. URL: `https://codeberg.org/forgejo/forgejo.git`
5. Repository name: `forgejo`
6. Check "This repository will be a mirror"
7. Click "Migrate Repository"
### 1.2 Clone Mirror Locally
```bash
git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/forgejo.git ~/code/3rd/forgejo
cd ~/code/3rd/forgejo
```
---
## Step 2: Understand Forgejo Build Process
### 2.1 Build Requirements
From Forgejo's `Makefile` and docs:
- **Go**: 1.23+ (check `go.mod` for exact version)
- **Node.js**: 20+ (for frontend)
- **Make**: GNU Make
- **Git**: For version embedding
### 2.2 Build Commands
```bash
# Install frontend dependencies and build
make deps-frontend
make frontend
# Build backend
TAGS="bindata sqlite sqlite_unlock_notify" make backend
# Or all-in-one
TAGS="bindata sqlite sqlite_unlock_notify" make build
```
### 2.3 Output
Binary at `gitea` (yes, the binary is still named `gitea` for compatibility).
---
## Step 3: Create Build Workflow
### 3.1 SSH Deploy Key for Runner
The runner needs SSH access to indri to deploy the binary.
**Generate key on gilbert**:
```bash
ssh-keygen -t ed25519 -C "forgejo-runner-deploy" -f ~/.ssh/forgejo-runner-deploy
```
**Add public key to indri's authorized_keys**:
```bash
cat ~/.ssh/forgejo-runner-deploy.pub | ssh indri 'cat >> ~/.ssh/authorized_keys'
```
**Store private key in 1Password** (blumeops vault) as "Forgejo Runner Deploy Key"
**Add to k8s as secret**:
Create `argocd/manifests/forgejo-runner/secret-ssh.yaml.tpl`:
```yaml
apiVersion: v1
kind: Secret
metadata:
name: forgejo-runner-ssh
namespace: forgejo-runner
type: Opaque
stringData:
id_ed25519: |
op://blumeops/<deploy-key-item>/private-key
known_hosts: |
indri.tail8d86e.ts.net ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIxxxxxx
```
Get indri's host key:
```bash
ssh-keyscan indri.tail8d86e.ts.net 2>/dev/null | grep ed25519
```
### 3.2 Create Workflow File
Create `.forgejo/workflows/build.yml` in the forgejo mirror repo:
```yaml
name: Build Forgejo
on:
push:
tags:
- 'v*'
workflow_dispatch:
inputs:
deploy:
description: 'Deploy to indri after build'
required: false
default: 'true'
type: boolean
env:
GOPROXY: "https://proxy.golang.org,direct"
CGO_ENABLED: "1"
TAGS: "bindata sqlite sqlite_unlock_notify"
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0 # Need full history for version
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version-file: 'go.mod'
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Get version
id: version
run: |
if [[ "${{ github.ref }}" == refs/tags/* ]]; then
VERSION="${{ github.ref_name }}"
else
VERSION="$(git describe --tags --always)-dev"
fi
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
echo "Building version: $VERSION"
- name: Build frontend
run: |
make deps-frontend
make frontend
- name: Build backend
run: |
TAGS="${{ env.TAGS }}" make backend
./gitea --version
- name: Rename binary
run: |
mv gitea forgejo-${{ steps.version.outputs.version }}-darwin-arm64
ls -la forgejo-*
- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: forgejo-${{ steps.version.outputs.version }}-darwin-arm64
path: forgejo-${{ steps.version.outputs.version }}-darwin-arm64
deploy:
needs: build
runs-on: ubuntu-latest
if: github.event_name == 'push' || (github.event_name == 'workflow_dispatch' && github.event.inputs.deploy == 'true')
steps:
- name: Download artifact
uses: actions/download-artifact@v4
with:
name: forgejo-${{ needs.build.outputs.version }}-darwin-arm64
- name: Setup SSH
run: |
mkdir -p ~/.ssh
echo "${{ secrets.DEPLOY_SSH_KEY }}" > ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519
echo "${{ secrets.DEPLOY_KNOWN_HOSTS }}" > ~/.ssh/known_hosts
- name: Deploy to indri
run: |
BINARY="forgejo-*-darwin-arm64"
chmod +x $BINARY
# Copy binary to indri
scp $BINARY erichblume@indri.tail8d86e.ts.net:~/.local/bin/forgejo-new
# Atomic swap and restart
ssh erichblume@indri.tail8d86e.ts.net << 'EOF'
set -e
cd ~/.local/bin
# Verify the new binary runs
./forgejo-new --version
# Stop current service
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true
# Atomic swap
mv forgejo forgejo-old 2>/dev/null || true
mv forgejo-new forgejo
# Start new service
launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
# Verify it's running
sleep 5
curl -sf http://localhost:3001/api/v1/version || exit 1
echo "Deploy successful!"
./forgejo --version
EOF
```
### 3.3 Add Repository Secrets
In Forgejo, go to the forgejo repo → Settings → Actions → Secrets:
1. **DEPLOY_SSH_KEY**: Private key from 1Password
2. **DEPLOY_KNOWN_HOSTS**: Output of `ssh-keyscan indri.tail8d86e.ts.net`
---
## Step 4: Build Cross-Platform Consideration
**Important**: The runner runs Linux containers, but indri is macOS ARM64.
**Options**:
### Option A: Cross-compile (Simpler, may have issues)
Add to build job:
```yaml
env:
GOOS: darwin
GOARCH: arm64
```
CGO cross-compilation is tricky. May need to disable CGO or use a cross-compiler.
### Option B: Build on macOS (More reliable)
Run a macOS runner on indri itself (not in k8s).
```bash
# Install forgejo-runner on indri via mise
ssh indri 'mise use forgejo-runner'
# Register as a macOS runner
ssh indri 'forgejo-runner register --labels "macos-arm64:host" ...'
```
Then workflow uses:
```yaml
runs-on: macos-arm64
```
**Recommendation**: Option B is more reliable for native macOS builds. Consider deploying a runner directly on indri for macOS-specific builds.
---
## Step 5: Test the Build
### 5.1 Manual Workflow Dispatch
1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions
2. Select "Build Forgejo" workflow
3. Click "Run workflow"
4. Set deploy=false for first test
5. Monitor the run
### 5.2 Verify Artifact
Download the artifact from the workflow run and verify it's a valid binary:
```bash
# If downloaded to gilbert
file forgejo-*-darwin-arm64
# Should show: Mach-O 64-bit executable arm64
```
---
## Alternative: Build on Gilbert, Deploy via CI
If cross-compilation proves difficult, consider a hybrid approach:
1. **Build on gilbert** (has Go, Node, is macOS ARM64)
2. **CI just deploys** the built binary
Workflow in blumeops repo:
```yaml
name: Deploy Forgejo
on:
workflow_dispatch:
inputs:
binary_path:
description: 'Path to binary on gilbert'
required: true
jobs:
deploy:
runs-on: ubuntu-latest
steps:
# Fetch binary from gilbert and deploy to indri
# (requires SSH access to both)
```
This is less elegant but more pragmatic for macOS targets.
---
## Verification Checklist
- [ ] Forgejo mirrored to forge
- [ ] SSH deploy key created and stored in 1Password
- [ ] Deploy key added to indri authorized_keys
- [ ] SSH secret added to k8s
- [ ] Workflow file created in forgejo mirror
- [ ] Repository secrets configured
- [ ] Test build completes successfully
- [ ] Binary is valid macOS ARM64 executable
---
## Troubleshooting
### CGO Cross-Compilation Fails
If building Linux→macOS fails:
```
# runtime/cgo
gcc: error: unrecognized command line option '-arch'
```
Either:
1. Use Option B (macOS runner on indri)
2. Build with `CGO_ENABLED=0` (loses some features)
3. Use a Docker image with macOS cross-compiler (complex)
### Artifact Too Large
Forgejo binary is ~100MB. If upload fails:
- Check Forgejo's artifact size limit in app.ini
- Consider compressing: `gzip -9 forgejo-*`
---
## Next Phase
Once build is working and produces valid binaries, proceed to [Phase 3: Self-Deploy](P3_self_deploy.md).

View file

@ -0,0 +1,409 @@
# Phase 3: Self-Deploy & Transition to mcquack
**Goal**: Complete the bootstrap - Forgejo deploys itself, transition from brew to mcquack LaunchAgent
**Status**: Planning
**Prerequisites**: [Phase 2](P2_mirror_and_build.md) complete (build workflow produces valid binaries)
---
## Overview
This phase completes the bootstrap:
1. First successful CI deploy creates the binary
2. Transition from brew service to mcquack LaunchAgent
3. Update ansible role to mcquack pattern
4. Remove brew forgejo
After this phase, Forgejo builds and deploys itself on every tagged release.
---
## Step 1: Prepare indri for mcquack
### 1.1 Create Directory Structure
```bash
ssh indri << 'EOF'
mkdir -p ~/.local/bin
mkdir -p ~/.config/forgejo
mkdir -p ~/Library/Logs
EOF
```
### 1.2 Prepare Data Directory
The existing data is at `/opt/homebrew/var/forgejo`. We'll keep it there for now (simpler), or optionally migrate to `~/forgejo`.
**Option A: Keep existing path** (recommended for simplicity)
- Data stays at `/opt/homebrew/var/forgejo`
- Binary moves to `~/.local/bin/forgejo`
**Option B: Full migration**
- Move data to `~/forgejo`
- Requires updating app.ini paths
For this plan, we'll use Option A.
---
## Step 2: First CI Deploy
### 2.1 Trigger Build with Deploy
1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions
2. Select "Build Forgejo" workflow
3. Click "Run workflow"
4. Set deploy=true
5. Monitor the run
### 2.2 Verify Binary Deployed
```bash
ssh indri 'ls -la ~/.local/bin/forgejo && ~/.local/bin/forgejo --version'
```
At this point:
- New binary is at `~/.local/bin/forgejo`
- Brew forgejo is still running
- LaunchAgent doesn't exist yet
---
## Step 3: Create mcquack LaunchAgent
### 3.1 Create Plist Manually (One-Time Bootstrap)
```bash
ssh indri << 'EOF'
cat > ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist << 'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mcquack.eblume.forgejo</string>
<key>ProgramArguments</key>
<array>
<string>/Users/erichblume/.local/bin/forgejo</string>
<string>web</string>
<string>--config</string>
<string>/opt/homebrew/var/forgejo/custom/conf/app.ini</string>
<string>--work-path</string>
<string>/opt/homebrew/var/forgejo</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/Users/erichblume/Library/Logs/mcquack.forgejo.out.log</string>
<key>StandardErrorPath</key>
<string>/Users/erichblume/Library/Logs/mcquack.forgejo.err.log</string>
<key>EnvironmentVariables</key>
<dict>
<key>HOME</key>
<string>/Users/erichblume</string>
<key>USER</key>
<string>erichblume</string>
</dict>
</dict>
</plist>
PLIST
EOF
```
---
## Step 4: Cutover from Brew to mcquack
### 4.1 Stop Brew Service
```bash
ssh indri 'brew services stop forgejo'
```
### 4.2 Start mcquack Service
```bash
ssh indri 'launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist'
```
### 4.3 Verify Service Running
```bash
# Check process
ssh indri 'launchctl list | grep forgejo'
# Check logs
ssh indri 'tail -20 ~/Library/Logs/mcquack.forgejo.err.log'
# Check HTTP
curl -s https://forge.tail8d86e.ts.net/api/v1/version
```
### 4.4 Verify Git Operations
```bash
# SSH test
ssh -T forgejo@forge.tail8d86e.ts.net
# Clone test
git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/blumeops.git /tmp/test-clone
rm -rf /tmp/test-clone
```
---
## Step 5: Update Ansible Role
### 5.1 Rewrite forgejo Role
Replace `ansible/roles/forgejo/tasks/main.yml`:
```yaml
---
# Forgejo is built from source via CI and deployed automatically.
# This role manages the configuration and LaunchAgent only.
#
# BINARY DEPLOYMENT:
# The binary at ~/.local/bin/forgejo is deployed by Forgejo Actions CI.
# If missing, trigger a build at:
# https://forge.tail8d86e.ts.net/eblume/forgejo/actions
#
# CONFIGURATION:
# app.ini at /opt/homebrew/var/forgejo/custom/conf/app.ini contains secrets
# and is NOT managed by ansible. It is backed up by borgmatic.
- name: Verify forgejo binary exists
ansible.builtin.stat:
path: "{{ forgejo_binary }}"
register: forgejo_binary_stat
- name: Fail if forgejo binary not found
ansible.builtin.fail:
msg: |
Forgejo binary not found at {{ forgejo_binary }}.
The binary is deployed by Forgejo Actions CI. To build and deploy:
1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions
2. Select "Build Forgejo" workflow
3. Click "Run workflow" with deploy=true
Alternatively, build manually on gilbert and scp to indri.
when: not forgejo_binary_stat.stat.exists
- name: Check forgejo config exists
ansible.builtin.stat:
path: "{{ forgejo_config }}"
register: forgejo_config_stat
- name: Fail if forgejo config is missing
ansible.builtin.fail:
msg: |
Forgejo config not found at {{ forgejo_config }}
This file contains secrets and is not managed by ansible.
To restore from backup, run:
borgmatic --config ~/.config/borgmatic/config.yaml extract --archive latest \
--path {{ forgejo_config }}
when: not forgejo_config_stat.stat.exists
- name: Deploy forgejo LaunchAgent plist
ansible.builtin.template:
src: forgejo.plist.j2
dest: ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
mode: '0644'
notify: Restart forgejo
- name: Check if forgejo LaunchAgent is loaded
ansible.builtin.command: launchctl list mcquack.eblume.forgejo
register: forgejo_launchctl_check
changed_when: false
failed_when: false
- name: Load forgejo LaunchAgent if not loaded
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
when: forgejo_launchctl_check.rc != 0
changed_when: true
failed_when: false
```
### 5.2 Create defaults/main.yml
```yaml
---
# Forgejo binary and paths
forgejo_binary: /Users/erichblume/.local/bin/forgejo
forgejo_work_path: /opt/homebrew/var/forgejo
forgejo_config: "{{ forgejo_work_path }}/custom/conf/app.ini"
forgejo_log_dir: /Users/erichblume/Library/Logs
# HTTP and SSH ports (must match app.ini)
forgejo_http_port: 3001
forgejo_ssh_port: 2200
```
### 5.3 Create templates/forgejo.plist.j2
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!-- {{ ansible_managed }} -->
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>mcquack.eblume.forgejo</string>
<key>ProgramArguments</key>
<array>
<string>{{ forgejo_binary }}</string>
<string>web</string>
<string>--config</string>
<string>{{ forgejo_config }}</string>
<string>--work-path</string>
<string>{{ forgejo_work_path }}</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>{{ forgejo_log_dir }}/mcquack.forgejo.out.log</string>
<key>StandardErrorPath</key>
<string>{{ forgejo_log_dir }}/mcquack.forgejo.err.log</string>
<key>EnvironmentVariables</key>
<dict>
<key>HOME</key>
<string>/Users/erichblume</string>
<key>USER</key>
<string>erichblume</string>
</dict>
</dict>
</plist>
```
### 5.4 Update handlers/main.yml
```yaml
---
- name: Restart forgejo
ansible.builtin.shell: |
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true
launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
changed_when: true
```
---
## Step 6: Update Alloy Log Collection
Update `ansible/roles/alloy/defaults/main.yml`:
Change forgejo log paths from brew to mcquack:
```yaml
alloy_brew_logs:
# Remove forgejo from here
- path: /opt/homebrew/var/log/tailscaled.log
service: tailscale
stream: stdout
alloy_mcquack_logs:
# ... existing entries ...
- path: /Users/erichblume/Library/Logs/mcquack.forgejo.out.log
service: forgejo
stream: stdout
- path: /Users/erichblume/Library/Logs/mcquack.forgejo.err.log
service: forgejo
stream: stderr
```
---
## Step 7: Remove Brew Forgejo
### 7.1 Uninstall Brew Package
```bash
ssh indri 'brew uninstall forgejo'
```
### 7.2 Remove Old Logs
```bash
ssh indri 'rm -f /opt/homebrew/var/log/forgejo.log'
```
---
## Step 8: Run Ansible
```bash
mise run provision-indri -- --tags forgejo,alloy
```
---
## Disaster Recovery
### If CI Deploy Breaks Forgejo
1. **Build manually on gilbert**:
```bash
cd ~/code/3rd/forgejo
git pull
mise use go node
TAGS="bindata sqlite sqlite_unlock_notify" make build
scp gitea indri:~/.local/bin/forgejo
```
2. **Restart service**:
```bash
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist; launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist'
```
3. **Verify**:
```bash
curl https://forge.tail8d86e.ts.net/api/v1/version
```
### If Forgejo Won't Start
1. Check logs: `ssh indri 'tail -100 ~/Library/Logs/mcquack.forgejo.err.log'`
2. Check binary: `ssh indri '~/.local/bin/forgejo --version'`
3. Check config: `ssh indri 'cat /opt/homebrew/var/forgejo/custom/conf/app.ini | head -50'`
4. Try running manually: `ssh indri '~/.local/bin/forgejo web --config /opt/homebrew/var/forgejo/custom/conf/app.ini --work-path /opt/homebrew/var/forgejo'`
### Switch ArgoCD to GitHub (Nuclear Option)
If Forgejo is down and you need to deploy fixes:
```bash
argocd repo add https://github.com/eblume/blumeops.git --username eblume --password $GITHUB_PAT
argocd app set apps --repo https://github.com/eblume/blumeops.git
argocd app sync apps
```
After recovery, switch back to Forgejo.
---
## Verification Checklist
- [ ] CI deploy completed successfully
- [ ] Binary at `~/.local/bin/forgejo`
- [ ] mcquack LaunchAgent created
- [ ] Brew service stopped
- [ ] mcquack service started
- [ ] HTTP works (`curl https://forge.tail8d86e.ts.net/api/v1/version`)
- [ ] SSH works (`ssh -T forgejo@forge.tail8d86e.ts.net`)
- [ ] Git clone/push works
- [ ] Ansible role updated
- [ ] Alloy logs updated
- [ ] Brew package uninstalled
- [ ] `mise run provision-indri` succeeds
---
## Next Phase
After bootstrap is complete, proceed to [Phase 4: Container Builds](P4_container_builds.md) to set up container image building for ArgoCD.

View file

@ -0,0 +1,409 @@
# Phase 4: Container Image Builds
**Goal**: Set up CI workflows to build custom container images and push to zot registry
**Status**: Planning
**Prerequisites**: [Phase 3](P3_self_deploy.md) complete (Forgejo self-deploying, Actions working)
---
## Overview
With Forgejo Actions operational, we can now build container images for:
- Custom devpi with pre-installed plugins
- Any other custom images needed for k8s services
- Release artifacts for Python packages
---
## Use Case 1: devpi Custom Image
### Current State
devpi runs from `registry.tail8d86e.ts.net/blumeops/devpi:latest`, built manually:
- Base image: python
- Adds: devpi-server, devpi-web
- Startup script for auto-initialization
### Goal
Automate builds triggered by:
- Push to devpi repo on forge
- Manual workflow dispatch
- Optionally: upstream devpi release (via schedule check)
---
## Step 1: Create Workflow for devpi
### 1.1 Ensure devpi Repo Has Dockerfile
The Dockerfile already exists at `argocd/manifests/devpi/Dockerfile`. We'll create a workflow in the blumeops repo that builds it.
### 1.2 Create Build Workflow
Create `.forgejo/workflows/build-devpi.yml` in blumeops repo:
```yaml
name: Build devpi Image
on:
push:
paths:
- 'argocd/manifests/devpi/Dockerfile'
- 'argocd/manifests/devpi/start.sh'
- '.forgejo/workflows/build-devpi.yml'
workflow_dispatch:
inputs:
tag:
description: 'Image tag (default: latest)'
required: false
default: 'latest'
env:
REGISTRY: registry.tail8d86e.ts.net
IMAGE_NAME: blumeops/devpi
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Determine tag
id: tag
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
TAG="${{ github.event.inputs.tag }}"
else
TAG="latest"
fi
echo "tag=$TAG" >> "$GITHUB_OUTPUT"
- name: Build image
uses: docker/build-push-action@v5
with:
context: argocd/manifests/devpi
file: argocd/manifests/devpi/Dockerfile
platforms: linux/arm64
load: true
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}
- name: Push to registry
run: |
# Zot has no auth, just push
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}
- name: Verify push
run: |
# Check image exists in registry
curl -sf "https://${{ env.REGISTRY }}/v2/${{ env.IMAGE_NAME }}/tags/list" | jq .
```
### 1.3 Runner Needs Registry Access
The runner needs to reach `registry.tail8d86e.ts.net`. This should work via Tailscale egress (same as Forgejo access).
If not, add egress for registry in `argocd/manifests/tailscale-operator/`:
```yaml
apiVersion: tailscale.com/v1alpha1
kind: Connector
metadata:
name: egress-registry
namespace: tailscale-operator
spec:
hostname: egress-registry
subnetRouter:
advertiseRoutes:
- registry.tail8d86e.ts.net/32
```
---
## Step 2: Test Build Workflow
### 2.1 Push and Trigger
```bash
# Make a small change to trigger
echo "# Build $(date)" >> argocd/manifests/devpi/Dockerfile
git add argocd/manifests/devpi/Dockerfile
git commit -m "Trigger devpi image rebuild"
git push
```
### 2.2 Monitor Build
1. Go to https://forge.tail8d86e.ts.net/eblume/blumeops/actions
2. Watch "Build devpi Image" workflow
3. Verify success
### 2.3 Verify Image in Registry
```bash
curl -s https://registry.tail8d86e.ts.net/v2/blumeops/devpi/tags/list | jq .
```
### 2.4 Restart devpi to Use New Image
```bash
kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi
```
---
## Step 3: Reusable Container Build Workflow
### 3.1 Create Reusable Workflow
Create `.forgejo/workflows/build-container.yml`:
```yaml
name: Build Container Image
on:
workflow_call:
inputs:
context:
description: 'Build context path'
required: true
type: string
dockerfile:
description: 'Dockerfile path (relative to context)'
required: false
type: string
default: 'Dockerfile'
image_name:
description: 'Image name (without registry)'
required: true
type: string
tag:
description: 'Image tag'
required: false
type: string
default: 'latest'
platforms:
description: 'Target platforms'
required: false
type: string
default: 'linux/arm64'
env:
REGISTRY: registry.tail8d86e.ts.net
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push
uses: docker/build-push-action@v5
with:
context: ${{ inputs.context }}
file: ${{ inputs.context }}/${{ inputs.dockerfile }}
platforms: ${{ inputs.platforms }}
push: true
tags: ${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}
- name: Verify push
run: |
curl -sf "https://${{ env.REGISTRY }}/v2/${{ inputs.image_name }}/tags/list" | jq .
```
### 3.2 Use in devpi Workflow
Simplify `.forgejo/workflows/build-devpi.yml`:
```yaml
name: Build devpi Image
on:
push:
paths:
- 'argocd/manifests/devpi/**'
workflow_dispatch:
jobs:
build:
uses: ./.forgejo/workflows/build-container.yml
with:
context: argocd/manifests/devpi
image_name: blumeops/devpi
```
---
## Step 4: Python Package Builds (Optional)
### 4.1 Use Case
Build Python packages from forge repos and publish to devpi.
Example: `mcquack` package (LaunchAgent management library)
### 4.2 Create Python Build Workflow
Create `.forgejo/workflows/build-python.yml`:
```yaml
name: Build Python Package
on:
workflow_call:
inputs:
package_path:
description: 'Path to package (contains pyproject.toml)'
required: false
type: string
default: '.'
python_version:
description: 'Python version'
required: false
type: string
default: '3.12'
publish:
description: 'Publish to devpi'
required: false
type: boolean
default: false
secrets:
DEVPI_PASSWORD:
required: false
env:
DEVPI_URL: https://pypi.tail8d86e.ts.net
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python_version }}
- name: Install uv
run: pip install uv
- name: Build package
run: |
cd ${{ inputs.package_path }}
uv build
- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: dist
path: ${{ inputs.package_path }}/dist/
- name: Publish to devpi
if: inputs.publish
run: |
cd ${{ inputs.package_path }}
uv publish \
--publish-url ${{ env.DEVPI_URL }}/eblume/dev/ \
--username eblume \
--password "${{ secrets.DEVPI_PASSWORD }}"
```
---
## Step 5: Scheduled Builds (Cron)
### 5.1 Weekly Rebuild
Keep images fresh with weekly rebuilds:
```yaml
name: Weekly Image Rebuilds
on:
schedule:
# Every Sunday at 3 AM UTC
- cron: '0 3 * * 0'
workflow_dispatch:
jobs:
devpi:
uses: ./.forgejo/workflows/build-container.yml
with:
context: argocd/manifests/devpi
image_name: blumeops/devpi
```
---
## Future Improvements
### Multi-Arch Builds
For images that need both ARM64 and AMD64:
```yaml
platforms: linux/arm64,linux/amd64
```
Requires QEMU emulation setup in runner (already supported by buildx).
### Build Caching
Use GitHub/Forgejo cache actions:
```yaml
- name: Cache Docker layers
uses: actions/cache@v4
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}
```
### Security Scanning
Add Trivy or similar:
```yaml
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: '${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}'
```
---
## Verification Checklist
- [ ] devpi build workflow created
- [ ] devpi image builds successfully
- [ ] Image pushed to zot registry
- [ ] devpi pod uses new image
- [ ] Reusable container workflow created
- [ ] (Optional) Python build workflow created
- [ ] (Optional) Scheduled builds configured
---
## Summary
With this phase complete, we have:
1. **Forgejo Actions** running with k8s runner
2. **Forgejo self-deploys** from CI on tagged releases
3. **Container images** built automatically on push
4. Infrastructure for Python package builds
The CI/CD bootstrap is complete. Future work:
- Add more container builds as needed
- Add Python package publishing for internal tools
- Consider adding a macOS runner on indri for native builds

View file

@ -0,0 +1,79 @@
# Blumeops Minikube Migration Plan
**Status**: Completed (2026-01-23)
This plan detailed the phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster. The migration is now complete for all services that will be migrated.
## Final Status
| Phase | Name | Status | Notes |
|-------|------|--------|-------|
| 0 | [Foundation](P0_foundation.complete.md) | ✅ Complete | Container registry (zot) + minikube cluster |
| 1 | [K8s Infrastructure](P1_k8s_infrastructure.complete.md) | ✅ Complete | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster |
| 2 | [Grafana](P2_grafana.complete.md) | ✅ Complete | Migrated Grafana via ArgoCD |
| 3 | [PostgreSQL](P3_postgresql.complete.md) | ✅ Complete | Data migration to k8s PostgreSQL |
| 4 | [Miniflux](P4_miniflux.complete.md) | ✅ Complete | Migrated Miniflux via ArgoCD |
| 5 | [devpi](P5_devpi.complete.md) | ✅ Complete | Migrated devpi via ArgoCD |
| 5.1 | [Docker Migration](P5.1_docker_migration.complete.md) | ✅ Complete | Switched minikube to docker driver (not QEMU2) |
| 6 | [Kiwix](P6_kiwix.complete.md) | ✅ Complete | Migrated Kiwix + Transmission via ArgoCD |
| 7 | [Forgejo](P7_forgejo.md) | ⏭️ Won't Do | Forgejo stays on indri - see [CI/CD Bootstrap](../../ci-cd-bootstrap/) |
| 8 | [Woodpecker](P8_woodpecker.md) | ⏭️ Won't Do | Replaced by Forgejo Actions - see [CI/CD Bootstrap](../../ci-cd-bootstrap/) |
| 9 | [Cleanup](P9_cleanup.md) | ⏭️ Won't Do | Observability cleanup done separately (2026-01-22) |
## What Was Migrated to K8s
| Service | Status | Notes |
|---------|--------|-------|
| Grafana | ✅ In k8s | Helm chart via ArgoCD |
| PostgreSQL | ✅ In k8s | CloudNativePG operator |
| Miniflux | ✅ In k8s | Using k8s PostgreSQL |
| devpi | ✅ In k8s | Custom container image |
| Kiwix | ✅ In k8s | NFS mount from sifaka |
| Transmission | ✅ In k8s | NFS mount from sifaka |
| Prometheus | ✅ In k8s | Migrated 2026-01-22 |
| Loki | ✅ In k8s | Migrated 2026-01-22 |
| Alloy (k8s) | ✅ In k8s | DaemonSet for pod logs |
| TeslaMate | ✅ In k8s | Added 2026-01-23 |
## What Stays on Indri
| Service | Reason |
|---------|--------|
| **Forgejo** | Critical infrastructure, avoids circular dependency with ArgoCD |
| **Zot Registry** | K8s needs images to start - must be outside k8s |
| **Alloy (host)** | Collects host-level metrics and logs |
| **Borgmatic** | Backup system must survive k8s failures |
| **Plex** | Uses own NAT traversal, not Tailscale |
## Architecture Decisions Made
### Minikube Driver: Docker (not QEMU2/Podman)
- Original plan called for QEMU2, but docker driver proved simpler
- NFS mounts work via Docker NAT through indri's LAN IP
- API server accessible via Tailscale TCP passthrough
### Forgejo: Stays on Indri
- Original P7 planned k8s migration
- Decision changed: Forgejo is critical infrastructure
- Will be built from source via Forgejo Actions CI
- See [CI/CD Bootstrap Plan](../../ci-cd-bootstrap/) for details
### CI/CD: Forgejo Actions (not Woodpecker)
- Original P8 planned Woodpecker deployment
- Decision changed: Use Forgejo's native Actions instead
- Simpler (one less system), GitHub Actions compatible
- See [CI/CD Bootstrap Plan](../../ci-cd-bootstrap/) for details
### Observability: Migrated to K8s
- Original plan kept Prometheus/Loki on indri
- Changed: Migrated both to k8s (2026-01-22)
- Alloy on indri pushes to k8s endpoints
- Alloy DaemonSet in k8s collects pod logs
## Lessons Learned
1. **Docker driver is simpler than QEMU2** - Direct NFS mounts work, no VM complexity
2. **Tailscale operator works well** - Easy service exposure with automatic TLS
3. **CloudNativePG is production-ready** - Good operator, easy backups
4. **Keep critical infra outside k8s** - Forgejo and zot must survive k8s failures
5. **CGO matters on macOS** - Alloy needed CGO=1 for Tailscale DNS resolution

View file

@ -1,151 +0,0 @@
# Blumeops Minikube Migration Plan
This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes.
## Phases
| Phase | Name | Status | Description |
|-------|------|--------|-------------|
| 0 | [Foundation](P0_foundation.complete.md) | Complete | Container registry + minikube cluster |
| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | Complete | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster |
| 2 | [Grafana](P2_grafana.complete.md) | Complete | Migrate Grafana (pilot) via ArgoCD |
| 3 | [PostgreSQL](P3_postgresql.complete.md) | Complete | Data migration to k8s PostgreSQL |
| 4 | [Miniflux](P4_miniflux.complete.md) | Complete | Migrate Miniflux via ArgoCD |
| 5 | [devpi](P5_devpi.complete.md) | Complete | Migrate devpi via ArgoCD |
| 5.1 | [QEMU2 Migration](P5.1_qemu2_migration.md) | Pending | Switch minikube from podman to qemu2 driver |
| 6 | [Kiwix](P6_kiwix.md) | Blocked | Migrate Kiwix + Transmission via ArgoCD (blocked on P5.1) |
| 7 | [Forgejo](P7_forgejo.md) | Pending | Migrate Forgejo (highest risk) via ArgoCD |
| 8 | [Woodpecker](P8_woodpecker.md) | Pending | Deploy CI/CD via ArgoCD |
| 9 | [Cleanup](P9_cleanup.md) | Pending | Remove deprecated services |
## Architecture Overview
### Services Staying on Indri (Outside K8s)
| Service | Reason |
|---------|--------|
| **Zot Registry** (NEW) | Avoid circular dependency - k8s needs images to start |
| **Prometheus** | Observability backbone must survive k8s failures |
| **Loki** | Log aggregation backbone |
| **Borgmatic** | Backup system |
| **Grafana-alloy** | Metrics/logs collector on host |
| **Plex** | Until Jellyfin replacement |
### Services Moving to K8s
| Service | Complexity | Dependencies |
|---------|------------|--------------|
| Grafana | LOW | Phase 1 |
| Kiwix | MEDIUM | Phase 5.1 (QEMU2), shared storage |
| Transmission | MEDIUM | Phase 5.1 (QEMU2), shared storage |
| Miniflux | MEDIUM | PostgreSQL |
| devpi | MEDIUM | Registry |
| PostgreSQL | HIGH | Phase 1 |
| Forgejo | HIGH | PostgreSQL |
| Woodpecker CI | MEDIUM | Forgejo |
## Technical Decisions
### Container Registry: Zot
- OCI-native, lightweight
- Native support for proxying multiple registries (Docker Hub, GHCR, Quay)
- Built from source at `~/code/3rd/zot` (not in homebrew)
- Binary: `~/code/3rd/zot/bin/zot-darwin-arm64`
- Config: `~/.config/zot/config.json`
- Data: `~/zot/`
### Minikube Driver: QEMU2 (migrating from Podman)
- **Original choice (Podman)** proved unable to mount external volumes (NFS, SMB, hostPath)
- Podman's rootless containers lack CAP_SYS_ADMIN for filesystem mounts
- **QEMU2** creates an actual VM with full kernel capabilities
- Phase 5.1 handles the migration from podman to qemu2
- `minikube start --driver=qemu2 --container-runtime=containerd`
### PostgreSQL: CloudNativePG Operator
- Production-grade operator
- Built-in backup/restore
- Prometheus metrics
- PITR support
### K8s Service Exposure: Tailscale Operator
- `loadBalancerClass: tailscale` on Services
- Automatic TLS and MagicDNS names
- ACL-controlled access
### LaunchAgent Requirements (Critical)
LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths**:
- `/Users/erichblume/code/3rd/zot/bin/zot-darwin-arm64` for zot (built from source)
- `/opt/homebrew/opt/mise/bin/mise x --` for mise-managed tools
- `/opt/homebrew/opt/postgresql@18/bin/pg_dump` for postgres tools
This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors).
`brew services` handles this automatically but those aren't tracked in ansible.
### Backup Strategy
Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at `/Volumes/backups`. This ensures backups continue even if k8s is down.
| Service | Backup Approach |
|---------|-----------------|
| **Zot Registry** | No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control |
| **Minikube** | No backup of cluster state - declarative manifests in git, can recreate |
| **PostgreSQL (k8s)** | CloudNativePG scheduled backups to sifaka (Phase 1) |
| **Grafana (k8s)** | Dashboards in ansible source control, no runtime backup needed |
| **Miniflux (k8s)** | Database backed up via CloudNativePG |
| **Forgejo (k8s)** | Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration |
| **devpi (k8s)** | Private packages backed up, PyPI cache re-fetchable |
| **Kiwix (k8s)** | ZIM files re-downloadable via torrent, no backup needed |
**Borgmatic config changes:** None required for Phase 0. Future phases may add k8s PV paths if needed.
---
## Critical Files
| File | Purpose |
|------|---------|
| `ansible/playbooks/indri.yml` | Main playbook - add k8s roles, remove migrated services |
| `ansible/roles/tailscale_serve/defaults/main.yml` | Transition services to Tailscale operator |
| `pulumi/policy.hujson` | Add tags: k8s, registry, ci |
| `ansible/roles/borgmatic/defaults/main.yml` | Update PostgreSQL endpoint |
| `mise-tasks/indri-services-check` | Add k8s health checks |
## New Directory Structure
```
ansible/
k8s/
operators/
tailscale-operator.yaml
cloudnative-pg.yaml
databases/
blumeops-pg.yaml
apps/
grafana/
miniflux/
forgejo/
devpi/
kiwix/
woodpecker/
roles/
zot/ # NEW
podman/ # NEW
minikube/ # NEW
```
## Risk Mitigation
- **Circular dependency prevention**: Zot registry runs outside k8s
- **Observability**: Prometheus/Loki stay on indri
- **Data loss prevention**: borgmatic + manual backups before each phase
- **Recovery**: Can manually push images, restore from backups
## Container Images (All ARM64)
| Service | Image |
|---------|-------|
| Miniflux | `ghcr.io/miniflux/miniflux:latest` |
| Forgejo | `codeberg.org/forgejo/forgejo:10` |
| Grafana | `grafana/grafana:latest` |
| Kiwix | `ghcr.io/kiwix/kiwix-serve:3.8.1` |
| Woodpecker | `woodpeckerci/woodpecker-server` |
Note: Zot runs as a native binary on indri (built from source at `~/code/3rd/zot`), not as a container.