Remove plans, they dont seem to work
All checks were successful
Test CI / test (push) Successful in 3s
All checks were successful
Test CI / test (push) Successful in 3s
This commit is contained in:
parent
8ca8798121
commit
ceba6b3c2c
18 changed files with 0 additions and 6816 deletions
|
|
@ -1,179 +0,0 @@
|
|||
# Forgejo Actions CI/CD Bootstrap Plan
|
||||
|
||||
This plan details the setup of Forgejo Actions as the CI/CD system for blumeops, starting with the bootstrapping problem: using Forgejo to build and deploy Forgejo itself.
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Forgejo Actions** as the primary CI system (replaces Woodpecker from original plan)
|
||||
2. **Self-hosted Forgejo** built from source, deployed as mcquack LaunchAgent on indri
|
||||
3. **Container builds** for ArgoCD manifests (devpi, etc.)
|
||||
4. **Cron-scheduled tasks** via k8s CronJobs (not Actions)
|
||||
5. **Local development** parity using `act` for workflow testing
|
||||
|
||||
## Why Forgejo Actions over Woodpecker?
|
||||
|
||||
- Native integration with Forgejo (no OAuth setup, automatic repo detection)
|
||||
- GitHub Actions compatible syntax (huge ecosystem of reusable actions)
|
||||
- `act` tool for local testing on gilbert
|
||||
- Single system to maintain instead of two
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ INDRI │
|
||||
│ ┌─────────────────────┐ │
|
||||
│ │ Forgejo │ ← Built from source │
|
||||
│ │ (mcquack agent) │ ← Deploys itself via CI │
|
||||
│ │ │ │
|
||||
│ │ - Web UI (3001) │ │
|
||||
│ │ - SSH (2200) │ │
|
||||
│ │ - Actions enabled │ │
|
||||
│ └─────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ SSH deploy
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ KUBERNETES (minikube) │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ Forgejo Runner │ │ Other Services │ │
|
||||
│ │ (host mode) │ │ (via ArgoCD) │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ - Custom image │ │ │ │
|
||||
│ │ - Node.js + tools │ │ │ │
|
||||
│ │ - Docker builds │ │ │ │
|
||||
│ └─────────────────────┘ └─────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Phases
|
||||
|
||||
| Phase | Name | Description | Status |
|
||||
|-------|------|-------------|--------|
|
||||
| 1 | [Enable Actions](P1_enable_actions.md) | Configure Forgejo for Actions, deploy runner in host mode | ✅ Complete |
|
||||
| 2 | [Custom Runner Image](P2_mirror_and_build.md) | Build custom runner with Node.js/tools, enable standard Actions | ✅ Complete |
|
||||
| 3 | [Mirror Forgejo & Build](P3_mirror_forgejo.md) | Mirror upstream Forgejo, create build workflow | Planning |
|
||||
| 4 | [Self-Deploy](P4_self_deploy.md) | Forgejo deploys itself, transition to mcquack | Planning |
|
||||
| 5 | [Container Builds](P5_container_builds.md) | Build custom container images (devpi, etc.) | Planning |
|
||||
|
||||
## The Bootstrap Problem
|
||||
|
||||
**Chicken-and-egg**: We need Forgejo Actions to build Forgejo, but Forgejo must be running first.
|
||||
|
||||
**Additional complication**: The stock runner image lacks Node.js, so standard GitHub Actions don't work.
|
||||
|
||||
**Solution**:
|
||||
1. Keep current brew-based Forgejo running during setup ✅
|
||||
2. Enable Actions, deploy runner in host mode ✅
|
||||
3. **Build custom runner image** with Node.js and tools (bootstrap manually, then automate)
|
||||
4. Mirror upstream Forgejo, create build workflow
|
||||
5. Address cross-compilation challenge (Linux runner → macOS target)
|
||||
6. First CI build creates the binary
|
||||
7. CI deploys binary to indri as mcquack service
|
||||
8. `brew services stop forgejo` and uninstall
|
||||
9. Future builds: Forgejo builds and deploys itself
|
||||
|
||||
**Cross-compilation challenge**:
|
||||
The runner runs in Linux containers (k8s), but Forgejo needs to run on indri (macOS ARM64). Options:
|
||||
- Cross-compile with CGO_ENABLED=1 (complex, needs OSX toolchain)
|
||||
- Cross-compile with CGO_ENABLED=0 (breaks Tailscale DNS resolution)
|
||||
- Build on gilbert manually, use CI only for deploy
|
||||
- Run a native macOS runner on indri (outside k8s)
|
||||
|
||||
This will be addressed in Phase 3.
|
||||
|
||||
**Risk mitigation**: If self-deployment breaks Forgejo:
|
||||
- blumeops is mirrored to GitHub
|
||||
- Manual recovery: build on gilbert, scp to indri, restart service
|
||||
- See Disaster Recovery section in P4
|
||||
|
||||
## Host Mode Runner
|
||||
|
||||
The runner uses **host mode** (`ubuntu-latest:host`), meaning:
|
||||
- Jobs run directly in the runner container (no Docker/k8s pods spawned)
|
||||
- Tools must be pre-installed in the runner image
|
||||
- Stock image lacks Node.js, so `actions/checkout@v4` doesn't work
|
||||
- Solution: Build custom runner image with necessary tools (Phase 2)
|
||||
|
||||
## Ansible Role Strategy
|
||||
|
||||
The forgejo ansible role will follow the zot/alloy pattern:
|
||||
|
||||
1. **Check binary exists** at expected path
|
||||
2. **If missing**: Fail with message pointing to CI trigger instructions
|
||||
3. **If present**: Deploy config, ensure LaunchAgent loaded
|
||||
|
||||
Ansible does NOT:
|
||||
- Build the binary (that's CI's job)
|
||||
- Deploy new versions (that's CI's job)
|
||||
|
||||
Ansible DOES:
|
||||
- Manage app.ini configuration (via template with secrets from 1Password)
|
||||
- Manage mcquack LaunchAgent plist
|
||||
- Ensure service is running
|
||||
- Collect logs via Alloy
|
||||
|
||||
## Files Summary
|
||||
|
||||
### New Files
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `argocd/apps/forgejo-runner.yaml` | ArgoCD Application for runner ✅ |
|
||||
| `argocd/manifests/forgejo-runner/` | Runner k8s manifests ✅ |
|
||||
| `argocd/manifests/forgejo-runner/Dockerfile` | Custom runner image (P2) |
|
||||
| `.forgejo/workflows/build-runner.yml` | Auto-rebuild runner image (P2) |
|
||||
| `.forgejo/workflows/test.yml` | Test workflow ✅ |
|
||||
| (on forge) `eblume/forgejo/.forgejo/workflows/` | Build workflow in forgejo mirror (P3) |
|
||||
|
||||
### Modified Files
|
||||
|
||||
| Path | Change |
|
||||
|------|--------|
|
||||
| `ansible/roles/forgejo/` | Complete rewrite for mcquack pattern (P4) |
|
||||
| `ansible/roles/alloy/defaults/main.yml` | Update forgejo log paths (P4) |
|
||||
| zk cards | Update forgejo, argocd, blumeops cards |
|
||||
|
||||
### Credentials Needed
|
||||
|
||||
| Item | Purpose | Storage |
|
||||
|------|---------|---------|
|
||||
| Runner registration token | Runner auth to Forgejo | 1Password ✅ |
|
||||
| SSH deploy key | Runner SSH to indri (for Forgejo deploy) | 1Password + k8s secret (P3) |
|
||||
|
||||
## Related Plans
|
||||
|
||||
- [P7_forgejo.md](../k8s-migration/P7_forgejo.md) - Original k8s migration plan (superseded for Forgejo itself, but SSH hostname split info still relevant)
|
||||
- [P8_woodpecker.md](../k8s-migration/P8_woodpecker.md) - Original Woodpecker plan (superseded by Forgejo Actions)
|
||||
|
||||
## Decision Log
|
||||
|
||||
### 2026-01-23: Custom runner image as Phase 2
|
||||
|
||||
**Decision**: Move custom runner image work from P4 to P2
|
||||
|
||||
**Rationale**:
|
||||
- Stock runner lacks Node.js, can't run `actions/checkout@v4`
|
||||
- Need working GitHub Actions before building Forgejo
|
||||
- Bootstrap manually (podman build on gilbert), then automate
|
||||
|
||||
### 2026-01-23: Forgejo Actions over Woodpecker
|
||||
|
||||
**Decision**: Use Forgejo Actions instead of Woodpecker CI
|
||||
|
||||
**Rationale**:
|
||||
- Native Forgejo integration (Actions is built-in)
|
||||
- GitHub Actions compatible (reuse existing actions)
|
||||
- `act` for local testing
|
||||
- One less system to deploy and maintain
|
||||
|
||||
### 2026-01-23: Keep Forgejo on indri (not k8s)
|
||||
|
||||
**Decision**: Forgejo stays on indri as mcquack service, not migrated to k8s
|
||||
|
||||
**Rationale**:
|
||||
- Avoid circular dependency (ArgoCD needs Forgejo to deploy Forgejo)
|
||||
- Simpler SSH handling (direct port, no k8s networking complexity)
|
||||
- Forgejo is critical infrastructure, benefits from isolation
|
||||
- Can still use Tailscale serve for external access
|
||||
|
|
@ -1,322 +0,0 @@
|
|||
# Phase 1: Enable Forgejo Actions
|
||||
|
||||
**Goal**: Configure Forgejo to support Actions workflows and deploy a runner in k8s
|
||||
|
||||
**Status**: Completed (2026-01-23)
|
||||
|
||||
**Prerequisites**: None (uses existing brew-based Forgejo)
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
- Forgejo runs via `brew services` on indri
|
||||
- Config at `/opt/homebrew/var/forgejo/custom/conf/app.ini`
|
||||
- Actions not enabled
|
||||
- No runners deployed
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Enable Actions in Forgejo
|
||||
|
||||
### 1.1 Update app.ini
|
||||
|
||||
SSH to indri and edit the Forgejo config:
|
||||
|
||||
```bash
|
||||
ssh indri 'vim /opt/homebrew/var/forgejo/custom/conf/app.ini'
|
||||
```
|
||||
|
||||
Add the following sections:
|
||||
|
||||
```ini
|
||||
[actions]
|
||||
ENABLED = true
|
||||
DEFAULT_ACTIONS_URL = https://code.forgejo.org
|
||||
|
||||
[repository]
|
||||
; Allow workflows to be stored in .forgejo/workflows
|
||||
DEFAULT_REPO_UNITS = repo.code,repo.issues,repo.pulls,repo.releases,repo.wiki,repo.projects,repo.packages,repo.actions
|
||||
```
|
||||
|
||||
### 1.2 Restart Forgejo
|
||||
|
||||
```bash
|
||||
ssh indri 'brew services restart forgejo'
|
||||
```
|
||||
|
||||
### 1.3 Verify Actions Enabled
|
||||
|
||||
1. Go to https://forge.tail8d86e.ts.net
|
||||
2. Navigate to any repo → Settings → Actions
|
||||
3. Should see "Enable Repository Actions" option
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Create Runner Registration Token
|
||||
|
||||
### 2.1 Generate Token in Forgejo UI
|
||||
|
||||
1. Go to https://forge.tail8d86e.ts.net/admin/actions/runners
|
||||
2. Click "Create new Runner"
|
||||
3. Copy the registration token
|
||||
4. Store in 1Password (blumeops vault) as "Forgejo Runner Token"
|
||||
|
||||
### 2.2 Create k8s Secret Template
|
||||
|
||||
Create `argocd/manifests/forgejo-runner/secret-token.yaml.tpl`:
|
||||
|
||||
```yaml
|
||||
# Template for op inject
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: forgejo-runner-token
|
||||
namespace: forgejo-runner
|
||||
type: Opaque
|
||||
stringData:
|
||||
token: "op://blumeops/<runner-token-item>/token"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Deploy Runner to Kubernetes
|
||||
|
||||
### 3.1 Create ArgoCD Application
|
||||
|
||||
Create `argocd/apps/forgejo-runner.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: forgejo-runner
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
|
||||
targetRevision: main
|
||||
path: argocd/manifests/forgejo-runner
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: forgejo-runner
|
||||
syncPolicy:
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
```
|
||||
|
||||
### 3.2 Create Runner Manifests
|
||||
|
||||
Create directory `argocd/manifests/forgejo-runner/` with:
|
||||
|
||||
**kustomization.yaml**:
|
||||
```yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: forgejo-runner
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- deployment.yaml
|
||||
- serviceaccount.yaml
|
||||
- secret-token.yaml
|
||||
```
|
||||
|
||||
**namespace.yaml**:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: forgejo-runner
|
||||
```
|
||||
|
||||
**serviceaccount.yaml**:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: forgejo-runner
|
||||
namespace: forgejo-runner
|
||||
```
|
||||
|
||||
**deployment.yaml**:
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: forgejo-runner
|
||||
namespace: forgejo-runner
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: forgejo-runner
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: forgejo-runner
|
||||
spec:
|
||||
serviceAccountName: forgejo-runner
|
||||
containers:
|
||||
- name: runner
|
||||
image: code.forgejo.org/forgejo/runner:3.5.1
|
||||
env:
|
||||
- name: FORGEJO_INSTANCE_URL
|
||||
value: "https://forge.tail8d86e.ts.net"
|
||||
- name: RUNNER_NAME
|
||||
value: "k8s-runner-1"
|
||||
- name: RUNNER_TOKEN
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: forgejo-runner-token
|
||||
key: token
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- |
|
||||
# Register runner if not already registered
|
||||
if [ ! -f /data/.runner ]; then
|
||||
forgejo-runner register \
|
||||
--instance "$FORGEJO_INSTANCE_URL" \
|
||||
--token "$RUNNER_TOKEN" \
|
||||
--name "$RUNNER_NAME" \
|
||||
--labels "ubuntu-latest:docker://node:20-bookworm,ubuntu-22.04:docker://ubuntu:22.04" \
|
||||
--no-interactive
|
||||
fi
|
||||
# Start the runner daemon
|
||||
forgejo-runner daemon
|
||||
volumeMounts:
|
||||
- name: runner-data
|
||||
mountPath: /data
|
||||
- name: docker-sock
|
||||
mountPath: /var/run/docker.sock
|
||||
resources:
|
||||
requests:
|
||||
memory: "256Mi"
|
||||
cpu: "100m"
|
||||
limits:
|
||||
memory: "1Gi"
|
||||
cpu: "1000m"
|
||||
volumes:
|
||||
- name: runner-data
|
||||
emptyDir: {}
|
||||
- name: docker-sock
|
||||
hostPath:
|
||||
path: /var/run/docker.sock
|
||||
type: Socket
|
||||
```
|
||||
|
||||
**Note**: The runner needs access to Docker to run workflow jobs in containers. In minikube with docker driver, `/var/run/docker.sock` is available.
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Deploy and Verify
|
||||
|
||||
### 4.1 Inject Secrets and Deploy
|
||||
|
||||
```bash
|
||||
# Inject secrets
|
||||
op inject -i argocd/manifests/forgejo-runner/secret-token.yaml.tpl \
|
||||
-o argocd/manifests/forgejo-runner/secret-token.yaml
|
||||
|
||||
# Sync apps
|
||||
argocd app sync apps
|
||||
argocd app sync forgejo-runner
|
||||
```
|
||||
|
||||
### 4.2 Verify Runner Registration
|
||||
|
||||
```bash
|
||||
# Check runner pod
|
||||
kubectl --context=minikube-indri -n forgejo-runner get pods
|
||||
|
||||
# Check runner logs
|
||||
kubectl --context=minikube-indri -n forgejo-runner logs -f deployment/forgejo-runner
|
||||
|
||||
# Verify in Forgejo UI
|
||||
# Go to https://forge.tail8d86e.ts.net/admin/actions/runners
|
||||
# Should see "k8s-runner-1" as online
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Test with Simple Workflow
|
||||
|
||||
### 5.1 Create Test Workflow
|
||||
|
||||
In the blumeops repo, create `.forgejo/workflows/test.yml`:
|
||||
|
||||
```yaml
|
||||
name: Test CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Hello World
|
||||
run: |
|
||||
echo "Hello from Forgejo Actions!"
|
||||
echo "Runner: ${{ runner.name }}"
|
||||
echo "Repo: ${{ github.repository }}"
|
||||
```
|
||||
|
||||
### 5.2 Push and Verify
|
||||
|
||||
```bash
|
||||
git add .forgejo/
|
||||
git commit -m "Add test workflow for Forgejo Actions"
|
||||
git push
|
||||
```
|
||||
|
||||
Check https://forge.tail8d86e.ts.net/eblume/blumeops/actions for the workflow run.
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [x] Actions enabled in app.ini
|
||||
- [x] Forgejo restarted successfully
|
||||
- [x] Runner token stored in 1Password
|
||||
- [x] Runner deployment created in ArgoCD
|
||||
- [x] Runner pod running in k8s
|
||||
- [x] Runner shows as online in Forgejo admin
|
||||
- [x] Test workflow runs successfully
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Runner Can't Connect to Forgejo
|
||||
|
||||
The runner needs to reach `forge.tail8d86e.ts.net` from inside k8s. This should work via Tailscale operator egress (already configured for ArgoCD).
|
||||
|
||||
If not working:
|
||||
```bash
|
||||
# Test from inside k8s
|
||||
kubectl --context=minikube-indri run -it --rm curl --image=curlimages/curl -- \
|
||||
curl -v https://forge.tail8d86e.ts.net/api/v1/version
|
||||
```
|
||||
|
||||
### Docker Socket Permission Denied
|
||||
|
||||
The runner container needs to access the Docker socket. In minikube with docker driver, this should work. If permission denied:
|
||||
|
||||
```bash
|
||||
# Check socket permissions
|
||||
kubectl --context=minikube-indri -n forgejo-runner exec deployment/forgejo-runner -- ls -la /var/run/docker.sock
|
||||
```
|
||||
|
||||
May need to run runner as root or adjust security context.
|
||||
|
||||
---
|
||||
|
||||
## Next Phase
|
||||
|
||||
Once runner is working, proceed to [Phase 2: Mirror & Build](P2_mirror_and_build.md).
|
||||
|
|
@ -1,347 +0,0 @@
|
|||
# Phase 2: Custom Runner Image
|
||||
|
||||
**Goal**: Build a custom forgejo-runner image with necessary tools, enabling standard GitHub Actions
|
||||
|
||||
**Status**: Complete (2026-01-23)
|
||||
|
||||
**Prerequisites**: [Phase 1](P1_enable_actions.md) complete (Actions enabled, runner deployed in host mode)
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The stock `code.forgejo.org/forgejo/runner:3.5.1` image lacks tools needed for standard GitHub Actions:
|
||||
- **Node.js** - Required by most actions (checkout, setup-*, etc.)
|
||||
- **Git** - For repository operations (present but minimal)
|
||||
- **Common build tools** - make, gcc, curl, jq, etc.
|
||||
|
||||
In host mode, jobs run directly in the runner container, so these tools must be pre-installed.
|
||||
|
||||
### Chicken-and-Egg Problem
|
||||
|
||||
We can't use `actions/checkout@v4` to build the custom runner because that action requires Node.js, which we don't have yet. Solution: Bootstrap manually, then automate.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Create Dockerfile for Custom Runner
|
||||
|
||||
Create `argocd/manifests/forgejo-runner/Dockerfile`:
|
||||
|
||||
```dockerfile
|
||||
FROM code.forgejo.org/forgejo/runner:3.5.1
|
||||
|
||||
# The base image is Debian-based
|
||||
# Install tools needed for GitHub Actions and builds
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
# Required for actions/checkout and other Node-based actions
|
||||
nodejs \
|
||||
npm \
|
||||
# Build essentials
|
||||
git \
|
||||
curl \
|
||||
wget \
|
||||
jq \
|
||||
make \
|
||||
gcc \
|
||||
g++ \
|
||||
# For container builds (if we add Docker-in-Docker later)
|
||||
ca-certificates \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Verify Node.js is available
|
||||
RUN node --version && npm --version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Bootstrap - Build Image Manually
|
||||
|
||||
Since we can't use CI yet, build the image manually on gilbert and push to zot.
|
||||
|
||||
### 2.1 Build with Podman
|
||||
|
||||
```bash
|
||||
cd ~/code/personal/blumeops/argocd/manifests/forgejo-runner
|
||||
|
||||
# Build for linux/arm64 (minikube on M1 Mac)
|
||||
podman build --platform linux/arm64 -t registry.tail8d86e.ts.net/blumeops/forgejo-runner:latest .
|
||||
|
||||
# Push to zot (no auth required)
|
||||
podman push registry.tail8d86e.ts.net/blumeops/forgejo-runner:latest
|
||||
```
|
||||
|
||||
### 2.2 Verify Image in Registry
|
||||
|
||||
```bash
|
||||
curl -s https://registry.tail8d86e.ts.net/v2/blumeops/forgejo-runner/tags/list | jq .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Update Runner Deployment
|
||||
|
||||
### 3.1 Update deployment.yaml
|
||||
|
||||
Change the image from stock to custom:
|
||||
|
||||
```yaml
|
||||
# Before
|
||||
image: code.forgejo.org/forgejo/runner:3.5.1
|
||||
|
||||
# After
|
||||
image: registry.tail8d86e.ts.net/blumeops/forgejo-runner:latest
|
||||
```
|
||||
|
||||
### 3.2 Update kustomization.yaml
|
||||
|
||||
Add Dockerfile to resources (for reference, not deployed):
|
||||
|
||||
```yaml
|
||||
# Note: Dockerfile is for building, not k8s deployment
|
||||
# It lives here for co-location with the runner manifests
|
||||
```
|
||||
|
||||
### 3.3 Sync Deployment
|
||||
|
||||
```bash
|
||||
argocd app sync forgejo-runner
|
||||
|
||||
# Verify new image is running
|
||||
kubectl --context=minikube-indri -n forgejo-runner get pods -o jsonpath='{.items[*].spec.containers[*].image}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Test with Real GitHub Action
|
||||
|
||||
Now that we have Node.js, test with `actions/checkout@v4`.
|
||||
|
||||
### 4.1 Update Test Workflow
|
||||
|
||||
Update `.forgejo/workflows/test.yml`:
|
||||
|
||||
```yaml
|
||||
name: Test CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Verify tools
|
||||
run: |
|
||||
echo "Node.js: $(node --version)"
|
||||
echo "npm: $(npm --version)"
|
||||
echo "Git: $(git --version)"
|
||||
echo "Make: $(make --version | head -1)"
|
||||
|
||||
- name: Show repo info
|
||||
run: |
|
||||
echo "Repository: ${{ github.repository }}"
|
||||
echo "Branch: ${{ github.ref_name }}"
|
||||
ls -la
|
||||
```
|
||||
|
||||
### 4.2 Push and Verify
|
||||
|
||||
```bash
|
||||
git add .forgejo/workflows/test.yml
|
||||
git commit -m "Test checkout action with custom runner"
|
||||
git push
|
||||
```
|
||||
|
||||
Check https://forge.tail8d86e.ts.net/eblume/blumeops/actions - should see successful run with `actions/checkout@v4`.
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Create Auto-Build Workflow for Runner
|
||||
|
||||
Now that Actions work properly, create a workflow to rebuild the runner image automatically.
|
||||
|
||||
### 5.1 Create Build Workflow
|
||||
|
||||
Create `.forgejo/workflows/build-runner.yml`:
|
||||
|
||||
```yaml
|
||||
name: Build Runner Image
|
||||
|
||||
on:
|
||||
push:
|
||||
paths:
|
||||
- 'argocd/manifests/forgejo-runner/Dockerfile'
|
||||
- '.forgejo/workflows/build-runner.yml'
|
||||
workflow_dispatch:
|
||||
|
||||
env:
|
||||
REGISTRY: registry.tail8d86e.ts.net
|
||||
IMAGE_NAME: blumeops/forgejo-runner
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Build image
|
||||
run: |
|
||||
cd argocd/manifests/forgejo-runner
|
||||
# Use docker build (available in runner container)
|
||||
# Note: This builds for the runner's native arch
|
||||
docker build -t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} .
|
||||
docker tag ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
|
||||
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
|
||||
|
||||
- name: Push to registry
|
||||
run: |
|
||||
# Zot has no auth, just push
|
||||
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
|
||||
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
|
||||
|
||||
- name: Verify push
|
||||
run: |
|
||||
curl -sf "https://${{ env.REGISTRY }}/v2/${{ env.IMAGE_NAME }}/tags/list" | jq .
|
||||
echo "Image pushed: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}"
|
||||
```
|
||||
|
||||
### 5.2 Note on Docker-in-Docker
|
||||
|
||||
The runner runs in host mode, so we need Docker CLI available. Options:
|
||||
|
||||
1. **Add Docker CLI to the custom image** (see Dockerfile update below)
|
||||
2. **Mount Docker socket from minikube** (requires deployment change)
|
||||
3. **Use Podman instead** (rootless, no socket needed)
|
||||
|
||||
For now, we'll add Docker CLI to the image and mount the socket.
|
||||
|
||||
### 5.3 Update Dockerfile for Docker Builds
|
||||
|
||||
```dockerfile
|
||||
FROM code.forgejo.org/forgejo/runner:3.5.1
|
||||
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
nodejs \
|
||||
npm \
|
||||
git \
|
||||
curl \
|
||||
wget \
|
||||
jq \
|
||||
make \
|
||||
gcc \
|
||||
g++ \
|
||||
ca-certificates \
|
||||
# Docker CLI for building container images
|
||||
docker.io \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN node --version && npm --version && docker --version
|
||||
```
|
||||
|
||||
### 5.4 Update Deployment for Docker Socket
|
||||
|
||||
Add Docker socket mount to `deployment.yaml`:
|
||||
|
||||
```yaml
|
||||
volumeMounts:
|
||||
- name: runner-data
|
||||
mountPath: /data
|
||||
- name: runner-config
|
||||
mountPath: /config
|
||||
- name: docker-sock
|
||||
mountPath: /var/run/docker.sock
|
||||
volumes:
|
||||
- name: runner-data
|
||||
emptyDir: {}
|
||||
- name: runner-config
|
||||
configMap:
|
||||
name: forgejo-runner-config
|
||||
- name: docker-sock
|
||||
hostPath:
|
||||
path: /var/run/docker.sock
|
||||
type: Socket
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Verification
|
||||
|
||||
### 6.1 Manual Image Build Works
|
||||
|
||||
```bash
|
||||
# On gilbert
|
||||
podman build --platform linux/arm64 -t registry.tail8d86e.ts.net/blumeops/forgejo-runner:test .
|
||||
podman push registry.tail8d86e.ts.net/blumeops/forgejo-runner:test
|
||||
```
|
||||
|
||||
### 6.2 Runner Uses Custom Image
|
||||
|
||||
```bash
|
||||
kubectl --context=minikube-indri -n forgejo-runner get pods -o jsonpath='{.items[*].spec.containers[*].image}'
|
||||
# Should show: registry.tail8d86e.ts.net/blumeops/forgejo-runner:latest
|
||||
```
|
||||
|
||||
### 6.3 GitHub Actions Work
|
||||
|
||||
- `actions/checkout@v4` succeeds
|
||||
- Test workflow shows Node.js, npm, git versions
|
||||
|
||||
### 6.4 Auto-Build Workflow Works
|
||||
|
||||
Push a change to the Dockerfile and verify:
|
||||
1. Workflow triggers
|
||||
2. Image builds successfully
|
||||
3. Image pushed to zot
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [x] Dockerfile created for custom runner (Alpine-based with apk)
|
||||
- [x] Image built manually on gilbert (podman build)
|
||||
- [x] Image pushed to zot registry
|
||||
- [x] Runner deployment updated to use custom image
|
||||
- [x] Runner pod running with new image
|
||||
- [x] `actions/checkout@v4` works in test workflow
|
||||
- [ ] Auto-build workflow created (deferred - needs Docker socket)
|
||||
- [ ] Docker socket mounted (for container builds)
|
||||
- [ ] Auto-build workflow successfully rebuilds runner
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Image Pull Fails in Minikube
|
||||
|
||||
Minikube needs to be able to pull from zot. Check registry mirror config:
|
||||
```bash
|
||||
ssh indri 'minikube ssh -- cat /etc/containerd/certs.d/registry.tail8d86e.ts.net/hosts.toml'
|
||||
```
|
||||
|
||||
### Docker Build Fails in Workflow
|
||||
|
||||
If Docker socket mount doesn't work:
|
||||
1. Check socket exists in minikube: `minikube ssh -- ls -la /var/run/docker.sock`
|
||||
2. Check permissions: runner may need to be in docker group
|
||||
3. Alternative: Use `podman` (rootless) instead of Docker
|
||||
|
||||
### Node.js Actions Still Fail
|
||||
|
||||
Ensure the runner pod restarted after image update:
|
||||
```bash
|
||||
kubectl --context=minikube-indri -n forgejo-runner rollout restart deployment/forgejo-runner
|
||||
kubectl --context=minikube-indri -n forgejo-runner logs -f deployment/forgejo-runner
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Phase
|
||||
|
||||
Once the custom runner is working with auto-build, proceed to [Phase 3: Mirror Forgejo & Build](P3_mirror_and_build.md) to set up Forgejo source builds.
|
||||
|
|
@ -1,349 +0,0 @@
|
|||
# Phase 3: Mirror Forgejo & Build from Source
|
||||
|
||||
**Goal**: Mirror upstream Forgejo to forge and create a workflow that builds it for macOS ARM64
|
||||
|
||||
**Status**: Planning
|
||||
|
||||
**Prerequisites**: [Phase 2](P2_mirror_and_build.md) complete (custom runner image with Node.js/tools)
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
We want to build Forgejo from source to:
|
||||
1. Have full control over the binary running on indri
|
||||
2. Enable self-deployment via CI
|
||||
3. Ensure proper macOS DNS resolution (requires CGO_ENABLED=1)
|
||||
|
||||
### The Cross-Compilation Challenge
|
||||
|
||||
The runner runs in a Linux container (k8s on indri), but the target is macOS ARM64 (indri itself).
|
||||
|
||||
**Options**:
|
||||
|
||||
| Option | Pros | Cons |
|
||||
|--------|------|------|
|
||||
| A. Cross-compile CGO_ENABLED=0 | Simple, no special toolchain | Breaks Tailscale MagicDNS resolution |
|
||||
| B. Cross-compile CGO_ENABLED=1 | Proper DNS | Needs OSX cross-compiler (osxcross), complex |
|
||||
| C. Build on gilbert manually | Works now, simple | Not automated, manual step |
|
||||
| D. Native macOS runner on indri | Full native build | Runner outside k8s, different architecture |
|
||||
| E. Hybrid: build on gilbert, deploy via CI | Uses existing tools | Partial automation |
|
||||
|
||||
**Recommendation**: Start with Option C/E (manual build on gilbert, CI just deploys), then consider Option D if we want full automation.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Mirror Upstream Forgejo
|
||||
|
||||
### 1.1 User Action: Create Mirror on Forge
|
||||
|
||||
**Manual step** (hairpinning doesn't work from indri):
|
||||
|
||||
1. Go to https://forge.tail8d86e.ts.net
|
||||
2. Click "+" → "New Migration"
|
||||
3. Select "Gitea" as clone source
|
||||
4. URL: `https://codeberg.org/forgejo/forgejo.git`
|
||||
5. Repository name: `forgejo`
|
||||
6. Check "This repository will be a mirror"
|
||||
7. Click "Migrate Repository"
|
||||
|
||||
### 1.2 Clone Mirror Locally
|
||||
|
||||
```bash
|
||||
git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/forgejo.git ~/code/3rd/forgejo
|
||||
cd ~/code/3rd/forgejo
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Understand Forgejo Build Process
|
||||
|
||||
### 2.1 Build Requirements
|
||||
|
||||
From Forgejo's `Makefile` and docs:
|
||||
|
||||
- **Go**: 1.23+ (check `go.mod` for exact version)
|
||||
- **Node.js**: 20+ (for frontend)
|
||||
- **Make**: GNU Make
|
||||
- **Git**: For version embedding
|
||||
|
||||
### 2.2 Build Commands
|
||||
|
||||
```bash
|
||||
# Install frontend dependencies and build
|
||||
make deps-frontend
|
||||
make frontend
|
||||
|
||||
# Build backend (with CGO for proper DNS on macOS)
|
||||
CGO_ENABLED=1 TAGS="bindata sqlite sqlite_unlock_notify" make backend
|
||||
|
||||
# Or all-in-one
|
||||
CGO_ENABLED=1 TAGS="bindata sqlite sqlite_unlock_notify" make build
|
||||
```
|
||||
|
||||
### 2.3 Output
|
||||
|
||||
Binary at `gitea` (yes, the binary is still named `gitea` for compatibility).
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Build on Gilbert (Manual Bootstrap)
|
||||
|
||||
For the initial bootstrap, build on gilbert (macOS ARM64 native).
|
||||
|
||||
### 3.1 Setup Build Environment
|
||||
|
||||
```bash
|
||||
cd ~/code/3rd/forgejo
|
||||
mise use go@1.23 node@20
|
||||
|
||||
# Verify tools
|
||||
go version
|
||||
node --version
|
||||
make --version
|
||||
```
|
||||
|
||||
### 3.2 Build
|
||||
|
||||
```bash
|
||||
# Clean build
|
||||
make clean
|
||||
|
||||
# Build frontend
|
||||
make deps-frontend
|
||||
make frontend
|
||||
|
||||
# Build backend with CGO (important for macOS DNS!)
|
||||
CGO_ENABLED=1 TAGS="bindata sqlite sqlite_unlock_notify" make backend
|
||||
|
||||
# Verify binary
|
||||
./gitea --version
|
||||
file gitea # Should show: Mach-O 64-bit executable arm64
|
||||
```
|
||||
|
||||
### 3.3 Deploy to Indri
|
||||
|
||||
```bash
|
||||
# Copy binary
|
||||
scp gitea indri:~/.local/bin/forgejo-new
|
||||
|
||||
# Verify on indri
|
||||
ssh indri '~/.local/bin/forgejo-new --version'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Create Deploy Workflow (Option E)
|
||||
|
||||
Since cross-compilation is complex, use a hybrid approach:
|
||||
1. Build on gilbert (manual trigger or pre-built)
|
||||
2. CI workflow fetches and deploys
|
||||
|
||||
### 4.1 SSH Deploy Key for Runner
|
||||
|
||||
The runner needs SSH access to indri to deploy the binary.
|
||||
|
||||
**Generate key on gilbert**:
|
||||
```bash
|
||||
ssh-keygen -t ed25519 -C "forgejo-runner-deploy" -f ~/.ssh/forgejo-runner-deploy -N ""
|
||||
```
|
||||
|
||||
**Add public key to indri's authorized_keys**:
|
||||
```bash
|
||||
cat ~/.ssh/forgejo-runner-deploy.pub | ssh indri 'cat >> ~/.ssh/authorized_keys'
|
||||
```
|
||||
|
||||
**Store private key in 1Password** (blumeops vault) as "Forgejo Runner Deploy Key"
|
||||
|
||||
### 4.2 Create k8s Secret
|
||||
|
||||
Create `argocd/manifests/forgejo-runner/secret-ssh.yaml.tpl`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: forgejo-runner-ssh
|
||||
namespace: forgejo-runner
|
||||
type: Opaque
|
||||
stringData:
|
||||
id_ed25519: |
|
||||
op://blumeops/<deploy-key-item>/private-key
|
||||
known_hosts: |
|
||||
# Get with: ssh-keyscan indri.tail8d86e.ts.net 2>/dev/null | grep ed25519
|
||||
indri.tail8d86e.ts.net ssh-ed25519 AAAAC3...
|
||||
```
|
||||
|
||||
### 4.3 Update Deployment for SSH
|
||||
|
||||
Add SSH secret mount to `deployment.yaml`:
|
||||
|
||||
```yaml
|
||||
volumeMounts:
|
||||
- name: ssh-key
|
||||
mountPath: /root/.ssh
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: ssh-key
|
||||
secret:
|
||||
secretName: forgejo-runner-ssh
|
||||
defaultMode: 0600
|
||||
```
|
||||
|
||||
### 4.4 Create Deploy-Only Workflow
|
||||
|
||||
Create `.forgejo/workflows/deploy-forgejo.yml` in blumeops:
|
||||
|
||||
```yaml
|
||||
name: Deploy Forgejo
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
version:
|
||||
description: 'Version to deploy (tag or commit)'
|
||||
required: true
|
||||
default: 'v10.0.0'
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Deploy to indri
|
||||
env:
|
||||
VERSION: ${{ github.event.inputs.version }}
|
||||
run: |
|
||||
# SSH config
|
||||
mkdir -p ~/.ssh
|
||||
cp /root/.ssh/id_ed25519 ~/.ssh/
|
||||
cp /root/.ssh/known_hosts ~/.ssh/
|
||||
chmod 600 ~/.ssh/id_ed25519
|
||||
|
||||
# Deploy script
|
||||
ssh erichblume@indri.tail8d86e.ts.net << 'EOF'
|
||||
set -e
|
||||
cd ~/.local/bin
|
||||
|
||||
# Verify the new binary exists and runs
|
||||
if [ ! -f forgejo-new ]; then
|
||||
echo "ERROR: forgejo-new not found. Build on gilbert first:"
|
||||
echo " cd ~/code/3rd/forgejo && git checkout $VERSION"
|
||||
echo " CGO_ENABLED=1 TAGS='bindata sqlite sqlite_unlock_notify' make build"
|
||||
echo " scp gitea indri:~/.local/bin/forgejo-new"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
./forgejo-new --version
|
||||
|
||||
# Stop current service
|
||||
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true
|
||||
|
||||
# Atomic swap
|
||||
mv forgejo forgejo-old 2>/dev/null || true
|
||||
mv forgejo-new forgejo
|
||||
|
||||
# Start new service
|
||||
launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
|
||||
|
||||
# Verify it's running
|
||||
sleep 5
|
||||
curl -sf http://localhost:3001/api/v1/version || exit 1
|
||||
|
||||
echo "Deploy successful!"
|
||||
./forgejo --version
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Future: Full CI Build (Option D)
|
||||
|
||||
If we want full automation, consider running a native macOS runner on indri:
|
||||
|
||||
### Native Runner on Indri
|
||||
|
||||
```bash
|
||||
# Install forgejo-runner on indri via mise
|
||||
ssh indri 'mise use forgejo-runner'
|
||||
|
||||
# Register as a macOS runner
|
||||
ssh indri 'forgejo-runner register \
|
||||
--instance https://forge.tail8d86e.ts.net \
|
||||
--token "$TOKEN" \
|
||||
--name "indri-native" \
|
||||
--labels "macos-arm64:host" \
|
||||
--no-interactive'
|
||||
|
||||
# Create LaunchAgent for runner
|
||||
# (similar to other mcquack services)
|
||||
```
|
||||
|
||||
Then workflow uses:
|
||||
```yaml
|
||||
runs-on: macos-arm64
|
||||
```
|
||||
|
||||
This enables full native builds in CI. Document in a future phase if needed.
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] Forgejo mirrored to forge
|
||||
- [ ] Mirror cloned to ~/code/3rd/forgejo
|
||||
- [ ] Build succeeds on gilbert
|
||||
- [ ] Binary is valid macOS ARM64 executable
|
||||
- [ ] Binary deployed to indri ~/.local/bin/
|
||||
- [ ] SSH deploy key created and stored in 1Password
|
||||
- [ ] Deploy key added to indri authorized_keys
|
||||
- [ ] (Optional) k8s SSH secret created
|
||||
- [ ] (Optional) Deploy workflow created
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Build Fails: Node.js Version
|
||||
|
||||
```
|
||||
error: engine "node" is incompatible
|
||||
```
|
||||
|
||||
Update Node.js: `mise use node@20`
|
||||
|
||||
### Build Fails: Go Version
|
||||
|
||||
```
|
||||
go: go.mod requires go >= 1.23
|
||||
```
|
||||
|
||||
Update Go: `mise use go@1.23`
|
||||
|
||||
### Binary Crashes on indri
|
||||
|
||||
Check if CGO was enabled:
|
||||
```bash
|
||||
# If built without CGO, DNS resolution may fail
|
||||
./forgejo --version # Should work
|
||||
./forgejo web # May fail to resolve Tailscale hostnames
|
||||
```
|
||||
|
||||
Rebuild with `CGO_ENABLED=1`.
|
||||
|
||||
### SSH Deploy Fails
|
||||
|
||||
Check runner has SSH access:
|
||||
```bash
|
||||
# Test from inside runner pod
|
||||
kubectl --context=minikube-indri -n forgejo-runner exec deployment/forgejo-runner -- \
|
||||
ssh -i /root/.ssh/id_ed25519 erichblume@indri.tail8d86e.ts.net 'echo ok'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Phase
|
||||
|
||||
Once Forgejo is building and deploying successfully, proceed to [Phase 4: Self-Deploy](P4_self_deploy.md) for the full mcquack transition.
|
||||
|
|
@ -1,409 +0,0 @@
|
|||
# Phase 4: Self-Deploy & Transition to mcquack
|
||||
|
||||
**Goal**: Complete the bootstrap - Forgejo deploys itself, transition from brew to mcquack LaunchAgent
|
||||
|
||||
**Status**: Planning
|
||||
|
||||
**Prerequisites**: [Phase 3](P3_mirror_forgejo.md) complete (Forgejo builds and deploys to indri)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This phase completes the bootstrap:
|
||||
1. First successful CI deploy creates the binary
|
||||
2. Transition from brew service to mcquack LaunchAgent
|
||||
3. Update ansible role to mcquack pattern
|
||||
4. Remove brew forgejo
|
||||
|
||||
After this phase, Forgejo builds and deploys itself on every tagged release.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Prepare indri for mcquack
|
||||
|
||||
### 1.1 Create Directory Structure
|
||||
|
||||
```bash
|
||||
ssh indri << 'EOF'
|
||||
mkdir -p ~/.local/bin
|
||||
mkdir -p ~/.config/forgejo
|
||||
mkdir -p ~/Library/Logs
|
||||
EOF
|
||||
```
|
||||
|
||||
### 1.2 Prepare Data Directory
|
||||
|
||||
The existing data is at `/opt/homebrew/var/forgejo`. We'll keep it there for now (simpler), or optionally migrate to `~/forgejo`.
|
||||
|
||||
**Option A: Keep existing path** (recommended for simplicity)
|
||||
- Data stays at `/opt/homebrew/var/forgejo`
|
||||
- Binary moves to `~/.local/bin/forgejo`
|
||||
|
||||
**Option B: Full migration**
|
||||
- Move data to `~/forgejo`
|
||||
- Requires updating app.ini paths
|
||||
|
||||
For this plan, we'll use Option A.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: First CI Deploy
|
||||
|
||||
### 2.1 Trigger Build with Deploy
|
||||
|
||||
1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions
|
||||
2. Select "Build Forgejo" workflow
|
||||
3. Click "Run workflow"
|
||||
4. Set deploy=true
|
||||
5. Monitor the run
|
||||
|
||||
### 2.2 Verify Binary Deployed
|
||||
|
||||
```bash
|
||||
ssh indri 'ls -la ~/.local/bin/forgejo && ~/.local/bin/forgejo --version'
|
||||
```
|
||||
|
||||
At this point:
|
||||
- New binary is at `~/.local/bin/forgejo`
|
||||
- Brew forgejo is still running
|
||||
- LaunchAgent doesn't exist yet
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Create mcquack LaunchAgent
|
||||
|
||||
### 3.1 Create Plist Manually (One-Time Bootstrap)
|
||||
|
||||
```bash
|
||||
ssh indri << 'EOF'
|
||||
cat > ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist << 'PLIST'
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>mcquack.eblume.forgejo</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/Users/erichblume/.local/bin/forgejo</string>
|
||||
<string>web</string>
|
||||
<string>--config</string>
|
||||
<string>/opt/homebrew/var/forgejo/custom/conf/app.ini</string>
|
||||
<string>--work-path</string>
|
||||
<string>/opt/homebrew/var/forgejo</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
<key>StandardOutPath</key>
|
||||
<string>/Users/erichblume/Library/Logs/mcquack.forgejo.out.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/Users/erichblume/Library/Logs/mcquack.forgejo.err.log</string>
|
||||
<key>EnvironmentVariables</key>
|
||||
<dict>
|
||||
<key>HOME</key>
|
||||
<string>/Users/erichblume</string>
|
||||
<key>USER</key>
|
||||
<string>erichblume</string>
|
||||
</dict>
|
||||
</dict>
|
||||
</plist>
|
||||
PLIST
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Cutover from Brew to mcquack
|
||||
|
||||
### 4.1 Stop Brew Service
|
||||
|
||||
```bash
|
||||
ssh indri 'brew services stop forgejo'
|
||||
```
|
||||
|
||||
### 4.2 Start mcquack Service
|
||||
|
||||
```bash
|
||||
ssh indri 'launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist'
|
||||
```
|
||||
|
||||
### 4.3 Verify Service Running
|
||||
|
||||
```bash
|
||||
# Check process
|
||||
ssh indri 'launchctl list | grep forgejo'
|
||||
|
||||
# Check logs
|
||||
ssh indri 'tail -20 ~/Library/Logs/mcquack.forgejo.err.log'
|
||||
|
||||
# Check HTTP
|
||||
curl -s https://forge.tail8d86e.ts.net/api/v1/version
|
||||
```
|
||||
|
||||
### 4.4 Verify Git Operations
|
||||
|
||||
```bash
|
||||
# SSH test
|
||||
ssh -T forgejo@forge.tail8d86e.ts.net
|
||||
|
||||
# Clone test
|
||||
git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/blumeops.git /tmp/test-clone
|
||||
rm -rf /tmp/test-clone
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Update Ansible Role
|
||||
|
||||
### 5.1 Rewrite forgejo Role
|
||||
|
||||
Replace `ansible/roles/forgejo/tasks/main.yml`:
|
||||
|
||||
```yaml
|
||||
---
|
||||
# Forgejo is built from source via CI and deployed automatically.
|
||||
# This role manages the configuration and LaunchAgent only.
|
||||
#
|
||||
# BINARY DEPLOYMENT:
|
||||
# The binary at ~/.local/bin/forgejo is deployed by Forgejo Actions CI.
|
||||
# If missing, trigger a build at:
|
||||
# https://forge.tail8d86e.ts.net/eblume/forgejo/actions
|
||||
#
|
||||
# CONFIGURATION:
|
||||
# app.ini at /opt/homebrew/var/forgejo/custom/conf/app.ini contains secrets
|
||||
# and is NOT managed by ansible. It is backed up by borgmatic.
|
||||
|
||||
- name: Verify forgejo binary exists
|
||||
ansible.builtin.stat:
|
||||
path: "{{ forgejo_binary }}"
|
||||
register: forgejo_binary_stat
|
||||
|
||||
- name: Fail if forgejo binary not found
|
||||
ansible.builtin.fail:
|
||||
msg: |
|
||||
Forgejo binary not found at {{ forgejo_binary }}.
|
||||
|
||||
The binary is deployed by Forgejo Actions CI. To build and deploy:
|
||||
1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions
|
||||
2. Select "Build Forgejo" workflow
|
||||
3. Click "Run workflow" with deploy=true
|
||||
|
||||
Alternatively, build manually on gilbert and scp to indri.
|
||||
when: not forgejo_binary_stat.stat.exists
|
||||
|
||||
- name: Check forgejo config exists
|
||||
ansible.builtin.stat:
|
||||
path: "{{ forgejo_config }}"
|
||||
register: forgejo_config_stat
|
||||
|
||||
- name: Fail if forgejo config is missing
|
||||
ansible.builtin.fail:
|
||||
msg: |
|
||||
Forgejo config not found at {{ forgejo_config }}
|
||||
This file contains secrets and is not managed by ansible.
|
||||
To restore from backup, run:
|
||||
borgmatic --config ~/.config/borgmatic/config.yaml extract --archive latest \
|
||||
--path {{ forgejo_config }}
|
||||
when: not forgejo_config_stat.stat.exists
|
||||
|
||||
- name: Deploy forgejo LaunchAgent plist
|
||||
ansible.builtin.template:
|
||||
src: forgejo.plist.j2
|
||||
dest: ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
|
||||
mode: '0644'
|
||||
notify: Restart forgejo
|
||||
|
||||
- name: Check if forgejo LaunchAgent is loaded
|
||||
ansible.builtin.command: launchctl list mcquack.eblume.forgejo
|
||||
register: forgejo_launchctl_check
|
||||
changed_when: false
|
||||
failed_when: false
|
||||
|
||||
- name: Load forgejo LaunchAgent if not loaded
|
||||
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
|
||||
when: forgejo_launchctl_check.rc != 0
|
||||
changed_when: true
|
||||
failed_when: false
|
||||
```
|
||||
|
||||
### 5.2 Create defaults/main.yml
|
||||
|
||||
```yaml
|
||||
---
|
||||
# Forgejo binary and paths
|
||||
forgejo_binary: /Users/erichblume/.local/bin/forgejo
|
||||
forgejo_work_path: /opt/homebrew/var/forgejo
|
||||
forgejo_config: "{{ forgejo_work_path }}/custom/conf/app.ini"
|
||||
forgejo_log_dir: /Users/erichblume/Library/Logs
|
||||
|
||||
# HTTP and SSH ports (must match app.ini)
|
||||
forgejo_http_port: 3001
|
||||
forgejo_ssh_port: 2200
|
||||
```
|
||||
|
||||
### 5.3 Create templates/forgejo.plist.j2
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!-- {{ ansible_managed }} -->
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>mcquack.eblume.forgejo</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>{{ forgejo_binary }}</string>
|
||||
<string>web</string>
|
||||
<string>--config</string>
|
||||
<string>{{ forgejo_config }}</string>
|
||||
<string>--work-path</string>
|
||||
<string>{{ forgejo_work_path }}</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
<key>StandardOutPath</key>
|
||||
<string>{{ forgejo_log_dir }}/mcquack.forgejo.out.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>{{ forgejo_log_dir }}/mcquack.forgejo.err.log</string>
|
||||
<key>EnvironmentVariables</key>
|
||||
<dict>
|
||||
<key>HOME</key>
|
||||
<string>/Users/erichblume</string>
|
||||
<key>USER</key>
|
||||
<string>erichblume</string>
|
||||
</dict>
|
||||
</dict>
|
||||
</plist>
|
||||
```
|
||||
|
||||
### 5.4 Update handlers/main.yml
|
||||
|
||||
```yaml
|
||||
---
|
||||
- name: Restart forgejo
|
||||
ansible.builtin.shell: |
|
||||
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true
|
||||
launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
|
||||
changed_when: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Update Alloy Log Collection
|
||||
|
||||
Update `ansible/roles/alloy/defaults/main.yml`:
|
||||
|
||||
Change forgejo log paths from brew to mcquack:
|
||||
```yaml
|
||||
alloy_brew_logs:
|
||||
# Remove forgejo from here
|
||||
- path: /opt/homebrew/var/log/tailscaled.log
|
||||
service: tailscale
|
||||
stream: stdout
|
||||
|
||||
alloy_mcquack_logs:
|
||||
# ... existing entries ...
|
||||
- path: /Users/erichblume/Library/Logs/mcquack.forgejo.out.log
|
||||
service: forgejo
|
||||
stream: stdout
|
||||
- path: /Users/erichblume/Library/Logs/mcquack.forgejo.err.log
|
||||
service: forgejo
|
||||
stream: stderr
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Remove Brew Forgejo
|
||||
|
||||
### 7.1 Uninstall Brew Package
|
||||
|
||||
```bash
|
||||
ssh indri 'brew uninstall forgejo'
|
||||
```
|
||||
|
||||
### 7.2 Remove Old Logs
|
||||
|
||||
```bash
|
||||
ssh indri 'rm -f /opt/homebrew/var/log/forgejo.log'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 8: Run Ansible
|
||||
|
||||
```bash
|
||||
mise run provision-indri -- --tags forgejo,alloy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Disaster Recovery
|
||||
|
||||
### If CI Deploy Breaks Forgejo
|
||||
|
||||
1. **Build manually on gilbert**:
|
||||
```bash
|
||||
cd ~/code/3rd/forgejo
|
||||
git pull
|
||||
mise use go node
|
||||
TAGS="bindata sqlite sqlite_unlock_notify" make build
|
||||
scp gitea indri:~/.local/bin/forgejo
|
||||
```
|
||||
|
||||
2. **Restart service**:
|
||||
```bash
|
||||
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist; launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist'
|
||||
```
|
||||
|
||||
3. **Verify**:
|
||||
```bash
|
||||
curl https://forge.tail8d86e.ts.net/api/v1/version
|
||||
```
|
||||
|
||||
### If Forgejo Won't Start
|
||||
|
||||
1. Check logs: `ssh indri 'tail -100 ~/Library/Logs/mcquack.forgejo.err.log'`
|
||||
2. Check binary: `ssh indri '~/.local/bin/forgejo --version'`
|
||||
3. Check config: `ssh indri 'cat /opt/homebrew/var/forgejo/custom/conf/app.ini | head -50'`
|
||||
4. Try running manually: `ssh indri '~/.local/bin/forgejo web --config /opt/homebrew/var/forgejo/custom/conf/app.ini --work-path /opt/homebrew/var/forgejo'`
|
||||
|
||||
### Switch ArgoCD to GitHub (Nuclear Option)
|
||||
|
||||
If Forgejo is down and you need to deploy fixes:
|
||||
|
||||
```bash
|
||||
argocd repo add https://github.com/eblume/blumeops.git --username eblume --password $GITHUB_PAT
|
||||
argocd app set apps --repo https://github.com/eblume/blumeops.git
|
||||
argocd app sync apps
|
||||
```
|
||||
|
||||
After recovery, switch back to Forgejo.
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] CI deploy completed successfully
|
||||
- [ ] Binary at `~/.local/bin/forgejo`
|
||||
- [ ] mcquack LaunchAgent created
|
||||
- [ ] Brew service stopped
|
||||
- [ ] mcquack service started
|
||||
- [ ] HTTP works (`curl https://forge.tail8d86e.ts.net/api/v1/version`)
|
||||
- [ ] SSH works (`ssh -T forgejo@forge.tail8d86e.ts.net`)
|
||||
- [ ] Git clone/push works
|
||||
- [ ] Ansible role updated
|
||||
- [ ] Alloy logs updated
|
||||
- [ ] Brew package uninstalled
|
||||
- [ ] `mise run provision-indri` succeeds
|
||||
|
||||
---
|
||||
|
||||
## Next Phase
|
||||
|
||||
After bootstrap is complete, proceed to [Phase 5: Container Builds](P5_container_builds.md) to set up container image building for ArgoCD.
|
||||
|
|
@ -1,505 +0,0 @@
|
|||
# Phase 5: Container Image Builds
|
||||
|
||||
**Goal**: Set up CI workflows to build custom container images and push to zot registry
|
||||
|
||||
**Status**: Planning
|
||||
|
||||
**Prerequisites**: [Phase 4](P4_self_deploy.md) complete (Forgejo self-deploying, Actions working)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
With Forgejo Actions operational (including custom runner from P2), we can now build container images for:
|
||||
- Custom devpi with pre-installed plugins
|
||||
- Any other custom images needed for k8s services
|
||||
- Release artifacts for Python packages
|
||||
|
||||
**Note**: The custom runner image build is covered in [Phase 2](P2_mirror_and_build.md). This phase focuses on application container builds.
|
||||
|
||||
---
|
||||
|
||||
## Use Case 1: devpi Custom Image
|
||||
|
||||
### Current State
|
||||
|
||||
devpi runs from `registry.tail8d86e.ts.net/blumeops/devpi:latest`, built manually:
|
||||
- Base image: python
|
||||
- Adds: devpi-server, devpi-web
|
||||
- Startup script for auto-initialization
|
||||
|
||||
### Goal
|
||||
|
||||
Automate builds triggered by:
|
||||
- Push to devpi repo on forge
|
||||
- Manual workflow dispatch
|
||||
- Optionally: upstream devpi release (via schedule check)
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Create Workflow for devpi
|
||||
|
||||
### 1.1 Ensure devpi Repo Has Dockerfile
|
||||
|
||||
The Dockerfile already exists at `argocd/manifests/devpi/Dockerfile`. We'll create a workflow in the blumeops repo that builds it.
|
||||
|
||||
### 1.2 Create Build Workflow
|
||||
|
||||
Create `.forgejo/workflows/build-devpi.yml` in blumeops repo:
|
||||
|
||||
```yaml
|
||||
name: Build devpi Image
|
||||
|
||||
on:
|
||||
push:
|
||||
paths:
|
||||
- 'argocd/manifests/devpi/Dockerfile'
|
||||
- 'argocd/manifests/devpi/start.sh'
|
||||
- '.forgejo/workflows/build-devpi.yml'
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
tag:
|
||||
description: 'Image tag (default: latest)'
|
||||
required: false
|
||||
default: 'latest'
|
||||
|
||||
env:
|
||||
REGISTRY: registry.tail8d86e.ts.net
|
||||
IMAGE_NAME: blumeops/devpi
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Determine tag
|
||||
id: tag
|
||||
run: |
|
||||
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||
TAG="${{ github.event.inputs.tag }}"
|
||||
else
|
||||
TAG="latest"
|
||||
fi
|
||||
echo "tag=$TAG" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Build image
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: argocd/manifests/devpi
|
||||
file: argocd/manifests/devpi/Dockerfile
|
||||
platforms: linux/arm64
|
||||
load: true
|
||||
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}
|
||||
|
||||
- name: Push to registry
|
||||
run: |
|
||||
# Zot has no auth, just push
|
||||
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}
|
||||
|
||||
- name: Verify push
|
||||
run: |
|
||||
# Check image exists in registry
|
||||
curl -sf "https://${{ env.REGISTRY }}/v2/${{ env.IMAGE_NAME }}/tags/list" | jq .
|
||||
```
|
||||
|
||||
### 1.3 Runner Needs Registry Access
|
||||
|
||||
The runner needs to reach `registry.tail8d86e.ts.net`. This should work via Tailscale egress (same as Forgejo access).
|
||||
|
||||
If not, add egress for registry in `argocd/manifests/tailscale-operator/`:
|
||||
```yaml
|
||||
apiVersion: tailscale.com/v1alpha1
|
||||
kind: Connector
|
||||
metadata:
|
||||
name: egress-registry
|
||||
namespace: tailscale-operator
|
||||
spec:
|
||||
hostname: egress-registry
|
||||
subnetRouter:
|
||||
advertiseRoutes:
|
||||
- registry.tail8d86e.ts.net/32
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Test Build Workflow
|
||||
|
||||
### 2.1 Push and Trigger
|
||||
|
||||
```bash
|
||||
# Make a small change to trigger
|
||||
echo "# Build $(date)" >> argocd/manifests/devpi/Dockerfile
|
||||
git add argocd/manifests/devpi/Dockerfile
|
||||
git commit -m "Trigger devpi image rebuild"
|
||||
git push
|
||||
```
|
||||
|
||||
### 2.2 Monitor Build
|
||||
|
||||
1. Go to https://forge.tail8d86e.ts.net/eblume/blumeops/actions
|
||||
2. Watch "Build devpi Image" workflow
|
||||
3. Verify success
|
||||
|
||||
### 2.3 Verify Image in Registry
|
||||
|
||||
```bash
|
||||
curl -s https://registry.tail8d86e.ts.net/v2/blumeops/devpi/tags/list | jq .
|
||||
```
|
||||
|
||||
### 2.4 Restart devpi to Use New Image
|
||||
|
||||
```bash
|
||||
kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Reusable Container Build Workflow
|
||||
|
||||
### 3.1 Create Reusable Workflow
|
||||
|
||||
Create `.forgejo/workflows/build-container.yml`:
|
||||
|
||||
```yaml
|
||||
name: Build Container Image
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
context:
|
||||
description: 'Build context path'
|
||||
required: true
|
||||
type: string
|
||||
dockerfile:
|
||||
description: 'Dockerfile path (relative to context)'
|
||||
required: false
|
||||
type: string
|
||||
default: 'Dockerfile'
|
||||
image_name:
|
||||
description: 'Image name (without registry)'
|
||||
required: true
|
||||
type: string
|
||||
tag:
|
||||
description: 'Image tag'
|
||||
required: false
|
||||
type: string
|
||||
default: 'latest'
|
||||
platforms:
|
||||
description: 'Target platforms'
|
||||
required: false
|
||||
type: string
|
||||
default: 'linux/arm64'
|
||||
|
||||
env:
|
||||
REGISTRY: registry.tail8d86e.ts.net
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.context }}/${{ inputs.dockerfile }}
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: true
|
||||
tags: ${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}
|
||||
|
||||
- name: Verify push
|
||||
run: |
|
||||
curl -sf "https://${{ env.REGISTRY }}/v2/${{ inputs.image_name }}/tags/list" | jq .
|
||||
```
|
||||
|
||||
### 3.2 Use in devpi Workflow
|
||||
|
||||
Simplify `.forgejo/workflows/build-devpi.yml`:
|
||||
|
||||
```yaml
|
||||
name: Build devpi Image
|
||||
|
||||
on:
|
||||
push:
|
||||
paths:
|
||||
- 'argocd/manifests/devpi/**'
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
build:
|
||||
uses: ./.forgejo/workflows/build-container.yml
|
||||
with:
|
||||
context: argocd/manifests/devpi
|
||||
image_name: blumeops/devpi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Python Package Builds (Optional)
|
||||
|
||||
### 4.1 Use Case
|
||||
|
||||
Build Python packages from forge repos and publish to devpi.
|
||||
|
||||
Example: `mcquack` package (LaunchAgent management library)
|
||||
|
||||
### 4.2 Create Python Build Workflow
|
||||
|
||||
Create `.forgejo/workflows/build-python.yml`:
|
||||
|
||||
```yaml
|
||||
name: Build Python Package
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
package_path:
|
||||
description: 'Path to package (contains pyproject.toml)'
|
||||
required: false
|
||||
type: string
|
||||
default: '.'
|
||||
python_version:
|
||||
description: 'Python version'
|
||||
required: false
|
||||
type: string
|
||||
default: '3.12'
|
||||
publish:
|
||||
description: 'Publish to devpi'
|
||||
required: false
|
||||
type: boolean
|
||||
default: false
|
||||
secrets:
|
||||
DEVPI_PASSWORD:
|
||||
required: false
|
||||
|
||||
env:
|
||||
DEVPI_URL: https://pypi.tail8d86e.ts.net
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: ${{ inputs.python_version }}
|
||||
|
||||
- name: Install uv
|
||||
run: pip install uv
|
||||
|
||||
- name: Build package
|
||||
run: |
|
||||
cd ${{ inputs.package_path }}
|
||||
uv build
|
||||
|
||||
- name: Upload artifact
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.package_path }}/dist/
|
||||
|
||||
- name: Publish to devpi
|
||||
if: inputs.publish
|
||||
run: |
|
||||
cd ${{ inputs.package_path }}
|
||||
uv publish \
|
||||
--publish-url ${{ env.DEVPI_URL }}/eblume/dev/ \
|
||||
--username eblume \
|
||||
--password "${{ secrets.DEVPI_PASSWORD }}"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Scheduled Builds (Cron)
|
||||
|
||||
### 5.1 Weekly Rebuild
|
||||
|
||||
Keep images fresh with weekly rebuilds:
|
||||
|
||||
```yaml
|
||||
name: Weekly Image Rebuilds
|
||||
|
||||
on:
|
||||
schedule:
|
||||
# Every Sunday at 3 AM UTC
|
||||
- cron: '0 3 * * 0'
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
devpi:
|
||||
uses: ./.forgejo/workflows/build-container.yml
|
||||
with:
|
||||
context: argocd/manifests/devpi
|
||||
image_name: blumeops/devpi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Future Improvements
|
||||
|
||||
### Multi-Arch Builds
|
||||
|
||||
For images that need both ARM64 and AMD64:
|
||||
|
||||
```yaml
|
||||
platforms: linux/arm64,linux/amd64
|
||||
```
|
||||
|
||||
Requires QEMU emulation setup in runner (already supported by buildx).
|
||||
|
||||
### Build Caching
|
||||
|
||||
Use GitHub/Forgejo cache actions:
|
||||
|
||||
```yaml
|
||||
- name: Cache Docker layers
|
||||
uses: actions/cache@v4
|
||||
with:
|
||||
path: /tmp/.buildx-cache
|
||||
key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}
|
||||
```
|
||||
|
||||
### Security Scanning
|
||||
|
||||
Add Trivy or similar:
|
||||
|
||||
```yaml
|
||||
- name: Run Trivy vulnerability scanner
|
||||
uses: aquasecurity/trivy-action@master
|
||||
with:
|
||||
image-ref: '${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Runner Observability (Logging & Metrics)
|
||||
|
||||
### 6.1 Problem
|
||||
|
||||
The forgejo-runner pod generates logs and metrics that should be collected for:
|
||||
- Debugging failed workflow runs
|
||||
- Monitoring runner health and capacity
|
||||
- Alerting on runner failures
|
||||
|
||||
### 6.2 Log Collection via Alloy
|
||||
|
||||
The forgejo-runner namespace needs to be included in Alloy's k8s log collection. Alloy is already configured to scrape logs from k8s pods - verify the runner namespace is included.
|
||||
|
||||
Check current Alloy config:
|
||||
```bash
|
||||
ssh indri 'cat ~/.config/alloy/config.alloy | grep -A20 discovery.kubernetes'
|
||||
```
|
||||
|
||||
If using namespace filtering, ensure `forgejo-runner` is included.
|
||||
|
||||
### 6.3 Metrics Collection
|
||||
|
||||
The forgejo-runner exposes Prometheus metrics. Add a ServiceMonitor or configure Alloy to scrape:
|
||||
|
||||
**Option A: ServiceMonitor (if using Prometheus Operator)**
|
||||
|
||||
Create `argocd/manifests/forgejo-runner/servicemonitor.yaml`:
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: forgejo-runner
|
||||
namespace: forgejo-runner
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: forgejo-runner
|
||||
endpoints:
|
||||
- port: metrics
|
||||
interval: 30s
|
||||
```
|
||||
|
||||
**Option B: Alloy scrape config**
|
||||
|
||||
Add to Alloy's k8s scrape config to discover the runner pod's metrics endpoint.
|
||||
|
||||
### 6.4 Create Runner Service for Metrics
|
||||
|
||||
Add `argocd/manifests/forgejo-runner/service.yaml`:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: forgejo-runner-metrics
|
||||
namespace: forgejo-runner
|
||||
labels:
|
||||
app: forgejo-runner
|
||||
spec:
|
||||
selector:
|
||||
app: forgejo-runner
|
||||
ports:
|
||||
- name: metrics
|
||||
port: 8080
|
||||
targetPort: 8080
|
||||
```
|
||||
|
||||
Update kustomization.yaml to include the service.
|
||||
|
||||
### 6.5 Grafana Dashboard
|
||||
|
||||
Consider creating a dashboard for:
|
||||
- Runner status (online/offline)
|
||||
- Job queue depth
|
||||
- Job execution time
|
||||
- Success/failure rates
|
||||
|
||||
### 6.6 Verification
|
||||
|
||||
```bash
|
||||
# Check runner logs are appearing in Loki
|
||||
# Go to Grafana → Explore → Loki
|
||||
# Query: {namespace="forgejo-runner"}
|
||||
|
||||
# Check metrics are being scraped
|
||||
# Go to Grafana → Explore → Prometheus
|
||||
# Query: forgejo_runner_*
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] devpi build workflow created
|
||||
- [ ] devpi image builds successfully
|
||||
- [ ] Image pushed to zot registry
|
||||
- [ ] devpi pod uses new image
|
||||
- [ ] Reusable container workflow created
|
||||
- [ ] (Optional) Python build workflow created
|
||||
- [ ] (Optional) Scheduled builds configured
|
||||
- [ ] Runner logs visible in Loki
|
||||
- [ ] Runner metrics scraped by Prometheus/Alloy
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
With this phase complete, we have:
|
||||
1. **Forgejo Actions** running with k8s runner
|
||||
2. **Forgejo self-deploys** from CI on tagged releases
|
||||
3. **Container images** built automatically on push
|
||||
4. Infrastructure for Python package builds
|
||||
5. **Runner observability** with logs in Loki and metrics in Prometheus
|
||||
|
||||
The CI/CD bootstrap is complete. Future work:
|
||||
- Add more container builds as needed
|
||||
- Add Python package publishing for internal tools
|
||||
- Consider adding a macOS runner on indri for native builds
|
||||
- Create Grafana dashboards for CI/CD monitoring
|
||||
|
|
@ -1,79 +0,0 @@
|
|||
# Blumeops Minikube Migration Plan
|
||||
|
||||
**Status**: Completed (2026-01-23)
|
||||
|
||||
This plan detailed the phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster. The migration is now complete for all services that will be migrated.
|
||||
|
||||
## Final Status
|
||||
|
||||
| Phase | Name | Status | Notes |
|
||||
|-------|------|--------|-------|
|
||||
| 0 | [Foundation](P0_foundation.complete.md) | ✅ Complete | Container registry (zot) + minikube cluster |
|
||||
| 1 | [K8s Infrastructure](P1_k8s_infrastructure.complete.md) | ✅ Complete | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster |
|
||||
| 2 | [Grafana](P2_grafana.complete.md) | ✅ Complete | Migrated Grafana via ArgoCD |
|
||||
| 3 | [PostgreSQL](P3_postgresql.complete.md) | ✅ Complete | Data migration to k8s PostgreSQL |
|
||||
| 4 | [Miniflux](P4_miniflux.complete.md) | ✅ Complete | Migrated Miniflux via ArgoCD |
|
||||
| 5 | [devpi](P5_devpi.complete.md) | ✅ Complete | Migrated devpi via ArgoCD |
|
||||
| 5.1 | [Docker Migration](P5.1_docker_migration.complete.md) | ✅ Complete | Switched minikube to docker driver (not QEMU2) |
|
||||
| 6 | [Kiwix](P6_kiwix.complete.md) | ✅ Complete | Migrated Kiwix + Transmission via ArgoCD |
|
||||
| 7 | [Forgejo](P7_forgejo.md) | ⏭️ Won't Do | Forgejo stays on indri - see [CI/CD Bootstrap](../../ci-cd-bootstrap/) |
|
||||
| 8 | [Woodpecker](P8_woodpecker.md) | ⏭️ Won't Do | Replaced by Forgejo Actions - see [CI/CD Bootstrap](../../ci-cd-bootstrap/) |
|
||||
| 9 | [Cleanup](P9_cleanup.md) | ⏭️ Won't Do | Observability cleanup done separately (2026-01-22) |
|
||||
|
||||
## What Was Migrated to K8s
|
||||
|
||||
| Service | Status | Notes |
|
||||
|---------|--------|-------|
|
||||
| Grafana | ✅ In k8s | Helm chart via ArgoCD |
|
||||
| PostgreSQL | ✅ In k8s | CloudNativePG operator |
|
||||
| Miniflux | ✅ In k8s | Using k8s PostgreSQL |
|
||||
| devpi | ✅ In k8s | Custom container image |
|
||||
| Kiwix | ✅ In k8s | NFS mount from sifaka |
|
||||
| Transmission | ✅ In k8s | NFS mount from sifaka |
|
||||
| Prometheus | ✅ In k8s | Migrated 2026-01-22 |
|
||||
| Loki | ✅ In k8s | Migrated 2026-01-22 |
|
||||
| Alloy (k8s) | ✅ In k8s | DaemonSet for pod logs |
|
||||
| TeslaMate | ✅ In k8s | Added 2026-01-23 |
|
||||
|
||||
## What Stays on Indri
|
||||
|
||||
| Service | Reason |
|
||||
|---------|--------|
|
||||
| **Forgejo** | Critical infrastructure, avoids circular dependency with ArgoCD |
|
||||
| **Zot Registry** | K8s needs images to start - must be outside k8s |
|
||||
| **Alloy (host)** | Collects host-level metrics and logs |
|
||||
| **Borgmatic** | Backup system must survive k8s failures |
|
||||
| **Plex** | Uses own NAT traversal, not Tailscale |
|
||||
|
||||
## Architecture Decisions Made
|
||||
|
||||
### Minikube Driver: Docker (not QEMU2/Podman)
|
||||
- Original plan called for QEMU2, but docker driver proved simpler
|
||||
- NFS mounts work via Docker NAT through indri's LAN IP
|
||||
- API server accessible via Tailscale TCP passthrough
|
||||
|
||||
### Forgejo: Stays on Indri
|
||||
- Original P7 planned k8s migration
|
||||
- Decision changed: Forgejo is critical infrastructure
|
||||
- Will be built from source via Forgejo Actions CI
|
||||
- See [CI/CD Bootstrap Plan](../../ci-cd-bootstrap/) for details
|
||||
|
||||
### CI/CD: Forgejo Actions (not Woodpecker)
|
||||
- Original P8 planned Woodpecker deployment
|
||||
- Decision changed: Use Forgejo's native Actions instead
|
||||
- Simpler (one less system), GitHub Actions compatible
|
||||
- See [CI/CD Bootstrap Plan](../../ci-cd-bootstrap/) for details
|
||||
|
||||
### Observability: Migrated to K8s
|
||||
- Original plan kept Prometheus/Loki on indri
|
||||
- Changed: Migrated both to k8s (2026-01-22)
|
||||
- Alloy on indri pushes to k8s endpoints
|
||||
- Alloy DaemonSet in k8s collects pod logs
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
1. **Docker driver is simpler than QEMU2** - Direct NFS mounts work, no VM complexity
|
||||
2. **Tailscale operator works well** - Easy service exposure with automatic TLS
|
||||
3. **CloudNativePG is production-ready** - Good operator, easy backups
|
||||
4. **Keep critical infra outside k8s** - Forgejo and zot must survive k8s failures
|
||||
5. **CGO matters on macOS** - Alloy needed CGO=1 for Tailscale DNS resolution
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,657 +0,0 @@
|
|||
# Phase 1: Kubernetes Infrastructure
|
||||
|
||||
**Goal**: Tailscale operator, ArgoCD, CloudNativePG operator, PostgreSQL cluster
|
||||
|
||||
**Status**: In Progress
|
||||
|
||||
**Prerequisites**: [Phase 0](P0_foundation.complete.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Phase 1 establishes the k8s control plane infrastructure:
|
||||
1. **Tailscale operator** - Exposes services on the tailnet
|
||||
2. **ArgoCD** - GitOps continuous delivery
|
||||
3. **CloudNativePG** - PostgreSQL operator
|
||||
4. **PostgreSQL cluster** - Database for future app migrations
|
||||
|
||||
The deployment follows a bootstrap pattern:
|
||||
- First two components deployed via `kubectl apply -k` (no GitOps yet)
|
||||
- ArgoCD then takes over management of all components including itself
|
||||
- All subsequent deployments use ArgoCD
|
||||
|
||||
---
|
||||
|
||||
## Kubernetes Tags Overview
|
||||
|
||||
| Tag | Purpose | Applied To |
|
||||
|-----|---------|------------|
|
||||
| `tag:k8s-api` | Controls access to the K8s API server | indri (Phase 0.14) |
|
||||
| `tag:k8s-operator` | Identifies the Tailscale K8s Operator | OAuth client for operator |
|
||||
| `tag:k8s` | Default tag for operator-managed resources | Proxies, services, ingresses created by operator |
|
||||
|
||||
**Ownership chain**: `tag:k8s-operator` must own `tag:k8s` so the operator can assign that tag to devices it creates.
|
||||
|
||||
---
|
||||
|
||||
## PostgreSQL Migration Strategy
|
||||
|
||||
The k8s PostgreSQL cluster will eventually replace the brew PostgreSQL on indri.
|
||||
|
||||
| Phase | `pg.tail8d86e.ts.net` points to | Miniflux connects to |
|
||||
|-------|--------------------------------|---------------------|
|
||||
| Current | brew PostgreSQL (indri) | `pg.tail8d86e.ts.net` |
|
||||
| Phase 1 | brew PostgreSQL (indri) | `pg.tail8d86e.ts.net` (no change) |
|
||||
| Phase 4 | brew PostgreSQL (indri) | k8s PG (internal, after miniflux migrates to k8s) |
|
||||
| Post-Phase 4 | k8s PostgreSQL | k8s PG (internal) |
|
||||
| Cleanup | k8s PostgreSQL | k8s PG (internal) |
|
||||
|
||||
This allows zero-downtime migration - the Tailscale service switches after apps are migrated.
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Update Pulumi ACLs for k8s workloads ✓
|
||||
|
||||
**Status**: Complete
|
||||
|
||||
Added to `pulumi/policy.hujson`:
|
||||
- `tag:k8s-operator` - for the operator OAuth client
|
||||
- `tag:k8s` - for operator-managed resources (owned by `tag:k8s-operator`)
|
||||
- Grant for `tag:k8s` → `tag:registry` access
|
||||
|
||||
---
|
||||
|
||||
### 2. Create Tailscale OAuth client ✓
|
||||
|
||||
**Status**: Complete
|
||||
|
||||
OAuth client stored in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `2it22lavwgbxdskoaxanej354q`)
|
||||
|
||||
**Configuration used:**
|
||||
- Tags: `tag:k8s-operator`
|
||||
- Devices write scope tag: `tag:k8s`
|
||||
- Scopes: Devices Core (R/W), Auth Keys (R/W), Services (Write)
|
||||
|
||||
---
|
||||
|
||||
### 3. Deploy Tailscale Kubernetes Operator (Bootstrap)
|
||||
|
||||
Deploy via `kubectl apply -k` - will be migrated to ArgoCD management in Step 5.
|
||||
|
||||
**Setup manifests directory:**
|
||||
```bash
|
||||
mkdir -p argocd/manifests/tailscale-operator
|
||||
cd argocd/manifests/tailscale-operator
|
||||
|
||||
# Download static manifest from Tailscale repo
|
||||
curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/manifests/operator.yaml -o operator.yaml
|
||||
|
||||
# Download CRDs
|
||||
curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/crds/tailscale.com_connectors.yaml -o crds/connectors.yaml
|
||||
curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/crds/tailscale.com_proxyclasses.yaml -o crds/proxyclasses.yaml
|
||||
# ... (other CRDs as needed)
|
||||
```
|
||||
|
||||
**Create kustomization.yaml:**
|
||||
```yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: tailscale-system
|
||||
resources:
|
||||
- operator.yaml
|
||||
secretGenerator:
|
||||
- name: operator-oauth
|
||||
namespace: tailscale-system
|
||||
literals:
|
||||
- client_id=PLACEHOLDER
|
||||
- client_secret=PLACEHOLDER
|
||||
generatorOptions:
|
||||
disableNameSuffixHash: true
|
||||
```
|
||||
|
||||
**Deploy:**
|
||||
```bash
|
||||
# Get credentials from 1Password and create secret manually (kustomize secretGenerator is for reference)
|
||||
CLIENT_ID=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 2it22lavwgbxdskoaxanej354q --fields client-id --reveal)
|
||||
CLIENT_SECRET=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 2it22lavwgbxdskoaxanej354q --fields client-secret --reveal)
|
||||
|
||||
kubectl create namespace tailscale-system
|
||||
kubectl create secret generic operator-oauth \
|
||||
--namespace tailscale-system \
|
||||
--from-literal=client_id=$CLIENT_ID \
|
||||
--from-literal=client_secret=$CLIENT_SECRET
|
||||
|
||||
# Apply operator manifests
|
||||
kubectl apply -k argocd/manifests/tailscale-operator/
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
```bash
|
||||
kubectl get pods -n tailscale-system
|
||||
# Expected: operator pod Running
|
||||
|
||||
kubectl logs -n tailscale-system -l app.kubernetes.io/name=tailscale-operator
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Deploy ArgoCD
|
||||
|
||||
Deploy ArgoCD and expose via Tailscale as `argocd.tail8d86e.ts.net`.
|
||||
|
||||
**Prerequisites:**
|
||||
- Add `tag:argocd` to Pulumi ACLs
|
||||
- Create Tailscale service `argocd` in admin console
|
||||
|
||||
**Setup manifests:**
|
||||
```bash
|
||||
mkdir -p argocd/manifests/argocd
|
||||
|
||||
# Download ArgoCD install manifest
|
||||
curl -sL https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml -o argocd/manifests/argocd/install.yaml
|
||||
```
|
||||
|
||||
**Create kustomization.yaml:**
|
||||
```yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: argocd
|
||||
resources:
|
||||
- install.yaml
|
||||
- service-tailscale.yaml # LoadBalancer for Tailscale exposure
|
||||
```
|
||||
|
||||
**Create service-tailscale.yaml:**
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: argocd-server-tailscale
|
||||
namespace: argocd
|
||||
annotations:
|
||||
tailscale.com/hostname: "argocd"
|
||||
spec:
|
||||
type: LoadBalancer
|
||||
loadBalancerClass: tailscale
|
||||
selector:
|
||||
app.kubernetes.io/name: argocd-server
|
||||
ports:
|
||||
- name: https
|
||||
port: 443
|
||||
targetPort: 8080
|
||||
```
|
||||
|
||||
**Deploy:**
|
||||
```bash
|
||||
kubectl create namespace argocd
|
||||
kubectl apply -k argocd/manifests/argocd/
|
||||
```
|
||||
|
||||
**Get initial admin password:**
|
||||
```bash
|
||||
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
- https://argocd.tail8d86e.ts.net loads
|
||||
- Can login with admin / <initial-password>
|
||||
|
||||
**Post-setup:**
|
||||
1. Change admin password, store in 1Password
|
||||
2. Configure git repo connection to `github.com/eblume/blumeops` (public, no auth needed)
|
||||
- Note: Using GitHub mirror since ArgoCD can't easily reach forge without additional networking
|
||||
|
||||
---
|
||||
|
||||
### 5. Migrate Tailscale Operator to ArgoCD
|
||||
|
||||
Create ArgoCD Application to manage the Tailscale operator.
|
||||
|
||||
**Create argocd/apps/tailscale-operator.yaml:**
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: tailscale-operator
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://github.com/eblume/blumeops.git
|
||||
targetRevision: main
|
||||
path: argocd/manifests/tailscale-operator
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: tailscale-system
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
```
|
||||
|
||||
**Apply:**
|
||||
```bash
|
||||
kubectl apply -f argocd/apps/tailscale-operator.yaml
|
||||
```
|
||||
|
||||
**Note on secrets:** The OAuth secret was created manually in Step 3. For GitOps, consider:
|
||||
- Sealed Secrets
|
||||
- External Secrets Operator
|
||||
- SOPS
|
||||
|
||||
For now, the secret remains manually managed outside of ArgoCD.
|
||||
|
||||
---
|
||||
|
||||
### 6. Deploy CloudNativePG via ArgoCD
|
||||
|
||||
**Setup manifests:**
|
||||
```bash
|
||||
mkdir -p argocd/manifests/cloudnative-pg
|
||||
|
||||
# Download CNPG operator manifest
|
||||
curl -sL https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml -o argocd/manifests/cloudnative-pg/operator.yaml
|
||||
```
|
||||
|
||||
**Create kustomization.yaml:**
|
||||
```yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- operator.yaml
|
||||
```
|
||||
|
||||
**Create ArgoCD Application (argocd/apps/cloudnative-pg.yaml):**
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: cloudnative-pg
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://github.com/eblume/blumeops.git
|
||||
targetRevision: main
|
||||
path: argocd/manifests/cloudnative-pg
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: cnpg-system
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
```
|
||||
|
||||
**Apply:**
|
||||
```bash
|
||||
kubectl apply -f argocd/apps/cloudnative-pg.yaml
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
```bash
|
||||
kubectl get pods -n cnpg-system
|
||||
# Expected: cnpg-controller-manager Running
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. Create PostgreSQL Cluster via ArgoCD
|
||||
|
||||
Create the database cluster. **Not exposed via Tailscale yet** - internal only until apps migrate.
|
||||
|
||||
**Create argocd/manifests/databases/blumeops-pg.yaml:**
|
||||
```yaml
|
||||
apiVersion: postgresql.cnpg.io/v1
|
||||
kind: Cluster
|
||||
metadata:
|
||||
name: blumeops-pg
|
||||
namespace: databases
|
||||
spec:
|
||||
instances: 1
|
||||
storage:
|
||||
size: 10Gi
|
||||
storageClass: standard
|
||||
monitoring:
|
||||
enablePodMonitor: true
|
||||
bootstrap:
|
||||
initdb:
|
||||
database: miniflux
|
||||
owner: miniflux
|
||||
```
|
||||
|
||||
**Create kustomization.yaml:**
|
||||
```yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: databases
|
||||
resources:
|
||||
- blumeops-pg.yaml
|
||||
```
|
||||
|
||||
**Create ArgoCD Application (argocd/apps/blumeops-pg.yaml):**
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: blumeops-pg
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://github.com/eblume/blumeops.git
|
||||
targetRevision: main
|
||||
path: argocd/manifests/databases
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: databases
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
```
|
||||
|
||||
**Apply:**
|
||||
```bash
|
||||
kubectl apply -f argocd/apps/blumeops-pg.yaml
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
```bash
|
||||
kubectl get cluster -n databases
|
||||
# Expected: blumeops-pg with STATUS "Cluster in healthy state"
|
||||
|
||||
kubectl get pods -n databases
|
||||
# Expected: blumeops-pg-1 Running
|
||||
|
||||
# Get connection secret
|
||||
kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 8. Create App-of-Apps Root Application
|
||||
|
||||
Once all components are deployed, create a root application to manage all apps.
|
||||
|
||||
**Create argocd/apps/root.yaml:**
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: root
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://github.com/eblume/blumeops.git
|
||||
targetRevision: main
|
||||
path: argocd/apps
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: argocd
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
```
|
||||
|
||||
**Apply:**
|
||||
```bash
|
||||
kubectl apply -f argocd/apps/root.yaml
|
||||
```
|
||||
|
||||
Now ArgoCD manages itself and all other applications via the app-of-apps pattern.
|
||||
|
||||
---
|
||||
|
||||
## New Files Summary
|
||||
|
||||
```
|
||||
argocd/
|
||||
apps/
|
||||
root.yaml # App-of-apps root
|
||||
tailscale-operator.yaml # Tailscale operator app
|
||||
cloudnative-pg.yaml # CNPG operator app
|
||||
blumeops-pg.yaml # PostgreSQL cluster app
|
||||
manifests/
|
||||
tailscale-operator/
|
||||
kustomization.yaml
|
||||
operator.yaml
|
||||
argocd/
|
||||
kustomization.yaml
|
||||
install.yaml
|
||||
service-tailscale.yaml
|
||||
cloudnative-pg/
|
||||
kustomization.yaml
|
||||
operator.yaml
|
||||
databases/
|
||||
kustomization.yaml
|
||||
blumeops-pg.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pulumi ACL Updates Required
|
||||
|
||||
Add to `pulumi/policy.hujson`:
|
||||
```hujson
|
||||
"tag:argocd": ["autogroup:admin", "tag:blumeops"],
|
||||
```
|
||||
|
||||
Add to Erich's test accept list:
|
||||
```hujson
|
||||
"accept": [..., "tag:argocd:443"],
|
||||
```
|
||||
|
||||
Add to Allison's deny list:
|
||||
```hujson
|
||||
"deny": [..., "tag:argocd:443"],
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
```bash
|
||||
# 1. Tailscale operator running
|
||||
kubectl get pods -n tailscale-system
|
||||
|
||||
# 2. ArgoCD accessible
|
||||
curl -k https://argocd.tail8d86e.ts.net/healthz
|
||||
|
||||
# 3. CloudNativePG operator running
|
||||
kubectl get pods -n cnpg-system
|
||||
|
||||
# 4. PostgreSQL cluster healthy
|
||||
kubectl get cluster -n databases
|
||||
|
||||
# 5. All ArgoCD apps synced
|
||||
kubectl get applications -n argocd
|
||||
# All should show STATUS: Synced, HEALTH: Healthy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
```bash
|
||||
# Remove ArgoCD apps (will cascade delete managed resources)
|
||||
kubectl delete application -n argocd root
|
||||
kubectl delete application -n argocd blumeops-pg
|
||||
kubectl delete application -n argocd cloudnative-pg
|
||||
kubectl delete application -n argocd tailscale-operator
|
||||
|
||||
# Remove ArgoCD
|
||||
kubectl delete -k argocd/manifests/argocd/
|
||||
kubectl delete namespace argocd
|
||||
|
||||
# Remove namespaces
|
||||
kubectl delete namespace databases
|
||||
kubectl delete namespace cnpg-system
|
||||
kubectl delete namespace tailscale-system
|
||||
|
||||
# Revert ACL changes
|
||||
git checkout pulumi/policy.hujson
|
||||
mise run tailnet-up
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes (Deviations from Plan)
|
||||
|
||||
*Added during implementation for retrospective review*
|
||||
|
||||
### Git Source: Forge Instead of GitHub
|
||||
|
||||
**Plan**: Use GitHub mirror (`github.com/eblume/blumeops`)
|
||||
**Actual**: Use internal Forgejo (`ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git`)
|
||||
|
||||
**Why**: User preference to use internal infrastructure, accepting circular dependency for later.
|
||||
|
||||
**Required changes**:
|
||||
- Deploy key added to forge for ArgoCD SSH access
|
||||
- Repository secret `repo-forge` with SSH private key from 1Password
|
||||
- Discovered: `op read` requires `?ssh-format=openssh` query parameter for ArgoCD-compatible key format
|
||||
- Egress proxy service to reach forge from cluster (targets `indri.tail8d86e.ts.net` not `forge.tail8d86e.ts.net` due to Tailscale Serve limitation)
|
||||
- DNSConfig CRD for cluster-to-tailnet MagicDNS resolution
|
||||
- ACL grant: `tag:k8s` → `tag:homelab` on ports 3001 (HTTP) and 2200 (SSH)
|
||||
|
||||
### ArgoCD Exposure: Ingress Instead of LoadBalancer
|
||||
|
||||
**Plan**: LoadBalancer service with `tailscale.com/hostname` annotation
|
||||
**Actual**: Tailscale Ingress with Let's Encrypt TLS termination
|
||||
|
||||
**Why**: Ingress provides automatic TLS certificates and is the recommended approach.
|
||||
|
||||
**File**: `argocd/manifests/argocd/service-tailscale.yaml` uses `kind: Ingress` with `ingressClassName: tailscale`
|
||||
|
||||
### Namespace: `tailscale` Instead of `tailscale-system`
|
||||
|
||||
**Plan**: `tailscale-system` namespace
|
||||
**Actual**: `tailscale` namespace
|
||||
|
||||
**Why**: Matches upstream Tailscale operator defaults.
|
||||
|
||||
### Sync Policy: Manual Instead of Automated
|
||||
|
||||
**Plan**: `syncPolicy.automated` with prune and selfHeal
|
||||
**Actual**: Manual sync policy for workload apps; auto-sync only for app-of-apps
|
||||
|
||||
**Why**: User preference for explicit control over deployments during initial migration phase.
|
||||
|
||||
**Pattern**:
|
||||
- `apps.yaml` (app-of-apps): auto-sync to pick up new Application manifests
|
||||
- All workload apps: manual sync requires `argocd app sync <name>`
|
||||
|
||||
### CloudNativePG: Helm Chart Instead of Raw Manifest
|
||||
|
||||
**Plan**: Download raw CNPG manifest
|
||||
**Actual**: Multi-source Application using official Helm chart from `https://cloudnative-pg.github.io/charts`
|
||||
|
||||
**Why**: Helm chart is the officially supported distribution method.
|
||||
|
||||
**Additional fix**: Required `ServerSideApply=true` sync option due to large CRD exceeding annotation size limit.
|
||||
|
||||
### App-of-Apps: Named `apps` Instead of `root`
|
||||
|
||||
**Plan**: `argocd/apps/root.yaml`
|
||||
**Actual**: `argocd/apps/apps.yaml` with Application named `apps`
|
||||
|
||||
**Why**: Clearer naming; `apps` manages apps, `argocd` manages itself.
|
||||
|
||||
### ArgoCD Self-Management Added
|
||||
|
||||
**Plan**: Not explicitly planned
|
||||
**Actual**: `argocd/apps/argocd.yaml` Application for ArgoCD self-management
|
||||
|
||||
**Why**: Standard GitOps pattern - ArgoCD manages its own deployment after bootstrap.
|
||||
|
||||
### CRI-O Registry Mirror for Zot
|
||||
|
||||
**Plan**: Not in original plan
|
||||
**Actual**: Configured CRI-O to use zot as pull-through cache for docker.io, ghcr.io, quay.io
|
||||
|
||||
**Why**: Reduces external bandwidth, speeds up pulls, avoids rate limits.
|
||||
|
||||
**Implementation**: Ansible `minikube` role applies `/etc/containers/registries.conf.d/zot-mirror.conf` inside minikube VM using stable hostname `host.containers.internal:5050`.
|
||||
|
||||
### ProxyClass for CRI-O Image Compatibility
|
||||
|
||||
**Plan**: Not mentioned
|
||||
**Actual**: Required `ProxyClass` with fully-qualified image paths (`docker.io/tailscale/...`)
|
||||
|
||||
**Why**: CRI-O requires fully-qualified image references; default Tailscale operator uses short names.
|
||||
|
||||
### Actual File Structure
|
||||
|
||||
```
|
||||
argocd/
|
||||
apps/
|
||||
apps.yaml # App-of-apps (auto-sync)
|
||||
argocd.yaml # ArgoCD self-management (manual sync)
|
||||
tailscale-operator.yaml # Tailscale operator (manual sync)
|
||||
cloudnative-pg.yaml # CNPG operator via Helm (manual sync)
|
||||
manifests/
|
||||
tailscale-operator/
|
||||
kustomization.yaml
|
||||
operator.yaml
|
||||
proxyclass.yaml # CRI-O compatibility
|
||||
dnsconfig.yaml # Cluster-to-tailnet DNS
|
||||
egress-forge.yaml # Egress proxy for forge
|
||||
secret.yaml.tpl # OAuth secret template (manual)
|
||||
README.md
|
||||
argocd/
|
||||
kustomization.yaml # Uses remote base from upstream
|
||||
service-tailscale.yaml # Ingress (not LoadBalancer)
|
||||
argocd-cmd-params-cm.yaml # Disable HTTPS redirect
|
||||
repo-forge-secret.yaml.tpl # SSH key template (manual)
|
||||
README.md
|
||||
cloudnative-pg/
|
||||
values.yaml # Helm values (currently minimal)
|
||||
README.md
|
||||
```
|
||||
|
||||
### Bootstrap Commands (Actual)
|
||||
|
||||
```bash
|
||||
# 1. Create namespaces
|
||||
kubectl create namespace tailscale
|
||||
kubectl create namespace argocd
|
||||
|
||||
# 2. Apply secrets (manual, uses 1Password)
|
||||
op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
|
||||
|
||||
PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n' && \
|
||||
kubectl create secret generic repo-forge -n argocd \
|
||||
--from-literal=type=git \
|
||||
--from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \
|
||||
--from-literal=insecure=true \
|
||||
--from-literal=sshPrivateKey="$PRIV_KEY" && \
|
||||
kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository
|
||||
|
||||
# 3. Bootstrap tailscale-operator
|
||||
kubectl apply -k argocd/manifests/tailscale-operator/
|
||||
|
||||
# 4. Bootstrap ArgoCD
|
||||
kubectl apply -k argocd/manifests/argocd/
|
||||
|
||||
# 5. Login and change password
|
||||
argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
|
||||
argocd account update-password
|
||||
|
||||
# 6. Apply ArgoCD Applications
|
||||
kubectl apply -f argocd/apps/argocd.yaml
|
||||
kubectl apply -f argocd/apps/apps.yaml
|
||||
|
||||
# 7. Sync workloads
|
||||
argocd app sync tailscale-operator
|
||||
argocd app sync cloudnative-pg
|
||||
```
|
||||
|
|
@ -1,396 +0,0 @@
|
|||
# Phase 2: Grafana Migration (Pilot)
|
||||
|
||||
**Goal**: Migrate Grafana as lowest-risk pilot service
|
||||
|
||||
**Status**: Complete (2026-01-19)
|
||||
|
||||
**Prerequisites**: [Phase 1](P1_k8s_infrastructure.complete.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This phase migrates Grafana from Homebrew/Ansible on indri to Kubernetes, establishing the pattern for future service migrations. Additionally, we establish the pattern of mirroring Helm chart repositories to forge for resilience and GitOps consistency.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
### Helm Chart Mirroring
|
||||
|
||||
**Problem**: P1 uses external Helm repos which creates external dependencies.
|
||||
|
||||
**Solution**: Mirror Helm chart Git repositories to forge, reference charts from git path.
|
||||
|
||||
ArgoCD auto-detects Helm charts when a directory contains `Chart.yaml`. No build step needed.
|
||||
|
||||
| Chart | Upstream Git Repo | Forge Mirror | Chart Path |
|
||||
|-------|-------------------|--------------|------------|
|
||||
| cloudnative-pg | `github.com/cloudnative-pg/charts` | `forge/eblume/cloudnative-pg-charts` | `charts/cloudnative-pg/` |
|
||||
| grafana | `github.com/grafana/helm-charts` | `forge/eblume/grafana-helm-charts` | `charts/grafana/` |
|
||||
|
||||
### Database Storage
|
||||
|
||||
Use SQLite with 1Gi PVC (not k8s PostgreSQL). Grafana stores minimal persistent data and dashboards are git-provisioned.
|
||||
|
||||
### Datasource URLs
|
||||
|
||||
From k8s pods, use `host.containers.internal` to reach indri services:
|
||||
- Prometheus: `http://host.containers.internal:9090`
|
||||
- Loki: `http://host.containers.internal:3100` (requires ansible change to bind 0.0.0.0)
|
||||
|
||||
### Ingress
|
||||
|
||||
Tailscale Ingress with Let's Encrypt TLS (following ArgoCD pattern), with `crio-compat` proxy class.
|
||||
|
||||
### Secrets Management
|
||||
|
||||
Admin password stored in 1Password, injected manually via `op inject`. Future: migrate to External Secrets Operator or similar.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### 0.1 Mirror Helm Chart Repos to Forge
|
||||
|
||||
**User action**: Create mirrors in forge:
|
||||
|
||||
1. **CloudNativePG charts** (fix existing P1 app):
|
||||
- Mirror: `https://github.com/cloudnative-pg/charts`
|
||||
- To: `forge.tail8d86e.ts.net/eblume/cloudnative-pg-charts`
|
||||
|
||||
2. **Grafana helm-charts** (new):
|
||||
- Mirror: `https://github.com/grafana/helm-charts`
|
||||
- To: `forge.tail8d86e.ts.net/eblume/grafana-helm-charts`
|
||||
|
||||
### 0.2 Update Loki to Bind 0.0.0.0
|
||||
|
||||
**File**: `ansible/roles/loki/templates/loki-config.yaml.j2`
|
||||
|
||||
Add under `server:`:
|
||||
```yaml
|
||||
http_listen_address: 0.0.0.0
|
||||
```
|
||||
|
||||
Deploy: `mise run provision-indri -- --tags loki`
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Fix CloudNativePG to Use Forge Mirror
|
||||
|
||||
Update `argocd/apps/cloudnative-pg.yaml` to use forge-mirrored chart:
|
||||
|
||||
```yaml
|
||||
sources:
|
||||
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/cloudnative-pg-charts.git
|
||||
targetRevision: cloudnative-pg-0.23.0 # git tag
|
||||
path: charts/cloudnative-pg
|
||||
helm:
|
||||
releaseName: cloudnative-pg
|
||||
valueFiles:
|
||||
- $values/argocd/manifests/cloudnative-pg/values.yaml
|
||||
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
|
||||
targetRevision: main
|
||||
ref: values
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Create Grafana Helm Values
|
||||
|
||||
**File**: `argocd/manifests/grafana/values.yaml`
|
||||
|
||||
```yaml
|
||||
admin:
|
||||
existingSecret: grafana-admin
|
||||
userKey: admin-user
|
||||
passwordKey: admin-password
|
||||
|
||||
persistence:
|
||||
enabled: true
|
||||
type: pvc
|
||||
size: 1Gi
|
||||
|
||||
grafana.ini:
|
||||
server:
|
||||
root_url: https://grafana.tail8d86e.ts.net
|
||||
analytics:
|
||||
check_for_updates: false
|
||||
reporting_enabled: false
|
||||
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: Prometheus
|
||||
type: prometheus
|
||||
access: proxy
|
||||
uid: prometheus
|
||||
url: http://host.containers.internal:9090
|
||||
isDefault: true
|
||||
editable: false
|
||||
- name: Loki
|
||||
type: loki
|
||||
access: proxy
|
||||
uid: loki
|
||||
url: http://host.containers.internal:3100
|
||||
editable: false
|
||||
|
||||
sidecar:
|
||||
dashboards:
|
||||
enabled: true
|
||||
label: grafana_dashboard
|
||||
labelValue: "1"
|
||||
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: 80
|
||||
|
||||
resources:
|
||||
requests:
|
||||
memory: "128Mi"
|
||||
cpu: "100m"
|
||||
limits:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Create Grafana ArgoCD Application
|
||||
|
||||
**File**: `argocd/apps/grafana.yaml`
|
||||
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: grafana
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
sources:
|
||||
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/grafana-helm-charts.git
|
||||
targetRevision: grafana-8.8.2
|
||||
path: charts/grafana
|
||||
helm:
|
||||
releaseName: grafana
|
||||
valueFiles:
|
||||
- $values/argocd/manifests/grafana/values.yaml
|
||||
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
|
||||
targetRevision: main
|
||||
ref: values
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: monitoring
|
||||
syncPolicy:
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Create Grafana Config Application
|
||||
|
||||
**File**: `argocd/apps/grafana-config.yaml`
|
||||
|
||||
Deploys Tailscale Ingress and Dashboard ConfigMaps from `argocd/manifests/grafana-config/`.
|
||||
|
||||
---
|
||||
|
||||
### 5. Create Grafana Config Manifests
|
||||
|
||||
**Directory**: `argocd/manifests/grafana-config/`
|
||||
|
||||
Contents:
|
||||
- `kustomization.yaml`
|
||||
- `ingress-tailscale.yaml` - Tailscale Ingress for `grafana.tail8d86e.ts.net`
|
||||
- `secret-admin.yaml.tpl` - Admin password template (1Password-backed)
|
||||
- `README.md` - Notes on secrets management
|
||||
- `dashboards/configmap-*.yaml` - 9 dashboard ConfigMaps
|
||||
|
||||
**Ingress**:
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: grafana-tailscale
|
||||
namespace: monitoring
|
||||
annotations:
|
||||
tailscale.com/proxy-class: "crio-compat"
|
||||
spec:
|
||||
ingressClassName: tailscale
|
||||
defaultBackend:
|
||||
service:
|
||||
name: grafana
|
||||
port:
|
||||
number: 80
|
||||
tls:
|
||||
- hosts:
|
||||
- grafana
|
||||
```
|
||||
|
||||
**Secret template** (`secret-admin.yaml.tpl`):
|
||||
```yaml
|
||||
# Apply: op inject -i secret-admin.yaml.tpl | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: grafana-admin
|
||||
namespace: monitoring
|
||||
type: Opaque
|
||||
stringData:
|
||||
admin-user: admin
|
||||
admin-password: {{ op://vg6xf6vvfmoh5hqjjhlhbeoaie/oxkcr3xtxnewy7noep2izvyr6y/password }}
|
||||
```
|
||||
|
||||
**Dashboard ConfigMaps**: Convert each JSON from `ansible/roles/grafana/files/dashboards/` to ConfigMap with label `grafana_dashboard: "1"`.
|
||||
|
||||
---
|
||||
|
||||
### 6. Deploy to Kubernetes
|
||||
|
||||
```bash
|
||||
# Create namespace and secret
|
||||
ki create namespace monitoring
|
||||
op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | ki apply -f -
|
||||
|
||||
# Push changes and sync
|
||||
argocd app sync grafana
|
||||
argocd app sync grafana-config
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. Tailscale Service Cutover
|
||||
|
||||
Remove `svc:grafana` from `ansible/roles/tailscale_serve/defaults/main.yml`, then:
|
||||
|
||||
```bash
|
||||
mise run provision-indri -- --tags tailscale-serve
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 8. Stop Brew Grafana
|
||||
|
||||
```bash
|
||||
ssh indri 'brew services stop grafana'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 9. Retire Ansible Grafana Role
|
||||
|
||||
Once k8s Grafana is verified working:
|
||||
|
||||
1. **Remove role from playbook** - Delete grafana role entry from `ansible/playbooks/indri.yml`
|
||||
|
||||
2. **Delete the role directory** - `rm -rf ansible/roles/grafana/`
|
||||
|
||||
3. **Update zk documentation** - Note in `~/code/personal/zk/1767747119-YCPO.md` that Grafana is now k8s-hosted
|
||||
|
||||
---
|
||||
|
||||
## New Files
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `argocd/apps/grafana.yaml` | Grafana Helm chart Application |
|
||||
| `argocd/apps/grafana-config.yaml` | Grafana config Application |
|
||||
| `argocd/manifests/grafana/values.yaml` | Helm values |
|
||||
| `argocd/manifests/grafana-config/kustomization.yaml` | Kustomize config |
|
||||
| `argocd/manifests/grafana-config/ingress-tailscale.yaml` | Tailscale Ingress |
|
||||
| `argocd/manifests/grafana-config/secret-admin.yaml.tpl` | Admin password template |
|
||||
| `argocd/manifests/grafana-config/README.md` | Secrets management notes |
|
||||
| `argocd/manifests/grafana-config/dashboards/configmap-*.yaml` | 9 dashboard ConfigMaps |
|
||||
|
||||
## Modified Files
|
||||
|
||||
| Path | Change |
|
||||
|------|--------|
|
||||
| `argocd/apps/cloudnative-pg.yaml` | Switch to forge-mirrored chart |
|
||||
| `ansible/roles/loki/templates/loki-config.yaml.j2` | Add `http_listen_address: 0.0.0.0` |
|
||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Remove `svc:grafana` |
|
||||
| `ansible/playbooks/indri.yml` | Remove grafana role |
|
||||
|
||||
## Deleted Files
|
||||
|
||||
| Path | Reason |
|
||||
|------|--------|
|
||||
| `ansible/roles/grafana/` | Replaced by k8s deployment |
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
- [x] Loki accessible from k8s pods
|
||||
- [x] Prometheus accessible from k8s pods
|
||||
- [x] Grafana pod running in `monitoring` namespace
|
||||
- [x] Grafana Ingress active
|
||||
- [x] https://grafana.tail8d86e.ts.net loads
|
||||
- [x] All 9 dashboards visible
|
||||
- [x] Prometheus datasource queries work
|
||||
- [x] Loki datasource queries work
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
1. Re-add `svc:grafana` to ansible tailscale_serve
|
||||
2. `mise run provision-indri -- --tags tailscale-serve,grafana`
|
||||
3. `argocd app delete grafana grafana-config --cascade`
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
*Added during implementation for retrospective review*
|
||||
|
||||
### SSH Credential Management
|
||||
|
||||
**Issue**: Initial plan used HTTPS URLs for forge-mirrored Helm chart repos, but ArgoCD in cluster couldn't resolve `forge.tail8d86e.ts.net` (MagicDNS not available inside cluster).
|
||||
|
||||
**Solution**: Use SSH URLs for all forge repos. Created a **credential template** (`repo-creds-forge`) that matches all repos under `ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/` using URL prefix matching. This allows a single SSH key (added to Forgejo user, not as deploy key) to work for all repos.
|
||||
|
||||
### SSH Host Key for ArgoCD
|
||||
|
||||
**Issue**: ArgoCD's known_hosts didn't include indri's SSH host key, causing `knownhosts: key is unknown` errors.
|
||||
|
||||
**Solution**: Added `argocd-ssh-known-hosts-cm.yaml` as a kustomize patch to include indri's host key alongside the upstream defaults.
|
||||
|
||||
**Gotcha**: Kustomize patches must **not specify namespace** - the namespace transformation happens *after* patch matching. Our patch had `namespace: argocd` which caused "no matches for Id" errors until removed.
|
||||
|
||||
### Tailscale Hostname Cutover
|
||||
|
||||
**Issue**: After removing `svc:grafana` from ansible's tailscale_serve config, the k8s Ingress still got a numbered hostname (`grafana-1.tail8d86e.ts.net`).
|
||||
|
||||
**Solution**: The old `svc:grafana` service remained registered in Tailscale admin console even after clearing its serve config. **Manual deletion in Tailscale admin console** was required to free the `grafana` hostname for the k8s Ingress to claim. After deletion, recreating the Ingress picked up the correct hostname.
|
||||
|
||||
### ArgoCD Workflow Decision
|
||||
|
||||
During implementation, we established the pattern for GitOps workflow:
|
||||
|
||||
- **All apps target `main` branch** (not feature branches)
|
||||
- Manual sync policy on workload apps = merge doesn't auto-deploy
|
||||
- Workflow: feature branch → PR → merge to main → `argocd app sync <name>`
|
||||
- For testing: temporarily set one app to feature branch via `argocd app set --revision`
|
||||
|
||||
This avoids the friction of switching `targetRevision` in manifests during development.
|
||||
|
||||
### Bootstrap Dependencies
|
||||
|
||||
Some resources must be applied manually before ArgoCD can manage itself:
|
||||
|
||||
1. **SSH known_hosts** - chicken-and-egg: ArgoCD can't sync the config that adds the host key
|
||||
2. **Credential secrets** - `repo-creds-forge` must exist before ArgoCD can pull from forge
|
||||
|
||||
These are documented in `argocd/manifests/argocd/README.md` as bootstrap steps.
|
||||
|
||||
### Actual Versions Used
|
||||
|
||||
- Grafana Helm chart: `grafana-8.8.2` (tag in grafana-helm-charts repo)
|
||||
- CloudNativePG Helm chart: `cloudnative-pg-v0.23.0` (tag in cloudnative-pg-charts repo)
|
||||
- Grafana version: 11.4.0
|
||||
|
|
@ -1,359 +0,0 @@
|
|||
# Phase 3: PostgreSQL Disaster Recovery & Backup
|
||||
|
||||
**Goal**: Test disaster recovery and configure borgmatic backups for k8s-pg
|
||||
|
||||
**Status**: Complete (2026-01-19)
|
||||
|
||||
**Prerequisites**: [Phase 2](P2_grafana.complete.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Phase 3 establishes disaster recovery capabilities for the k8s PostgreSQL cluster:
|
||||
1. **Fix borgmatic backup issues** - Resolve `borg: command not found` error
|
||||
2. **Test disaster recovery** - Restore miniflux data from borgmatic backup to k8s-pg
|
||||
3. **Create borgmatic user** - Read-only backup user in k8s-pg via CloudNativePG
|
||||
4. **Configure dual database backup** - Backup both brew PostgreSQL and k8s-pg during migration
|
||||
|
||||
This phase prepares for Phase 4 (miniflux migration) by verifying we can restore data to k8s-pg.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
### Backup Both Databases During Transition
|
||||
|
||||
**Decision**: Configure borgmatic to backup both `localhost:5432/miniflux` (brew) and `k8s-pg.tail8d86e.ts.net:5432/miniflux` (k8s) until migration complete.
|
||||
|
||||
**Why**: Provides redundancy during migration. After Phase 4, remove localhost entry.
|
||||
|
||||
### Reuse Existing borgmatic Password
|
||||
|
||||
**Decision**: Use same borgmatic password from 1Password for k8s-pg user.
|
||||
|
||||
**Why**: Simpler credential management, password already proven secure.
|
||||
|
||||
### CloudNativePG Managed Roles
|
||||
|
||||
**Decision**: Declare borgmatic user via CloudNativePG `managed.roles` instead of SQL commands.
|
||||
|
||||
**Why**: Declarative, version-controlled, matches eblume user pattern.
|
||||
|
||||
### Disable selfHeal on apps App
|
||||
|
||||
**Decision**: Remove `selfHeal: true` from `argocd/apps/apps.yaml`.
|
||||
|
||||
**Why**: Allows temporarily pointing child apps to feature branches during development without ArgoCD reverting the change.
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Fix borgmatic borg path issue
|
||||
|
||||
**Problem**: borgmatic failing with `borg: command not found`
|
||||
|
||||
**Cause**: LaunchAgent doesn't have homebrew in PATH, so `borg` binary not found.
|
||||
|
||||
**Solution**: Add `local_path` to borgmatic config template.
|
||||
|
||||
**File**: `ansible/roles/borgmatic/templates/config.yaml.j2`
|
||||
```yaml
|
||||
# Path to borg binary (LaunchAgent doesn't have homebrew in PATH)
|
||||
local_path: {{ borgmatic_local_path }}
|
||||
```
|
||||
|
||||
**File**: `ansible/roles/borgmatic/defaults/main.yml`
|
||||
```yaml
|
||||
borgmatic_local_path: /opt/homebrew/bin/borg
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Run manual backup to verify fix
|
||||
|
||||
```bash
|
||||
mise run provision-indri -- --tags borgmatic
|
||||
ssh indri '/opt/homebrew/bin/borgmatic --verbosity 1'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Extract miniflux dump from borgmatic
|
||||
|
||||
```bash
|
||||
ssh indri 'borgmatic list --archive latest'
|
||||
ssh indri 'borgmatic restore --archive latest --destination /tmp/restore'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Add ACL grant for homelab → k8s
|
||||
|
||||
**Problem**: Connection from indri to k8s-pg blocked - Tailscale proxy logs showed "no rules matched"
|
||||
|
||||
**Solution**: Add ACL grant in Pulumi.
|
||||
|
||||
**File**: `pulumi/policy.hujson`
|
||||
```hujson
|
||||
// Homelab can reach k8s PostgreSQL for borgmatic backups
|
||||
{
|
||||
"src": ["tag:homelab"],
|
||||
"dst": ["tag:k8s"],
|
||||
"ip": ["tcp:5432"],
|
||||
},
|
||||
```
|
||||
|
||||
Deploy: `mise run tailnet-up`
|
||||
|
||||
---
|
||||
|
||||
### 5. Restore data to k8s-pg
|
||||
|
||||
```bash
|
||||
# Using eblume superuser credentials from 1Password
|
||||
ssh indri "psql 'postgres://eblume@k8s-pg.tail8d86e.ts.net:5432/miniflux' -f /tmp/restore/localhost/miniflux/miniflux"
|
||||
```
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
psql 'postgres://eblume@k8s-pg.tail8d86e.ts.net:5432/miniflux' -c 'SELECT COUNT(*) FROM users; SELECT COUNT(*) FROM feeds; SELECT COUNT(*) FROM entries;'
|
||||
# Result: 2 users, 2 feeds, 44 entries
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Create borgmatic user in k8s-pg via CloudNativePG
|
||||
|
||||
**File**: `argocd/manifests/databases/secret-borgmatic.yaml.tpl`
|
||||
```yaml
|
||||
# Template for borgmatic backup user password
|
||||
# Apply with: op inject -i secret-borgmatic.yaml.tpl | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: blumeops-pg-borgmatic
|
||||
namespace: databases
|
||||
type: kubernetes.io/basic-auth
|
||||
stringData:
|
||||
username: borgmatic
|
||||
password: {{ op://vg6xf6vvfmoh5hqjjhlhbeoaie/mw2bv5we7woicjza7hc6s44yvy/db-password }}
|
||||
```
|
||||
|
||||
**File**: `argocd/manifests/databases/blumeops-pg.yaml` (add to managed roles)
|
||||
```yaml
|
||||
managed:
|
||||
roles:
|
||||
# ... existing eblume role ...
|
||||
# borgmatic read-only user for backups
|
||||
- name: borgmatic
|
||||
login: true
|
||||
connectionLimit: -1
|
||||
ensure: present
|
||||
inherit: true
|
||||
inRoles:
|
||||
- pg_read_all_data
|
||||
passwordSecret:
|
||||
name: blumeops-pg-borgmatic
|
||||
```
|
||||
|
||||
**Deploy**:
|
||||
```bash
|
||||
op inject -i argocd/manifests/databases/secret-borgmatic.yaml.tpl | kubectl apply -f -
|
||||
argocd app set blumeops-pg --revision feature/p3-postgresql-borgmatic
|
||||
argocd app sync blumeops-pg
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. Configure borgmatic for dual database backup
|
||||
|
||||
**File**: `ansible/roles/borgmatic/defaults/main.yml`
|
||||
```yaml
|
||||
borgmatic_postgresql_databases:
|
||||
# Brew PostgreSQL on indri (current production)
|
||||
- name: miniflux
|
||||
hostname: localhost
|
||||
port: 5432
|
||||
username: borgmatic
|
||||
# k8s PostgreSQL (CloudNativePG) - backup both during migration
|
||||
- name: miniflux
|
||||
hostname: k8s-pg.tail8d86e.ts.net
|
||||
port: 5432
|
||||
username: borgmatic
|
||||
```
|
||||
|
||||
**File**: `ansible/roles/postgresql/tasks/main.yml` (update .pgpass)
|
||||
```yaml
|
||||
- name: Write .pgpass file for borgmatic backups
|
||||
ansible.builtin.copy:
|
||||
content: |
|
||||
# Managed by ansible - only read-only roles
|
||||
localhost:{{ postgresql_port }}:*:borgmatic:{{ postgresql_user_passwords['borgmatic'] }}
|
||||
k8s-pg.tail8d86e.ts.net:5432:*:borgmatic:{{ postgresql_user_passwords['borgmatic'] }}
|
||||
dest: ~/.pgpass
|
||||
mode: '0600'
|
||||
no_log: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 8. Verify complete backup pipeline
|
||||
|
||||
```bash
|
||||
mise run provision-indri -- --tags borgmatic,postgresql
|
||||
ssh indri '/opt/homebrew/bin/borgmatic --verbosity 1'
|
||||
ssh indri 'borgmatic list --archive latest'
|
||||
```
|
||||
|
||||
**Expected output**: Archive contains both dumps:
|
||||
- `localhost/miniflux/miniflux`
|
||||
- `k8s-pg.tail8d86e.ts.net/miniflux/miniflux`
|
||||
|
||||
---
|
||||
|
||||
### 9. Fix ArgoCD drift from CNPG defaults
|
||||
|
||||
**Problem**: ArgoCD showed blumeops-pg as OutOfSync due to CNPG operator adding default values.
|
||||
|
||||
**Solution**: Add CNPG defaults explicitly to managed roles.
|
||||
|
||||
**File**: `argocd/manifests/databases/blumeops-pg.yaml`
|
||||
```yaml
|
||||
managed:
|
||||
roles:
|
||||
- name: eblume
|
||||
# ... existing fields ...
|
||||
connectionLimit: -1
|
||||
ensure: present
|
||||
inherit: true
|
||||
- name: borgmatic
|
||||
# ... existing fields ...
|
||||
connectionLimit: -1
|
||||
ensure: present
|
||||
inherit: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 10. Update zk documentation
|
||||
|
||||
Updated:
|
||||
- `~/code/personal/zk/borgmatic.md` - k8s-pg backup documentation and log entry
|
||||
- `~/code/personal/zk/postgresql.md` - k8s PostgreSQL section and log entry
|
||||
|
||||
---
|
||||
|
||||
## New Files
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `argocd/manifests/databases/secret-borgmatic.yaml.tpl` | borgmatic user password template |
|
||||
|
||||
## Modified Files
|
||||
|
||||
| Path | Change |
|
||||
|------|--------|
|
||||
| `ansible/roles/borgmatic/defaults/main.yml` | Added `borgmatic_local_path`, k8s-pg database entry |
|
||||
| `ansible/roles/borgmatic/templates/config.yaml.j2` | Added `local_path` option |
|
||||
| `ansible/roles/postgresql/tasks/main.yml` | Added k8s-pg to .pgpass |
|
||||
| `argocd/apps/apps.yaml` | Disabled selfHeal |
|
||||
| `argocd/manifests/databases/blumeops-pg.yaml` | Added borgmatic managed role, CNPG defaults |
|
||||
| `pulumi/policy.hujson` | Added ACL grant homelab → k8s on tcp:5432 |
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
- [x] borgmatic backup runs successfully
|
||||
- [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries)
|
||||
- [x] borgmatic user created in k8s-pg with pg_read_all_data role
|
||||
- [x] Both localhost and k8s-pg databases in backup archive
|
||||
- [x] ArgoCD shows blumeops-pg as Synced
|
||||
- [x] zk documentation updated
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
Keep brew PostgreSQL running until Phase 4 verified. To revert:
|
||||
|
||||
1. Remove k8s-pg entry from borgmatic databases
|
||||
2. Remove k8s-pg from .pgpass
|
||||
3. `mise run provision-indri -- --tags borgmatic,postgresql`
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
*Added during implementation for retrospective review*
|
||||
|
||||
### borgmatic LaunchAgent PATH Issue
|
||||
|
||||
**Problem**: borgmatic LaunchAgent failed with `borg: command not found`
|
||||
|
||||
**Root cause**: LaunchAgents run with minimal PATH that doesn't include `/opt/homebrew/bin`
|
||||
|
||||
**Solution**: Added `local_path: /opt/homebrew/bin/borg` to borgmatic config. This was already done for `pg_dump_command` but not for borg itself.
|
||||
|
||||
**Lesson**: Any tool invoked by borgmatic needs absolute path when running from LaunchAgent.
|
||||
|
||||
### 1Password Field Name Mismatch
|
||||
|
||||
**Issue**: Initial secret template used `password` field but 1Password item had `db-password`.
|
||||
|
||||
**Discovery**: Error message from `op inject` indicated field not found.
|
||||
|
||||
**Fix**: Updated template to use correct field name `db-password`.
|
||||
|
||||
### ACL Grant Discovery
|
||||
|
||||
**Problem**: Connection from indri (tag:homelab) to k8s-pg (tag:k8s) failed.
|
||||
|
||||
**Diagnosis**: Checked Tailscale operator proxy logs which showed "no rules matched" - clear indication of missing ACL.
|
||||
|
||||
**Solution**: Added explicit grant in `pulumi/policy.hujson` for `tag:homelab` → `tag:k8s` on `tcp:5432`.
|
||||
|
||||
### ArgoCD selfHeal and Feature Branch Development
|
||||
|
||||
**Problem**: When testing changes, temporarily pointed blumeops-pg app to feature branch via `argocd app set --revision`. ArgoCD's selfHeal kept reverting it back to main.
|
||||
|
||||
**Discussion**: Two options considered:
|
||||
- Option A: Disable selfHeal on apps app (manual sync required for new apps)
|
||||
- Option B: Keep selfHeal, use different workflow
|
||||
|
||||
**Decision**: Option A chosen. The apps app now only has `prune: true`, not selfHeal. This allows:
|
||||
1. Temporarily testing feature branches
|
||||
2. Manual control over when app manifest changes are applied
|
||||
|
||||
**Trade-off**: Must manually sync apps app when adding/removing Application manifests.
|
||||
|
||||
### CloudNativePG Managed Role Reconciliation
|
||||
|
||||
**Issue**: After creating borgmatic secret with correct password, CNPG didn't immediately update the user.
|
||||
|
||||
**Solution**: Annotated the Cluster to trigger reconciliation:
|
||||
```bash
|
||||
kubectl annotate cluster blumeops-pg -n databases cnpg.io/reconcile=$(date +%s) --overwrite
|
||||
```
|
||||
|
||||
### ArgoCD Drift from CNPG Defaults
|
||||
|
||||
**Problem**: blumeops-pg showed OutOfSync despite successful syncs.
|
||||
|
||||
**Cause**: CNPG operator adds default values (`connectionLimit: -1`, `ensure: present`, `inherit: true`) to managed roles that weren't in our spec.
|
||||
|
||||
**Solution**: Added these defaults explicitly to our spec to match what CNPG generates.
|
||||
|
||||
**Comment added**: Documented in blumeops-pg.yaml that these are "CNPG defaults added to prevent ArgoCD drift".
|
||||
|
||||
### Git Workflow for Phase 3
|
||||
|
||||
1. Created feature branch: `feature/p3-postgresql-borgmatic`
|
||||
2. Made commits throughout implementation
|
||||
3. Pointed blumeops-pg app to feature branch for testing
|
||||
4. Created PR #32 for review
|
||||
5. After merge, reset app to main: `argocd app set blumeops-pg --revision main`
|
||||
|
||||
This workflow was enabled by disabling selfHeal (see above).
|
||||
|
|
@ -1,162 +0,0 @@
|
|||
# Phase 4: Miniflux Migration to Kubernetes
|
||||
|
||||
**Goal**: Migrate Miniflux entirely off indri and onto k8s, retire brew PostgreSQL, rename k8s-pg to pg
|
||||
|
||||
**Status**: Complete (2026-01-20)
|
||||
|
||||
**Prerequisites**: [Phase 3](P3_postgresql.complete.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This phase completed the miniflux migration and retired brew PostgreSQL:
|
||||
1. Deployed miniflux container in k8s via ArgoCD
|
||||
2. Exposed via Tailscale Ingress at `feed.tail8d86e.ts.net`
|
||||
3. Removed all miniflux infrastructure from indri (ansible role, brew service, Tailscale serve)
|
||||
4. Retired brew PostgreSQL (no longer needed)
|
||||
5. Renamed k8s-pg to pg (canonical Tailscale hostname)
|
||||
6. Updated borgmatic to backup only `pg.tail8d86e.ts.net`
|
||||
7. Updated all zk documentation
|
||||
|
||||
---
|
||||
|
||||
## New Files
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `argocd/apps/miniflux.yaml` | ArgoCD Application definition |
|
||||
| `argocd/manifests/miniflux/deployment.yaml` | Miniflux Deployment |
|
||||
| `argocd/manifests/miniflux/service.yaml` | ClusterIP Service |
|
||||
| `argocd/manifests/miniflux/ingress-tailscale.yaml` | Tailscale Ingress for `feed.tail8d86e.ts.net` |
|
||||
| `argocd/manifests/miniflux/secret-db.yaml.tpl` | Database URL secret documentation |
|
||||
| `argocd/manifests/miniflux/kustomization.yaml` | Kustomize configuration |
|
||||
| `argocd/manifests/miniflux/README.md` | Setup instructions |
|
||||
|
||||
## Modified Files
|
||||
|
||||
| Path | Change |
|
||||
|------|--------|
|
||||
| `ansible/playbooks/indri.yml` | Removed miniflux and postgresql roles, simplified pre_tasks |
|
||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Removed `svc:feed` and `svc:pg` entries |
|
||||
| `ansible/roles/alloy/defaults/main.yml` | Removed miniflux and postgresql logs, disabled postgres metrics |
|
||||
| `ansible/roles/borgmatic/defaults/main.yml` | Updated to backup only `pg.tail8d86e.ts.net` |
|
||||
| `ansible/roles/borgmatic/tasks/main.yml` | Added .pgpass file management |
|
||||
| `argocd/manifests/databases/service-tailscale.yaml` | Renamed hostname from k8s-pg to pg |
|
||||
|
||||
## Deleted Files
|
||||
|
||||
| Path | Reason |
|
||||
|------|--------|
|
||||
| `ansible/roles/miniflux/` | Entire role no longer needed |
|
||||
| `ansible/roles/postgresql/` | Brew PostgreSQL no longer needed |
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
- [x] Miniflux pod healthy in k8s
|
||||
- [x] https://feed.tail8d86e.ts.net accessible
|
||||
- [x] User `eblume` can log in
|
||||
- [x] Feeds visible and entries readable
|
||||
- [x] `pg.tail8d86e.ts.net` resolves to k8s PostgreSQL
|
||||
- [x] Old `k8s-pg` and `feed` devices removed from Tailscale
|
||||
- [x] brew miniflux and postgresql services stopped
|
||||
- [x] Tailscale serve entries cleared from indri
|
||||
- [x] zk documentation updated
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
*Lessons learned and issues encountered*
|
||||
|
||||
### CNPG-Generated Password vs 1Password
|
||||
|
||||
**Problem**: Initial secret template used 1Password for miniflux database password, but CNPG auto-generates the bootstrap owner password.
|
||||
|
||||
**Solution**: Reference the CNPG-generated password from `blumeops-pg-app` secret:
|
||||
```bash
|
||||
kubectl create secret generic miniflux-db -n miniflux \
|
||||
--from-literal=url="$(kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d)"
|
||||
```
|
||||
|
||||
### Table Ownership Issue After P3 Restore
|
||||
|
||||
**Problem**: Miniflux pod crashed with "permission denied for table schema_version".
|
||||
|
||||
**Root cause**: P3 restore was run as the `eblume` superuser, so all tables were created owned by `eblume`, not `miniflux`.
|
||||
|
||||
**Solution**: Transfer ownership of all tables to miniflux:
|
||||
```sql
|
||||
DO $$
|
||||
DECLARE r RECORD;
|
||||
BEGIN
|
||||
FOR r IN (SELECT tablename FROM pg_tables WHERE schemaname = 'public') LOOP
|
||||
EXECUTE 'ALTER TABLE public.' || quote_ident(r.tablename) || ' OWNER TO miniflux';
|
||||
END LOOP;
|
||||
END$$;
|
||||
```
|
||||
|
||||
### Tailscale Ingress Hostname Suffix
|
||||
|
||||
**Behavior**: When requesting a Tailscale hostname that's already taken, the operator adds a suffix (e.g., `feed-1`).
|
||||
|
||||
**Workflow**:
|
||||
1. Deploy initially - gets `feed-1.tail8d86e.ts.net`
|
||||
2. Clear old `svc:feed` from indri
|
||||
3. Delete old `feed` device from Tailscale admin
|
||||
4. Delete and recreate the Ingress - now claims `feed`
|
||||
|
||||
### Renaming Tailscale Service Hostname
|
||||
|
||||
**Problem**: Changing the `tailscale.com/hostname` annotation doesn't automatically update the Tailscale device.
|
||||
|
||||
**Solution**: Delete the service and let ArgoCD recreate it:
|
||||
```bash
|
||||
kubectl -n databases delete service blumeops-pg-tailscale
|
||||
argocd app sync blumeops-pg
|
||||
```
|
||||
|
||||
### .pgpass Management Migration
|
||||
|
||||
**Issue**: The postgresql role managed `~/.pgpass` for borgmatic. With postgresql role deleted, borgmatic couldn't authenticate.
|
||||
|
||||
**Solution**: Moved .pgpass management to the borgmatic role. Password is still fetched in playbook pre_tasks as `borgmatic_db_password`.
|
||||
|
||||
### Ansible Check Mode and Registered Variables
|
||||
|
||||
**Problem**: Running `provision-indri --check --diff` failed in the podman role with "Conditional result (True) was derived from value of type 'str'" errors.
|
||||
|
||||
**Root cause**: Command tasks are skipped in check mode, leaving registered variables undefined or with unexpected types when used in conditionals.
|
||||
|
||||
**Solution**: Added `check_mode: false` to read-only command tasks that gather information:
|
||||
```yaml
|
||||
- name: Check if podman machine exists
|
||||
ansible.builtin.command:
|
||||
cmd: podman machine list --format json
|
||||
register: podman_machine_list
|
||||
changed_when: false
|
||||
check_mode: false # Safe to run in check mode - read-only
|
||||
```
|
||||
|
||||
**Lesson**: Any task that registers a variable used in conditionals should have `check_mode: false` if the command is read-only/safe.
|
||||
|
||||
### 1Password CLI on Headless Hosts
|
||||
|
||||
**Issue**: Attempted to run `op` commands on indri, but 1Password CLI requires interactive authentication (biometrics/password).
|
||||
|
||||
**Solution**: All `op` commands must be in `pre_tasks` of the playbook with `delegate_to: localhost` so they run on gilbert (the workstation with GUI auth).
|
||||
|
||||
### Git Workflow for Phase 4
|
||||
|
||||
1. Created feature branch: `feature/p4-miniflux`
|
||||
2. Made incremental commits throughout implementation
|
||||
3. Pointed `miniflux` and `blumeops-pg` apps to feature branch for testing
|
||||
4. Created PR #33 for review
|
||||
5. After merge, reset apps to main:
|
||||
```bash
|
||||
argocd app set miniflux --revision main
|
||||
argocd app set blumeops-pg --revision main
|
||||
argocd app sync apps
|
||||
```
|
||||
|
|
@ -1,208 +0,0 @@
|
|||
# Phase 5.1: Migrate Minikube from QEMU2 to Docker Driver
|
||||
|
||||
**Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts
|
||||
|
||||
**Status**: Complete (2026-01-21) - Cluster running, ArgoCD deployed, apps synced
|
||||
|
||||
**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Background
|
||||
|
||||
### Original Problem (Podman → QEMU2)
|
||||
|
||||
During Phase 6 (Kiwix/Transmission migration), we discovered that the **podman driver has fundamental limitations** that prevent mounting external volumes:
|
||||
|
||||
1. **SMB CSI driver fails** with "Operation not permitted" - the rootless container lacks kernel-level mount capabilities
|
||||
2. **`minikube mount` fails** - 9p mount gets "permission denied" inside the podman VM
|
||||
3. **hostPath volumes** only work for paths inside the minikube container, not the macOS host
|
||||
|
||||
We migrated to QEMU2 to get a full VM with kernel capabilities.
|
||||
|
||||
### New Problem (QEMU2 → Docker)
|
||||
|
||||
The QEMU2 driver introduced a **new problem**: the Kubernetes API server is inside the VM at `192.168.105.2:6443`, and Tailscale's TCP proxy cannot forward to it properly:
|
||||
|
||||
- TCP connections succeed (nc -zv works)
|
||||
- TLS handshake times out
|
||||
- Root cause unknown, but likely related to Tailscale serve's handling of non-localhost upstreams
|
||||
|
||||
Additionally, the volume mount solution with QEMU2 was complex:
|
||||
- Required NFS mount from sifaka → indri
|
||||
- Then `minikube mount` to pass through to VM
|
||||
- Two LaunchAgents/LaunchDaemons for persistence
|
||||
- macOS GUI approval required for network access
|
||||
|
||||
### Why Docker?
|
||||
|
||||
The **docker driver** solves both problems:
|
||||
|
||||
1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` works
|
||||
|
||||
2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers.
|
||||
|
||||
3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Summary
|
||||
|
||||
### Infrastructure Changes
|
||||
|
||||
1. **Docker Desktop installed** (manual via `brew install --cask docker`)
|
||||
- Configured with 12GB memory in Docker Desktop settings
|
||||
- Kubernetes option disabled (using minikube instead)
|
||||
|
||||
2. **Docker minikube cluster created**:
|
||||
```bash
|
||||
minikube start \
|
||||
--driver=docker \
|
||||
--container-runtime=docker \
|
||||
--cpus=6 \
|
||||
--memory=11264 \
|
||||
--disk-size=200g \
|
||||
--apiserver-names=k8s.tail8d86e.ts.net,indri \
|
||||
--apiserver-port=6443 \
|
||||
--listen-address=0.0.0.0
|
||||
```
|
||||
|
||||
3. **Tailscale serve configured** for k8s API:
|
||||
- API server on localhost (port is dynamic with docker driver)
|
||||
- `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:<PORT>`
|
||||
|
||||
4. **Remote kubectl access working** from gilbert:
|
||||
- Created `mise-tasks/ensure-minikube-indri-kubectl-config` script
|
||||
- Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml`
|
||||
|
||||
### Ansible Roles Updated
|
||||
|
||||
- `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet
|
||||
- `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port)
|
||||
- Containerd registry mirrors configured for zot pull-through cache
|
||||
|
||||
### ArgoCD Bootstrap
|
||||
|
||||
All apps deployed and synced from `feature/p5.1-qemu2-migration` branch:
|
||||
|
||||
| App | Status | Notes |
|
||||
|-----|--------|-------|
|
||||
| tailscale-operator | Healthy | Manages Tailscale ingresses |
|
||||
| argocd | Healthy | Self-managed |
|
||||
| cloudnative-pg | Healthy | PostgreSQL operator |
|
||||
| blumeops-pg | Progressing | PostgreSQL cluster starting |
|
||||
| grafana | Progressing | Needs grafana-admin secret |
|
||||
| grafana-config | Healthy | Dashboards and ingress |
|
||||
| miniflux | Progressing | Needs miniflux-config secret |
|
||||
| devpi | Progressing | Starting up |
|
||||
|
||||
### Secrets Still Needed
|
||||
|
||||
After PR merge, apply these secrets manually:
|
||||
|
||||
```bash
|
||||
# Grafana admin password
|
||||
op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | kubectl --context=minikube-indri apply -f -
|
||||
|
||||
# Miniflux config
|
||||
op inject -i argocd/manifests/miniflux/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Technical Notes
|
||||
|
||||
### API Server Port
|
||||
|
||||
With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container.
|
||||
|
||||
The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly.
|
||||
|
||||
### Registry Mirror Configuration
|
||||
|
||||
Containerd uses `/etc/containerd/certs.d/<registry>/hosts.toml` files. The ansible role configures mirrors for:
|
||||
- `registry.tail8d86e.ts.net` (private images)
|
||||
- `docker.io`
|
||||
- `ghcr.io`
|
||||
- `quay.io`
|
||||
|
||||
### ProxyClass Renamed
|
||||
|
||||
Changed from `crio-compat` to `default` - the old name was misleading since we're no longer using CRI-O.
|
||||
|
||||
### Volume Mounts for P6 (Kiwix/Transmission)
|
||||
|
||||
**Solution: Direct NFS from pods to sifaka** ✅ TESTED AND WORKING
|
||||
|
||||
Docker NATs outbound traffic through indri's LAN IP (192.168.1.50), so sifaka's NFS exports need to allow `192.168.1.0/24`.
|
||||
|
||||
Sifaka NFS exports configured:
|
||||
- `192.168.1.0/24` - Docker containers via indri NAT
|
||||
- `100.64.0.0/10` - Tailscale clients
|
||||
|
||||
Pods can mount NFS directly:
|
||||
```yaml
|
||||
volumes:
|
||||
- name: torrents
|
||||
nfs:
|
||||
server: sifaka
|
||||
path: /volume1/torrents
|
||||
```
|
||||
|
||||
No LaunchAgents, no `minikube mount`, no SMB CSI driver needed.
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [x] Docker Desktop installed and running on indri
|
||||
- [x] QEMU2 minikube deleted
|
||||
- [x] Docker minikube running (6 CPUs, 11GB RAM)
|
||||
- [x] API server accessible on localhost
|
||||
- [x] Tailscale serve configured for svc:k8s
|
||||
- [x] Remote kubectl access working from gilbert
|
||||
- [x] Ansible roles updated for docker driver
|
||||
- [x] socket_vmnet stopped
|
||||
- [x] ArgoCD deployed and synced
|
||||
- [x] All apps synced to feature branch
|
||||
- [x] Apply app secrets (grafana-admin, miniflux-db, devpi-root, eblume, borgmatic)
|
||||
- [x] Verify all apps healthy after secrets applied
|
||||
- [x] Miniflux database restored from borgmatic backup
|
||||
- [ ] Merge PR and reset apps to main branch
|
||||
- [ ] `mise run indri-services-check` passes
|
||||
|
||||
---
|
||||
|
||||
## Post-Merge Steps
|
||||
|
||||
After PR is merged:
|
||||
|
||||
```bash
|
||||
# Reset all blumeops apps to main branch
|
||||
argocd app set apps --revision main
|
||||
argocd app set argocd --revision main
|
||||
argocd app set blumeops-pg --revision main
|
||||
argocd app set devpi --revision main
|
||||
argocd app set grafana-config --revision main
|
||||
argocd app set miniflux --revision main
|
||||
argocd app set tailscale-operator --revision main
|
||||
|
||||
# Sync all apps
|
||||
argocd app sync apps
|
||||
argocd app sync argocd
|
||||
argocd app sync tailscale-operator
|
||||
argocd app sync blumeops-pg
|
||||
argocd app sync grafana-config
|
||||
argocd app sync miniflux
|
||||
argocd app sync devpi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If Docker driver doesn't work:
|
||||
|
||||
1. Delete Docker minikube: `minikube delete`
|
||||
2. Recreate QEMU2 cluster (restore old ansible config from git)
|
||||
3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl
|
||||
|
|
@ -1,102 +0,0 @@
|
|||
# Phase 5: devpi Migration to Kubernetes
|
||||
|
||||
**Goal**: Migrate devpi PyPI caching proxy from indri to k8s
|
||||
|
||||
**Status**: Complete (2026-01-20)
|
||||
|
||||
**Prerequisites**: [Phase 4](P4_miniflux.complete.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Successfully migrated devpi from mcquack LaunchAgent on indri to Kubernetes:
|
||||
- Custom container image with devpi-server + devpi-web + auto-init startup script
|
||||
- StatefulSet with 50Gi PVC for data persistence
|
||||
- Tailscale Ingress at `pypi.tail8d86e.ts.net`
|
||||
- Root password from 1Password secret, auto-initialized on first run
|
||||
- Verified pip caching proxy and mcquack package upload
|
||||
|
||||
---
|
||||
|
||||
## Key Learnings
|
||||
|
||||
### Registry Mirror Configuration
|
||||
- Minikube's CRI-O can't resolve Tailscale hostnames directly
|
||||
- Added registry mirror config to redirect `registry.tail8d86e.ts.net` → `host.containers.internal:5050`
|
||||
- Also added direct insecure registry entry for `host.containers.internal:5050`
|
||||
- Config in `ansible/roles/minikube/files/zot-mirror.conf`
|
||||
|
||||
### Memory Requirements
|
||||
- devpi-web's Whoosh search indexer needs significant memory during PyPI index build
|
||||
- Initial 512Mi limit caused OOMKills
|
||||
- Solution: High limit (2Gi) with low request (256Mi) - memory reclaimed after indexing
|
||||
|
||||
### Environment Variable Conflicts
|
||||
- Kubernetes auto-sets `DEVPI_PORT` for service discovery
|
||||
- Conflicted with our port config - renamed to `DEVPI_LISTEN_PORT`
|
||||
|
||||
### Tailscale Serve Cleanup
|
||||
- Use `tailscale serve status --json` to see entries (non-JSON output can be empty)
|
||||
- Use `tailscale serve clear svc:<name>` to remove entries
|
||||
|
||||
### ArgoCD Workflow
|
||||
- Changed `apps` to manual sync (was auto-sync with prune)
|
||||
- Workflow: sync apps → set revision to feature branch → sync service → test → reset to main after merge
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [x] devpi pod healthy in k8s
|
||||
- [x] https://pypi.tail8d86e.ts.net accessible
|
||||
- [x] Web interface shows root/pypi index
|
||||
- [x] `pip install <package>` works through proxy
|
||||
- [x] mcquack v1.0.0 uploaded to eblume/dev
|
||||
- [x] `pip install --index-url https://pypi.tail8d86e.ts.net/eblume/dev/+simple/ mcquack` works
|
||||
- [x] Old devpi service removed from indri
|
||||
- [x] zk documentation updated
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
### New Files
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `argocd/apps/devpi.yaml` | ArgoCD Application definition |
|
||||
| `argocd/manifests/devpi/Dockerfile` | Container image with startup script |
|
||||
| `argocd/manifests/devpi/start.sh` | Auto-init startup script |
|
||||
| `argocd/manifests/devpi/statefulset.yaml` | StatefulSet with PVC |
|
||||
| `argocd/manifests/devpi/service.yaml` | ClusterIP Service |
|
||||
| `argocd/manifests/devpi/ingress-tailscale.yaml` | Tailscale Ingress |
|
||||
| `argocd/manifests/devpi/kustomization.yaml` | Kustomize configuration |
|
||||
| `argocd/manifests/devpi/secret-root.yaml.tpl` | 1Password secret template |
|
||||
| `argocd/manifests/devpi/README.md` | Setup documentation |
|
||||
|
||||
### Modified Files
|
||||
| Path | Change |
|
||||
|------|--------|
|
||||
| `CLAUDE.md` | Added k8s/ArgoCD workflow documentation |
|
||||
| `ansible/playbooks/indri.yml` | Removed devpi and devpi_metrics roles |
|
||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Removed svc:pypi |
|
||||
| `ansible/roles/alloy/defaults/main.yml` | Removed devpi log collection |
|
||||
| `ansible/roles/borgmatic/defaults/main.yml` | Removed devpi backup paths |
|
||||
| `ansible/roles/minikube/files/zot-mirror.conf` | Added registry mirror for Tailscale hostname |
|
||||
| `argocd/apps/apps.yaml` | Changed to manual sync policy |
|
||||
|
||||
### Roles Kept (not deleted)
|
||||
- `ansible/roles/devpi/` - Kept for reference
|
||||
- `ansible/roles/devpi_metrics/` - Kept for reference
|
||||
|
||||
---
|
||||
|
||||
## Post-Merge Cleanup
|
||||
|
||||
After PR merge, reset ArgoCD apps to main:
|
||||
```fish
|
||||
argocd app set apps --revision main
|
||||
argocd app sync apps
|
||||
argocd app set devpi --revision main
|
||||
argocd app sync devpi
|
||||
```
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,394 +0,0 @@
|
|||
# Phase 7: Forgejo Migration to Kubernetes
|
||||
|
||||
**Goal**: Migrate Forgejo from indri (macOS Homebrew) to Kubernetes via ArgoCD
|
||||
|
||||
**Status**: Planning (2026-01-21)
|
||||
|
||||
**Prerequisites**: [Phase 6](P6_kiwix.complete.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Critical Risks & Mitigations
|
||||
|
||||
### 1. Circular Dependency (Highest Risk)
|
||||
|
||||
ArgoCD pulls manifests from Forgejo. If k8s Forgejo fails, we cannot redeploy it.
|
||||
|
||||
**Mitigation**: blumeops is mirrored to `github.com/eblume/blumeops`. DR procedure documented to switch ArgoCD to GitHub temporarily (see Disaster Recovery section).
|
||||
|
||||
### 2. Split Hostnames Required
|
||||
|
||||
The Tailscale k8s operator [cannot expose both HTTPS and TCP/SSH on the same hostname](https://github.com/tailscale/tailscale/issues/15539). See also [user comment](https://github.com/tailscale/tailscale/issues/15539#issuecomment-3782368432).
|
||||
|
||||
**Solution**:
|
||||
- **HTTPS (web UI)**: `forge.tail8d86e.ts.net` via Tailscale Ingress
|
||||
- **SSH (git operations)**: `git.tail8d86e.ts.net` via Tailscale LoadBalancer
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
### Forgejo on indri
|
||||
|
||||
| Component | Location/Details |
|
||||
|-----------|------------------|
|
||||
| Data directory | `/opt/homebrew/var/forgejo/` (~426MB) |
|
||||
| SQLite database | `/opt/homebrew/var/forgejo/data/forgejo.db` (4.1MB) |
|
||||
| Git repositories | `/opt/homebrew/var/forgejo/data/forgejo-repositories/` (~418MB) |
|
||||
| Configuration | `/opt/homebrew/var/forgejo/custom/conf/app.ini` (contains secrets) |
|
||||
| HTTP port | 3001 (localhost) |
|
||||
| SSH port | 2200 (localhost) |
|
||||
| Tailscale | `svc:forge` with tcp:22→2200 and https:443→3001 |
|
||||
| Backup | borgmatic backs up to sifaka |
|
||||
|
||||
### Hosted Repositories (8 total)
|
||||
|
||||
- blumeops (mirrored to GitHub)
|
||||
- cloudnative-pg-charts
|
||||
- csi-driver-smb
|
||||
- devpi
|
||||
- dotfiles
|
||||
- grafana-helm-charts
|
||||
- mcquack
|
||||
- zot
|
||||
|
||||
---
|
||||
|
||||
## Architecture Decision: Helm Chart via ArgoCD
|
||||
|
||||
Following established pattern from cloudnative-pg and grafana:
|
||||
1. Mirror `https://code.forgejo.org/forgejo-helm/forgejo-helm` to forge
|
||||
2. ArgoCD Application with multi-source (chart + values)
|
||||
3. Values file in `argocd/manifests/forgejo/values.yaml`
|
||||
|
||||
---
|
||||
|
||||
## All `forge` References Requiring Update
|
||||
|
||||
### SSH URLs (change to `git.tail8d86e.ts.net:22`)
|
||||
|
||||
| File | Current | After |
|
||||
|------|---------|-------|
|
||||
| `argocd/apps/apps.yaml` | `ssh://forgejo@indri.tail8d86e.ts.net:2200/...` | `ssh://forgejo@git.tail8d86e.ts.net/...` |
|
||||
| `argocd/apps/argocd.yaml` | same | same |
|
||||
| `argocd/apps/blumeops-pg.yaml` | same | same |
|
||||
| `argocd/apps/cloudnative-pg.yaml` | same | same |
|
||||
| `argocd/apps/devpi.yaml` | same | same |
|
||||
| `argocd/apps/grafana.yaml` | same | same |
|
||||
| `argocd/apps/grafana-config.yaml` | same | same |
|
||||
| `argocd/apps/kiwix.yaml` | same | same |
|
||||
| `argocd/apps/miniflux.yaml` | same | same |
|
||||
| `argocd/apps/tailscale-operator.yaml` | same | same |
|
||||
| `argocd/apps/torrent.yaml` | same | same |
|
||||
| `argocd/manifests/argocd/repo-forge-secret.yaml.tpl` | `ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/` | `ssh://forgejo@git.tail8d86e.ts.net/eblume/` |
|
||||
| `ansible/group_vars/all.yml` | `ssh://forgejo@forge.tail8d86e.ts.net/...` | `ssh://forgejo@git.tail8d86e.ts.net/...` |
|
||||
|
||||
### SSH Known Hosts (add `git.tail8d86e.ts.net`)
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `argocd/manifests/argocd/argocd-ssh-known-hosts-cm.yaml` | Add `git.tail8d86e.ts.net ssh-ed25519 AAAA...` |
|
||||
|
||||
### HTTPS URLs (stay as `forge.tail8d86e.ts.net`)
|
||||
|
||||
These remain unchanged:
|
||||
- `CLAUDE.md:135` - Mirror location
|
||||
- `mise-tasks/pr-comments:23` - Forge API base
|
||||
- `mise-tasks/indri-services-check:65` - HTTP health check (update to check k8s)
|
||||
|
||||
### Ansible/Indri Cleanup (remove after migration)
|
||||
|
||||
| File | Action |
|
||||
|------|--------|
|
||||
| `ansible/playbooks/indri.yml:36-37` | Remove forgejo role |
|
||||
| `ansible/roles/tailscale_serve/defaults/main.yml:6` | Remove `svc:forge` entry |
|
||||
| `ansible/roles/alloy/defaults/main.yml:31-32` | Remove forgejo log collection |
|
||||
| `ansible/roles/borgmatic/defaults/main.yml:17` | Update backup path |
|
||||
|
||||
### Tailscale/Pulumi (update after hostname cutover)
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `argocd/manifests/tailscale-operator/egress-forge.yaml` | Delete (no longer needed) |
|
||||
| `pulumi/policy.hujson` | Update `tag:forge` ACLs for k8s source |
|
||||
|
||||
---
|
||||
|
||||
## Pre-Migration Checklist
|
||||
|
||||
- [ ] GitHub mirror verified current
|
||||
- [ ] Full borgmatic backup completed and verified
|
||||
- [ ] Manual backup of `/opt/homebrew/var/forgejo` on indri
|
||||
- [ ] Document all SSH deploy keys and webhooks
|
||||
- [ ] **User action**: Mirror forgejo-helm chart to forge
|
||||
- [ ] Extract secrets from app.ini to 1Password:
|
||||
- `INTERNAL_TOKEN`
|
||||
- `SECRET_KEY`
|
||||
- `JWT_SECRET`
|
||||
- Any OAuth/webhook secrets
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### Phase A: Create k8s Manifests
|
||||
|
||||
**New Files:**
|
||||
```
|
||||
argocd/apps/forgejo.yaml # ArgoCD Application (multi-source Helm)
|
||||
argocd/manifests/forgejo/values.yaml # Helm chart values
|
||||
argocd/manifests/forgejo/kustomization.yaml # Kustomize config
|
||||
argocd/manifests/forgejo/pvc.yaml # 10Gi PersistentVolumeClaim
|
||||
argocd/manifests/forgejo/secret-app.yaml.tpl # Secrets from 1Password
|
||||
```
|
||||
|
||||
**Key values.yaml settings:**
|
||||
```yaml
|
||||
service:
|
||||
ssh:
|
||||
type: LoadBalancer
|
||||
loadBalancerClass: tailscale
|
||||
port: 22
|
||||
annotations:
|
||||
tailscale.com/hostname: "git-1" # Test hostname first
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: tailscale
|
||||
hosts:
|
||||
- host: forge-1 # Test hostname first
|
||||
|
||||
gitea:
|
||||
config:
|
||||
server:
|
||||
DOMAIN: forge-1.tail8d86e.ts.net
|
||||
ROOT_URL: https://forge-1.tail8d86e.ts.net/
|
||||
SSH_DOMAIN: git-1.tail8d86e.ts.net
|
||||
SSH_PORT: 22
|
||||
database:
|
||||
DB_TYPE: sqlite3
|
||||
PATH: /data/forgejo.db
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase B: Deploy to Test Hostnames
|
||||
|
||||
1. Create feature branch, push to forge
|
||||
2. Sync ArgoCD apps: `argocd app sync apps`
|
||||
3. Point forgejo app to feature branch: `argocd app set forgejo --revision feature/p7-forgejo`
|
||||
4. Sync forgejo app: `argocd app sync forgejo`
|
||||
5. Verify pods running (empty data initially)
|
||||
|
||||
---
|
||||
|
||||
### Phase C: Data Migration (~10 min downtime)
|
||||
|
||||
1. **Stop indri Forgejo**
|
||||
```bash
|
||||
ssh indri 'brew services stop forgejo'
|
||||
```
|
||||
|
||||
2. **Copy data** (option A: rsync via NFS staging)
|
||||
```bash
|
||||
ssh indri 'rsync -avP /opt/homebrew/var/forgejo/ sifaka:/volume1/forgejo-migration/'
|
||||
```
|
||||
|
||||
3. **Copy to PVC and fix permissions**
|
||||
```bash
|
||||
kubectl exec -n forgejo deployment/forgejo -- rsync -avP /staging/ /data/
|
||||
kubectl exec -n forgejo deployment/forgejo -- chown -R 1000:1000 /data
|
||||
```
|
||||
|
||||
4. **Restart Forgejo**
|
||||
```bash
|
||||
kubectl rollout restart deployment/forgejo -n forgejo
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase D: Validation (Critical)
|
||||
|
||||
- [ ] Web UI accessible at `forge-1.tail8d86e.ts.net`
|
||||
- [ ] SSH works: `ssh -T forgejo@git-1.tail8d86e.ts.net`
|
||||
- [ ] All 8 repos visible and accessible
|
||||
- [ ] Git clone works
|
||||
- [ ] Git push works (test on non-critical repo)
|
||||
- [ ] eblume user preserved with correct permissions
|
||||
- [ ] PR history intact
|
||||
- [ ] Webhooks functioning
|
||||
- [ ] GitHub mirror push still works
|
||||
|
||||
---
|
||||
|
||||
### Phase E: Hostname Cutover
|
||||
|
||||
1. **Clear indri Tailscale serve**
|
||||
```bash
|
||||
ssh indri 'tailscale serve clear svc:forge'
|
||||
```
|
||||
|
||||
2. **User action**: Delete `svc:forge` and `forge-1` devices from Tailscale admin
|
||||
|
||||
3. **Update manifests**: Change `forge-1` → `forge`, `git-1` → `git`
|
||||
|
||||
4. **Sync ArgoCD**
|
||||
|
||||
5. **Verify hostnames claimed**
|
||||
```bash
|
||||
curl https://forge.tail8d86e.ts.net/api/v1/version
|
||||
ssh -T forgejo@git.tail8d86e.ts.net
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase F: Update ArgoCD to Use New Forgejo
|
||||
|
||||
1. **Get SSH host key from k8s Forgejo**
|
||||
```bash
|
||||
kubectl exec -n forgejo deployment/forgejo -- cat /data/ssh/ssh_host_ed25519_key.pub
|
||||
```
|
||||
|
||||
2. **Update known_hosts ConfigMap** with `git.tail8d86e.ts.net` key
|
||||
|
||||
3. **Update repo-creds-forge secret** (manual kubectl commands)
|
||||
|
||||
4. **Update all ArgoCD Application manifests** with new repoURL
|
||||
|
||||
5. **Delete egress-forge.yaml** (no longer needed)
|
||||
|
||||
6. **Sync ArgoCD** and verify all apps sync successfully
|
||||
|
||||
---
|
||||
|
||||
### Phase G: Update Local Git Remotes
|
||||
|
||||
```bash
|
||||
cd ~/code/personal/blumeops
|
||||
git remote set-url origin ssh://forgejo@git.tail8d86e.ts.net/eblume/blumeops.git
|
||||
# Repeat for all 8 repos
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase H: Cleanup
|
||||
|
||||
1. Remove forgejo role from `ansible/playbooks/indri.yml`
|
||||
2. Remove `svc:forge` from `ansible/roles/tailscale_serve/defaults/main.yml`
|
||||
3. Remove forgejo log collection from `ansible/roles/alloy/defaults/main.yml`
|
||||
4. Delete `argocd/manifests/tailscale-operator/egress-forge.yaml`
|
||||
5. Update `mise-tasks/indri-services-check`
|
||||
6. Run ansible to clean up indri: `mise run provision-indri -- --tags tailscale-serve,alloy`
|
||||
7. Update zk documentation (forgejo, argocd, blumeops cards)
|
||||
8. Merge PR
|
||||
9. Reset ArgoCD to main
|
||||
|
||||
---
|
||||
|
||||
## Disaster Recovery Procedure
|
||||
|
||||
**Add to [[forgejo]] zk card:**
|
||||
|
||||
### When Forgejo is Unavailable
|
||||
|
||||
1. **Add GitHub repository to ArgoCD**
|
||||
```bash
|
||||
argocd repo add https://github.com/eblume/blumeops.git \
|
||||
--username eblume \
|
||||
--password $(op read "op://<vault>/<item>/github-pat")
|
||||
```
|
||||
|
||||
2. **Point critical apps to GitHub**
|
||||
```bash
|
||||
argocd app set apps --repo https://github.com/eblume/blumeops.git
|
||||
argocd app set forgejo --repo https://github.com/eblume/blumeops.git
|
||||
argocd app sync forgejo
|
||||
```
|
||||
|
||||
3. **Fix Forgejo** (restore from backup, fix config, etc.)
|
||||
|
||||
4. **Verify Forgejo is healthy**
|
||||
```bash
|
||||
curl https://forge.tail8d86e.ts.net/api/v1/version
|
||||
ssh -T forgejo@git.tail8d86e.ts.net
|
||||
```
|
||||
|
||||
5. **Switch back to Forgejo**
|
||||
```bash
|
||||
argocd app set apps --repo ssh://forgejo@git.tail8d86e.ts.net/eblume/blumeops.git
|
||||
argocd app set forgejo --repo ssh://forgejo@git.tail8d86e.ts.net/eblume/blumeops.git
|
||||
argocd app sync apps
|
||||
argocd repo rm https://github.com/eblume/blumeops.git
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files Summary
|
||||
|
||||
### New Files
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `argocd/apps/forgejo.yaml` | ArgoCD Application (multi-source Helm) |
|
||||
| `argocd/manifests/forgejo/values.yaml` | Helm chart values |
|
||||
| `argocd/manifests/forgejo/kustomization.yaml` | Kustomize config |
|
||||
| `argocd/manifests/forgejo/pvc.yaml` | 10Gi PersistentVolumeClaim |
|
||||
| `argocd/manifests/forgejo/secret-app.yaml.tpl` | Secrets template |
|
||||
|
||||
### Modified Files
|
||||
|
||||
| Path | Change |
|
||||
|------|--------|
|
||||
| All `argocd/apps/*.yaml` | Update repoURL to `git.tail8d86e.ts.net` |
|
||||
| `argocd/manifests/argocd/argocd-ssh-known-hosts-cm.yaml` | Add `git.tail8d86e.ts.net` |
|
||||
| `argocd/manifests/argocd/repo-forge-secret.yaml.tpl` | Update URL |
|
||||
| `ansible/playbooks/indri.yml` | Remove forgejo role |
|
||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Remove `svc:forge` |
|
||||
| `ansible/roles/alloy/defaults/main.yml` | Remove forgejo logs |
|
||||
|
||||
### Files to Delete
|
||||
|
||||
| Path | Reason |
|
||||
|------|--------|
|
||||
| `argocd/manifests/tailscale-operator/egress-forge.yaml` | No longer needed |
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
If migration fails at any point:
|
||||
|
||||
1. **Delete k8s resources**
|
||||
```bash
|
||||
argocd app delete forgejo --cascade
|
||||
kubectl delete namespace forgejo
|
||||
```
|
||||
|
||||
2. **Restart indri Forgejo**
|
||||
```bash
|
||||
ssh indri 'brew services start forgejo'
|
||||
```
|
||||
|
||||
3. **Re-enable Tailscale serve**
|
||||
```bash
|
||||
mise run provision-indri -- --tags tailscale-serve
|
||||
```
|
||||
|
||||
4. **Revert ArgoCD apps to indri URLs** (if changed)
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] GitHub mirror verified current
|
||||
- [ ] Helm chart mirrored to forge
|
||||
- [ ] Secrets extracted to 1Password
|
||||
- [ ] k8s Forgejo pod running
|
||||
- [ ] All 8 repos accessible
|
||||
- [ ] SSH clone/push works via `git.tail8d86e.ts.net`
|
||||
- [ ] HTTPS works via `forge.tail8d86e.ts.net`
|
||||
- [ ] ArgoCD syncs from new URL
|
||||
- [ ] All local remotes updated
|
||||
- [ ] Indri cleanup complete
|
||||
- [ ] zk docs updated
|
||||
- [ ] DR procedure documented in [[forgejo]] card
|
||||
|
|
@ -1,32 +0,0 @@
|
|||
# Phase 8: CI/CD (Woodpecker)
|
||||
|
||||
**Goal**: Deploy Woodpecker CI integrated with Forgejo
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 7](P7_forgejo.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Create Forgejo OAuth application
|
||||
|
||||
- Callback: https://ci.tail8d86e.ts.net/authorize
|
||||
- Store in 1Password
|
||||
|
||||
---
|
||||
|
||||
### 2. Deploy Woodpecker Server + Agent
|
||||
|
||||
---
|
||||
|
||||
### 3. Configure Tailscale LoadBalancer
|
||||
|
||||
Tag: `svc:ci`
|
||||
|
||||
---
|
||||
|
||||
### 4. Test pipeline
|
||||
|
||||
Create `.woodpecker.yaml` in test repo
|
||||
|
|
@ -1,52 +0,0 @@
|
|||
# Phase 9: Cleanup
|
||||
|
||||
**Goal**: Remove deprecated services, harden system
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 8](P8_woodpecker.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Stop/remove unused brew services
|
||||
|
||||
- postgresql@18
|
||||
- grafana
|
||||
- miniflux
|
||||
- forgejo
|
||||
|
||||
---
|
||||
|
||||
### 2. Update ansible playbook
|
||||
|
||||
- Remove migrated service roles
|
||||
- Add k8s deployment references
|
||||
|
||||
---
|
||||
|
||||
### 3. Configure Velero backups (optional)
|
||||
|
||||
- Install with MinIO on sifaka
|
||||
- Schedule daily cluster backups
|
||||
|
||||
---
|
||||
|
||||
### 4. Update zk documentation
|
||||
|
||||
- New architecture
|
||||
- Runbooks
|
||||
- DR procedures
|
||||
|
||||
---
|
||||
|
||||
## Plan Completion
|
||||
|
||||
When all phases are complete and verified:
|
||||
|
||||
```bash
|
||||
# Rename this folder to indicate completion
|
||||
git mv plans/k8s-migration plans/k8s-migration.complete
|
||||
git commit -m "Complete k8s migration plan"
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue