Remove plans, they dont seem to work
All checks were successful
Test CI / test (push) Successful in 3s
All checks were successful
Test CI / test (push) Successful in 3s
This commit is contained in:
parent
8ca8798121
commit
ceba6b3c2c
18 changed files with 0 additions and 6816 deletions
|
|
@ -1,179 +0,0 @@
|
||||||
# Forgejo Actions CI/CD Bootstrap Plan
|
|
||||||
|
|
||||||
This plan details the setup of Forgejo Actions as the CI/CD system for blumeops, starting with the bootstrapping problem: using Forgejo to build and deploy Forgejo itself.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
1. **Forgejo Actions** as the primary CI system (replaces Woodpecker from original plan)
|
|
||||||
2. **Self-hosted Forgejo** built from source, deployed as mcquack LaunchAgent on indri
|
|
||||||
3. **Container builds** for ArgoCD manifests (devpi, etc.)
|
|
||||||
4. **Cron-scheduled tasks** via k8s CronJobs (not Actions)
|
|
||||||
5. **Local development** parity using `act` for workflow testing
|
|
||||||
|
|
||||||
## Why Forgejo Actions over Woodpecker?
|
|
||||||
|
|
||||||
- Native integration with Forgejo (no OAuth setup, automatic repo detection)
|
|
||||||
- GitHub Actions compatible syntax (huge ecosystem of reusable actions)
|
|
||||||
- `act` tool for local testing on gilbert
|
|
||||||
- Single system to maintain instead of two
|
|
||||||
|
|
||||||
## Architecture Overview
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────────────┐
|
|
||||||
│ INDRI │
|
|
||||||
│ ┌─────────────────────┐ │
|
|
||||||
│ │ Forgejo │ ← Built from source │
|
|
||||||
│ │ (mcquack agent) │ ← Deploys itself via CI │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ - Web UI (3001) │ │
|
|
||||||
│ │ - SSH (2200) │ │
|
|
||||||
│ │ - Actions enabled │ │
|
|
||||||
│ └─────────────────────┘ │
|
|
||||||
└─────────────────────────────────────────────────────────────────┘
|
|
||||||
│
|
|
||||||
│ SSH deploy
|
|
||||||
▼
|
|
||||||
┌─────────────────────────────────────────────────────────────────┐
|
|
||||||
│ KUBERNETES (minikube) │
|
|
||||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
|
||||||
│ │ Forgejo Runner │ │ Other Services │ │
|
|
||||||
│ │ (host mode) │ │ (via ArgoCD) │ │
|
|
||||||
│ │ │ │ │ │
|
|
||||||
│ │ - Custom image │ │ │ │
|
|
||||||
│ │ - Node.js + tools │ │ │ │
|
|
||||||
│ │ - Docker builds │ │ │ │
|
|
||||||
│ └─────────────────────┘ └─────────────────────┘ │
|
|
||||||
└─────────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
## Phases
|
|
||||||
|
|
||||||
| Phase | Name | Description | Status |
|
|
||||||
|-------|------|-------------|--------|
|
|
||||||
| 1 | [Enable Actions](P1_enable_actions.md) | Configure Forgejo for Actions, deploy runner in host mode | ✅ Complete |
|
|
||||||
| 2 | [Custom Runner Image](P2_mirror_and_build.md) | Build custom runner with Node.js/tools, enable standard Actions | ✅ Complete |
|
|
||||||
| 3 | [Mirror Forgejo & Build](P3_mirror_forgejo.md) | Mirror upstream Forgejo, create build workflow | Planning |
|
|
||||||
| 4 | [Self-Deploy](P4_self_deploy.md) | Forgejo deploys itself, transition to mcquack | Planning |
|
|
||||||
| 5 | [Container Builds](P5_container_builds.md) | Build custom container images (devpi, etc.) | Planning |
|
|
||||||
|
|
||||||
## The Bootstrap Problem
|
|
||||||
|
|
||||||
**Chicken-and-egg**: We need Forgejo Actions to build Forgejo, but Forgejo must be running first.
|
|
||||||
|
|
||||||
**Additional complication**: The stock runner image lacks Node.js, so standard GitHub Actions don't work.
|
|
||||||
|
|
||||||
**Solution**:
|
|
||||||
1. Keep current brew-based Forgejo running during setup ✅
|
|
||||||
2. Enable Actions, deploy runner in host mode ✅
|
|
||||||
3. **Build custom runner image** with Node.js and tools (bootstrap manually, then automate)
|
|
||||||
4. Mirror upstream Forgejo, create build workflow
|
|
||||||
5. Address cross-compilation challenge (Linux runner → macOS target)
|
|
||||||
6. First CI build creates the binary
|
|
||||||
7. CI deploys binary to indri as mcquack service
|
|
||||||
8. `brew services stop forgejo` and uninstall
|
|
||||||
9. Future builds: Forgejo builds and deploys itself
|
|
||||||
|
|
||||||
**Cross-compilation challenge**:
|
|
||||||
The runner runs in Linux containers (k8s), but Forgejo needs to run on indri (macOS ARM64). Options:
|
|
||||||
- Cross-compile with CGO_ENABLED=1 (complex, needs OSX toolchain)
|
|
||||||
- Cross-compile with CGO_ENABLED=0 (breaks Tailscale DNS resolution)
|
|
||||||
- Build on gilbert manually, use CI only for deploy
|
|
||||||
- Run a native macOS runner on indri (outside k8s)
|
|
||||||
|
|
||||||
This will be addressed in Phase 3.
|
|
||||||
|
|
||||||
**Risk mitigation**: If self-deployment breaks Forgejo:
|
|
||||||
- blumeops is mirrored to GitHub
|
|
||||||
- Manual recovery: build on gilbert, scp to indri, restart service
|
|
||||||
- See Disaster Recovery section in P4
|
|
||||||
|
|
||||||
## Host Mode Runner
|
|
||||||
|
|
||||||
The runner uses **host mode** (`ubuntu-latest:host`), meaning:
|
|
||||||
- Jobs run directly in the runner container (no Docker/k8s pods spawned)
|
|
||||||
- Tools must be pre-installed in the runner image
|
|
||||||
- Stock image lacks Node.js, so `actions/checkout@v4` doesn't work
|
|
||||||
- Solution: Build custom runner image with necessary tools (Phase 2)
|
|
||||||
|
|
||||||
## Ansible Role Strategy
|
|
||||||
|
|
||||||
The forgejo ansible role will follow the zot/alloy pattern:
|
|
||||||
|
|
||||||
1. **Check binary exists** at expected path
|
|
||||||
2. **If missing**: Fail with message pointing to CI trigger instructions
|
|
||||||
3. **If present**: Deploy config, ensure LaunchAgent loaded
|
|
||||||
|
|
||||||
Ansible does NOT:
|
|
||||||
- Build the binary (that's CI's job)
|
|
||||||
- Deploy new versions (that's CI's job)
|
|
||||||
|
|
||||||
Ansible DOES:
|
|
||||||
- Manage app.ini configuration (via template with secrets from 1Password)
|
|
||||||
- Manage mcquack LaunchAgent plist
|
|
||||||
- Ensure service is running
|
|
||||||
- Collect logs via Alloy
|
|
||||||
|
|
||||||
## Files Summary
|
|
||||||
|
|
||||||
### New Files
|
|
||||||
|
|
||||||
| Path | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `argocd/apps/forgejo-runner.yaml` | ArgoCD Application for runner ✅ |
|
|
||||||
| `argocd/manifests/forgejo-runner/` | Runner k8s manifests ✅ |
|
|
||||||
| `argocd/manifests/forgejo-runner/Dockerfile` | Custom runner image (P2) |
|
|
||||||
| `.forgejo/workflows/build-runner.yml` | Auto-rebuild runner image (P2) |
|
|
||||||
| `.forgejo/workflows/test.yml` | Test workflow ✅ |
|
|
||||||
| (on forge) `eblume/forgejo/.forgejo/workflows/` | Build workflow in forgejo mirror (P3) |
|
|
||||||
|
|
||||||
### Modified Files
|
|
||||||
|
|
||||||
| Path | Change |
|
|
||||||
|------|--------|
|
|
||||||
| `ansible/roles/forgejo/` | Complete rewrite for mcquack pattern (P4) |
|
|
||||||
| `ansible/roles/alloy/defaults/main.yml` | Update forgejo log paths (P4) |
|
|
||||||
| zk cards | Update forgejo, argocd, blumeops cards |
|
|
||||||
|
|
||||||
### Credentials Needed
|
|
||||||
|
|
||||||
| Item | Purpose | Storage |
|
|
||||||
|------|---------|---------|
|
|
||||||
| Runner registration token | Runner auth to Forgejo | 1Password ✅ |
|
|
||||||
| SSH deploy key | Runner SSH to indri (for Forgejo deploy) | 1Password + k8s secret (P3) |
|
|
||||||
|
|
||||||
## Related Plans
|
|
||||||
|
|
||||||
- [P7_forgejo.md](../k8s-migration/P7_forgejo.md) - Original k8s migration plan (superseded for Forgejo itself, but SSH hostname split info still relevant)
|
|
||||||
- [P8_woodpecker.md](../k8s-migration/P8_woodpecker.md) - Original Woodpecker plan (superseded by Forgejo Actions)
|
|
||||||
|
|
||||||
## Decision Log
|
|
||||||
|
|
||||||
### 2026-01-23: Custom runner image as Phase 2
|
|
||||||
|
|
||||||
**Decision**: Move custom runner image work from P4 to P2
|
|
||||||
|
|
||||||
**Rationale**:
|
|
||||||
- Stock runner lacks Node.js, can't run `actions/checkout@v4`
|
|
||||||
- Need working GitHub Actions before building Forgejo
|
|
||||||
- Bootstrap manually (podman build on gilbert), then automate
|
|
||||||
|
|
||||||
### 2026-01-23: Forgejo Actions over Woodpecker
|
|
||||||
|
|
||||||
**Decision**: Use Forgejo Actions instead of Woodpecker CI
|
|
||||||
|
|
||||||
**Rationale**:
|
|
||||||
- Native Forgejo integration (Actions is built-in)
|
|
||||||
- GitHub Actions compatible (reuse existing actions)
|
|
||||||
- `act` for local testing
|
|
||||||
- One less system to deploy and maintain
|
|
||||||
|
|
||||||
### 2026-01-23: Keep Forgejo on indri (not k8s)
|
|
||||||
|
|
||||||
**Decision**: Forgejo stays on indri as mcquack service, not migrated to k8s
|
|
||||||
|
|
||||||
**Rationale**:
|
|
||||||
- Avoid circular dependency (ArgoCD needs Forgejo to deploy Forgejo)
|
|
||||||
- Simpler SSH handling (direct port, no k8s networking complexity)
|
|
||||||
- Forgejo is critical infrastructure, benefits from isolation
|
|
||||||
- Can still use Tailscale serve for external access
|
|
||||||
|
|
@ -1,322 +0,0 @@
|
||||||
# Phase 1: Enable Forgejo Actions
|
|
||||||
|
|
||||||
**Goal**: Configure Forgejo to support Actions workflows and deploy a runner in k8s
|
|
||||||
|
|
||||||
**Status**: Completed (2026-01-23)
|
|
||||||
|
|
||||||
**Prerequisites**: None (uses existing brew-based Forgejo)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Current State
|
|
||||||
|
|
||||||
- Forgejo runs via `brew services` on indri
|
|
||||||
- Config at `/opt/homebrew/var/forgejo/custom/conf/app.ini`
|
|
||||||
- Actions not enabled
|
|
||||||
- No runners deployed
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 1: Enable Actions in Forgejo
|
|
||||||
|
|
||||||
### 1.1 Update app.ini
|
|
||||||
|
|
||||||
SSH to indri and edit the Forgejo config:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'vim /opt/homebrew/var/forgejo/custom/conf/app.ini'
|
|
||||||
```
|
|
||||||
|
|
||||||
Add the following sections:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[actions]
|
|
||||||
ENABLED = true
|
|
||||||
DEFAULT_ACTIONS_URL = https://code.forgejo.org
|
|
||||||
|
|
||||||
[repository]
|
|
||||||
; Allow workflows to be stored in .forgejo/workflows
|
|
||||||
DEFAULT_REPO_UNITS = repo.code,repo.issues,repo.pulls,repo.releases,repo.wiki,repo.projects,repo.packages,repo.actions
|
|
||||||
```
|
|
||||||
|
|
||||||
### 1.2 Restart Forgejo
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'brew services restart forgejo'
|
|
||||||
```
|
|
||||||
|
|
||||||
### 1.3 Verify Actions Enabled
|
|
||||||
|
|
||||||
1. Go to https://forge.tail8d86e.ts.net
|
|
||||||
2. Navigate to any repo → Settings → Actions
|
|
||||||
3. Should see "Enable Repository Actions" option
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 2: Create Runner Registration Token
|
|
||||||
|
|
||||||
### 2.1 Generate Token in Forgejo UI
|
|
||||||
|
|
||||||
1. Go to https://forge.tail8d86e.ts.net/admin/actions/runners
|
|
||||||
2. Click "Create new Runner"
|
|
||||||
3. Copy the registration token
|
|
||||||
4. Store in 1Password (blumeops vault) as "Forgejo Runner Token"
|
|
||||||
|
|
||||||
### 2.2 Create k8s Secret Template
|
|
||||||
|
|
||||||
Create `argocd/manifests/forgejo-runner/secret-token.yaml.tpl`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Template for op inject
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Secret
|
|
||||||
metadata:
|
|
||||||
name: forgejo-runner-token
|
|
||||||
namespace: forgejo-runner
|
|
||||||
type: Opaque
|
|
||||||
stringData:
|
|
||||||
token: "op://blumeops/<runner-token-item>/token"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 3: Deploy Runner to Kubernetes
|
|
||||||
|
|
||||||
### 3.1 Create ArgoCD Application
|
|
||||||
|
|
||||||
Create `argocd/apps/forgejo-runner.yaml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: forgejo-runner
|
|
||||||
namespace: argocd
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
|
|
||||||
targetRevision: main
|
|
||||||
path: argocd/manifests/forgejo-runner
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: forgejo-runner
|
|
||||||
syncPolicy:
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=true
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3.2 Create Runner Manifests
|
|
||||||
|
|
||||||
Create directory `argocd/manifests/forgejo-runner/` with:
|
|
||||||
|
|
||||||
**kustomization.yaml**:
|
|
||||||
```yaml
|
|
||||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
|
||||||
kind: Kustomization
|
|
||||||
namespace: forgejo-runner
|
|
||||||
resources:
|
|
||||||
- namespace.yaml
|
|
||||||
- deployment.yaml
|
|
||||||
- serviceaccount.yaml
|
|
||||||
- secret-token.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**namespace.yaml**:
|
|
||||||
```yaml
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Namespace
|
|
||||||
metadata:
|
|
||||||
name: forgejo-runner
|
|
||||||
```
|
|
||||||
|
|
||||||
**serviceaccount.yaml**:
|
|
||||||
```yaml
|
|
||||||
apiVersion: v1
|
|
||||||
kind: ServiceAccount
|
|
||||||
metadata:
|
|
||||||
name: forgejo-runner
|
|
||||||
namespace: forgejo-runner
|
|
||||||
```
|
|
||||||
|
|
||||||
**deployment.yaml**:
|
|
||||||
```yaml
|
|
||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: forgejo-runner
|
|
||||||
namespace: forgejo-runner
|
|
||||||
spec:
|
|
||||||
replicas: 1
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
app: forgejo-runner
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
app: forgejo-runner
|
|
||||||
spec:
|
|
||||||
serviceAccountName: forgejo-runner
|
|
||||||
containers:
|
|
||||||
- name: runner
|
|
||||||
image: code.forgejo.org/forgejo/runner:3.5.1
|
|
||||||
env:
|
|
||||||
- name: FORGEJO_INSTANCE_URL
|
|
||||||
value: "https://forge.tail8d86e.ts.net"
|
|
||||||
- name: RUNNER_NAME
|
|
||||||
value: "k8s-runner-1"
|
|
||||||
- name: RUNNER_TOKEN
|
|
||||||
valueFrom:
|
|
||||||
secretKeyRef:
|
|
||||||
name: forgejo-runner-token
|
|
||||||
key: token
|
|
||||||
command:
|
|
||||||
- /bin/sh
|
|
||||||
- -c
|
|
||||||
- |
|
|
||||||
# Register runner if not already registered
|
|
||||||
if [ ! -f /data/.runner ]; then
|
|
||||||
forgejo-runner register \
|
|
||||||
--instance "$FORGEJO_INSTANCE_URL" \
|
|
||||||
--token "$RUNNER_TOKEN" \
|
|
||||||
--name "$RUNNER_NAME" \
|
|
||||||
--labels "ubuntu-latest:docker://node:20-bookworm,ubuntu-22.04:docker://ubuntu:22.04" \
|
|
||||||
--no-interactive
|
|
||||||
fi
|
|
||||||
# Start the runner daemon
|
|
||||||
forgejo-runner daemon
|
|
||||||
volumeMounts:
|
|
||||||
- name: runner-data
|
|
||||||
mountPath: /data
|
|
||||||
- name: docker-sock
|
|
||||||
mountPath: /var/run/docker.sock
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
memory: "256Mi"
|
|
||||||
cpu: "100m"
|
|
||||||
limits:
|
|
||||||
memory: "1Gi"
|
|
||||||
cpu: "1000m"
|
|
||||||
volumes:
|
|
||||||
- name: runner-data
|
|
||||||
emptyDir: {}
|
|
||||||
- name: docker-sock
|
|
||||||
hostPath:
|
|
||||||
path: /var/run/docker.sock
|
|
||||||
type: Socket
|
|
||||||
```
|
|
||||||
|
|
||||||
**Note**: The runner needs access to Docker to run workflow jobs in containers. In minikube with docker driver, `/var/run/docker.sock` is available.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 4: Deploy and Verify
|
|
||||||
|
|
||||||
### 4.1 Inject Secrets and Deploy
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Inject secrets
|
|
||||||
op inject -i argocd/manifests/forgejo-runner/secret-token.yaml.tpl \
|
|
||||||
-o argocd/manifests/forgejo-runner/secret-token.yaml
|
|
||||||
|
|
||||||
# Sync apps
|
|
||||||
argocd app sync apps
|
|
||||||
argocd app sync forgejo-runner
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.2 Verify Runner Registration
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check runner pod
|
|
||||||
kubectl --context=minikube-indri -n forgejo-runner get pods
|
|
||||||
|
|
||||||
# Check runner logs
|
|
||||||
kubectl --context=minikube-indri -n forgejo-runner logs -f deployment/forgejo-runner
|
|
||||||
|
|
||||||
# Verify in Forgejo UI
|
|
||||||
# Go to https://forge.tail8d86e.ts.net/admin/actions/runners
|
|
||||||
# Should see "k8s-runner-1" as online
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 5: Test with Simple Workflow
|
|
||||||
|
|
||||||
### 5.1 Create Test Workflow
|
|
||||||
|
|
||||||
In the blumeops repo, create `.forgejo/workflows/test.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Test CI
|
|
||||||
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
branches: [main]
|
|
||||||
pull_request:
|
|
||||||
workflow_dispatch:
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
test:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- uses: actions/checkout@v4
|
|
||||||
- name: Hello World
|
|
||||||
run: |
|
|
||||||
echo "Hello from Forgejo Actions!"
|
|
||||||
echo "Runner: ${{ runner.name }}"
|
|
||||||
echo "Repo: ${{ github.repository }}"
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.2 Push and Verify
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add .forgejo/
|
|
||||||
git commit -m "Add test workflow for Forgejo Actions"
|
|
||||||
git push
|
|
||||||
```
|
|
||||||
|
|
||||||
Check https://forge.tail8d86e.ts.net/eblume/blumeops/actions for the workflow run.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
- [x] Actions enabled in app.ini
|
|
||||||
- [x] Forgejo restarted successfully
|
|
||||||
- [x] Runner token stored in 1Password
|
|
||||||
- [x] Runner deployment created in ArgoCD
|
|
||||||
- [x] Runner pod running in k8s
|
|
||||||
- [x] Runner shows as online in Forgejo admin
|
|
||||||
- [x] Test workflow runs successfully
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Runner Can't Connect to Forgejo
|
|
||||||
|
|
||||||
The runner needs to reach `forge.tail8d86e.ts.net` from inside k8s. This should work via Tailscale operator egress (already configured for ArgoCD).
|
|
||||||
|
|
||||||
If not working:
|
|
||||||
```bash
|
|
||||||
# Test from inside k8s
|
|
||||||
kubectl --context=minikube-indri run -it --rm curl --image=curlimages/curl -- \
|
|
||||||
curl -v https://forge.tail8d86e.ts.net/api/v1/version
|
|
||||||
```
|
|
||||||
|
|
||||||
### Docker Socket Permission Denied
|
|
||||||
|
|
||||||
The runner container needs to access the Docker socket. In minikube with docker driver, this should work. If permission denied:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check socket permissions
|
|
||||||
kubectl --context=minikube-indri -n forgejo-runner exec deployment/forgejo-runner -- ls -la /var/run/docker.sock
|
|
||||||
```
|
|
||||||
|
|
||||||
May need to run runner as root or adjust security context.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Next Phase
|
|
||||||
|
|
||||||
Once runner is working, proceed to [Phase 2: Mirror & Build](P2_mirror_and_build.md).
|
|
||||||
|
|
@ -1,347 +0,0 @@
|
||||||
# Phase 2: Custom Runner Image
|
|
||||||
|
|
||||||
**Goal**: Build a custom forgejo-runner image with necessary tools, enabling standard GitHub Actions
|
|
||||||
|
|
||||||
**Status**: Complete (2026-01-23)
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 1](P1_enable_actions.md) complete (Actions enabled, runner deployed in host mode)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Problem Statement
|
|
||||||
|
|
||||||
The stock `code.forgejo.org/forgejo/runner:3.5.1` image lacks tools needed for standard GitHub Actions:
|
|
||||||
- **Node.js** - Required by most actions (checkout, setup-*, etc.)
|
|
||||||
- **Git** - For repository operations (present but minimal)
|
|
||||||
- **Common build tools** - make, gcc, curl, jq, etc.
|
|
||||||
|
|
||||||
In host mode, jobs run directly in the runner container, so these tools must be pre-installed.
|
|
||||||
|
|
||||||
### Chicken-and-Egg Problem
|
|
||||||
|
|
||||||
We can't use `actions/checkout@v4` to build the custom runner because that action requires Node.js, which we don't have yet. Solution: Bootstrap manually, then automate.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 1: Create Dockerfile for Custom Runner
|
|
||||||
|
|
||||||
Create `argocd/manifests/forgejo-runner/Dockerfile`:
|
|
||||||
|
|
||||||
```dockerfile
|
|
||||||
FROM code.forgejo.org/forgejo/runner:3.5.1
|
|
||||||
|
|
||||||
# The base image is Debian-based
|
|
||||||
# Install tools needed for GitHub Actions and builds
|
|
||||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
||||||
# Required for actions/checkout and other Node-based actions
|
|
||||||
nodejs \
|
|
||||||
npm \
|
|
||||||
# Build essentials
|
|
||||||
git \
|
|
||||||
curl \
|
|
||||||
wget \
|
|
||||||
jq \
|
|
||||||
make \
|
|
||||||
gcc \
|
|
||||||
g++ \
|
|
||||||
# For container builds (if we add Docker-in-Docker later)
|
|
||||||
ca-certificates \
|
|
||||||
&& rm -rf /var/lib/apt/lists/*
|
|
||||||
|
|
||||||
# Verify Node.js is available
|
|
||||||
RUN node --version && npm --version
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 2: Bootstrap - Build Image Manually
|
|
||||||
|
|
||||||
Since we can't use CI yet, build the image manually on gilbert and push to zot.
|
|
||||||
|
|
||||||
### 2.1 Build with Podman
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/code/personal/blumeops/argocd/manifests/forgejo-runner
|
|
||||||
|
|
||||||
# Build for linux/arm64 (minikube on M1 Mac)
|
|
||||||
podman build --platform linux/arm64 -t registry.tail8d86e.ts.net/blumeops/forgejo-runner:latest .
|
|
||||||
|
|
||||||
# Push to zot (no auth required)
|
|
||||||
podman push registry.tail8d86e.ts.net/blumeops/forgejo-runner:latest
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2.2 Verify Image in Registry
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -s https://registry.tail8d86e.ts.net/v2/blumeops/forgejo-runner/tags/list | jq .
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 3: Update Runner Deployment
|
|
||||||
|
|
||||||
### 3.1 Update deployment.yaml
|
|
||||||
|
|
||||||
Change the image from stock to custom:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Before
|
|
||||||
image: code.forgejo.org/forgejo/runner:3.5.1
|
|
||||||
|
|
||||||
# After
|
|
||||||
image: registry.tail8d86e.ts.net/blumeops/forgejo-runner:latest
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3.2 Update kustomization.yaml
|
|
||||||
|
|
||||||
Add Dockerfile to resources (for reference, not deployed):
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Note: Dockerfile is for building, not k8s deployment
|
|
||||||
# It lives here for co-location with the runner manifests
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3.3 Sync Deployment
|
|
||||||
|
|
||||||
```bash
|
|
||||||
argocd app sync forgejo-runner
|
|
||||||
|
|
||||||
# Verify new image is running
|
|
||||||
kubectl --context=minikube-indri -n forgejo-runner get pods -o jsonpath='{.items[*].spec.containers[*].image}'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 4: Test with Real GitHub Action
|
|
||||||
|
|
||||||
Now that we have Node.js, test with `actions/checkout@v4`.
|
|
||||||
|
|
||||||
### 4.1 Update Test Workflow
|
|
||||||
|
|
||||||
Update `.forgejo/workflows/test.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Test CI
|
|
||||||
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
branches: [main]
|
|
||||||
pull_request:
|
|
||||||
workflow_dispatch:
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
test:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- name: Checkout
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
|
|
||||||
- name: Verify tools
|
|
||||||
run: |
|
|
||||||
echo "Node.js: $(node --version)"
|
|
||||||
echo "npm: $(npm --version)"
|
|
||||||
echo "Git: $(git --version)"
|
|
||||||
echo "Make: $(make --version | head -1)"
|
|
||||||
|
|
||||||
- name: Show repo info
|
|
||||||
run: |
|
|
||||||
echo "Repository: ${{ github.repository }}"
|
|
||||||
echo "Branch: ${{ github.ref_name }}"
|
|
||||||
ls -la
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.2 Push and Verify
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add .forgejo/workflows/test.yml
|
|
||||||
git commit -m "Test checkout action with custom runner"
|
|
||||||
git push
|
|
||||||
```
|
|
||||||
|
|
||||||
Check https://forge.tail8d86e.ts.net/eblume/blumeops/actions - should see successful run with `actions/checkout@v4`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 5: Create Auto-Build Workflow for Runner
|
|
||||||
|
|
||||||
Now that Actions work properly, create a workflow to rebuild the runner image automatically.
|
|
||||||
|
|
||||||
### 5.1 Create Build Workflow
|
|
||||||
|
|
||||||
Create `.forgejo/workflows/build-runner.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Build Runner Image
|
|
||||||
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
paths:
|
|
||||||
- 'argocd/manifests/forgejo-runner/Dockerfile'
|
|
||||||
- '.forgejo/workflows/build-runner.yml'
|
|
||||||
workflow_dispatch:
|
|
||||||
|
|
||||||
env:
|
|
||||||
REGISTRY: registry.tail8d86e.ts.net
|
|
||||||
IMAGE_NAME: blumeops/forgejo-runner
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
build:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- name: Checkout
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
|
|
||||||
- name: Build image
|
|
||||||
run: |
|
|
||||||
cd argocd/manifests/forgejo-runner
|
|
||||||
# Use docker build (available in runner container)
|
|
||||||
# Note: This builds for the runner's native arch
|
|
||||||
docker build -t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} .
|
|
||||||
docker tag ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
|
|
||||||
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
|
|
||||||
|
|
||||||
- name: Push to registry
|
|
||||||
run: |
|
|
||||||
# Zot has no auth, just push
|
|
||||||
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
|
|
||||||
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
|
|
||||||
|
|
||||||
- name: Verify push
|
|
||||||
run: |
|
|
||||||
curl -sf "https://${{ env.REGISTRY }}/v2/${{ env.IMAGE_NAME }}/tags/list" | jq .
|
|
||||||
echo "Image pushed: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}"
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.2 Note on Docker-in-Docker
|
|
||||||
|
|
||||||
The runner runs in host mode, so we need Docker CLI available. Options:
|
|
||||||
|
|
||||||
1. **Add Docker CLI to the custom image** (see Dockerfile update below)
|
|
||||||
2. **Mount Docker socket from minikube** (requires deployment change)
|
|
||||||
3. **Use Podman instead** (rootless, no socket needed)
|
|
||||||
|
|
||||||
For now, we'll add Docker CLI to the image and mount the socket.
|
|
||||||
|
|
||||||
### 5.3 Update Dockerfile for Docker Builds
|
|
||||||
|
|
||||||
```dockerfile
|
|
||||||
FROM code.forgejo.org/forgejo/runner:3.5.1
|
|
||||||
|
|
||||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
||||||
nodejs \
|
|
||||||
npm \
|
|
||||||
git \
|
|
||||||
curl \
|
|
||||||
wget \
|
|
||||||
jq \
|
|
||||||
make \
|
|
||||||
gcc \
|
|
||||||
g++ \
|
|
||||||
ca-certificates \
|
|
||||||
# Docker CLI for building container images
|
|
||||||
docker.io \
|
|
||||||
&& rm -rf /var/lib/apt/lists/*
|
|
||||||
|
|
||||||
RUN node --version && npm --version && docker --version
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.4 Update Deployment for Docker Socket
|
|
||||||
|
|
||||||
Add Docker socket mount to `deployment.yaml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
volumeMounts:
|
|
||||||
- name: runner-data
|
|
||||||
mountPath: /data
|
|
||||||
- name: runner-config
|
|
||||||
mountPath: /config
|
|
||||||
- name: docker-sock
|
|
||||||
mountPath: /var/run/docker.sock
|
|
||||||
volumes:
|
|
||||||
- name: runner-data
|
|
||||||
emptyDir: {}
|
|
||||||
- name: runner-config
|
|
||||||
configMap:
|
|
||||||
name: forgejo-runner-config
|
|
||||||
- name: docker-sock
|
|
||||||
hostPath:
|
|
||||||
path: /var/run/docker.sock
|
|
||||||
type: Socket
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 6: Verification
|
|
||||||
|
|
||||||
### 6.1 Manual Image Build Works
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# On gilbert
|
|
||||||
podman build --platform linux/arm64 -t registry.tail8d86e.ts.net/blumeops/forgejo-runner:test .
|
|
||||||
podman push registry.tail8d86e.ts.net/blumeops/forgejo-runner:test
|
|
||||||
```
|
|
||||||
|
|
||||||
### 6.2 Runner Uses Custom Image
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl --context=minikube-indri -n forgejo-runner get pods -o jsonpath='{.items[*].spec.containers[*].image}'
|
|
||||||
# Should show: registry.tail8d86e.ts.net/blumeops/forgejo-runner:latest
|
|
||||||
```
|
|
||||||
|
|
||||||
### 6.3 GitHub Actions Work
|
|
||||||
|
|
||||||
- `actions/checkout@v4` succeeds
|
|
||||||
- Test workflow shows Node.js, npm, git versions
|
|
||||||
|
|
||||||
### 6.4 Auto-Build Workflow Works
|
|
||||||
|
|
||||||
Push a change to the Dockerfile and verify:
|
|
||||||
1. Workflow triggers
|
|
||||||
2. Image builds successfully
|
|
||||||
3. Image pushed to zot
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
- [x] Dockerfile created for custom runner (Alpine-based with apk)
|
|
||||||
- [x] Image built manually on gilbert (podman build)
|
|
||||||
- [x] Image pushed to zot registry
|
|
||||||
- [x] Runner deployment updated to use custom image
|
|
||||||
- [x] Runner pod running with new image
|
|
||||||
- [x] `actions/checkout@v4` works in test workflow
|
|
||||||
- [ ] Auto-build workflow created (deferred - needs Docker socket)
|
|
||||||
- [ ] Docker socket mounted (for container builds)
|
|
||||||
- [ ] Auto-build workflow successfully rebuilds runner
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Image Pull Fails in Minikube
|
|
||||||
|
|
||||||
Minikube needs to be able to pull from zot. Check registry mirror config:
|
|
||||||
```bash
|
|
||||||
ssh indri 'minikube ssh -- cat /etc/containerd/certs.d/registry.tail8d86e.ts.net/hosts.toml'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Docker Build Fails in Workflow
|
|
||||||
|
|
||||||
If Docker socket mount doesn't work:
|
|
||||||
1. Check socket exists in minikube: `minikube ssh -- ls -la /var/run/docker.sock`
|
|
||||||
2. Check permissions: runner may need to be in docker group
|
|
||||||
3. Alternative: Use `podman` (rootless) instead of Docker
|
|
||||||
|
|
||||||
### Node.js Actions Still Fail
|
|
||||||
|
|
||||||
Ensure the runner pod restarted after image update:
|
|
||||||
```bash
|
|
||||||
kubectl --context=minikube-indri -n forgejo-runner rollout restart deployment/forgejo-runner
|
|
||||||
kubectl --context=minikube-indri -n forgejo-runner logs -f deployment/forgejo-runner
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Next Phase
|
|
||||||
|
|
||||||
Once the custom runner is working with auto-build, proceed to [Phase 3: Mirror Forgejo & Build](P3_mirror_and_build.md) to set up Forgejo source builds.
|
|
||||||
|
|
@ -1,349 +0,0 @@
|
||||||
# Phase 3: Mirror Forgejo & Build from Source
|
|
||||||
|
|
||||||
**Goal**: Mirror upstream Forgejo to forge and create a workflow that builds it for macOS ARM64
|
|
||||||
|
|
||||||
**Status**: Planning
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 2](P2_mirror_and_build.md) complete (custom runner image with Node.js/tools)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Problem Statement
|
|
||||||
|
|
||||||
We want to build Forgejo from source to:
|
|
||||||
1. Have full control over the binary running on indri
|
|
||||||
2. Enable self-deployment via CI
|
|
||||||
3. Ensure proper macOS DNS resolution (requires CGO_ENABLED=1)
|
|
||||||
|
|
||||||
### The Cross-Compilation Challenge
|
|
||||||
|
|
||||||
The runner runs in a Linux container (k8s on indri), but the target is macOS ARM64 (indri itself).
|
|
||||||
|
|
||||||
**Options**:
|
|
||||||
|
|
||||||
| Option | Pros | Cons |
|
|
||||||
|--------|------|------|
|
|
||||||
| A. Cross-compile CGO_ENABLED=0 | Simple, no special toolchain | Breaks Tailscale MagicDNS resolution |
|
|
||||||
| B. Cross-compile CGO_ENABLED=1 | Proper DNS | Needs OSX cross-compiler (osxcross), complex |
|
|
||||||
| C. Build on gilbert manually | Works now, simple | Not automated, manual step |
|
|
||||||
| D. Native macOS runner on indri | Full native build | Runner outside k8s, different architecture |
|
|
||||||
| E. Hybrid: build on gilbert, deploy via CI | Uses existing tools | Partial automation |
|
|
||||||
|
|
||||||
**Recommendation**: Start with Option C/E (manual build on gilbert, CI just deploys), then consider Option D if we want full automation.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 1: Mirror Upstream Forgejo
|
|
||||||
|
|
||||||
### 1.1 User Action: Create Mirror on Forge
|
|
||||||
|
|
||||||
**Manual step** (hairpinning doesn't work from indri):
|
|
||||||
|
|
||||||
1. Go to https://forge.tail8d86e.ts.net
|
|
||||||
2. Click "+" → "New Migration"
|
|
||||||
3. Select "Gitea" as clone source
|
|
||||||
4. URL: `https://codeberg.org/forgejo/forgejo.git`
|
|
||||||
5. Repository name: `forgejo`
|
|
||||||
6. Check "This repository will be a mirror"
|
|
||||||
7. Click "Migrate Repository"
|
|
||||||
|
|
||||||
### 1.2 Clone Mirror Locally
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/forgejo.git ~/code/3rd/forgejo
|
|
||||||
cd ~/code/3rd/forgejo
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 2: Understand Forgejo Build Process
|
|
||||||
|
|
||||||
### 2.1 Build Requirements
|
|
||||||
|
|
||||||
From Forgejo's `Makefile` and docs:
|
|
||||||
|
|
||||||
- **Go**: 1.23+ (check `go.mod` for exact version)
|
|
||||||
- **Node.js**: 20+ (for frontend)
|
|
||||||
- **Make**: GNU Make
|
|
||||||
- **Git**: For version embedding
|
|
||||||
|
|
||||||
### 2.2 Build Commands
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Install frontend dependencies and build
|
|
||||||
make deps-frontend
|
|
||||||
make frontend
|
|
||||||
|
|
||||||
# Build backend (with CGO for proper DNS on macOS)
|
|
||||||
CGO_ENABLED=1 TAGS="bindata sqlite sqlite_unlock_notify" make backend
|
|
||||||
|
|
||||||
# Or all-in-one
|
|
||||||
CGO_ENABLED=1 TAGS="bindata sqlite sqlite_unlock_notify" make build
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2.3 Output
|
|
||||||
|
|
||||||
Binary at `gitea` (yes, the binary is still named `gitea` for compatibility).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 3: Build on Gilbert (Manual Bootstrap)
|
|
||||||
|
|
||||||
For the initial bootstrap, build on gilbert (macOS ARM64 native).
|
|
||||||
|
|
||||||
### 3.1 Setup Build Environment
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/code/3rd/forgejo
|
|
||||||
mise use go@1.23 node@20
|
|
||||||
|
|
||||||
# Verify tools
|
|
||||||
go version
|
|
||||||
node --version
|
|
||||||
make --version
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3.2 Build
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Clean build
|
|
||||||
make clean
|
|
||||||
|
|
||||||
# Build frontend
|
|
||||||
make deps-frontend
|
|
||||||
make frontend
|
|
||||||
|
|
||||||
# Build backend with CGO (important for macOS DNS!)
|
|
||||||
CGO_ENABLED=1 TAGS="bindata sqlite sqlite_unlock_notify" make backend
|
|
||||||
|
|
||||||
# Verify binary
|
|
||||||
./gitea --version
|
|
||||||
file gitea # Should show: Mach-O 64-bit executable arm64
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3.3 Deploy to Indri
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Copy binary
|
|
||||||
scp gitea indri:~/.local/bin/forgejo-new
|
|
||||||
|
|
||||||
# Verify on indri
|
|
||||||
ssh indri '~/.local/bin/forgejo-new --version'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 4: Create Deploy Workflow (Option E)
|
|
||||||
|
|
||||||
Since cross-compilation is complex, use a hybrid approach:
|
|
||||||
1. Build on gilbert (manual trigger or pre-built)
|
|
||||||
2. CI workflow fetches and deploys
|
|
||||||
|
|
||||||
### 4.1 SSH Deploy Key for Runner
|
|
||||||
|
|
||||||
The runner needs SSH access to indri to deploy the binary.
|
|
||||||
|
|
||||||
**Generate key on gilbert**:
|
|
||||||
```bash
|
|
||||||
ssh-keygen -t ed25519 -C "forgejo-runner-deploy" -f ~/.ssh/forgejo-runner-deploy -N ""
|
|
||||||
```
|
|
||||||
|
|
||||||
**Add public key to indri's authorized_keys**:
|
|
||||||
```bash
|
|
||||||
cat ~/.ssh/forgejo-runner-deploy.pub | ssh indri 'cat >> ~/.ssh/authorized_keys'
|
|
||||||
```
|
|
||||||
|
|
||||||
**Store private key in 1Password** (blumeops vault) as "Forgejo Runner Deploy Key"
|
|
||||||
|
|
||||||
### 4.2 Create k8s Secret
|
|
||||||
|
|
||||||
Create `argocd/manifests/forgejo-runner/secret-ssh.yaml.tpl`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Secret
|
|
||||||
metadata:
|
|
||||||
name: forgejo-runner-ssh
|
|
||||||
namespace: forgejo-runner
|
|
||||||
type: Opaque
|
|
||||||
stringData:
|
|
||||||
id_ed25519: |
|
|
||||||
op://blumeops/<deploy-key-item>/private-key
|
|
||||||
known_hosts: |
|
|
||||||
# Get with: ssh-keyscan indri.tail8d86e.ts.net 2>/dev/null | grep ed25519
|
|
||||||
indri.tail8d86e.ts.net ssh-ed25519 AAAAC3...
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.3 Update Deployment for SSH
|
|
||||||
|
|
||||||
Add SSH secret mount to `deployment.yaml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
volumeMounts:
|
|
||||||
- name: ssh-key
|
|
||||||
mountPath: /root/.ssh
|
|
||||||
readOnly: true
|
|
||||||
volumes:
|
|
||||||
- name: ssh-key
|
|
||||||
secret:
|
|
||||||
secretName: forgejo-runner-ssh
|
|
||||||
defaultMode: 0600
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.4 Create Deploy-Only Workflow
|
|
||||||
|
|
||||||
Create `.forgejo/workflows/deploy-forgejo.yml` in blumeops:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Deploy Forgejo
|
|
||||||
|
|
||||||
on:
|
|
||||||
workflow_dispatch:
|
|
||||||
inputs:
|
|
||||||
version:
|
|
||||||
description: 'Version to deploy (tag or commit)'
|
|
||||||
required: true
|
|
||||||
default: 'v10.0.0'
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
deploy:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- name: Checkout
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
|
|
||||||
- name: Deploy to indri
|
|
||||||
env:
|
|
||||||
VERSION: ${{ github.event.inputs.version }}
|
|
||||||
run: |
|
|
||||||
# SSH config
|
|
||||||
mkdir -p ~/.ssh
|
|
||||||
cp /root/.ssh/id_ed25519 ~/.ssh/
|
|
||||||
cp /root/.ssh/known_hosts ~/.ssh/
|
|
||||||
chmod 600 ~/.ssh/id_ed25519
|
|
||||||
|
|
||||||
# Deploy script
|
|
||||||
ssh erichblume@indri.tail8d86e.ts.net << 'EOF'
|
|
||||||
set -e
|
|
||||||
cd ~/.local/bin
|
|
||||||
|
|
||||||
# Verify the new binary exists and runs
|
|
||||||
if [ ! -f forgejo-new ]; then
|
|
||||||
echo "ERROR: forgejo-new not found. Build on gilbert first:"
|
|
||||||
echo " cd ~/code/3rd/forgejo && git checkout $VERSION"
|
|
||||||
echo " CGO_ENABLED=1 TAGS='bindata sqlite sqlite_unlock_notify' make build"
|
|
||||||
echo " scp gitea indri:~/.local/bin/forgejo-new"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
./forgejo-new --version
|
|
||||||
|
|
||||||
# Stop current service
|
|
||||||
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true
|
|
||||||
|
|
||||||
# Atomic swap
|
|
||||||
mv forgejo forgejo-old 2>/dev/null || true
|
|
||||||
mv forgejo-new forgejo
|
|
||||||
|
|
||||||
# Start new service
|
|
||||||
launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
|
|
||||||
|
|
||||||
# Verify it's running
|
|
||||||
sleep 5
|
|
||||||
curl -sf http://localhost:3001/api/v1/version || exit 1
|
|
||||||
|
|
||||||
echo "Deploy successful!"
|
|
||||||
./forgejo --version
|
|
||||||
EOF
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Future: Full CI Build (Option D)
|
|
||||||
|
|
||||||
If we want full automation, consider running a native macOS runner on indri:
|
|
||||||
|
|
||||||
### Native Runner on Indri
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Install forgejo-runner on indri via mise
|
|
||||||
ssh indri 'mise use forgejo-runner'
|
|
||||||
|
|
||||||
# Register as a macOS runner
|
|
||||||
ssh indri 'forgejo-runner register \
|
|
||||||
--instance https://forge.tail8d86e.ts.net \
|
|
||||||
--token "$TOKEN" \
|
|
||||||
--name "indri-native" \
|
|
||||||
--labels "macos-arm64:host" \
|
|
||||||
--no-interactive'
|
|
||||||
|
|
||||||
# Create LaunchAgent for runner
|
|
||||||
# (similar to other mcquack services)
|
|
||||||
```
|
|
||||||
|
|
||||||
Then workflow uses:
|
|
||||||
```yaml
|
|
||||||
runs-on: macos-arm64
|
|
||||||
```
|
|
||||||
|
|
||||||
This enables full native builds in CI. Document in a future phase if needed.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
- [ ] Forgejo mirrored to forge
|
|
||||||
- [ ] Mirror cloned to ~/code/3rd/forgejo
|
|
||||||
- [ ] Build succeeds on gilbert
|
|
||||||
- [ ] Binary is valid macOS ARM64 executable
|
|
||||||
- [ ] Binary deployed to indri ~/.local/bin/
|
|
||||||
- [ ] SSH deploy key created and stored in 1Password
|
|
||||||
- [ ] Deploy key added to indri authorized_keys
|
|
||||||
- [ ] (Optional) k8s SSH secret created
|
|
||||||
- [ ] (Optional) Deploy workflow created
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Build Fails: Node.js Version
|
|
||||||
|
|
||||||
```
|
|
||||||
error: engine "node" is incompatible
|
|
||||||
```
|
|
||||||
|
|
||||||
Update Node.js: `mise use node@20`
|
|
||||||
|
|
||||||
### Build Fails: Go Version
|
|
||||||
|
|
||||||
```
|
|
||||||
go: go.mod requires go >= 1.23
|
|
||||||
```
|
|
||||||
|
|
||||||
Update Go: `mise use go@1.23`
|
|
||||||
|
|
||||||
### Binary Crashes on indri
|
|
||||||
|
|
||||||
Check if CGO was enabled:
|
|
||||||
```bash
|
|
||||||
# If built without CGO, DNS resolution may fail
|
|
||||||
./forgejo --version # Should work
|
|
||||||
./forgejo web # May fail to resolve Tailscale hostnames
|
|
||||||
```
|
|
||||||
|
|
||||||
Rebuild with `CGO_ENABLED=1`.
|
|
||||||
|
|
||||||
### SSH Deploy Fails
|
|
||||||
|
|
||||||
Check runner has SSH access:
|
|
||||||
```bash
|
|
||||||
# Test from inside runner pod
|
|
||||||
kubectl --context=minikube-indri -n forgejo-runner exec deployment/forgejo-runner -- \
|
|
||||||
ssh -i /root/.ssh/id_ed25519 erichblume@indri.tail8d86e.ts.net 'echo ok'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Next Phase
|
|
||||||
|
|
||||||
Once Forgejo is building and deploying successfully, proceed to [Phase 4: Self-Deploy](P4_self_deploy.md) for the full mcquack transition.
|
|
||||||
|
|
@ -1,409 +0,0 @@
|
||||||
# Phase 4: Self-Deploy & Transition to mcquack
|
|
||||||
|
|
||||||
**Goal**: Complete the bootstrap - Forgejo deploys itself, transition from brew to mcquack LaunchAgent
|
|
||||||
|
|
||||||
**Status**: Planning
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 3](P3_mirror_forgejo.md) complete (Forgejo builds and deploys to indri)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
This phase completes the bootstrap:
|
|
||||||
1. First successful CI deploy creates the binary
|
|
||||||
2. Transition from brew service to mcquack LaunchAgent
|
|
||||||
3. Update ansible role to mcquack pattern
|
|
||||||
4. Remove brew forgejo
|
|
||||||
|
|
||||||
After this phase, Forgejo builds and deploys itself on every tagged release.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 1: Prepare indri for mcquack
|
|
||||||
|
|
||||||
### 1.1 Create Directory Structure
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri << 'EOF'
|
|
||||||
mkdir -p ~/.local/bin
|
|
||||||
mkdir -p ~/.config/forgejo
|
|
||||||
mkdir -p ~/Library/Logs
|
|
||||||
EOF
|
|
||||||
```
|
|
||||||
|
|
||||||
### 1.2 Prepare Data Directory
|
|
||||||
|
|
||||||
The existing data is at `/opt/homebrew/var/forgejo`. We'll keep it there for now (simpler), or optionally migrate to `~/forgejo`.
|
|
||||||
|
|
||||||
**Option A: Keep existing path** (recommended for simplicity)
|
|
||||||
- Data stays at `/opt/homebrew/var/forgejo`
|
|
||||||
- Binary moves to `~/.local/bin/forgejo`
|
|
||||||
|
|
||||||
**Option B: Full migration**
|
|
||||||
- Move data to `~/forgejo`
|
|
||||||
- Requires updating app.ini paths
|
|
||||||
|
|
||||||
For this plan, we'll use Option A.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 2: First CI Deploy
|
|
||||||
|
|
||||||
### 2.1 Trigger Build with Deploy
|
|
||||||
|
|
||||||
1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions
|
|
||||||
2. Select "Build Forgejo" workflow
|
|
||||||
3. Click "Run workflow"
|
|
||||||
4. Set deploy=true
|
|
||||||
5. Monitor the run
|
|
||||||
|
|
||||||
### 2.2 Verify Binary Deployed
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'ls -la ~/.local/bin/forgejo && ~/.local/bin/forgejo --version'
|
|
||||||
```
|
|
||||||
|
|
||||||
At this point:
|
|
||||||
- New binary is at `~/.local/bin/forgejo`
|
|
||||||
- Brew forgejo is still running
|
|
||||||
- LaunchAgent doesn't exist yet
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 3: Create mcquack LaunchAgent
|
|
||||||
|
|
||||||
### 3.1 Create Plist Manually (One-Time Bootstrap)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri << 'EOF'
|
|
||||||
cat > ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist << 'PLIST'
|
|
||||||
<?xml version="1.0" encoding="UTF-8"?>
|
|
||||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
||||||
<plist version="1.0">
|
|
||||||
<dict>
|
|
||||||
<key>Label</key>
|
|
||||||
<string>mcquack.eblume.forgejo</string>
|
|
||||||
<key>ProgramArguments</key>
|
|
||||||
<array>
|
|
||||||
<string>/Users/erichblume/.local/bin/forgejo</string>
|
|
||||||
<string>web</string>
|
|
||||||
<string>--config</string>
|
|
||||||
<string>/opt/homebrew/var/forgejo/custom/conf/app.ini</string>
|
|
||||||
<string>--work-path</string>
|
|
||||||
<string>/opt/homebrew/var/forgejo</string>
|
|
||||||
</array>
|
|
||||||
<key>RunAtLoad</key>
|
|
||||||
<true/>
|
|
||||||
<key>KeepAlive</key>
|
|
||||||
<true/>
|
|
||||||
<key>StandardOutPath</key>
|
|
||||||
<string>/Users/erichblume/Library/Logs/mcquack.forgejo.out.log</string>
|
|
||||||
<key>StandardErrorPath</key>
|
|
||||||
<string>/Users/erichblume/Library/Logs/mcquack.forgejo.err.log</string>
|
|
||||||
<key>EnvironmentVariables</key>
|
|
||||||
<dict>
|
|
||||||
<key>HOME</key>
|
|
||||||
<string>/Users/erichblume</string>
|
|
||||||
<key>USER</key>
|
|
||||||
<string>erichblume</string>
|
|
||||||
</dict>
|
|
||||||
</dict>
|
|
||||||
</plist>
|
|
||||||
PLIST
|
|
||||||
EOF
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 4: Cutover from Brew to mcquack
|
|
||||||
|
|
||||||
### 4.1 Stop Brew Service
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'brew services stop forgejo'
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.2 Start mcquack Service
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist'
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.3 Verify Service Running
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check process
|
|
||||||
ssh indri 'launchctl list | grep forgejo'
|
|
||||||
|
|
||||||
# Check logs
|
|
||||||
ssh indri 'tail -20 ~/Library/Logs/mcquack.forgejo.err.log'
|
|
||||||
|
|
||||||
# Check HTTP
|
|
||||||
curl -s https://forge.tail8d86e.ts.net/api/v1/version
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.4 Verify Git Operations
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# SSH test
|
|
||||||
ssh -T forgejo@forge.tail8d86e.ts.net
|
|
||||||
|
|
||||||
# Clone test
|
|
||||||
git clone ssh://forgejo@forge.tail8d86e.ts.net/eblume/blumeops.git /tmp/test-clone
|
|
||||||
rm -rf /tmp/test-clone
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 5: Update Ansible Role
|
|
||||||
|
|
||||||
### 5.1 Rewrite forgejo Role
|
|
||||||
|
|
||||||
Replace `ansible/roles/forgejo/tasks/main.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
---
|
|
||||||
# Forgejo is built from source via CI and deployed automatically.
|
|
||||||
# This role manages the configuration and LaunchAgent only.
|
|
||||||
#
|
|
||||||
# BINARY DEPLOYMENT:
|
|
||||||
# The binary at ~/.local/bin/forgejo is deployed by Forgejo Actions CI.
|
|
||||||
# If missing, trigger a build at:
|
|
||||||
# https://forge.tail8d86e.ts.net/eblume/forgejo/actions
|
|
||||||
#
|
|
||||||
# CONFIGURATION:
|
|
||||||
# app.ini at /opt/homebrew/var/forgejo/custom/conf/app.ini contains secrets
|
|
||||||
# and is NOT managed by ansible. It is backed up by borgmatic.
|
|
||||||
|
|
||||||
- name: Verify forgejo binary exists
|
|
||||||
ansible.builtin.stat:
|
|
||||||
path: "{{ forgejo_binary }}"
|
|
||||||
register: forgejo_binary_stat
|
|
||||||
|
|
||||||
- name: Fail if forgejo binary not found
|
|
||||||
ansible.builtin.fail:
|
|
||||||
msg: |
|
|
||||||
Forgejo binary not found at {{ forgejo_binary }}.
|
|
||||||
|
|
||||||
The binary is deployed by Forgejo Actions CI. To build and deploy:
|
|
||||||
1. Go to https://forge.tail8d86e.ts.net/eblume/forgejo/actions
|
|
||||||
2. Select "Build Forgejo" workflow
|
|
||||||
3. Click "Run workflow" with deploy=true
|
|
||||||
|
|
||||||
Alternatively, build manually on gilbert and scp to indri.
|
|
||||||
when: not forgejo_binary_stat.stat.exists
|
|
||||||
|
|
||||||
- name: Check forgejo config exists
|
|
||||||
ansible.builtin.stat:
|
|
||||||
path: "{{ forgejo_config }}"
|
|
||||||
register: forgejo_config_stat
|
|
||||||
|
|
||||||
- name: Fail if forgejo config is missing
|
|
||||||
ansible.builtin.fail:
|
|
||||||
msg: |
|
|
||||||
Forgejo config not found at {{ forgejo_config }}
|
|
||||||
This file contains secrets and is not managed by ansible.
|
|
||||||
To restore from backup, run:
|
|
||||||
borgmatic --config ~/.config/borgmatic/config.yaml extract --archive latest \
|
|
||||||
--path {{ forgejo_config }}
|
|
||||||
when: not forgejo_config_stat.stat.exists
|
|
||||||
|
|
||||||
- name: Deploy forgejo LaunchAgent plist
|
|
||||||
ansible.builtin.template:
|
|
||||||
src: forgejo.plist.j2
|
|
||||||
dest: ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
|
|
||||||
mode: '0644'
|
|
||||||
notify: Restart forgejo
|
|
||||||
|
|
||||||
- name: Check if forgejo LaunchAgent is loaded
|
|
||||||
ansible.builtin.command: launchctl list mcquack.eblume.forgejo
|
|
||||||
register: forgejo_launchctl_check
|
|
||||||
changed_when: false
|
|
||||||
failed_when: false
|
|
||||||
|
|
||||||
- name: Load forgejo LaunchAgent if not loaded
|
|
||||||
ansible.builtin.command: launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
|
|
||||||
when: forgejo_launchctl_check.rc != 0
|
|
||||||
changed_when: true
|
|
||||||
failed_when: false
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.2 Create defaults/main.yml
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
---
|
|
||||||
# Forgejo binary and paths
|
|
||||||
forgejo_binary: /Users/erichblume/.local/bin/forgejo
|
|
||||||
forgejo_work_path: /opt/homebrew/var/forgejo
|
|
||||||
forgejo_config: "{{ forgejo_work_path }}/custom/conf/app.ini"
|
|
||||||
forgejo_log_dir: /Users/erichblume/Library/Logs
|
|
||||||
|
|
||||||
# HTTP and SSH ports (must match app.ini)
|
|
||||||
forgejo_http_port: 3001
|
|
||||||
forgejo_ssh_port: 2200
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.3 Create templates/forgejo.plist.j2
|
|
||||||
|
|
||||||
```xml
|
|
||||||
<?xml version="1.0" encoding="UTF-8"?>
|
|
||||||
<!-- {{ ansible_managed }} -->
|
|
||||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
||||||
<plist version="1.0">
|
|
||||||
<dict>
|
|
||||||
<key>Label</key>
|
|
||||||
<string>mcquack.eblume.forgejo</string>
|
|
||||||
<key>ProgramArguments</key>
|
|
||||||
<array>
|
|
||||||
<string>{{ forgejo_binary }}</string>
|
|
||||||
<string>web</string>
|
|
||||||
<string>--config</string>
|
|
||||||
<string>{{ forgejo_config }}</string>
|
|
||||||
<string>--work-path</string>
|
|
||||||
<string>{{ forgejo_work_path }}</string>
|
|
||||||
</array>
|
|
||||||
<key>RunAtLoad</key>
|
|
||||||
<true/>
|
|
||||||
<key>KeepAlive</key>
|
|
||||||
<true/>
|
|
||||||
<key>StandardOutPath</key>
|
|
||||||
<string>{{ forgejo_log_dir }}/mcquack.forgejo.out.log</string>
|
|
||||||
<key>StandardErrorPath</key>
|
|
||||||
<string>{{ forgejo_log_dir }}/mcquack.forgejo.err.log</string>
|
|
||||||
<key>EnvironmentVariables</key>
|
|
||||||
<dict>
|
|
||||||
<key>HOME</key>
|
|
||||||
<string>/Users/erichblume</string>
|
|
||||||
<key>USER</key>
|
|
||||||
<string>erichblume</string>
|
|
||||||
</dict>
|
|
||||||
</dict>
|
|
||||||
</plist>
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5.4 Update handlers/main.yml
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
---
|
|
||||||
- name: Restart forgejo
|
|
||||||
ansible.builtin.shell: |
|
|
||||||
launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist 2>/dev/null || true
|
|
||||||
launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist
|
|
||||||
changed_when: true
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 6: Update Alloy Log Collection
|
|
||||||
|
|
||||||
Update `ansible/roles/alloy/defaults/main.yml`:
|
|
||||||
|
|
||||||
Change forgejo log paths from brew to mcquack:
|
|
||||||
```yaml
|
|
||||||
alloy_brew_logs:
|
|
||||||
# Remove forgejo from here
|
|
||||||
- path: /opt/homebrew/var/log/tailscaled.log
|
|
||||||
service: tailscale
|
|
||||||
stream: stdout
|
|
||||||
|
|
||||||
alloy_mcquack_logs:
|
|
||||||
# ... existing entries ...
|
|
||||||
- path: /Users/erichblume/Library/Logs/mcquack.forgejo.out.log
|
|
||||||
service: forgejo
|
|
||||||
stream: stdout
|
|
||||||
- path: /Users/erichblume/Library/Logs/mcquack.forgejo.err.log
|
|
||||||
service: forgejo
|
|
||||||
stream: stderr
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 7: Remove Brew Forgejo
|
|
||||||
|
|
||||||
### 7.1 Uninstall Brew Package
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'brew uninstall forgejo'
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7.2 Remove Old Logs
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'rm -f /opt/homebrew/var/log/forgejo.log'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 8: Run Ansible
|
|
||||||
|
|
||||||
```bash
|
|
||||||
mise run provision-indri -- --tags forgejo,alloy
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Disaster Recovery
|
|
||||||
|
|
||||||
### If CI Deploy Breaks Forgejo
|
|
||||||
|
|
||||||
1. **Build manually on gilbert**:
|
|
||||||
```bash
|
|
||||||
cd ~/code/3rd/forgejo
|
|
||||||
git pull
|
|
||||||
mise use go node
|
|
||||||
TAGS="bindata sqlite sqlite_unlock_notify" make build
|
|
||||||
scp gitea indri:~/.local/bin/forgejo
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Restart service**:
|
|
||||||
```bash
|
|
||||||
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist; launchctl load ~/Library/LaunchAgents/mcquack.eblume.forgejo.plist'
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Verify**:
|
|
||||||
```bash
|
|
||||||
curl https://forge.tail8d86e.ts.net/api/v1/version
|
|
||||||
```
|
|
||||||
|
|
||||||
### If Forgejo Won't Start
|
|
||||||
|
|
||||||
1. Check logs: `ssh indri 'tail -100 ~/Library/Logs/mcquack.forgejo.err.log'`
|
|
||||||
2. Check binary: `ssh indri '~/.local/bin/forgejo --version'`
|
|
||||||
3. Check config: `ssh indri 'cat /opt/homebrew/var/forgejo/custom/conf/app.ini | head -50'`
|
|
||||||
4. Try running manually: `ssh indri '~/.local/bin/forgejo web --config /opt/homebrew/var/forgejo/custom/conf/app.ini --work-path /opt/homebrew/var/forgejo'`
|
|
||||||
|
|
||||||
### Switch ArgoCD to GitHub (Nuclear Option)
|
|
||||||
|
|
||||||
If Forgejo is down and you need to deploy fixes:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
argocd repo add https://github.com/eblume/blumeops.git --username eblume --password $GITHUB_PAT
|
|
||||||
argocd app set apps --repo https://github.com/eblume/blumeops.git
|
|
||||||
argocd app sync apps
|
|
||||||
```
|
|
||||||
|
|
||||||
After recovery, switch back to Forgejo.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
- [ ] CI deploy completed successfully
|
|
||||||
- [ ] Binary at `~/.local/bin/forgejo`
|
|
||||||
- [ ] mcquack LaunchAgent created
|
|
||||||
- [ ] Brew service stopped
|
|
||||||
- [ ] mcquack service started
|
|
||||||
- [ ] HTTP works (`curl https://forge.tail8d86e.ts.net/api/v1/version`)
|
|
||||||
- [ ] SSH works (`ssh -T forgejo@forge.tail8d86e.ts.net`)
|
|
||||||
- [ ] Git clone/push works
|
|
||||||
- [ ] Ansible role updated
|
|
||||||
- [ ] Alloy logs updated
|
|
||||||
- [ ] Brew package uninstalled
|
|
||||||
- [ ] `mise run provision-indri` succeeds
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Next Phase
|
|
||||||
|
|
||||||
After bootstrap is complete, proceed to [Phase 5: Container Builds](P5_container_builds.md) to set up container image building for ArgoCD.
|
|
||||||
|
|
@ -1,505 +0,0 @@
|
||||||
# Phase 5: Container Image Builds
|
|
||||||
|
|
||||||
**Goal**: Set up CI workflows to build custom container images and push to zot registry
|
|
||||||
|
|
||||||
**Status**: Planning
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 4](P4_self_deploy.md) complete (Forgejo self-deploying, Actions working)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
With Forgejo Actions operational (including custom runner from P2), we can now build container images for:
|
|
||||||
- Custom devpi with pre-installed plugins
|
|
||||||
- Any other custom images needed for k8s services
|
|
||||||
- Release artifacts for Python packages
|
|
||||||
|
|
||||||
**Note**: The custom runner image build is covered in [Phase 2](P2_mirror_and_build.md). This phase focuses on application container builds.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Use Case 1: devpi Custom Image
|
|
||||||
|
|
||||||
### Current State
|
|
||||||
|
|
||||||
devpi runs from `registry.tail8d86e.ts.net/blumeops/devpi:latest`, built manually:
|
|
||||||
- Base image: python
|
|
||||||
- Adds: devpi-server, devpi-web
|
|
||||||
- Startup script for auto-initialization
|
|
||||||
|
|
||||||
### Goal
|
|
||||||
|
|
||||||
Automate builds triggered by:
|
|
||||||
- Push to devpi repo on forge
|
|
||||||
- Manual workflow dispatch
|
|
||||||
- Optionally: upstream devpi release (via schedule check)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 1: Create Workflow for devpi
|
|
||||||
|
|
||||||
### 1.1 Ensure devpi Repo Has Dockerfile
|
|
||||||
|
|
||||||
The Dockerfile already exists at `argocd/manifests/devpi/Dockerfile`. We'll create a workflow in the blumeops repo that builds it.
|
|
||||||
|
|
||||||
### 1.2 Create Build Workflow
|
|
||||||
|
|
||||||
Create `.forgejo/workflows/build-devpi.yml` in blumeops repo:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Build devpi Image
|
|
||||||
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
paths:
|
|
||||||
- 'argocd/manifests/devpi/Dockerfile'
|
|
||||||
- 'argocd/manifests/devpi/start.sh'
|
|
||||||
- '.forgejo/workflows/build-devpi.yml'
|
|
||||||
workflow_dispatch:
|
|
||||||
inputs:
|
|
||||||
tag:
|
|
||||||
description: 'Image tag (default: latest)'
|
|
||||||
required: false
|
|
||||||
default: 'latest'
|
|
||||||
|
|
||||||
env:
|
|
||||||
REGISTRY: registry.tail8d86e.ts.net
|
|
||||||
IMAGE_NAME: blumeops/devpi
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
build:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- name: Checkout
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
|
|
||||||
- name: Set up Docker Buildx
|
|
||||||
uses: docker/setup-buildx-action@v3
|
|
||||||
|
|
||||||
- name: Determine tag
|
|
||||||
id: tag
|
|
||||||
run: |
|
|
||||||
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
|
||||||
TAG="${{ github.event.inputs.tag }}"
|
|
||||||
else
|
|
||||||
TAG="latest"
|
|
||||||
fi
|
|
||||||
echo "tag=$TAG" >> "$GITHUB_OUTPUT"
|
|
||||||
|
|
||||||
- name: Build image
|
|
||||||
uses: docker/build-push-action@v5
|
|
||||||
with:
|
|
||||||
context: argocd/manifests/devpi
|
|
||||||
file: argocd/manifests/devpi/Dockerfile
|
|
||||||
platforms: linux/arm64
|
|
||||||
load: true
|
|
||||||
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}
|
|
||||||
|
|
||||||
- name: Push to registry
|
|
||||||
run: |
|
|
||||||
# Zot has no auth, just push
|
|
||||||
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}
|
|
||||||
|
|
||||||
- name: Verify push
|
|
||||||
run: |
|
|
||||||
# Check image exists in registry
|
|
||||||
curl -sf "https://${{ env.REGISTRY }}/v2/${{ env.IMAGE_NAME }}/tags/list" | jq .
|
|
||||||
```
|
|
||||||
|
|
||||||
### 1.3 Runner Needs Registry Access
|
|
||||||
|
|
||||||
The runner needs to reach `registry.tail8d86e.ts.net`. This should work via Tailscale egress (same as Forgejo access).
|
|
||||||
|
|
||||||
If not, add egress for registry in `argocd/manifests/tailscale-operator/`:
|
|
||||||
```yaml
|
|
||||||
apiVersion: tailscale.com/v1alpha1
|
|
||||||
kind: Connector
|
|
||||||
metadata:
|
|
||||||
name: egress-registry
|
|
||||||
namespace: tailscale-operator
|
|
||||||
spec:
|
|
||||||
hostname: egress-registry
|
|
||||||
subnetRouter:
|
|
||||||
advertiseRoutes:
|
|
||||||
- registry.tail8d86e.ts.net/32
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 2: Test Build Workflow
|
|
||||||
|
|
||||||
### 2.1 Push and Trigger
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Make a small change to trigger
|
|
||||||
echo "# Build $(date)" >> argocd/manifests/devpi/Dockerfile
|
|
||||||
git add argocd/manifests/devpi/Dockerfile
|
|
||||||
git commit -m "Trigger devpi image rebuild"
|
|
||||||
git push
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2.2 Monitor Build
|
|
||||||
|
|
||||||
1. Go to https://forge.tail8d86e.ts.net/eblume/blumeops/actions
|
|
||||||
2. Watch "Build devpi Image" workflow
|
|
||||||
3. Verify success
|
|
||||||
|
|
||||||
### 2.3 Verify Image in Registry
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -s https://registry.tail8d86e.ts.net/v2/blumeops/devpi/tags/list | jq .
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2.4 Restart devpi to Use New Image
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl --context=minikube-indri -n devpi rollout restart statefulset/devpi
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 3: Reusable Container Build Workflow
|
|
||||||
|
|
||||||
### 3.1 Create Reusable Workflow
|
|
||||||
|
|
||||||
Create `.forgejo/workflows/build-container.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Build Container Image
|
|
||||||
|
|
||||||
on:
|
|
||||||
workflow_call:
|
|
||||||
inputs:
|
|
||||||
context:
|
|
||||||
description: 'Build context path'
|
|
||||||
required: true
|
|
||||||
type: string
|
|
||||||
dockerfile:
|
|
||||||
description: 'Dockerfile path (relative to context)'
|
|
||||||
required: false
|
|
||||||
type: string
|
|
||||||
default: 'Dockerfile'
|
|
||||||
image_name:
|
|
||||||
description: 'Image name (without registry)'
|
|
||||||
required: true
|
|
||||||
type: string
|
|
||||||
tag:
|
|
||||||
description: 'Image tag'
|
|
||||||
required: false
|
|
||||||
type: string
|
|
||||||
default: 'latest'
|
|
||||||
platforms:
|
|
||||||
description: 'Target platforms'
|
|
||||||
required: false
|
|
||||||
type: string
|
|
||||||
default: 'linux/arm64'
|
|
||||||
|
|
||||||
env:
|
|
||||||
REGISTRY: registry.tail8d86e.ts.net
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
build:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- name: Checkout
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
|
|
||||||
- name: Set up Docker Buildx
|
|
||||||
uses: docker/setup-buildx-action@v3
|
|
||||||
|
|
||||||
- name: Build and push
|
|
||||||
uses: docker/build-push-action@v5
|
|
||||||
with:
|
|
||||||
context: ${{ inputs.context }}
|
|
||||||
file: ${{ inputs.context }}/${{ inputs.dockerfile }}
|
|
||||||
platforms: ${{ inputs.platforms }}
|
|
||||||
push: true
|
|
||||||
tags: ${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}
|
|
||||||
|
|
||||||
- name: Verify push
|
|
||||||
run: |
|
|
||||||
curl -sf "https://${{ env.REGISTRY }}/v2/${{ inputs.image_name }}/tags/list" | jq .
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3.2 Use in devpi Workflow
|
|
||||||
|
|
||||||
Simplify `.forgejo/workflows/build-devpi.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Build devpi Image
|
|
||||||
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
paths:
|
|
||||||
- 'argocd/manifests/devpi/**'
|
|
||||||
workflow_dispatch:
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
build:
|
|
||||||
uses: ./.forgejo/workflows/build-container.yml
|
|
||||||
with:
|
|
||||||
context: argocd/manifests/devpi
|
|
||||||
image_name: blumeops/devpi
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 4: Python Package Builds (Optional)
|
|
||||||
|
|
||||||
### 4.1 Use Case
|
|
||||||
|
|
||||||
Build Python packages from forge repos and publish to devpi.
|
|
||||||
|
|
||||||
Example: `mcquack` package (LaunchAgent management library)
|
|
||||||
|
|
||||||
### 4.2 Create Python Build Workflow
|
|
||||||
|
|
||||||
Create `.forgejo/workflows/build-python.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Build Python Package
|
|
||||||
|
|
||||||
on:
|
|
||||||
workflow_call:
|
|
||||||
inputs:
|
|
||||||
package_path:
|
|
||||||
description: 'Path to package (contains pyproject.toml)'
|
|
||||||
required: false
|
|
||||||
type: string
|
|
||||||
default: '.'
|
|
||||||
python_version:
|
|
||||||
description: 'Python version'
|
|
||||||
required: false
|
|
||||||
type: string
|
|
||||||
default: '3.12'
|
|
||||||
publish:
|
|
||||||
description: 'Publish to devpi'
|
|
||||||
required: false
|
|
||||||
type: boolean
|
|
||||||
default: false
|
|
||||||
secrets:
|
|
||||||
DEVPI_PASSWORD:
|
|
||||||
required: false
|
|
||||||
|
|
||||||
env:
|
|
||||||
DEVPI_URL: https://pypi.tail8d86e.ts.net
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
build:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- name: Checkout
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
|
|
||||||
- name: Setup Python
|
|
||||||
uses: actions/setup-python@v5
|
|
||||||
with:
|
|
||||||
python-version: ${{ inputs.python_version }}
|
|
||||||
|
|
||||||
- name: Install uv
|
|
||||||
run: pip install uv
|
|
||||||
|
|
||||||
- name: Build package
|
|
||||||
run: |
|
|
||||||
cd ${{ inputs.package_path }}
|
|
||||||
uv build
|
|
||||||
|
|
||||||
- name: Upload artifact
|
|
||||||
uses: actions/upload-artifact@v4
|
|
||||||
with:
|
|
||||||
name: dist
|
|
||||||
path: ${{ inputs.package_path }}/dist/
|
|
||||||
|
|
||||||
- name: Publish to devpi
|
|
||||||
if: inputs.publish
|
|
||||||
run: |
|
|
||||||
cd ${{ inputs.package_path }}
|
|
||||||
uv publish \
|
|
||||||
--publish-url ${{ env.DEVPI_URL }}/eblume/dev/ \
|
|
||||||
--username eblume \
|
|
||||||
--password "${{ secrets.DEVPI_PASSWORD }}"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 5: Scheduled Builds (Cron)
|
|
||||||
|
|
||||||
### 5.1 Weekly Rebuild
|
|
||||||
|
|
||||||
Keep images fresh with weekly rebuilds:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
name: Weekly Image Rebuilds
|
|
||||||
|
|
||||||
on:
|
|
||||||
schedule:
|
|
||||||
# Every Sunday at 3 AM UTC
|
|
||||||
- cron: '0 3 * * 0'
|
|
||||||
workflow_dispatch:
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
devpi:
|
|
||||||
uses: ./.forgejo/workflows/build-container.yml
|
|
||||||
with:
|
|
||||||
context: argocd/manifests/devpi
|
|
||||||
image_name: blumeops/devpi
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Future Improvements
|
|
||||||
|
|
||||||
### Multi-Arch Builds
|
|
||||||
|
|
||||||
For images that need both ARM64 and AMD64:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
platforms: linux/arm64,linux/amd64
|
|
||||||
```
|
|
||||||
|
|
||||||
Requires QEMU emulation setup in runner (already supported by buildx).
|
|
||||||
|
|
||||||
### Build Caching
|
|
||||||
|
|
||||||
Use GitHub/Forgejo cache actions:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
- name: Cache Docker layers
|
|
||||||
uses: actions/cache@v4
|
|
||||||
with:
|
|
||||||
path: /tmp/.buildx-cache
|
|
||||||
key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Security Scanning
|
|
||||||
|
|
||||||
Add Trivy or similar:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
- name: Run Trivy vulnerability scanner
|
|
||||||
uses: aquasecurity/trivy-action@master
|
|
||||||
with:
|
|
||||||
image-ref: '${{ env.REGISTRY }}/${{ inputs.image_name }}:${{ inputs.tag }}'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 6: Runner Observability (Logging & Metrics)
|
|
||||||
|
|
||||||
### 6.1 Problem
|
|
||||||
|
|
||||||
The forgejo-runner pod generates logs and metrics that should be collected for:
|
|
||||||
- Debugging failed workflow runs
|
|
||||||
- Monitoring runner health and capacity
|
|
||||||
- Alerting on runner failures
|
|
||||||
|
|
||||||
### 6.2 Log Collection via Alloy
|
|
||||||
|
|
||||||
The forgejo-runner namespace needs to be included in Alloy's k8s log collection. Alloy is already configured to scrape logs from k8s pods - verify the runner namespace is included.
|
|
||||||
|
|
||||||
Check current Alloy config:
|
|
||||||
```bash
|
|
||||||
ssh indri 'cat ~/.config/alloy/config.alloy | grep -A20 discovery.kubernetes'
|
|
||||||
```
|
|
||||||
|
|
||||||
If using namespace filtering, ensure `forgejo-runner` is included.
|
|
||||||
|
|
||||||
### 6.3 Metrics Collection
|
|
||||||
|
|
||||||
The forgejo-runner exposes Prometheus metrics. Add a ServiceMonitor or configure Alloy to scrape:
|
|
||||||
|
|
||||||
**Option A: ServiceMonitor (if using Prometheus Operator)**
|
|
||||||
|
|
||||||
Create `argocd/manifests/forgejo-runner/servicemonitor.yaml`:
|
|
||||||
```yaml
|
|
||||||
apiVersion: monitoring.coreos.com/v1
|
|
||||||
kind: ServiceMonitor
|
|
||||||
metadata:
|
|
||||||
name: forgejo-runner
|
|
||||||
namespace: forgejo-runner
|
|
||||||
spec:
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
app: forgejo-runner
|
|
||||||
endpoints:
|
|
||||||
- port: metrics
|
|
||||||
interval: 30s
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option B: Alloy scrape config**
|
|
||||||
|
|
||||||
Add to Alloy's k8s scrape config to discover the runner pod's metrics endpoint.
|
|
||||||
|
|
||||||
### 6.4 Create Runner Service for Metrics
|
|
||||||
|
|
||||||
Add `argocd/manifests/forgejo-runner/service.yaml`:
|
|
||||||
```yaml
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: forgejo-runner-metrics
|
|
||||||
namespace: forgejo-runner
|
|
||||||
labels:
|
|
||||||
app: forgejo-runner
|
|
||||||
spec:
|
|
||||||
selector:
|
|
||||||
app: forgejo-runner
|
|
||||||
ports:
|
|
||||||
- name: metrics
|
|
||||||
port: 8080
|
|
||||||
targetPort: 8080
|
|
||||||
```
|
|
||||||
|
|
||||||
Update kustomization.yaml to include the service.
|
|
||||||
|
|
||||||
### 6.5 Grafana Dashboard
|
|
||||||
|
|
||||||
Consider creating a dashboard for:
|
|
||||||
- Runner status (online/offline)
|
|
||||||
- Job queue depth
|
|
||||||
- Job execution time
|
|
||||||
- Success/failure rates
|
|
||||||
|
|
||||||
### 6.6 Verification
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check runner logs are appearing in Loki
|
|
||||||
# Go to Grafana → Explore → Loki
|
|
||||||
# Query: {namespace="forgejo-runner"}
|
|
||||||
|
|
||||||
# Check metrics are being scraped
|
|
||||||
# Go to Grafana → Explore → Prometheus
|
|
||||||
# Query: forgejo_runner_*
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
- [ ] devpi build workflow created
|
|
||||||
- [ ] devpi image builds successfully
|
|
||||||
- [ ] Image pushed to zot registry
|
|
||||||
- [ ] devpi pod uses new image
|
|
||||||
- [ ] Reusable container workflow created
|
|
||||||
- [ ] (Optional) Python build workflow created
|
|
||||||
- [ ] (Optional) Scheduled builds configured
|
|
||||||
- [ ] Runner logs visible in Loki
|
|
||||||
- [ ] Runner metrics scraped by Prometheus/Alloy
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
With this phase complete, we have:
|
|
||||||
1. **Forgejo Actions** running with k8s runner
|
|
||||||
2. **Forgejo self-deploys** from CI on tagged releases
|
|
||||||
3. **Container images** built automatically on push
|
|
||||||
4. Infrastructure for Python package builds
|
|
||||||
5. **Runner observability** with logs in Loki and metrics in Prometheus
|
|
||||||
|
|
||||||
The CI/CD bootstrap is complete. Future work:
|
|
||||||
- Add more container builds as needed
|
|
||||||
- Add Python package publishing for internal tools
|
|
||||||
- Consider adding a macOS runner on indri for native builds
|
|
||||||
- Create Grafana dashboards for CI/CD monitoring
|
|
||||||
|
|
@ -1,79 +0,0 @@
|
||||||
# Blumeops Minikube Migration Plan
|
|
||||||
|
|
||||||
**Status**: Completed (2026-01-23)
|
|
||||||
|
|
||||||
This plan detailed the phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster. The migration is now complete for all services that will be migrated.
|
|
||||||
|
|
||||||
## Final Status
|
|
||||||
|
|
||||||
| Phase | Name | Status | Notes |
|
|
||||||
|-------|------|--------|-------|
|
|
||||||
| 0 | [Foundation](P0_foundation.complete.md) | ✅ Complete | Container registry (zot) + minikube cluster |
|
|
||||||
| 1 | [K8s Infrastructure](P1_k8s_infrastructure.complete.md) | ✅ Complete | Tailscale operator, ArgoCD, CloudNativePG, PostgreSQL cluster |
|
|
||||||
| 2 | [Grafana](P2_grafana.complete.md) | ✅ Complete | Migrated Grafana via ArgoCD |
|
|
||||||
| 3 | [PostgreSQL](P3_postgresql.complete.md) | ✅ Complete | Data migration to k8s PostgreSQL |
|
|
||||||
| 4 | [Miniflux](P4_miniflux.complete.md) | ✅ Complete | Migrated Miniflux via ArgoCD |
|
|
||||||
| 5 | [devpi](P5_devpi.complete.md) | ✅ Complete | Migrated devpi via ArgoCD |
|
|
||||||
| 5.1 | [Docker Migration](P5.1_docker_migration.complete.md) | ✅ Complete | Switched minikube to docker driver (not QEMU2) |
|
|
||||||
| 6 | [Kiwix](P6_kiwix.complete.md) | ✅ Complete | Migrated Kiwix + Transmission via ArgoCD |
|
|
||||||
| 7 | [Forgejo](P7_forgejo.md) | ⏭️ Won't Do | Forgejo stays on indri - see [CI/CD Bootstrap](../../ci-cd-bootstrap/) |
|
|
||||||
| 8 | [Woodpecker](P8_woodpecker.md) | ⏭️ Won't Do | Replaced by Forgejo Actions - see [CI/CD Bootstrap](../../ci-cd-bootstrap/) |
|
|
||||||
| 9 | [Cleanup](P9_cleanup.md) | ⏭️ Won't Do | Observability cleanup done separately (2026-01-22) |
|
|
||||||
|
|
||||||
## What Was Migrated to K8s
|
|
||||||
|
|
||||||
| Service | Status | Notes |
|
|
||||||
|---------|--------|-------|
|
|
||||||
| Grafana | ✅ In k8s | Helm chart via ArgoCD |
|
|
||||||
| PostgreSQL | ✅ In k8s | CloudNativePG operator |
|
|
||||||
| Miniflux | ✅ In k8s | Using k8s PostgreSQL |
|
|
||||||
| devpi | ✅ In k8s | Custom container image |
|
|
||||||
| Kiwix | ✅ In k8s | NFS mount from sifaka |
|
|
||||||
| Transmission | ✅ In k8s | NFS mount from sifaka |
|
|
||||||
| Prometheus | ✅ In k8s | Migrated 2026-01-22 |
|
|
||||||
| Loki | ✅ In k8s | Migrated 2026-01-22 |
|
|
||||||
| Alloy (k8s) | ✅ In k8s | DaemonSet for pod logs |
|
|
||||||
| TeslaMate | ✅ In k8s | Added 2026-01-23 |
|
|
||||||
|
|
||||||
## What Stays on Indri
|
|
||||||
|
|
||||||
| Service | Reason |
|
|
||||||
|---------|--------|
|
|
||||||
| **Forgejo** | Critical infrastructure, avoids circular dependency with ArgoCD |
|
|
||||||
| **Zot Registry** | K8s needs images to start - must be outside k8s |
|
|
||||||
| **Alloy (host)** | Collects host-level metrics and logs |
|
|
||||||
| **Borgmatic** | Backup system must survive k8s failures |
|
|
||||||
| **Plex** | Uses own NAT traversal, not Tailscale |
|
|
||||||
|
|
||||||
## Architecture Decisions Made
|
|
||||||
|
|
||||||
### Minikube Driver: Docker (not QEMU2/Podman)
|
|
||||||
- Original plan called for QEMU2, but docker driver proved simpler
|
|
||||||
- NFS mounts work via Docker NAT through indri's LAN IP
|
|
||||||
- API server accessible via Tailscale TCP passthrough
|
|
||||||
|
|
||||||
### Forgejo: Stays on Indri
|
|
||||||
- Original P7 planned k8s migration
|
|
||||||
- Decision changed: Forgejo is critical infrastructure
|
|
||||||
- Will be built from source via Forgejo Actions CI
|
|
||||||
- See [CI/CD Bootstrap Plan](../../ci-cd-bootstrap/) for details
|
|
||||||
|
|
||||||
### CI/CD: Forgejo Actions (not Woodpecker)
|
|
||||||
- Original P8 planned Woodpecker deployment
|
|
||||||
- Decision changed: Use Forgejo's native Actions instead
|
|
||||||
- Simpler (one less system), GitHub Actions compatible
|
|
||||||
- See [CI/CD Bootstrap Plan](../../ci-cd-bootstrap/) for details
|
|
||||||
|
|
||||||
### Observability: Migrated to K8s
|
|
||||||
- Original plan kept Prometheus/Loki on indri
|
|
||||||
- Changed: Migrated both to k8s (2026-01-22)
|
|
||||||
- Alloy on indri pushes to k8s endpoints
|
|
||||||
- Alloy DaemonSet in k8s collects pod logs
|
|
||||||
|
|
||||||
## Lessons Learned
|
|
||||||
|
|
||||||
1. **Docker driver is simpler than QEMU2** - Direct NFS mounts work, no VM complexity
|
|
||||||
2. **Tailscale operator works well** - Easy service exposure with automatic TLS
|
|
||||||
3. **CloudNativePG is production-ready** - Good operator, easy backups
|
|
||||||
4. **Keep critical infra outside k8s** - Forgejo and zot must survive k8s failures
|
|
||||||
5. **CGO matters on macOS** - Alloy needed CGO=1 for Tailscale DNS resolution
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,657 +0,0 @@
|
||||||
# Phase 1: Kubernetes Infrastructure
|
|
||||||
|
|
||||||
**Goal**: Tailscale operator, ArgoCD, CloudNativePG operator, PostgreSQL cluster
|
|
||||||
|
|
||||||
**Status**: In Progress
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 0](P0_foundation.complete.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
Phase 1 establishes the k8s control plane infrastructure:
|
|
||||||
1. **Tailscale operator** - Exposes services on the tailnet
|
|
||||||
2. **ArgoCD** - GitOps continuous delivery
|
|
||||||
3. **CloudNativePG** - PostgreSQL operator
|
|
||||||
4. **PostgreSQL cluster** - Database for future app migrations
|
|
||||||
|
|
||||||
The deployment follows a bootstrap pattern:
|
|
||||||
- First two components deployed via `kubectl apply -k` (no GitOps yet)
|
|
||||||
- ArgoCD then takes over management of all components including itself
|
|
||||||
- All subsequent deployments use ArgoCD
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Kubernetes Tags Overview
|
|
||||||
|
|
||||||
| Tag | Purpose | Applied To |
|
|
||||||
|-----|---------|------------|
|
|
||||||
| `tag:k8s-api` | Controls access to the K8s API server | indri (Phase 0.14) |
|
|
||||||
| `tag:k8s-operator` | Identifies the Tailscale K8s Operator | OAuth client for operator |
|
|
||||||
| `tag:k8s` | Default tag for operator-managed resources | Proxies, services, ingresses created by operator |
|
|
||||||
|
|
||||||
**Ownership chain**: `tag:k8s-operator` must own `tag:k8s` so the operator can assign that tag to devices it creates.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## PostgreSQL Migration Strategy
|
|
||||||
|
|
||||||
The k8s PostgreSQL cluster will eventually replace the brew PostgreSQL on indri.
|
|
||||||
|
|
||||||
| Phase | `pg.tail8d86e.ts.net` points to | Miniflux connects to |
|
|
||||||
|-------|--------------------------------|---------------------|
|
|
||||||
| Current | brew PostgreSQL (indri) | `pg.tail8d86e.ts.net` |
|
|
||||||
| Phase 1 | brew PostgreSQL (indri) | `pg.tail8d86e.ts.net` (no change) |
|
|
||||||
| Phase 4 | brew PostgreSQL (indri) | k8s PG (internal, after miniflux migrates to k8s) |
|
|
||||||
| Post-Phase 4 | k8s PostgreSQL | k8s PG (internal) |
|
|
||||||
| Cleanup | k8s PostgreSQL | k8s PG (internal) |
|
|
||||||
|
|
||||||
This allows zero-downtime migration - the Tailscale service switches after apps are migrated.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Steps
|
|
||||||
|
|
||||||
### 1. Update Pulumi ACLs for k8s workloads ✓
|
|
||||||
|
|
||||||
**Status**: Complete
|
|
||||||
|
|
||||||
Added to `pulumi/policy.hujson`:
|
|
||||||
- `tag:k8s-operator` - for the operator OAuth client
|
|
||||||
- `tag:k8s` - for operator-managed resources (owned by `tag:k8s-operator`)
|
|
||||||
- Grant for `tag:k8s` → `tag:registry` access
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. Create Tailscale OAuth client ✓
|
|
||||||
|
|
||||||
**Status**: Complete
|
|
||||||
|
|
||||||
OAuth client stored in 1Password (vault: `vg6xf6vvfmoh5hqjjhlhbeoaie`, item: `2it22lavwgbxdskoaxanej354q`)
|
|
||||||
|
|
||||||
**Configuration used:**
|
|
||||||
- Tags: `tag:k8s-operator`
|
|
||||||
- Devices write scope tag: `tag:k8s`
|
|
||||||
- Scopes: Devices Core (R/W), Auth Keys (R/W), Services (Write)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Deploy Tailscale Kubernetes Operator (Bootstrap)
|
|
||||||
|
|
||||||
Deploy via `kubectl apply -k` - will be migrated to ArgoCD management in Step 5.
|
|
||||||
|
|
||||||
**Setup manifests directory:**
|
|
||||||
```bash
|
|
||||||
mkdir -p argocd/manifests/tailscale-operator
|
|
||||||
cd argocd/manifests/tailscale-operator
|
|
||||||
|
|
||||||
# Download static manifest from Tailscale repo
|
|
||||||
curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/manifests/operator.yaml -o operator.yaml
|
|
||||||
|
|
||||||
# Download CRDs
|
|
||||||
curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/crds/tailscale.com_connectors.yaml -o crds/connectors.yaml
|
|
||||||
curl -sL https://raw.githubusercontent.com/tailscale/tailscale/main/cmd/k8s-operator/deploy/crds/tailscale.com_proxyclasses.yaml -o crds/proxyclasses.yaml
|
|
||||||
# ... (other CRDs as needed)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Create kustomization.yaml:**
|
|
||||||
```yaml
|
|
||||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
|
||||||
kind: Kustomization
|
|
||||||
namespace: tailscale-system
|
|
||||||
resources:
|
|
||||||
- operator.yaml
|
|
||||||
secretGenerator:
|
|
||||||
- name: operator-oauth
|
|
||||||
namespace: tailscale-system
|
|
||||||
literals:
|
|
||||||
- client_id=PLACEHOLDER
|
|
||||||
- client_secret=PLACEHOLDER
|
|
||||||
generatorOptions:
|
|
||||||
disableNameSuffixHash: true
|
|
||||||
```
|
|
||||||
|
|
||||||
**Deploy:**
|
|
||||||
```bash
|
|
||||||
# Get credentials from 1Password and create secret manually (kustomize secretGenerator is for reference)
|
|
||||||
CLIENT_ID=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 2it22lavwgbxdskoaxanej354q --fields client-id --reveal)
|
|
||||||
CLIENT_SECRET=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get 2it22lavwgbxdskoaxanej354q --fields client-secret --reveal)
|
|
||||||
|
|
||||||
kubectl create namespace tailscale-system
|
|
||||||
kubectl create secret generic operator-oauth \
|
|
||||||
--namespace tailscale-system \
|
|
||||||
--from-literal=client_id=$CLIENT_ID \
|
|
||||||
--from-literal=client_secret=$CLIENT_SECRET
|
|
||||||
|
|
||||||
# Apply operator manifests
|
|
||||||
kubectl apply -k argocd/manifests/tailscale-operator/
|
|
||||||
```
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
```bash
|
|
||||||
kubectl get pods -n tailscale-system
|
|
||||||
# Expected: operator pod Running
|
|
||||||
|
|
||||||
kubectl logs -n tailscale-system -l app.kubernetes.io/name=tailscale-operator
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Deploy ArgoCD
|
|
||||||
|
|
||||||
Deploy ArgoCD and expose via Tailscale as `argocd.tail8d86e.ts.net`.
|
|
||||||
|
|
||||||
**Prerequisites:**
|
|
||||||
- Add `tag:argocd` to Pulumi ACLs
|
|
||||||
- Create Tailscale service `argocd` in admin console
|
|
||||||
|
|
||||||
**Setup manifests:**
|
|
||||||
```bash
|
|
||||||
mkdir -p argocd/manifests/argocd
|
|
||||||
|
|
||||||
# Download ArgoCD install manifest
|
|
||||||
curl -sL https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml -o argocd/manifests/argocd/install.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**Create kustomization.yaml:**
|
|
||||||
```yaml
|
|
||||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
|
||||||
kind: Kustomization
|
|
||||||
namespace: argocd
|
|
||||||
resources:
|
|
||||||
- install.yaml
|
|
||||||
- service-tailscale.yaml # LoadBalancer for Tailscale exposure
|
|
||||||
```
|
|
||||||
|
|
||||||
**Create service-tailscale.yaml:**
|
|
||||||
```yaml
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: argocd-server-tailscale
|
|
||||||
namespace: argocd
|
|
||||||
annotations:
|
|
||||||
tailscale.com/hostname: "argocd"
|
|
||||||
spec:
|
|
||||||
type: LoadBalancer
|
|
||||||
loadBalancerClass: tailscale
|
|
||||||
selector:
|
|
||||||
app.kubernetes.io/name: argocd-server
|
|
||||||
ports:
|
|
||||||
- name: https
|
|
||||||
port: 443
|
|
||||||
targetPort: 8080
|
|
||||||
```
|
|
||||||
|
|
||||||
**Deploy:**
|
|
||||||
```bash
|
|
||||||
kubectl create namespace argocd
|
|
||||||
kubectl apply -k argocd/manifests/argocd/
|
|
||||||
```
|
|
||||||
|
|
||||||
**Get initial admin password:**
|
|
||||||
```bash
|
|
||||||
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
|
|
||||||
```
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
- https://argocd.tail8d86e.ts.net loads
|
|
||||||
- Can login with admin / <initial-password>
|
|
||||||
|
|
||||||
**Post-setup:**
|
|
||||||
1. Change admin password, store in 1Password
|
|
||||||
2. Configure git repo connection to `github.com/eblume/blumeops` (public, no auth needed)
|
|
||||||
- Note: Using GitHub mirror since ArgoCD can't easily reach forge without additional networking
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 5. Migrate Tailscale Operator to ArgoCD
|
|
||||||
|
|
||||||
Create ArgoCD Application to manage the Tailscale operator.
|
|
||||||
|
|
||||||
**Create argocd/apps/tailscale-operator.yaml:**
|
|
||||||
```yaml
|
|
||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: tailscale-operator
|
|
||||||
namespace: argocd
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://github.com/eblume/blumeops.git
|
|
||||||
targetRevision: main
|
|
||||||
path: argocd/manifests/tailscale-operator
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: tailscale-system
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
```
|
|
||||||
|
|
||||||
**Apply:**
|
|
||||||
```bash
|
|
||||||
kubectl apply -f argocd/apps/tailscale-operator.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**Note on secrets:** The OAuth secret was created manually in Step 3. For GitOps, consider:
|
|
||||||
- Sealed Secrets
|
|
||||||
- External Secrets Operator
|
|
||||||
- SOPS
|
|
||||||
|
|
||||||
For now, the secret remains manually managed outside of ArgoCD.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 6. Deploy CloudNativePG via ArgoCD
|
|
||||||
|
|
||||||
**Setup manifests:**
|
|
||||||
```bash
|
|
||||||
mkdir -p argocd/manifests/cloudnative-pg
|
|
||||||
|
|
||||||
# Download CNPG operator manifest
|
|
||||||
curl -sL https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml -o argocd/manifests/cloudnative-pg/operator.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**Create kustomization.yaml:**
|
|
||||||
```yaml
|
|
||||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
|
||||||
kind: Kustomization
|
|
||||||
resources:
|
|
||||||
- operator.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**Create ArgoCD Application (argocd/apps/cloudnative-pg.yaml):**
|
|
||||||
```yaml
|
|
||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: cloudnative-pg
|
|
||||||
namespace: argocd
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://github.com/eblume/blumeops.git
|
|
||||||
targetRevision: main
|
|
||||||
path: argocd/manifests/cloudnative-pg
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: cnpg-system
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=true
|
|
||||||
```
|
|
||||||
|
|
||||||
**Apply:**
|
|
||||||
```bash
|
|
||||||
kubectl apply -f argocd/apps/cloudnative-pg.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
```bash
|
|
||||||
kubectl get pods -n cnpg-system
|
|
||||||
# Expected: cnpg-controller-manager Running
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 7. Create PostgreSQL Cluster via ArgoCD
|
|
||||||
|
|
||||||
Create the database cluster. **Not exposed via Tailscale yet** - internal only until apps migrate.
|
|
||||||
|
|
||||||
**Create argocd/manifests/databases/blumeops-pg.yaml:**
|
|
||||||
```yaml
|
|
||||||
apiVersion: postgresql.cnpg.io/v1
|
|
||||||
kind: Cluster
|
|
||||||
metadata:
|
|
||||||
name: blumeops-pg
|
|
||||||
namespace: databases
|
|
||||||
spec:
|
|
||||||
instances: 1
|
|
||||||
storage:
|
|
||||||
size: 10Gi
|
|
||||||
storageClass: standard
|
|
||||||
monitoring:
|
|
||||||
enablePodMonitor: true
|
|
||||||
bootstrap:
|
|
||||||
initdb:
|
|
||||||
database: miniflux
|
|
||||||
owner: miniflux
|
|
||||||
```
|
|
||||||
|
|
||||||
**Create kustomization.yaml:**
|
|
||||||
```yaml
|
|
||||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
|
||||||
kind: Kustomization
|
|
||||||
namespace: databases
|
|
||||||
resources:
|
|
||||||
- blumeops-pg.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**Create ArgoCD Application (argocd/apps/blumeops-pg.yaml):**
|
|
||||||
```yaml
|
|
||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: blumeops-pg
|
|
||||||
namespace: argocd
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://github.com/eblume/blumeops.git
|
|
||||||
targetRevision: main
|
|
||||||
path: argocd/manifests/databases
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: databases
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=true
|
|
||||||
```
|
|
||||||
|
|
||||||
**Apply:**
|
|
||||||
```bash
|
|
||||||
kubectl apply -f argocd/apps/blumeops-pg.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
```bash
|
|
||||||
kubectl get cluster -n databases
|
|
||||||
# Expected: blumeops-pg with STATUS "Cluster in healthy state"
|
|
||||||
|
|
||||||
kubectl get pods -n databases
|
|
||||||
# Expected: blumeops-pg-1 Running
|
|
||||||
|
|
||||||
# Get connection secret
|
|
||||||
kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 8. Create App-of-Apps Root Application
|
|
||||||
|
|
||||||
Once all components are deployed, create a root application to manage all apps.
|
|
||||||
|
|
||||||
**Create argocd/apps/root.yaml:**
|
|
||||||
```yaml
|
|
||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: root
|
|
||||||
namespace: argocd
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://github.com/eblume/blumeops.git
|
|
||||||
targetRevision: main
|
|
||||||
path: argocd/apps
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: argocd
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
```
|
|
||||||
|
|
||||||
**Apply:**
|
|
||||||
```bash
|
|
||||||
kubectl apply -f argocd/apps/root.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
Now ArgoCD manages itself and all other applications via the app-of-apps pattern.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## New Files Summary
|
|
||||||
|
|
||||||
```
|
|
||||||
argocd/
|
|
||||||
apps/
|
|
||||||
root.yaml # App-of-apps root
|
|
||||||
tailscale-operator.yaml # Tailscale operator app
|
|
||||||
cloudnative-pg.yaml # CNPG operator app
|
|
||||||
blumeops-pg.yaml # PostgreSQL cluster app
|
|
||||||
manifests/
|
|
||||||
tailscale-operator/
|
|
||||||
kustomization.yaml
|
|
||||||
operator.yaml
|
|
||||||
argocd/
|
|
||||||
kustomization.yaml
|
|
||||||
install.yaml
|
|
||||||
service-tailscale.yaml
|
|
||||||
cloudnative-pg/
|
|
||||||
kustomization.yaml
|
|
||||||
operator.yaml
|
|
||||||
databases/
|
|
||||||
kustomization.yaml
|
|
||||||
blumeops-pg.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Pulumi ACL Updates Required
|
|
||||||
|
|
||||||
Add to `pulumi/policy.hujson`:
|
|
||||||
```hujson
|
|
||||||
"tag:argocd": ["autogroup:admin", "tag:blumeops"],
|
|
||||||
```
|
|
||||||
|
|
||||||
Add to Erich's test accept list:
|
|
||||||
```hujson
|
|
||||||
"accept": [..., "tag:argocd:443"],
|
|
||||||
```
|
|
||||||
|
|
||||||
Add to Allison's deny list:
|
|
||||||
```hujson
|
|
||||||
"deny": [..., "tag:argocd:443"],
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. Tailscale operator running
|
|
||||||
kubectl get pods -n tailscale-system
|
|
||||||
|
|
||||||
# 2. ArgoCD accessible
|
|
||||||
curl -k https://argocd.tail8d86e.ts.net/healthz
|
|
||||||
|
|
||||||
# 3. CloudNativePG operator running
|
|
||||||
kubectl get pods -n cnpg-system
|
|
||||||
|
|
||||||
# 4. PostgreSQL cluster healthy
|
|
||||||
kubectl get cluster -n databases
|
|
||||||
|
|
||||||
# 5. All ArgoCD apps synced
|
|
||||||
kubectl get applications -n argocd
|
|
||||||
# All should show STATUS: Synced, HEALTH: Healthy
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Rollback
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Remove ArgoCD apps (will cascade delete managed resources)
|
|
||||||
kubectl delete application -n argocd root
|
|
||||||
kubectl delete application -n argocd blumeops-pg
|
|
||||||
kubectl delete application -n argocd cloudnative-pg
|
|
||||||
kubectl delete application -n argocd tailscale-operator
|
|
||||||
|
|
||||||
# Remove ArgoCD
|
|
||||||
kubectl delete -k argocd/manifests/argocd/
|
|
||||||
kubectl delete namespace argocd
|
|
||||||
|
|
||||||
# Remove namespaces
|
|
||||||
kubectl delete namespace databases
|
|
||||||
kubectl delete namespace cnpg-system
|
|
||||||
kubectl delete namespace tailscale-system
|
|
||||||
|
|
||||||
# Revert ACL changes
|
|
||||||
git checkout pulumi/policy.hujson
|
|
||||||
mise run tailnet-up
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Implementation Notes (Deviations from Plan)
|
|
||||||
|
|
||||||
*Added during implementation for retrospective review*
|
|
||||||
|
|
||||||
### Git Source: Forge Instead of GitHub
|
|
||||||
|
|
||||||
**Plan**: Use GitHub mirror (`github.com/eblume/blumeops`)
|
|
||||||
**Actual**: Use internal Forgejo (`ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git`)
|
|
||||||
|
|
||||||
**Why**: User preference to use internal infrastructure, accepting circular dependency for later.
|
|
||||||
|
|
||||||
**Required changes**:
|
|
||||||
- Deploy key added to forge for ArgoCD SSH access
|
|
||||||
- Repository secret `repo-forge` with SSH private key from 1Password
|
|
||||||
- Discovered: `op read` requires `?ssh-format=openssh` query parameter for ArgoCD-compatible key format
|
|
||||||
- Egress proxy service to reach forge from cluster (targets `indri.tail8d86e.ts.net` not `forge.tail8d86e.ts.net` due to Tailscale Serve limitation)
|
|
||||||
- DNSConfig CRD for cluster-to-tailnet MagicDNS resolution
|
|
||||||
- ACL grant: `tag:k8s` → `tag:homelab` on ports 3001 (HTTP) and 2200 (SSH)
|
|
||||||
|
|
||||||
### ArgoCD Exposure: Ingress Instead of LoadBalancer
|
|
||||||
|
|
||||||
**Plan**: LoadBalancer service with `tailscale.com/hostname` annotation
|
|
||||||
**Actual**: Tailscale Ingress with Let's Encrypt TLS termination
|
|
||||||
|
|
||||||
**Why**: Ingress provides automatic TLS certificates and is the recommended approach.
|
|
||||||
|
|
||||||
**File**: `argocd/manifests/argocd/service-tailscale.yaml` uses `kind: Ingress` with `ingressClassName: tailscale`
|
|
||||||
|
|
||||||
### Namespace: `tailscale` Instead of `tailscale-system`
|
|
||||||
|
|
||||||
**Plan**: `tailscale-system` namespace
|
|
||||||
**Actual**: `tailscale` namespace
|
|
||||||
|
|
||||||
**Why**: Matches upstream Tailscale operator defaults.
|
|
||||||
|
|
||||||
### Sync Policy: Manual Instead of Automated
|
|
||||||
|
|
||||||
**Plan**: `syncPolicy.automated` with prune and selfHeal
|
|
||||||
**Actual**: Manual sync policy for workload apps; auto-sync only for app-of-apps
|
|
||||||
|
|
||||||
**Why**: User preference for explicit control over deployments during initial migration phase.
|
|
||||||
|
|
||||||
**Pattern**:
|
|
||||||
- `apps.yaml` (app-of-apps): auto-sync to pick up new Application manifests
|
|
||||||
- All workload apps: manual sync requires `argocd app sync <name>`
|
|
||||||
|
|
||||||
### CloudNativePG: Helm Chart Instead of Raw Manifest
|
|
||||||
|
|
||||||
**Plan**: Download raw CNPG manifest
|
|
||||||
**Actual**: Multi-source Application using official Helm chart from `https://cloudnative-pg.github.io/charts`
|
|
||||||
|
|
||||||
**Why**: Helm chart is the officially supported distribution method.
|
|
||||||
|
|
||||||
**Additional fix**: Required `ServerSideApply=true` sync option due to large CRD exceeding annotation size limit.
|
|
||||||
|
|
||||||
### App-of-Apps: Named `apps` Instead of `root`
|
|
||||||
|
|
||||||
**Plan**: `argocd/apps/root.yaml`
|
|
||||||
**Actual**: `argocd/apps/apps.yaml` with Application named `apps`
|
|
||||||
|
|
||||||
**Why**: Clearer naming; `apps` manages apps, `argocd` manages itself.
|
|
||||||
|
|
||||||
### ArgoCD Self-Management Added
|
|
||||||
|
|
||||||
**Plan**: Not explicitly planned
|
|
||||||
**Actual**: `argocd/apps/argocd.yaml` Application for ArgoCD self-management
|
|
||||||
|
|
||||||
**Why**: Standard GitOps pattern - ArgoCD manages its own deployment after bootstrap.
|
|
||||||
|
|
||||||
### CRI-O Registry Mirror for Zot
|
|
||||||
|
|
||||||
**Plan**: Not in original plan
|
|
||||||
**Actual**: Configured CRI-O to use zot as pull-through cache for docker.io, ghcr.io, quay.io
|
|
||||||
|
|
||||||
**Why**: Reduces external bandwidth, speeds up pulls, avoids rate limits.
|
|
||||||
|
|
||||||
**Implementation**: Ansible `minikube` role applies `/etc/containers/registries.conf.d/zot-mirror.conf` inside minikube VM using stable hostname `host.containers.internal:5050`.
|
|
||||||
|
|
||||||
### ProxyClass for CRI-O Image Compatibility
|
|
||||||
|
|
||||||
**Plan**: Not mentioned
|
|
||||||
**Actual**: Required `ProxyClass` with fully-qualified image paths (`docker.io/tailscale/...`)
|
|
||||||
|
|
||||||
**Why**: CRI-O requires fully-qualified image references; default Tailscale operator uses short names.
|
|
||||||
|
|
||||||
### Actual File Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
argocd/
|
|
||||||
apps/
|
|
||||||
apps.yaml # App-of-apps (auto-sync)
|
|
||||||
argocd.yaml # ArgoCD self-management (manual sync)
|
|
||||||
tailscale-operator.yaml # Tailscale operator (manual sync)
|
|
||||||
cloudnative-pg.yaml # CNPG operator via Helm (manual sync)
|
|
||||||
manifests/
|
|
||||||
tailscale-operator/
|
|
||||||
kustomization.yaml
|
|
||||||
operator.yaml
|
|
||||||
proxyclass.yaml # CRI-O compatibility
|
|
||||||
dnsconfig.yaml # Cluster-to-tailnet DNS
|
|
||||||
egress-forge.yaml # Egress proxy for forge
|
|
||||||
secret.yaml.tpl # OAuth secret template (manual)
|
|
||||||
README.md
|
|
||||||
argocd/
|
|
||||||
kustomization.yaml # Uses remote base from upstream
|
|
||||||
service-tailscale.yaml # Ingress (not LoadBalancer)
|
|
||||||
argocd-cmd-params-cm.yaml # Disable HTTPS redirect
|
|
||||||
repo-forge-secret.yaml.tpl # SSH key template (manual)
|
|
||||||
README.md
|
|
||||||
cloudnative-pg/
|
|
||||||
values.yaml # Helm values (currently minimal)
|
|
||||||
README.md
|
|
||||||
```
|
|
||||||
|
|
||||||
### Bootstrap Commands (Actual)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. Create namespaces
|
|
||||||
kubectl create namespace tailscale
|
|
||||||
kubectl create namespace argocd
|
|
||||||
|
|
||||||
# 2. Apply secrets (manual, uses 1Password)
|
|
||||||
op inject -i argocd/manifests/tailscale-operator/secret.yaml.tpl | kubectl apply -f -
|
|
||||||
|
|
||||||
PRIV_KEY=$(op read "op://vg6xf6vvfmoh5hqjjhlhbeoaie/csjncynh6htjvnh2l2da65y32q/private key?ssh-format=openssh")$'\n' && \
|
|
||||||
kubectl create secret generic repo-forge -n argocd \
|
|
||||||
--from-literal=type=git \
|
|
||||||
--from-literal=url='ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git' \
|
|
||||||
--from-literal=insecure=true \
|
|
||||||
--from-literal=sshPrivateKey="$PRIV_KEY" && \
|
|
||||||
kubectl label secret repo-forge -n argocd argocd.argoproj.io/secret-type=repository
|
|
||||||
|
|
||||||
# 3. Bootstrap tailscale-operator
|
|
||||||
kubectl apply -k argocd/manifests/tailscale-operator/
|
|
||||||
|
|
||||||
# 4. Bootstrap ArgoCD
|
|
||||||
kubectl apply -k argocd/manifests/argocd/
|
|
||||||
|
|
||||||
# 5. Login and change password
|
|
||||||
argocd login argocd.tail8d86e.ts.net --username admin --grpc-web
|
|
||||||
argocd account update-password
|
|
||||||
|
|
||||||
# 6. Apply ArgoCD Applications
|
|
||||||
kubectl apply -f argocd/apps/argocd.yaml
|
|
||||||
kubectl apply -f argocd/apps/apps.yaml
|
|
||||||
|
|
||||||
# 7. Sync workloads
|
|
||||||
argocd app sync tailscale-operator
|
|
||||||
argocd app sync cloudnative-pg
|
|
||||||
```
|
|
||||||
|
|
@ -1,396 +0,0 @@
|
||||||
# Phase 2: Grafana Migration (Pilot)
|
|
||||||
|
|
||||||
**Goal**: Migrate Grafana as lowest-risk pilot service
|
|
||||||
|
|
||||||
**Status**: Complete (2026-01-19)
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 1](P1_k8s_infrastructure.complete.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
This phase migrates Grafana from Homebrew/Ansible on indri to Kubernetes, establishing the pattern for future service migrations. Additionally, we establish the pattern of mirroring Helm chart repositories to forge for resilience and GitOps consistency.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Key Decisions
|
|
||||||
|
|
||||||
### Helm Chart Mirroring
|
|
||||||
|
|
||||||
**Problem**: P1 uses external Helm repos which creates external dependencies.
|
|
||||||
|
|
||||||
**Solution**: Mirror Helm chart Git repositories to forge, reference charts from git path.
|
|
||||||
|
|
||||||
ArgoCD auto-detects Helm charts when a directory contains `Chart.yaml`. No build step needed.
|
|
||||||
|
|
||||||
| Chart | Upstream Git Repo | Forge Mirror | Chart Path |
|
|
||||||
|-------|-------------------|--------------|------------|
|
|
||||||
| cloudnative-pg | `github.com/cloudnative-pg/charts` | `forge/eblume/cloudnative-pg-charts` | `charts/cloudnative-pg/` |
|
|
||||||
| grafana | `github.com/grafana/helm-charts` | `forge/eblume/grafana-helm-charts` | `charts/grafana/` |
|
|
||||||
|
|
||||||
### Database Storage
|
|
||||||
|
|
||||||
Use SQLite with 1Gi PVC (not k8s PostgreSQL). Grafana stores minimal persistent data and dashboards are git-provisioned.
|
|
||||||
|
|
||||||
### Datasource URLs
|
|
||||||
|
|
||||||
From k8s pods, use `host.containers.internal` to reach indri services:
|
|
||||||
- Prometheus: `http://host.containers.internal:9090`
|
|
||||||
- Loki: `http://host.containers.internal:3100` (requires ansible change to bind 0.0.0.0)
|
|
||||||
|
|
||||||
### Ingress
|
|
||||||
|
|
||||||
Tailscale Ingress with Let's Encrypt TLS (following ArgoCD pattern), with `crio-compat` proxy class.
|
|
||||||
|
|
||||||
### Secrets Management
|
|
||||||
|
|
||||||
Admin password stored in 1Password, injected manually via `op inject`. Future: migrate to External Secrets Operator or similar.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Prerequisites
|
|
||||||
|
|
||||||
### 0.1 Mirror Helm Chart Repos to Forge
|
|
||||||
|
|
||||||
**User action**: Create mirrors in forge:
|
|
||||||
|
|
||||||
1. **CloudNativePG charts** (fix existing P1 app):
|
|
||||||
- Mirror: `https://github.com/cloudnative-pg/charts`
|
|
||||||
- To: `forge.tail8d86e.ts.net/eblume/cloudnative-pg-charts`
|
|
||||||
|
|
||||||
2. **Grafana helm-charts** (new):
|
|
||||||
- Mirror: `https://github.com/grafana/helm-charts`
|
|
||||||
- To: `forge.tail8d86e.ts.net/eblume/grafana-helm-charts`
|
|
||||||
|
|
||||||
### 0.2 Update Loki to Bind 0.0.0.0
|
|
||||||
|
|
||||||
**File**: `ansible/roles/loki/templates/loki-config.yaml.j2`
|
|
||||||
|
|
||||||
Add under `server:`:
|
|
||||||
```yaml
|
|
||||||
http_listen_address: 0.0.0.0
|
|
||||||
```
|
|
||||||
|
|
||||||
Deploy: `mise run provision-indri -- --tags loki`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Steps
|
|
||||||
|
|
||||||
### 1. Fix CloudNativePG to Use Forge Mirror
|
|
||||||
|
|
||||||
Update `argocd/apps/cloudnative-pg.yaml` to use forge-mirrored chart:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
sources:
|
|
||||||
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/cloudnative-pg-charts.git
|
|
||||||
targetRevision: cloudnative-pg-0.23.0 # git tag
|
|
||||||
path: charts/cloudnative-pg
|
|
||||||
helm:
|
|
||||||
releaseName: cloudnative-pg
|
|
||||||
valueFiles:
|
|
||||||
- $values/argocd/manifests/cloudnative-pg/values.yaml
|
|
||||||
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
|
|
||||||
targetRevision: main
|
|
||||||
ref: values
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. Create Grafana Helm Values
|
|
||||||
|
|
||||||
**File**: `argocd/manifests/grafana/values.yaml`
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
admin:
|
|
||||||
existingSecret: grafana-admin
|
|
||||||
userKey: admin-user
|
|
||||||
passwordKey: admin-password
|
|
||||||
|
|
||||||
persistence:
|
|
||||||
enabled: true
|
|
||||||
type: pvc
|
|
||||||
size: 1Gi
|
|
||||||
|
|
||||||
grafana.ini:
|
|
||||||
server:
|
|
||||||
root_url: https://grafana.tail8d86e.ts.net
|
|
||||||
analytics:
|
|
||||||
check_for_updates: false
|
|
||||||
reporting_enabled: false
|
|
||||||
|
|
||||||
datasources:
|
|
||||||
datasources.yaml:
|
|
||||||
apiVersion: 1
|
|
||||||
datasources:
|
|
||||||
- name: Prometheus
|
|
||||||
type: prometheus
|
|
||||||
access: proxy
|
|
||||||
uid: prometheus
|
|
||||||
url: http://host.containers.internal:9090
|
|
||||||
isDefault: true
|
|
||||||
editable: false
|
|
||||||
- name: Loki
|
|
||||||
type: loki
|
|
||||||
access: proxy
|
|
||||||
uid: loki
|
|
||||||
url: http://host.containers.internal:3100
|
|
||||||
editable: false
|
|
||||||
|
|
||||||
sidecar:
|
|
||||||
dashboards:
|
|
||||||
enabled: true
|
|
||||||
label: grafana_dashboard
|
|
||||||
labelValue: "1"
|
|
||||||
|
|
||||||
service:
|
|
||||||
type: ClusterIP
|
|
||||||
port: 80
|
|
||||||
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
memory: "128Mi"
|
|
||||||
cpu: "100m"
|
|
||||||
limits:
|
|
||||||
memory: "512Mi"
|
|
||||||
cpu: "500m"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Create Grafana ArgoCD Application
|
|
||||||
|
|
||||||
**File**: `argocd/apps/grafana.yaml`
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: grafana
|
|
||||||
namespace: argocd
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
sources:
|
|
||||||
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/grafana-helm-charts.git
|
|
||||||
targetRevision: grafana-8.8.2
|
|
||||||
path: charts/grafana
|
|
||||||
helm:
|
|
||||||
releaseName: grafana
|
|
||||||
valueFiles:
|
|
||||||
- $values/argocd/manifests/grafana/values.yaml
|
|
||||||
- repoURL: ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/blumeops.git
|
|
||||||
targetRevision: main
|
|
||||||
ref: values
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: monitoring
|
|
||||||
syncPolicy:
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=true
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Create Grafana Config Application
|
|
||||||
|
|
||||||
**File**: `argocd/apps/grafana-config.yaml`
|
|
||||||
|
|
||||||
Deploys Tailscale Ingress and Dashboard ConfigMaps from `argocd/manifests/grafana-config/`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 5. Create Grafana Config Manifests
|
|
||||||
|
|
||||||
**Directory**: `argocd/manifests/grafana-config/`
|
|
||||||
|
|
||||||
Contents:
|
|
||||||
- `kustomization.yaml`
|
|
||||||
- `ingress-tailscale.yaml` - Tailscale Ingress for `grafana.tail8d86e.ts.net`
|
|
||||||
- `secret-admin.yaml.tpl` - Admin password template (1Password-backed)
|
|
||||||
- `README.md` - Notes on secrets management
|
|
||||||
- `dashboards/configmap-*.yaml` - 9 dashboard ConfigMaps
|
|
||||||
|
|
||||||
**Ingress**:
|
|
||||||
```yaml
|
|
||||||
apiVersion: networking.k8s.io/v1
|
|
||||||
kind: Ingress
|
|
||||||
metadata:
|
|
||||||
name: grafana-tailscale
|
|
||||||
namespace: monitoring
|
|
||||||
annotations:
|
|
||||||
tailscale.com/proxy-class: "crio-compat"
|
|
||||||
spec:
|
|
||||||
ingressClassName: tailscale
|
|
||||||
defaultBackend:
|
|
||||||
service:
|
|
||||||
name: grafana
|
|
||||||
port:
|
|
||||||
number: 80
|
|
||||||
tls:
|
|
||||||
- hosts:
|
|
||||||
- grafana
|
|
||||||
```
|
|
||||||
|
|
||||||
**Secret template** (`secret-admin.yaml.tpl`):
|
|
||||||
```yaml
|
|
||||||
# Apply: op inject -i secret-admin.yaml.tpl | kubectl apply -f -
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Secret
|
|
||||||
metadata:
|
|
||||||
name: grafana-admin
|
|
||||||
namespace: monitoring
|
|
||||||
type: Opaque
|
|
||||||
stringData:
|
|
||||||
admin-user: admin
|
|
||||||
admin-password: {{ op://vg6xf6vvfmoh5hqjjhlhbeoaie/oxkcr3xtxnewy7noep2izvyr6y/password }}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Dashboard ConfigMaps**: Convert each JSON from `ansible/roles/grafana/files/dashboards/` to ConfigMap with label `grafana_dashboard: "1"`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 6. Deploy to Kubernetes
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create namespace and secret
|
|
||||||
ki create namespace monitoring
|
|
||||||
op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | ki apply -f -
|
|
||||||
|
|
||||||
# Push changes and sync
|
|
||||||
argocd app sync grafana
|
|
||||||
argocd app sync grafana-config
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 7. Tailscale Service Cutover
|
|
||||||
|
|
||||||
Remove `svc:grafana` from `ansible/roles/tailscale_serve/defaults/main.yml`, then:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
mise run provision-indri -- --tags tailscale-serve
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 8. Stop Brew Grafana
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'brew services stop grafana'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 9. Retire Ansible Grafana Role
|
|
||||||
|
|
||||||
Once k8s Grafana is verified working:
|
|
||||||
|
|
||||||
1. **Remove role from playbook** - Delete grafana role entry from `ansible/playbooks/indri.yml`
|
|
||||||
|
|
||||||
2. **Delete the role directory** - `rm -rf ansible/roles/grafana/`
|
|
||||||
|
|
||||||
3. **Update zk documentation** - Note in `~/code/personal/zk/1767747119-YCPO.md` that Grafana is now k8s-hosted
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## New Files
|
|
||||||
|
|
||||||
| Path | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `argocd/apps/grafana.yaml` | Grafana Helm chart Application |
|
|
||||||
| `argocd/apps/grafana-config.yaml` | Grafana config Application |
|
|
||||||
| `argocd/manifests/grafana/values.yaml` | Helm values |
|
|
||||||
| `argocd/manifests/grafana-config/kustomization.yaml` | Kustomize config |
|
|
||||||
| `argocd/manifests/grafana-config/ingress-tailscale.yaml` | Tailscale Ingress |
|
|
||||||
| `argocd/manifests/grafana-config/secret-admin.yaml.tpl` | Admin password template |
|
|
||||||
| `argocd/manifests/grafana-config/README.md` | Secrets management notes |
|
|
||||||
| `argocd/manifests/grafana-config/dashboards/configmap-*.yaml` | 9 dashboard ConfigMaps |
|
|
||||||
|
|
||||||
## Modified Files
|
|
||||||
|
|
||||||
| Path | Change |
|
|
||||||
|------|--------|
|
|
||||||
| `argocd/apps/cloudnative-pg.yaml` | Switch to forge-mirrored chart |
|
|
||||||
| `ansible/roles/loki/templates/loki-config.yaml.j2` | Add `http_listen_address: 0.0.0.0` |
|
|
||||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Remove `svc:grafana` |
|
|
||||||
| `ansible/playbooks/indri.yml` | Remove grafana role |
|
|
||||||
|
|
||||||
## Deleted Files
|
|
||||||
|
|
||||||
| Path | Reason |
|
|
||||||
|------|--------|
|
|
||||||
| `ansible/roles/grafana/` | Replaced by k8s deployment |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
- [x] Loki accessible from k8s pods
|
|
||||||
- [x] Prometheus accessible from k8s pods
|
|
||||||
- [x] Grafana pod running in `monitoring` namespace
|
|
||||||
- [x] Grafana Ingress active
|
|
||||||
- [x] https://grafana.tail8d86e.ts.net loads
|
|
||||||
- [x] All 9 dashboards visible
|
|
||||||
- [x] Prometheus datasource queries work
|
|
||||||
- [x] Loki datasource queries work
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Rollback
|
|
||||||
|
|
||||||
1. Re-add `svc:grafana` to ansible tailscale_serve
|
|
||||||
2. `mise run provision-indri -- --tags tailscale-serve,grafana`
|
|
||||||
3. `argocd app delete grafana grafana-config --cascade`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Implementation Notes
|
|
||||||
|
|
||||||
*Added during implementation for retrospective review*
|
|
||||||
|
|
||||||
### SSH Credential Management
|
|
||||||
|
|
||||||
**Issue**: Initial plan used HTTPS URLs for forge-mirrored Helm chart repos, but ArgoCD in cluster couldn't resolve `forge.tail8d86e.ts.net` (MagicDNS not available inside cluster).
|
|
||||||
|
|
||||||
**Solution**: Use SSH URLs for all forge repos. Created a **credential template** (`repo-creds-forge`) that matches all repos under `ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/` using URL prefix matching. This allows a single SSH key (added to Forgejo user, not as deploy key) to work for all repos.
|
|
||||||
|
|
||||||
### SSH Host Key for ArgoCD
|
|
||||||
|
|
||||||
**Issue**: ArgoCD's known_hosts didn't include indri's SSH host key, causing `knownhosts: key is unknown` errors.
|
|
||||||
|
|
||||||
**Solution**: Added `argocd-ssh-known-hosts-cm.yaml` as a kustomize patch to include indri's host key alongside the upstream defaults.
|
|
||||||
|
|
||||||
**Gotcha**: Kustomize patches must **not specify namespace** - the namespace transformation happens *after* patch matching. Our patch had `namespace: argocd` which caused "no matches for Id" errors until removed.
|
|
||||||
|
|
||||||
### Tailscale Hostname Cutover
|
|
||||||
|
|
||||||
**Issue**: After removing `svc:grafana` from ansible's tailscale_serve config, the k8s Ingress still got a numbered hostname (`grafana-1.tail8d86e.ts.net`).
|
|
||||||
|
|
||||||
**Solution**: The old `svc:grafana` service remained registered in Tailscale admin console even after clearing its serve config. **Manual deletion in Tailscale admin console** was required to free the `grafana` hostname for the k8s Ingress to claim. After deletion, recreating the Ingress picked up the correct hostname.
|
|
||||||
|
|
||||||
### ArgoCD Workflow Decision
|
|
||||||
|
|
||||||
During implementation, we established the pattern for GitOps workflow:
|
|
||||||
|
|
||||||
- **All apps target `main` branch** (not feature branches)
|
|
||||||
- Manual sync policy on workload apps = merge doesn't auto-deploy
|
|
||||||
- Workflow: feature branch → PR → merge to main → `argocd app sync <name>`
|
|
||||||
- For testing: temporarily set one app to feature branch via `argocd app set --revision`
|
|
||||||
|
|
||||||
This avoids the friction of switching `targetRevision` in manifests during development.
|
|
||||||
|
|
||||||
### Bootstrap Dependencies
|
|
||||||
|
|
||||||
Some resources must be applied manually before ArgoCD can manage itself:
|
|
||||||
|
|
||||||
1. **SSH known_hosts** - chicken-and-egg: ArgoCD can't sync the config that adds the host key
|
|
||||||
2. **Credential secrets** - `repo-creds-forge` must exist before ArgoCD can pull from forge
|
|
||||||
|
|
||||||
These are documented in `argocd/manifests/argocd/README.md` as bootstrap steps.
|
|
||||||
|
|
||||||
### Actual Versions Used
|
|
||||||
|
|
||||||
- Grafana Helm chart: `grafana-8.8.2` (tag in grafana-helm-charts repo)
|
|
||||||
- CloudNativePG Helm chart: `cloudnative-pg-v0.23.0` (tag in cloudnative-pg-charts repo)
|
|
||||||
- Grafana version: 11.4.0
|
|
||||||
|
|
@ -1,359 +0,0 @@
|
||||||
# Phase 3: PostgreSQL Disaster Recovery & Backup
|
|
||||||
|
|
||||||
**Goal**: Test disaster recovery and configure borgmatic backups for k8s-pg
|
|
||||||
|
|
||||||
**Status**: Complete (2026-01-19)
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 2](P2_grafana.complete.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
Phase 3 establishes disaster recovery capabilities for the k8s PostgreSQL cluster:
|
|
||||||
1. **Fix borgmatic backup issues** - Resolve `borg: command not found` error
|
|
||||||
2. **Test disaster recovery** - Restore miniflux data from borgmatic backup to k8s-pg
|
|
||||||
3. **Create borgmatic user** - Read-only backup user in k8s-pg via CloudNativePG
|
|
||||||
4. **Configure dual database backup** - Backup both brew PostgreSQL and k8s-pg during migration
|
|
||||||
|
|
||||||
This phase prepares for Phase 4 (miniflux migration) by verifying we can restore data to k8s-pg.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Key Decisions
|
|
||||||
|
|
||||||
### Backup Both Databases During Transition
|
|
||||||
|
|
||||||
**Decision**: Configure borgmatic to backup both `localhost:5432/miniflux` (brew) and `k8s-pg.tail8d86e.ts.net:5432/miniflux` (k8s) until migration complete.
|
|
||||||
|
|
||||||
**Why**: Provides redundancy during migration. After Phase 4, remove localhost entry.
|
|
||||||
|
|
||||||
### Reuse Existing borgmatic Password
|
|
||||||
|
|
||||||
**Decision**: Use same borgmatic password from 1Password for k8s-pg user.
|
|
||||||
|
|
||||||
**Why**: Simpler credential management, password already proven secure.
|
|
||||||
|
|
||||||
### CloudNativePG Managed Roles
|
|
||||||
|
|
||||||
**Decision**: Declare borgmatic user via CloudNativePG `managed.roles` instead of SQL commands.
|
|
||||||
|
|
||||||
**Why**: Declarative, version-controlled, matches eblume user pattern.
|
|
||||||
|
|
||||||
### Disable selfHeal on apps App
|
|
||||||
|
|
||||||
**Decision**: Remove `selfHeal: true` from `argocd/apps/apps.yaml`.
|
|
||||||
|
|
||||||
**Why**: Allows temporarily pointing child apps to feature branches during development without ArgoCD reverting the change.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Steps
|
|
||||||
|
|
||||||
### 1. Fix borgmatic borg path issue
|
|
||||||
|
|
||||||
**Problem**: borgmatic failing with `borg: command not found`
|
|
||||||
|
|
||||||
**Cause**: LaunchAgent doesn't have homebrew in PATH, so `borg` binary not found.
|
|
||||||
|
|
||||||
**Solution**: Add `local_path` to borgmatic config template.
|
|
||||||
|
|
||||||
**File**: `ansible/roles/borgmatic/templates/config.yaml.j2`
|
|
||||||
```yaml
|
|
||||||
# Path to borg binary (LaunchAgent doesn't have homebrew in PATH)
|
|
||||||
local_path: {{ borgmatic_local_path }}
|
|
||||||
```
|
|
||||||
|
|
||||||
**File**: `ansible/roles/borgmatic/defaults/main.yml`
|
|
||||||
```yaml
|
|
||||||
borgmatic_local_path: /opt/homebrew/bin/borg
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. Run manual backup to verify fix
|
|
||||||
|
|
||||||
```bash
|
|
||||||
mise run provision-indri -- --tags borgmatic
|
|
||||||
ssh indri '/opt/homebrew/bin/borgmatic --verbosity 1'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Extract miniflux dump from borgmatic
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh indri 'borgmatic list --archive latest'
|
|
||||||
ssh indri 'borgmatic restore --archive latest --destination /tmp/restore'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Add ACL grant for homelab → k8s
|
|
||||||
|
|
||||||
**Problem**: Connection from indri to k8s-pg blocked - Tailscale proxy logs showed "no rules matched"
|
|
||||||
|
|
||||||
**Solution**: Add ACL grant in Pulumi.
|
|
||||||
|
|
||||||
**File**: `pulumi/policy.hujson`
|
|
||||||
```hujson
|
|
||||||
// Homelab can reach k8s PostgreSQL for borgmatic backups
|
|
||||||
{
|
|
||||||
"src": ["tag:homelab"],
|
|
||||||
"dst": ["tag:k8s"],
|
|
||||||
"ip": ["tcp:5432"],
|
|
||||||
},
|
|
||||||
```
|
|
||||||
|
|
||||||
Deploy: `mise run tailnet-up`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 5. Restore data to k8s-pg
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Using eblume superuser credentials from 1Password
|
|
||||||
ssh indri "psql 'postgres://eblume@k8s-pg.tail8d86e.ts.net:5432/miniflux' -f /tmp/restore/localhost/miniflux/miniflux"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Verification**:
|
|
||||||
```bash
|
|
||||||
psql 'postgres://eblume@k8s-pg.tail8d86e.ts.net:5432/miniflux' -c 'SELECT COUNT(*) FROM users; SELECT COUNT(*) FROM feeds; SELECT COUNT(*) FROM entries;'
|
|
||||||
# Result: 2 users, 2 feeds, 44 entries
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 6. Create borgmatic user in k8s-pg via CloudNativePG
|
|
||||||
|
|
||||||
**File**: `argocd/manifests/databases/secret-borgmatic.yaml.tpl`
|
|
||||||
```yaml
|
|
||||||
# Template for borgmatic backup user password
|
|
||||||
# Apply with: op inject -i secret-borgmatic.yaml.tpl | kubectl apply -f -
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Secret
|
|
||||||
metadata:
|
|
||||||
name: blumeops-pg-borgmatic
|
|
||||||
namespace: databases
|
|
||||||
type: kubernetes.io/basic-auth
|
|
||||||
stringData:
|
|
||||||
username: borgmatic
|
|
||||||
password: {{ op://vg6xf6vvfmoh5hqjjhlhbeoaie/mw2bv5we7woicjza7hc6s44yvy/db-password }}
|
|
||||||
```
|
|
||||||
|
|
||||||
**File**: `argocd/manifests/databases/blumeops-pg.yaml` (add to managed roles)
|
|
||||||
```yaml
|
|
||||||
managed:
|
|
||||||
roles:
|
|
||||||
# ... existing eblume role ...
|
|
||||||
# borgmatic read-only user for backups
|
|
||||||
- name: borgmatic
|
|
||||||
login: true
|
|
||||||
connectionLimit: -1
|
|
||||||
ensure: present
|
|
||||||
inherit: true
|
|
||||||
inRoles:
|
|
||||||
- pg_read_all_data
|
|
||||||
passwordSecret:
|
|
||||||
name: blumeops-pg-borgmatic
|
|
||||||
```
|
|
||||||
|
|
||||||
**Deploy**:
|
|
||||||
```bash
|
|
||||||
op inject -i argocd/manifests/databases/secret-borgmatic.yaml.tpl | kubectl apply -f -
|
|
||||||
argocd app set blumeops-pg --revision feature/p3-postgresql-borgmatic
|
|
||||||
argocd app sync blumeops-pg
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 7. Configure borgmatic for dual database backup
|
|
||||||
|
|
||||||
**File**: `ansible/roles/borgmatic/defaults/main.yml`
|
|
||||||
```yaml
|
|
||||||
borgmatic_postgresql_databases:
|
|
||||||
# Brew PostgreSQL on indri (current production)
|
|
||||||
- name: miniflux
|
|
||||||
hostname: localhost
|
|
||||||
port: 5432
|
|
||||||
username: borgmatic
|
|
||||||
# k8s PostgreSQL (CloudNativePG) - backup both during migration
|
|
||||||
- name: miniflux
|
|
||||||
hostname: k8s-pg.tail8d86e.ts.net
|
|
||||||
port: 5432
|
|
||||||
username: borgmatic
|
|
||||||
```
|
|
||||||
|
|
||||||
**File**: `ansible/roles/postgresql/tasks/main.yml` (update .pgpass)
|
|
||||||
```yaml
|
|
||||||
- name: Write .pgpass file for borgmatic backups
|
|
||||||
ansible.builtin.copy:
|
|
||||||
content: |
|
|
||||||
# Managed by ansible - only read-only roles
|
|
||||||
localhost:{{ postgresql_port }}:*:borgmatic:{{ postgresql_user_passwords['borgmatic'] }}
|
|
||||||
k8s-pg.tail8d86e.ts.net:5432:*:borgmatic:{{ postgresql_user_passwords['borgmatic'] }}
|
|
||||||
dest: ~/.pgpass
|
|
||||||
mode: '0600'
|
|
||||||
no_log: true
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 8. Verify complete backup pipeline
|
|
||||||
|
|
||||||
```bash
|
|
||||||
mise run provision-indri -- --tags borgmatic,postgresql
|
|
||||||
ssh indri '/opt/homebrew/bin/borgmatic --verbosity 1'
|
|
||||||
ssh indri 'borgmatic list --archive latest'
|
|
||||||
```
|
|
||||||
|
|
||||||
**Expected output**: Archive contains both dumps:
|
|
||||||
- `localhost/miniflux/miniflux`
|
|
||||||
- `k8s-pg.tail8d86e.ts.net/miniflux/miniflux`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 9. Fix ArgoCD drift from CNPG defaults
|
|
||||||
|
|
||||||
**Problem**: ArgoCD showed blumeops-pg as OutOfSync due to CNPG operator adding default values.
|
|
||||||
|
|
||||||
**Solution**: Add CNPG defaults explicitly to managed roles.
|
|
||||||
|
|
||||||
**File**: `argocd/manifests/databases/blumeops-pg.yaml`
|
|
||||||
```yaml
|
|
||||||
managed:
|
|
||||||
roles:
|
|
||||||
- name: eblume
|
|
||||||
# ... existing fields ...
|
|
||||||
connectionLimit: -1
|
|
||||||
ensure: present
|
|
||||||
inherit: true
|
|
||||||
- name: borgmatic
|
|
||||||
# ... existing fields ...
|
|
||||||
connectionLimit: -1
|
|
||||||
ensure: present
|
|
||||||
inherit: true
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 10. Update zk documentation
|
|
||||||
|
|
||||||
Updated:
|
|
||||||
- `~/code/personal/zk/borgmatic.md` - k8s-pg backup documentation and log entry
|
|
||||||
- `~/code/personal/zk/postgresql.md` - k8s PostgreSQL section and log entry
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## New Files
|
|
||||||
|
|
||||||
| Path | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `argocd/manifests/databases/secret-borgmatic.yaml.tpl` | borgmatic user password template |
|
|
||||||
|
|
||||||
## Modified Files
|
|
||||||
|
|
||||||
| Path | Change |
|
|
||||||
|------|--------|
|
|
||||||
| `ansible/roles/borgmatic/defaults/main.yml` | Added `borgmatic_local_path`, k8s-pg database entry |
|
|
||||||
| `ansible/roles/borgmatic/templates/config.yaml.j2` | Added `local_path` option |
|
|
||||||
| `ansible/roles/postgresql/tasks/main.yml` | Added k8s-pg to .pgpass |
|
|
||||||
| `argocd/apps/apps.yaml` | Disabled selfHeal |
|
|
||||||
| `argocd/manifests/databases/blumeops-pg.yaml` | Added borgmatic managed role, CNPG defaults |
|
|
||||||
| `pulumi/policy.hujson` | Added ACL grant homelab → k8s on tcp:5432 |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
- [x] borgmatic backup runs successfully
|
|
||||||
- [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries)
|
|
||||||
- [x] borgmatic user created in k8s-pg with pg_read_all_data role
|
|
||||||
- [x] Both localhost and k8s-pg databases in backup archive
|
|
||||||
- [x] ArgoCD shows blumeops-pg as Synced
|
|
||||||
- [x] zk documentation updated
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Rollback
|
|
||||||
|
|
||||||
Keep brew PostgreSQL running until Phase 4 verified. To revert:
|
|
||||||
|
|
||||||
1. Remove k8s-pg entry from borgmatic databases
|
|
||||||
2. Remove k8s-pg from .pgpass
|
|
||||||
3. `mise run provision-indri -- --tags borgmatic,postgresql`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Implementation Notes
|
|
||||||
|
|
||||||
*Added during implementation for retrospective review*
|
|
||||||
|
|
||||||
### borgmatic LaunchAgent PATH Issue
|
|
||||||
|
|
||||||
**Problem**: borgmatic LaunchAgent failed with `borg: command not found`
|
|
||||||
|
|
||||||
**Root cause**: LaunchAgents run with minimal PATH that doesn't include `/opt/homebrew/bin`
|
|
||||||
|
|
||||||
**Solution**: Added `local_path: /opt/homebrew/bin/borg` to borgmatic config. This was already done for `pg_dump_command` but not for borg itself.
|
|
||||||
|
|
||||||
**Lesson**: Any tool invoked by borgmatic needs absolute path when running from LaunchAgent.
|
|
||||||
|
|
||||||
### 1Password Field Name Mismatch
|
|
||||||
|
|
||||||
**Issue**: Initial secret template used `password` field but 1Password item had `db-password`.
|
|
||||||
|
|
||||||
**Discovery**: Error message from `op inject` indicated field not found.
|
|
||||||
|
|
||||||
**Fix**: Updated template to use correct field name `db-password`.
|
|
||||||
|
|
||||||
### ACL Grant Discovery
|
|
||||||
|
|
||||||
**Problem**: Connection from indri (tag:homelab) to k8s-pg (tag:k8s) failed.
|
|
||||||
|
|
||||||
**Diagnosis**: Checked Tailscale operator proxy logs which showed "no rules matched" - clear indication of missing ACL.
|
|
||||||
|
|
||||||
**Solution**: Added explicit grant in `pulumi/policy.hujson` for `tag:homelab` → `tag:k8s` on `tcp:5432`.
|
|
||||||
|
|
||||||
### ArgoCD selfHeal and Feature Branch Development
|
|
||||||
|
|
||||||
**Problem**: When testing changes, temporarily pointed blumeops-pg app to feature branch via `argocd app set --revision`. ArgoCD's selfHeal kept reverting it back to main.
|
|
||||||
|
|
||||||
**Discussion**: Two options considered:
|
|
||||||
- Option A: Disable selfHeal on apps app (manual sync required for new apps)
|
|
||||||
- Option B: Keep selfHeal, use different workflow
|
|
||||||
|
|
||||||
**Decision**: Option A chosen. The apps app now only has `prune: true`, not selfHeal. This allows:
|
|
||||||
1. Temporarily testing feature branches
|
|
||||||
2. Manual control over when app manifest changes are applied
|
|
||||||
|
|
||||||
**Trade-off**: Must manually sync apps app when adding/removing Application manifests.
|
|
||||||
|
|
||||||
### CloudNativePG Managed Role Reconciliation
|
|
||||||
|
|
||||||
**Issue**: After creating borgmatic secret with correct password, CNPG didn't immediately update the user.
|
|
||||||
|
|
||||||
**Solution**: Annotated the Cluster to trigger reconciliation:
|
|
||||||
```bash
|
|
||||||
kubectl annotate cluster blumeops-pg -n databases cnpg.io/reconcile=$(date +%s) --overwrite
|
|
||||||
```
|
|
||||||
|
|
||||||
### ArgoCD Drift from CNPG Defaults
|
|
||||||
|
|
||||||
**Problem**: blumeops-pg showed OutOfSync despite successful syncs.
|
|
||||||
|
|
||||||
**Cause**: CNPG operator adds default values (`connectionLimit: -1`, `ensure: present`, `inherit: true`) to managed roles that weren't in our spec.
|
|
||||||
|
|
||||||
**Solution**: Added these defaults explicitly to our spec to match what CNPG generates.
|
|
||||||
|
|
||||||
**Comment added**: Documented in blumeops-pg.yaml that these are "CNPG defaults added to prevent ArgoCD drift".
|
|
||||||
|
|
||||||
### Git Workflow for Phase 3
|
|
||||||
|
|
||||||
1. Created feature branch: `feature/p3-postgresql-borgmatic`
|
|
||||||
2. Made commits throughout implementation
|
|
||||||
3. Pointed blumeops-pg app to feature branch for testing
|
|
||||||
4. Created PR #32 for review
|
|
||||||
5. After merge, reset app to main: `argocd app set blumeops-pg --revision main`
|
|
||||||
|
|
||||||
This workflow was enabled by disabling selfHeal (see above).
|
|
||||||
|
|
@ -1,162 +0,0 @@
|
||||||
# Phase 4: Miniflux Migration to Kubernetes
|
|
||||||
|
|
||||||
**Goal**: Migrate Miniflux entirely off indri and onto k8s, retire brew PostgreSQL, rename k8s-pg to pg
|
|
||||||
|
|
||||||
**Status**: Complete (2026-01-20)
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 3](P3_postgresql.complete.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
This phase completed the miniflux migration and retired brew PostgreSQL:
|
|
||||||
1. Deployed miniflux container in k8s via ArgoCD
|
|
||||||
2. Exposed via Tailscale Ingress at `feed.tail8d86e.ts.net`
|
|
||||||
3. Removed all miniflux infrastructure from indri (ansible role, brew service, Tailscale serve)
|
|
||||||
4. Retired brew PostgreSQL (no longer needed)
|
|
||||||
5. Renamed k8s-pg to pg (canonical Tailscale hostname)
|
|
||||||
6. Updated borgmatic to backup only `pg.tail8d86e.ts.net`
|
|
||||||
7. Updated all zk documentation
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## New Files
|
|
||||||
|
|
||||||
| Path | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `argocd/apps/miniflux.yaml` | ArgoCD Application definition |
|
|
||||||
| `argocd/manifests/miniflux/deployment.yaml` | Miniflux Deployment |
|
|
||||||
| `argocd/manifests/miniflux/service.yaml` | ClusterIP Service |
|
|
||||||
| `argocd/manifests/miniflux/ingress-tailscale.yaml` | Tailscale Ingress for `feed.tail8d86e.ts.net` |
|
|
||||||
| `argocd/manifests/miniflux/secret-db.yaml.tpl` | Database URL secret documentation |
|
|
||||||
| `argocd/manifests/miniflux/kustomization.yaml` | Kustomize configuration |
|
|
||||||
| `argocd/manifests/miniflux/README.md` | Setup instructions |
|
|
||||||
|
|
||||||
## Modified Files
|
|
||||||
|
|
||||||
| Path | Change |
|
|
||||||
|------|--------|
|
|
||||||
| `ansible/playbooks/indri.yml` | Removed miniflux and postgresql roles, simplified pre_tasks |
|
|
||||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Removed `svc:feed` and `svc:pg` entries |
|
|
||||||
| `ansible/roles/alloy/defaults/main.yml` | Removed miniflux and postgresql logs, disabled postgres metrics |
|
|
||||||
| `ansible/roles/borgmatic/defaults/main.yml` | Updated to backup only `pg.tail8d86e.ts.net` |
|
|
||||||
| `ansible/roles/borgmatic/tasks/main.yml` | Added .pgpass file management |
|
|
||||||
| `argocd/manifests/databases/service-tailscale.yaml` | Renamed hostname from k8s-pg to pg |
|
|
||||||
|
|
||||||
## Deleted Files
|
|
||||||
|
|
||||||
| Path | Reason |
|
|
||||||
|------|--------|
|
|
||||||
| `ansible/roles/miniflux/` | Entire role no longer needed |
|
|
||||||
| `ansible/roles/postgresql/` | Brew PostgreSQL no longer needed |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
- [x] Miniflux pod healthy in k8s
|
|
||||||
- [x] https://feed.tail8d86e.ts.net accessible
|
|
||||||
- [x] User `eblume` can log in
|
|
||||||
- [x] Feeds visible and entries readable
|
|
||||||
- [x] `pg.tail8d86e.ts.net` resolves to k8s PostgreSQL
|
|
||||||
- [x] Old `k8s-pg` and `feed` devices removed from Tailscale
|
|
||||||
- [x] brew miniflux and postgresql services stopped
|
|
||||||
- [x] Tailscale serve entries cleared from indri
|
|
||||||
- [x] zk documentation updated
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Implementation Notes
|
|
||||||
|
|
||||||
*Lessons learned and issues encountered*
|
|
||||||
|
|
||||||
### CNPG-Generated Password vs 1Password
|
|
||||||
|
|
||||||
**Problem**: Initial secret template used 1Password for miniflux database password, but CNPG auto-generates the bootstrap owner password.
|
|
||||||
|
|
||||||
**Solution**: Reference the CNPG-generated password from `blumeops-pg-app` secret:
|
|
||||||
```bash
|
|
||||||
kubectl create secret generic miniflux-db -n miniflux \
|
|
||||||
--from-literal=url="$(kubectl -n databases get secret blumeops-pg-app -o jsonpath='{.data.uri}' | base64 -d)"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Table Ownership Issue After P3 Restore
|
|
||||||
|
|
||||||
**Problem**: Miniflux pod crashed with "permission denied for table schema_version".
|
|
||||||
|
|
||||||
**Root cause**: P3 restore was run as the `eblume` superuser, so all tables were created owned by `eblume`, not `miniflux`.
|
|
||||||
|
|
||||||
**Solution**: Transfer ownership of all tables to miniflux:
|
|
||||||
```sql
|
|
||||||
DO $$
|
|
||||||
DECLARE r RECORD;
|
|
||||||
BEGIN
|
|
||||||
FOR r IN (SELECT tablename FROM pg_tables WHERE schemaname = 'public') LOOP
|
|
||||||
EXECUTE 'ALTER TABLE public.' || quote_ident(r.tablename) || ' OWNER TO miniflux';
|
|
||||||
END LOOP;
|
|
||||||
END$$;
|
|
||||||
```
|
|
||||||
|
|
||||||
### Tailscale Ingress Hostname Suffix
|
|
||||||
|
|
||||||
**Behavior**: When requesting a Tailscale hostname that's already taken, the operator adds a suffix (e.g., `feed-1`).
|
|
||||||
|
|
||||||
**Workflow**:
|
|
||||||
1. Deploy initially - gets `feed-1.tail8d86e.ts.net`
|
|
||||||
2. Clear old `svc:feed` from indri
|
|
||||||
3. Delete old `feed` device from Tailscale admin
|
|
||||||
4. Delete and recreate the Ingress - now claims `feed`
|
|
||||||
|
|
||||||
### Renaming Tailscale Service Hostname
|
|
||||||
|
|
||||||
**Problem**: Changing the `tailscale.com/hostname` annotation doesn't automatically update the Tailscale device.
|
|
||||||
|
|
||||||
**Solution**: Delete the service and let ArgoCD recreate it:
|
|
||||||
```bash
|
|
||||||
kubectl -n databases delete service blumeops-pg-tailscale
|
|
||||||
argocd app sync blumeops-pg
|
|
||||||
```
|
|
||||||
|
|
||||||
### .pgpass Management Migration
|
|
||||||
|
|
||||||
**Issue**: The postgresql role managed `~/.pgpass` for borgmatic. With postgresql role deleted, borgmatic couldn't authenticate.
|
|
||||||
|
|
||||||
**Solution**: Moved .pgpass management to the borgmatic role. Password is still fetched in playbook pre_tasks as `borgmatic_db_password`.
|
|
||||||
|
|
||||||
### Ansible Check Mode and Registered Variables
|
|
||||||
|
|
||||||
**Problem**: Running `provision-indri --check --diff` failed in the podman role with "Conditional result (True) was derived from value of type 'str'" errors.
|
|
||||||
|
|
||||||
**Root cause**: Command tasks are skipped in check mode, leaving registered variables undefined or with unexpected types when used in conditionals.
|
|
||||||
|
|
||||||
**Solution**: Added `check_mode: false` to read-only command tasks that gather information:
|
|
||||||
```yaml
|
|
||||||
- name: Check if podman machine exists
|
|
||||||
ansible.builtin.command:
|
|
||||||
cmd: podman machine list --format json
|
|
||||||
register: podman_machine_list
|
|
||||||
changed_when: false
|
|
||||||
check_mode: false # Safe to run in check mode - read-only
|
|
||||||
```
|
|
||||||
|
|
||||||
**Lesson**: Any task that registers a variable used in conditionals should have `check_mode: false` if the command is read-only/safe.
|
|
||||||
|
|
||||||
### 1Password CLI on Headless Hosts
|
|
||||||
|
|
||||||
**Issue**: Attempted to run `op` commands on indri, but 1Password CLI requires interactive authentication (biometrics/password).
|
|
||||||
|
|
||||||
**Solution**: All `op` commands must be in `pre_tasks` of the playbook with `delegate_to: localhost` so they run on gilbert (the workstation with GUI auth).
|
|
||||||
|
|
||||||
### Git Workflow for Phase 4
|
|
||||||
|
|
||||||
1. Created feature branch: `feature/p4-miniflux`
|
|
||||||
2. Made incremental commits throughout implementation
|
|
||||||
3. Pointed `miniflux` and `blumeops-pg` apps to feature branch for testing
|
|
||||||
4. Created PR #33 for review
|
|
||||||
5. After merge, reset apps to main:
|
|
||||||
```bash
|
|
||||||
argocd app set miniflux --revision main
|
|
||||||
argocd app set blumeops-pg --revision main
|
|
||||||
argocd app sync apps
|
|
||||||
```
|
|
||||||
|
|
@ -1,208 +0,0 @@
|
||||||
# Phase 5.1: Migrate Minikube from QEMU2 to Docker Driver
|
|
||||||
|
|
||||||
**Goal**: Replace the qemu2 driver with docker to fix remote API access and simplify volume mounts
|
|
||||||
|
|
||||||
**Status**: Complete (2026-01-21) - Cluster running, ArgoCD deployed, apps synced
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 5](P5_devpi.complete.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Background
|
|
||||||
|
|
||||||
### Original Problem (Podman → QEMU2)
|
|
||||||
|
|
||||||
During Phase 6 (Kiwix/Transmission migration), we discovered that the **podman driver has fundamental limitations** that prevent mounting external volumes:
|
|
||||||
|
|
||||||
1. **SMB CSI driver fails** with "Operation not permitted" - the rootless container lacks kernel-level mount capabilities
|
|
||||||
2. **`minikube mount` fails** - 9p mount gets "permission denied" inside the podman VM
|
|
||||||
3. **hostPath volumes** only work for paths inside the minikube container, not the macOS host
|
|
||||||
|
|
||||||
We migrated to QEMU2 to get a full VM with kernel capabilities.
|
|
||||||
|
|
||||||
### New Problem (QEMU2 → Docker)
|
|
||||||
|
|
||||||
The QEMU2 driver introduced a **new problem**: the Kubernetes API server is inside the VM at `192.168.105.2:6443`, and Tailscale's TCP proxy cannot forward to it properly:
|
|
||||||
|
|
||||||
- TCP connections succeed (nc -zv works)
|
|
||||||
- TLS handshake times out
|
|
||||||
- Root cause unknown, but likely related to Tailscale serve's handling of non-localhost upstreams
|
|
||||||
|
|
||||||
Additionally, the volume mount solution with QEMU2 was complex:
|
|
||||||
- Required NFS mount from sifaka → indri
|
|
||||||
- Then `minikube mount` to pass through to VM
|
|
||||||
- Two LaunchAgents/LaunchDaemons for persistence
|
|
||||||
- macOS GUI approval required for network access
|
|
||||||
|
|
||||||
### Why Docker?
|
|
||||||
|
|
||||||
The **docker driver** solves both problems:
|
|
||||||
|
|
||||||
1. **API Server on localhost**: Docker Desktop handles port forwarding from container to localhost automatically, so `tailscale serve --tcp=443 tcp://localhost:PORT` works
|
|
||||||
|
|
||||||
2. **Simpler volume mounts**: Docker Desktop has built-in macOS file sharing. Paths shared with Docker are accessible inside containers.
|
|
||||||
|
|
||||||
3. **Official Tailscale recommendation**: Tailscale's own [Kubernetes guide](https://tailscale.com/learn/managing-access-to-kubernetes-with-tailscale) uses minikube with the docker driver.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Implementation Summary
|
|
||||||
|
|
||||||
### Infrastructure Changes
|
|
||||||
|
|
||||||
1. **Docker Desktop installed** (manual via `brew install --cask docker`)
|
|
||||||
- Configured with 12GB memory in Docker Desktop settings
|
|
||||||
- Kubernetes option disabled (using minikube instead)
|
|
||||||
|
|
||||||
2. **Docker minikube cluster created**:
|
|
||||||
```bash
|
|
||||||
minikube start \
|
|
||||||
--driver=docker \
|
|
||||||
--container-runtime=docker \
|
|
||||||
--cpus=6 \
|
|
||||||
--memory=11264 \
|
|
||||||
--disk-size=200g \
|
|
||||||
--apiserver-names=k8s.tail8d86e.ts.net,indri \
|
|
||||||
--apiserver-port=6443 \
|
|
||||||
--listen-address=0.0.0.0
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Tailscale serve configured** for k8s API:
|
|
||||||
- API server on localhost (port is dynamic with docker driver)
|
|
||||||
- `tailscale serve --service=svc:k8s --tcp=443 tcp://localhost:<PORT>`
|
|
||||||
|
|
||||||
4. **Remote kubectl access working** from gilbert:
|
|
||||||
- Created `mise-tasks/ensure-minikube-indri-kubectl-config` script
|
|
||||||
- Fetches certs from indri and sets up `~/.kube/minikube-indri/config.yml`
|
|
||||||
|
|
||||||
### Ansible Roles Updated
|
|
||||||
|
|
||||||
- `ansible/roles/minikube/` - docker driver, removed qemu2/NFS/socket_vmnet
|
|
||||||
- `ansible/roles/tailscale_serve/` - removed svc:k8s (minikube role handles dynamic port)
|
|
||||||
- Containerd registry mirrors configured for zot pull-through cache
|
|
||||||
|
|
||||||
### ArgoCD Bootstrap
|
|
||||||
|
|
||||||
All apps deployed and synced from `feature/p5.1-qemu2-migration` branch:
|
|
||||||
|
|
||||||
| App | Status | Notes |
|
|
||||||
|-----|--------|-------|
|
|
||||||
| tailscale-operator | Healthy | Manages Tailscale ingresses |
|
|
||||||
| argocd | Healthy | Self-managed |
|
|
||||||
| cloudnative-pg | Healthy | PostgreSQL operator |
|
|
||||||
| blumeops-pg | Progressing | PostgreSQL cluster starting |
|
|
||||||
| grafana | Progressing | Needs grafana-admin secret |
|
|
||||||
| grafana-config | Healthy | Dashboards and ingress |
|
|
||||||
| miniflux | Progressing | Needs miniflux-config secret |
|
|
||||||
| devpi | Progressing | Starting up |
|
|
||||||
|
|
||||||
### Secrets Still Needed
|
|
||||||
|
|
||||||
After PR merge, apply these secrets manually:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Grafana admin password
|
|
||||||
op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | kubectl --context=minikube-indri apply -f -
|
|
||||||
|
|
||||||
# Miniflux config
|
|
||||||
op inject -i argocd/manifests/miniflux/secret.yaml.tpl | kubectl --context=minikube-indri apply -f -
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Technical Notes
|
|
||||||
|
|
||||||
### API Server Port
|
|
||||||
|
|
||||||
With docker driver, the API server port is **dynamic** - Docker maps a random host port to 6443 inside the container.
|
|
||||||
|
|
||||||
The minikube ansible role queries the port after cluster start and configures tailscale serve accordingly.
|
|
||||||
|
|
||||||
### Registry Mirror Configuration
|
|
||||||
|
|
||||||
Containerd uses `/etc/containerd/certs.d/<registry>/hosts.toml` files. The ansible role configures mirrors for:
|
|
||||||
- `registry.tail8d86e.ts.net` (private images)
|
|
||||||
- `docker.io`
|
|
||||||
- `ghcr.io`
|
|
||||||
- `quay.io`
|
|
||||||
|
|
||||||
### ProxyClass Renamed
|
|
||||||
|
|
||||||
Changed from `crio-compat` to `default` - the old name was misleading since we're no longer using CRI-O.
|
|
||||||
|
|
||||||
### Volume Mounts for P6 (Kiwix/Transmission)
|
|
||||||
|
|
||||||
**Solution: Direct NFS from pods to sifaka** ✅ TESTED AND WORKING
|
|
||||||
|
|
||||||
Docker NATs outbound traffic through indri's LAN IP (192.168.1.50), so sifaka's NFS exports need to allow `192.168.1.0/24`.
|
|
||||||
|
|
||||||
Sifaka NFS exports configured:
|
|
||||||
- `192.168.1.0/24` - Docker containers via indri NAT
|
|
||||||
- `100.64.0.0/10` - Tailscale clients
|
|
||||||
|
|
||||||
Pods can mount NFS directly:
|
|
||||||
```yaml
|
|
||||||
volumes:
|
|
||||||
- name: torrents
|
|
||||||
nfs:
|
|
||||||
server: sifaka
|
|
||||||
path: /volume1/torrents
|
|
||||||
```
|
|
||||||
|
|
||||||
No LaunchAgents, no `minikube mount`, no SMB CSI driver needed.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
- [x] Docker Desktop installed and running on indri
|
|
||||||
- [x] QEMU2 minikube deleted
|
|
||||||
- [x] Docker minikube running (6 CPUs, 11GB RAM)
|
|
||||||
- [x] API server accessible on localhost
|
|
||||||
- [x] Tailscale serve configured for svc:k8s
|
|
||||||
- [x] Remote kubectl access working from gilbert
|
|
||||||
- [x] Ansible roles updated for docker driver
|
|
||||||
- [x] socket_vmnet stopped
|
|
||||||
- [x] ArgoCD deployed and synced
|
|
||||||
- [x] All apps synced to feature branch
|
|
||||||
- [x] Apply app secrets (grafana-admin, miniflux-db, devpi-root, eblume, borgmatic)
|
|
||||||
- [x] Verify all apps healthy after secrets applied
|
|
||||||
- [x] Miniflux database restored from borgmatic backup
|
|
||||||
- [ ] Merge PR and reset apps to main branch
|
|
||||||
- [ ] `mise run indri-services-check` passes
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Post-Merge Steps
|
|
||||||
|
|
||||||
After PR is merged:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Reset all blumeops apps to main branch
|
|
||||||
argocd app set apps --revision main
|
|
||||||
argocd app set argocd --revision main
|
|
||||||
argocd app set blumeops-pg --revision main
|
|
||||||
argocd app set devpi --revision main
|
|
||||||
argocd app set grafana-config --revision main
|
|
||||||
argocd app set miniflux --revision main
|
|
||||||
argocd app set tailscale-operator --revision main
|
|
||||||
|
|
||||||
# Sync all apps
|
|
||||||
argocd app sync apps
|
|
||||||
argocd app sync argocd
|
|
||||||
argocd app sync tailscale-operator
|
|
||||||
argocd app sync blumeops-pg
|
|
||||||
argocd app sync grafana-config
|
|
||||||
argocd app sync miniflux
|
|
||||||
argocd app sync devpi
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Rollback Plan
|
|
||||||
|
|
||||||
If Docker driver doesn't work:
|
|
||||||
|
|
||||||
1. Delete Docker minikube: `minikube delete`
|
|
||||||
2. Recreate QEMU2 cluster (restore old ansible config from git)
|
|
||||||
3. Accept the Tailscale TCP forwarding limitation and use SSH tunnel for remote kubectl
|
|
||||||
|
|
@ -1,102 +0,0 @@
|
||||||
# Phase 5: devpi Migration to Kubernetes
|
|
||||||
|
|
||||||
**Goal**: Migrate devpi PyPI caching proxy from indri to k8s
|
|
||||||
|
|
||||||
**Status**: Complete (2026-01-20)
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 4](P4_miniflux.complete.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Successfully migrated devpi from mcquack LaunchAgent on indri to Kubernetes:
|
|
||||||
- Custom container image with devpi-server + devpi-web + auto-init startup script
|
|
||||||
- StatefulSet with 50Gi PVC for data persistence
|
|
||||||
- Tailscale Ingress at `pypi.tail8d86e.ts.net`
|
|
||||||
- Root password from 1Password secret, auto-initialized on first run
|
|
||||||
- Verified pip caching proxy and mcquack package upload
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Key Learnings
|
|
||||||
|
|
||||||
### Registry Mirror Configuration
|
|
||||||
- Minikube's CRI-O can't resolve Tailscale hostnames directly
|
|
||||||
- Added registry mirror config to redirect `registry.tail8d86e.ts.net` → `host.containers.internal:5050`
|
|
||||||
- Also added direct insecure registry entry for `host.containers.internal:5050`
|
|
||||||
- Config in `ansible/roles/minikube/files/zot-mirror.conf`
|
|
||||||
|
|
||||||
### Memory Requirements
|
|
||||||
- devpi-web's Whoosh search indexer needs significant memory during PyPI index build
|
|
||||||
- Initial 512Mi limit caused OOMKills
|
|
||||||
- Solution: High limit (2Gi) with low request (256Mi) - memory reclaimed after indexing
|
|
||||||
|
|
||||||
### Environment Variable Conflicts
|
|
||||||
- Kubernetes auto-sets `DEVPI_PORT` for service discovery
|
|
||||||
- Conflicted with our port config - renamed to `DEVPI_LISTEN_PORT`
|
|
||||||
|
|
||||||
### Tailscale Serve Cleanup
|
|
||||||
- Use `tailscale serve status --json` to see entries (non-JSON output can be empty)
|
|
||||||
- Use `tailscale serve clear svc:<name>` to remove entries
|
|
||||||
|
|
||||||
### ArgoCD Workflow
|
|
||||||
- Changed `apps` to manual sync (was auto-sync with prune)
|
|
||||||
- Workflow: sync apps → set revision to feature branch → sync service → test → reset to main after merge
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
- [x] devpi pod healthy in k8s
|
|
||||||
- [x] https://pypi.tail8d86e.ts.net accessible
|
|
||||||
- [x] Web interface shows root/pypi index
|
|
||||||
- [x] `pip install <package>` works through proxy
|
|
||||||
- [x] mcquack v1.0.0 uploaded to eblume/dev
|
|
||||||
- [x] `pip install --index-url https://pypi.tail8d86e.ts.net/eblume/dev/+simple/ mcquack` works
|
|
||||||
- [x] Old devpi service removed from indri
|
|
||||||
- [x] zk documentation updated
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Files Changed
|
|
||||||
|
|
||||||
### New Files
|
|
||||||
| Path | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `argocd/apps/devpi.yaml` | ArgoCD Application definition |
|
|
||||||
| `argocd/manifests/devpi/Dockerfile` | Container image with startup script |
|
|
||||||
| `argocd/manifests/devpi/start.sh` | Auto-init startup script |
|
|
||||||
| `argocd/manifests/devpi/statefulset.yaml` | StatefulSet with PVC |
|
|
||||||
| `argocd/manifests/devpi/service.yaml` | ClusterIP Service |
|
|
||||||
| `argocd/manifests/devpi/ingress-tailscale.yaml` | Tailscale Ingress |
|
|
||||||
| `argocd/manifests/devpi/kustomization.yaml` | Kustomize configuration |
|
|
||||||
| `argocd/manifests/devpi/secret-root.yaml.tpl` | 1Password secret template |
|
|
||||||
| `argocd/manifests/devpi/README.md` | Setup documentation |
|
|
||||||
|
|
||||||
### Modified Files
|
|
||||||
| Path | Change |
|
|
||||||
|------|--------|
|
|
||||||
| `CLAUDE.md` | Added k8s/ArgoCD workflow documentation |
|
|
||||||
| `ansible/playbooks/indri.yml` | Removed devpi and devpi_metrics roles |
|
|
||||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Removed svc:pypi |
|
|
||||||
| `ansible/roles/alloy/defaults/main.yml` | Removed devpi log collection |
|
|
||||||
| `ansible/roles/borgmatic/defaults/main.yml` | Removed devpi backup paths |
|
|
||||||
| `ansible/roles/minikube/files/zot-mirror.conf` | Added registry mirror for Tailscale hostname |
|
|
||||||
| `argocd/apps/apps.yaml` | Changed to manual sync policy |
|
|
||||||
|
|
||||||
### Roles Kept (not deleted)
|
|
||||||
- `ansible/roles/devpi/` - Kept for reference
|
|
||||||
- `ansible/roles/devpi_metrics/` - Kept for reference
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Post-Merge Cleanup
|
|
||||||
|
|
||||||
After PR merge, reset ArgoCD apps to main:
|
|
||||||
```fish
|
|
||||||
argocd app set apps --revision main
|
|
||||||
argocd app sync apps
|
|
||||||
argocd app set devpi --revision main
|
|
||||||
argocd app sync devpi
|
|
||||||
```
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,394 +0,0 @@
|
||||||
# Phase 7: Forgejo Migration to Kubernetes
|
|
||||||
|
|
||||||
**Goal**: Migrate Forgejo from indri (macOS Homebrew) to Kubernetes via ArgoCD
|
|
||||||
|
|
||||||
**Status**: Planning (2026-01-21)
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 6](P6_kiwix.complete.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Critical Risks & Mitigations
|
|
||||||
|
|
||||||
### 1. Circular Dependency (Highest Risk)
|
|
||||||
|
|
||||||
ArgoCD pulls manifests from Forgejo. If k8s Forgejo fails, we cannot redeploy it.
|
|
||||||
|
|
||||||
**Mitigation**: blumeops is mirrored to `github.com/eblume/blumeops`. DR procedure documented to switch ArgoCD to GitHub temporarily (see Disaster Recovery section).
|
|
||||||
|
|
||||||
### 2. Split Hostnames Required
|
|
||||||
|
|
||||||
The Tailscale k8s operator [cannot expose both HTTPS and TCP/SSH on the same hostname](https://github.com/tailscale/tailscale/issues/15539). See also [user comment](https://github.com/tailscale/tailscale/issues/15539#issuecomment-3782368432).
|
|
||||||
|
|
||||||
**Solution**:
|
|
||||||
- **HTTPS (web UI)**: `forge.tail8d86e.ts.net` via Tailscale Ingress
|
|
||||||
- **SSH (git operations)**: `git.tail8d86e.ts.net` via Tailscale LoadBalancer
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Current State
|
|
||||||
|
|
||||||
### Forgejo on indri
|
|
||||||
|
|
||||||
| Component | Location/Details |
|
|
||||||
|-----------|------------------|
|
|
||||||
| Data directory | `/opt/homebrew/var/forgejo/` (~426MB) |
|
|
||||||
| SQLite database | `/opt/homebrew/var/forgejo/data/forgejo.db` (4.1MB) |
|
|
||||||
| Git repositories | `/opt/homebrew/var/forgejo/data/forgejo-repositories/` (~418MB) |
|
|
||||||
| Configuration | `/opt/homebrew/var/forgejo/custom/conf/app.ini` (contains secrets) |
|
|
||||||
| HTTP port | 3001 (localhost) |
|
|
||||||
| SSH port | 2200 (localhost) |
|
|
||||||
| Tailscale | `svc:forge` with tcp:22→2200 and https:443→3001 |
|
|
||||||
| Backup | borgmatic backs up to sifaka |
|
|
||||||
|
|
||||||
### Hosted Repositories (8 total)
|
|
||||||
|
|
||||||
- blumeops (mirrored to GitHub)
|
|
||||||
- cloudnative-pg-charts
|
|
||||||
- csi-driver-smb
|
|
||||||
- devpi
|
|
||||||
- dotfiles
|
|
||||||
- grafana-helm-charts
|
|
||||||
- mcquack
|
|
||||||
- zot
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Architecture Decision: Helm Chart via ArgoCD
|
|
||||||
|
|
||||||
Following established pattern from cloudnative-pg and grafana:
|
|
||||||
1. Mirror `https://code.forgejo.org/forgejo-helm/forgejo-helm` to forge
|
|
||||||
2. ArgoCD Application with multi-source (chart + values)
|
|
||||||
3. Values file in `argocd/manifests/forgejo/values.yaml`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## All `forge` References Requiring Update
|
|
||||||
|
|
||||||
### SSH URLs (change to `git.tail8d86e.ts.net:22`)
|
|
||||||
|
|
||||||
| File | Current | After |
|
|
||||||
|------|---------|-------|
|
|
||||||
| `argocd/apps/apps.yaml` | `ssh://forgejo@indri.tail8d86e.ts.net:2200/...` | `ssh://forgejo@git.tail8d86e.ts.net/...` |
|
|
||||||
| `argocd/apps/argocd.yaml` | same | same |
|
|
||||||
| `argocd/apps/blumeops-pg.yaml` | same | same |
|
|
||||||
| `argocd/apps/cloudnative-pg.yaml` | same | same |
|
|
||||||
| `argocd/apps/devpi.yaml` | same | same |
|
|
||||||
| `argocd/apps/grafana.yaml` | same | same |
|
|
||||||
| `argocd/apps/grafana-config.yaml` | same | same |
|
|
||||||
| `argocd/apps/kiwix.yaml` | same | same |
|
|
||||||
| `argocd/apps/miniflux.yaml` | same | same |
|
|
||||||
| `argocd/apps/tailscale-operator.yaml` | same | same |
|
|
||||||
| `argocd/apps/torrent.yaml` | same | same |
|
|
||||||
| `argocd/manifests/argocd/repo-forge-secret.yaml.tpl` | `ssh://forgejo@indri.tail8d86e.ts.net:2200/eblume/` | `ssh://forgejo@git.tail8d86e.ts.net/eblume/` |
|
|
||||||
| `ansible/group_vars/all.yml` | `ssh://forgejo@forge.tail8d86e.ts.net/...` | `ssh://forgejo@git.tail8d86e.ts.net/...` |
|
|
||||||
|
|
||||||
### SSH Known Hosts (add `git.tail8d86e.ts.net`)
|
|
||||||
|
|
||||||
| File | Change |
|
|
||||||
|------|--------|
|
|
||||||
| `argocd/manifests/argocd/argocd-ssh-known-hosts-cm.yaml` | Add `git.tail8d86e.ts.net ssh-ed25519 AAAA...` |
|
|
||||||
|
|
||||||
### HTTPS URLs (stay as `forge.tail8d86e.ts.net`)
|
|
||||||
|
|
||||||
These remain unchanged:
|
|
||||||
- `CLAUDE.md:135` - Mirror location
|
|
||||||
- `mise-tasks/pr-comments:23` - Forge API base
|
|
||||||
- `mise-tasks/indri-services-check:65` - HTTP health check (update to check k8s)
|
|
||||||
|
|
||||||
### Ansible/Indri Cleanup (remove after migration)
|
|
||||||
|
|
||||||
| File | Action |
|
|
||||||
|------|--------|
|
|
||||||
| `ansible/playbooks/indri.yml:36-37` | Remove forgejo role |
|
|
||||||
| `ansible/roles/tailscale_serve/defaults/main.yml:6` | Remove `svc:forge` entry |
|
|
||||||
| `ansible/roles/alloy/defaults/main.yml:31-32` | Remove forgejo log collection |
|
|
||||||
| `ansible/roles/borgmatic/defaults/main.yml:17` | Update backup path |
|
|
||||||
|
|
||||||
### Tailscale/Pulumi (update after hostname cutover)
|
|
||||||
|
|
||||||
| File | Change |
|
|
||||||
|------|--------|
|
|
||||||
| `argocd/manifests/tailscale-operator/egress-forge.yaml` | Delete (no longer needed) |
|
|
||||||
| `pulumi/policy.hujson` | Update `tag:forge` ACLs for k8s source |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Pre-Migration Checklist
|
|
||||||
|
|
||||||
- [ ] GitHub mirror verified current
|
|
||||||
- [ ] Full borgmatic backup completed and verified
|
|
||||||
- [ ] Manual backup of `/opt/homebrew/var/forgejo` on indri
|
|
||||||
- [ ] Document all SSH deploy keys and webhooks
|
|
||||||
- [ ] **User action**: Mirror forgejo-helm chart to forge
|
|
||||||
- [ ] Extract secrets from app.ini to 1Password:
|
|
||||||
- `INTERNAL_TOKEN`
|
|
||||||
- `SECRET_KEY`
|
|
||||||
- `JWT_SECRET`
|
|
||||||
- Any OAuth/webhook secrets
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Steps
|
|
||||||
|
|
||||||
### Phase A: Create k8s Manifests
|
|
||||||
|
|
||||||
**New Files:**
|
|
||||||
```
|
|
||||||
argocd/apps/forgejo.yaml # ArgoCD Application (multi-source Helm)
|
|
||||||
argocd/manifests/forgejo/values.yaml # Helm chart values
|
|
||||||
argocd/manifests/forgejo/kustomization.yaml # Kustomize config
|
|
||||||
argocd/manifests/forgejo/pvc.yaml # 10Gi PersistentVolumeClaim
|
|
||||||
argocd/manifests/forgejo/secret-app.yaml.tpl # Secrets from 1Password
|
|
||||||
```
|
|
||||||
|
|
||||||
**Key values.yaml settings:**
|
|
||||||
```yaml
|
|
||||||
service:
|
|
||||||
ssh:
|
|
||||||
type: LoadBalancer
|
|
||||||
loadBalancerClass: tailscale
|
|
||||||
port: 22
|
|
||||||
annotations:
|
|
||||||
tailscale.com/hostname: "git-1" # Test hostname first
|
|
||||||
|
|
||||||
ingress:
|
|
||||||
enabled: true
|
|
||||||
className: tailscale
|
|
||||||
hosts:
|
|
||||||
- host: forge-1 # Test hostname first
|
|
||||||
|
|
||||||
gitea:
|
|
||||||
config:
|
|
||||||
server:
|
|
||||||
DOMAIN: forge-1.tail8d86e.ts.net
|
|
||||||
ROOT_URL: https://forge-1.tail8d86e.ts.net/
|
|
||||||
SSH_DOMAIN: git-1.tail8d86e.ts.net
|
|
||||||
SSH_PORT: 22
|
|
||||||
database:
|
|
||||||
DB_TYPE: sqlite3
|
|
||||||
PATH: /data/forgejo.db
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase B: Deploy to Test Hostnames
|
|
||||||
|
|
||||||
1. Create feature branch, push to forge
|
|
||||||
2. Sync ArgoCD apps: `argocd app sync apps`
|
|
||||||
3. Point forgejo app to feature branch: `argocd app set forgejo --revision feature/p7-forgejo`
|
|
||||||
4. Sync forgejo app: `argocd app sync forgejo`
|
|
||||||
5. Verify pods running (empty data initially)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase C: Data Migration (~10 min downtime)
|
|
||||||
|
|
||||||
1. **Stop indri Forgejo**
|
|
||||||
```bash
|
|
||||||
ssh indri 'brew services stop forgejo'
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Copy data** (option A: rsync via NFS staging)
|
|
||||||
```bash
|
|
||||||
ssh indri 'rsync -avP /opt/homebrew/var/forgejo/ sifaka:/volume1/forgejo-migration/'
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Copy to PVC and fix permissions**
|
|
||||||
```bash
|
|
||||||
kubectl exec -n forgejo deployment/forgejo -- rsync -avP /staging/ /data/
|
|
||||||
kubectl exec -n forgejo deployment/forgejo -- chown -R 1000:1000 /data
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Restart Forgejo**
|
|
||||||
```bash
|
|
||||||
kubectl rollout restart deployment/forgejo -n forgejo
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase D: Validation (Critical)
|
|
||||||
|
|
||||||
- [ ] Web UI accessible at `forge-1.tail8d86e.ts.net`
|
|
||||||
- [ ] SSH works: `ssh -T forgejo@git-1.tail8d86e.ts.net`
|
|
||||||
- [ ] All 8 repos visible and accessible
|
|
||||||
- [ ] Git clone works
|
|
||||||
- [ ] Git push works (test on non-critical repo)
|
|
||||||
- [ ] eblume user preserved with correct permissions
|
|
||||||
- [ ] PR history intact
|
|
||||||
- [ ] Webhooks functioning
|
|
||||||
- [ ] GitHub mirror push still works
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase E: Hostname Cutover
|
|
||||||
|
|
||||||
1. **Clear indri Tailscale serve**
|
|
||||||
```bash
|
|
||||||
ssh indri 'tailscale serve clear svc:forge'
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **User action**: Delete `svc:forge` and `forge-1` devices from Tailscale admin
|
|
||||||
|
|
||||||
3. **Update manifests**: Change `forge-1` → `forge`, `git-1` → `git`
|
|
||||||
|
|
||||||
4. **Sync ArgoCD**
|
|
||||||
|
|
||||||
5. **Verify hostnames claimed**
|
|
||||||
```bash
|
|
||||||
curl https://forge.tail8d86e.ts.net/api/v1/version
|
|
||||||
ssh -T forgejo@git.tail8d86e.ts.net
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase F: Update ArgoCD to Use New Forgejo
|
|
||||||
|
|
||||||
1. **Get SSH host key from k8s Forgejo**
|
|
||||||
```bash
|
|
||||||
kubectl exec -n forgejo deployment/forgejo -- cat /data/ssh/ssh_host_ed25519_key.pub
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Update known_hosts ConfigMap** with `git.tail8d86e.ts.net` key
|
|
||||||
|
|
||||||
3. **Update repo-creds-forge secret** (manual kubectl commands)
|
|
||||||
|
|
||||||
4. **Update all ArgoCD Application manifests** with new repoURL
|
|
||||||
|
|
||||||
5. **Delete egress-forge.yaml** (no longer needed)
|
|
||||||
|
|
||||||
6. **Sync ArgoCD** and verify all apps sync successfully
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase G: Update Local Git Remotes
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/code/personal/blumeops
|
|
||||||
git remote set-url origin ssh://forgejo@git.tail8d86e.ts.net/eblume/blumeops.git
|
|
||||||
# Repeat for all 8 repos
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Phase H: Cleanup
|
|
||||||
|
|
||||||
1. Remove forgejo role from `ansible/playbooks/indri.yml`
|
|
||||||
2. Remove `svc:forge` from `ansible/roles/tailscale_serve/defaults/main.yml`
|
|
||||||
3. Remove forgejo log collection from `ansible/roles/alloy/defaults/main.yml`
|
|
||||||
4. Delete `argocd/manifests/tailscale-operator/egress-forge.yaml`
|
|
||||||
5. Update `mise-tasks/indri-services-check`
|
|
||||||
6. Run ansible to clean up indri: `mise run provision-indri -- --tags tailscale-serve,alloy`
|
|
||||||
7. Update zk documentation (forgejo, argocd, blumeops cards)
|
|
||||||
8. Merge PR
|
|
||||||
9. Reset ArgoCD to main
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Disaster Recovery Procedure
|
|
||||||
|
|
||||||
**Add to [[forgejo]] zk card:**
|
|
||||||
|
|
||||||
### When Forgejo is Unavailable
|
|
||||||
|
|
||||||
1. **Add GitHub repository to ArgoCD**
|
|
||||||
```bash
|
|
||||||
argocd repo add https://github.com/eblume/blumeops.git \
|
|
||||||
--username eblume \
|
|
||||||
--password $(op read "op://<vault>/<item>/github-pat")
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Point critical apps to GitHub**
|
|
||||||
```bash
|
|
||||||
argocd app set apps --repo https://github.com/eblume/blumeops.git
|
|
||||||
argocd app set forgejo --repo https://github.com/eblume/blumeops.git
|
|
||||||
argocd app sync forgejo
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Fix Forgejo** (restore from backup, fix config, etc.)
|
|
||||||
|
|
||||||
4. **Verify Forgejo is healthy**
|
|
||||||
```bash
|
|
||||||
curl https://forge.tail8d86e.ts.net/api/v1/version
|
|
||||||
ssh -T forgejo@git.tail8d86e.ts.net
|
|
||||||
```
|
|
||||||
|
|
||||||
5. **Switch back to Forgejo**
|
|
||||||
```bash
|
|
||||||
argocd app set apps --repo ssh://forgejo@git.tail8d86e.ts.net/eblume/blumeops.git
|
|
||||||
argocd app set forgejo --repo ssh://forgejo@git.tail8d86e.ts.net/eblume/blumeops.git
|
|
||||||
argocd app sync apps
|
|
||||||
argocd repo rm https://github.com/eblume/blumeops.git
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Files Summary
|
|
||||||
|
|
||||||
### New Files
|
|
||||||
|
|
||||||
| Path | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `argocd/apps/forgejo.yaml` | ArgoCD Application (multi-source Helm) |
|
|
||||||
| `argocd/manifests/forgejo/values.yaml` | Helm chart values |
|
|
||||||
| `argocd/manifests/forgejo/kustomization.yaml` | Kustomize config |
|
|
||||||
| `argocd/manifests/forgejo/pvc.yaml` | 10Gi PersistentVolumeClaim |
|
|
||||||
| `argocd/manifests/forgejo/secret-app.yaml.tpl` | Secrets template |
|
|
||||||
|
|
||||||
### Modified Files
|
|
||||||
|
|
||||||
| Path | Change |
|
|
||||||
|------|--------|
|
|
||||||
| All `argocd/apps/*.yaml` | Update repoURL to `git.tail8d86e.ts.net` |
|
|
||||||
| `argocd/manifests/argocd/argocd-ssh-known-hosts-cm.yaml` | Add `git.tail8d86e.ts.net` |
|
|
||||||
| `argocd/manifests/argocd/repo-forge-secret.yaml.tpl` | Update URL |
|
|
||||||
| `ansible/playbooks/indri.yml` | Remove forgejo role |
|
|
||||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Remove `svc:forge` |
|
|
||||||
| `ansible/roles/alloy/defaults/main.yml` | Remove forgejo logs |
|
|
||||||
|
|
||||||
### Files to Delete
|
|
||||||
|
|
||||||
| Path | Reason |
|
|
||||||
|------|--------|
|
|
||||||
| `argocd/manifests/tailscale-operator/egress-forge.yaml` | No longer needed |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Rollback
|
|
||||||
|
|
||||||
If migration fails at any point:
|
|
||||||
|
|
||||||
1. **Delete k8s resources**
|
|
||||||
```bash
|
|
||||||
argocd app delete forgejo --cascade
|
|
||||||
kubectl delete namespace forgejo
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Restart indri Forgejo**
|
|
||||||
```bash
|
|
||||||
ssh indri 'brew services start forgejo'
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Re-enable Tailscale serve**
|
|
||||||
```bash
|
|
||||||
mise run provision-indri -- --tags tailscale-serve
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Revert ArgoCD apps to indri URLs** (if changed)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verification Checklist
|
|
||||||
|
|
||||||
- [ ] GitHub mirror verified current
|
|
||||||
- [ ] Helm chart mirrored to forge
|
|
||||||
- [ ] Secrets extracted to 1Password
|
|
||||||
- [ ] k8s Forgejo pod running
|
|
||||||
- [ ] All 8 repos accessible
|
|
||||||
- [ ] SSH clone/push works via `git.tail8d86e.ts.net`
|
|
||||||
- [ ] HTTPS works via `forge.tail8d86e.ts.net`
|
|
||||||
- [ ] ArgoCD syncs from new URL
|
|
||||||
- [ ] All local remotes updated
|
|
||||||
- [ ] Indri cleanup complete
|
|
||||||
- [ ] zk docs updated
|
|
||||||
- [ ] DR procedure documented in [[forgejo]] card
|
|
||||||
|
|
@ -1,32 +0,0 @@
|
||||||
# Phase 8: CI/CD (Woodpecker)
|
|
||||||
|
|
||||||
**Goal**: Deploy Woodpecker CI integrated with Forgejo
|
|
||||||
|
|
||||||
**Status**: Pending
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 7](P7_forgejo.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Steps
|
|
||||||
|
|
||||||
### 1. Create Forgejo OAuth application
|
|
||||||
|
|
||||||
- Callback: https://ci.tail8d86e.ts.net/authorize
|
|
||||||
- Store in 1Password
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. Deploy Woodpecker Server + Agent
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Configure Tailscale LoadBalancer
|
|
||||||
|
|
||||||
Tag: `svc:ci`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Test pipeline
|
|
||||||
|
|
||||||
Create `.woodpecker.yaml` in test repo
|
|
||||||
|
|
@ -1,52 +0,0 @@
|
||||||
# Phase 9: Cleanup
|
|
||||||
|
|
||||||
**Goal**: Remove deprecated services, harden system
|
|
||||||
|
|
||||||
**Status**: Pending
|
|
||||||
|
|
||||||
**Prerequisites**: [Phase 8](P8_woodpecker.md) complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Steps
|
|
||||||
|
|
||||||
### 1. Stop/remove unused brew services
|
|
||||||
|
|
||||||
- postgresql@18
|
|
||||||
- grafana
|
|
||||||
- miniflux
|
|
||||||
- forgejo
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. Update ansible playbook
|
|
||||||
|
|
||||||
- Remove migrated service roles
|
|
||||||
- Add k8s deployment references
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Configure Velero backups (optional)
|
|
||||||
|
|
||||||
- Install with MinIO on sifaka
|
|
||||||
- Schedule daily cluster backups
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Update zk documentation
|
|
||||||
|
|
||||||
- New architecture
|
|
||||||
- Runbooks
|
|
||||||
- DR procedures
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Plan Completion
|
|
||||||
|
|
||||||
When all phases are complete and verified:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Rename this folder to indicate completion
|
|
||||||
git mv plans/k8s-migration plans/k8s-migration.complete
|
|
||||||
git commit -m "Complete k8s migration plan"
|
|
||||||
```
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue