Split k8s migration plan into phases folder
Reorganized the monolithic migration plan into separate files: - 00_overview.md: Architecture, technical decisions, shared info - P0_foundation.complete.md: Phase 0 (complete) - P1_k8s_infrastructure.md: Phase 1 (in progress) - P2-P9: Remaining phases (pending) This makes the plan easier to navigate and track progress. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
61dced048b
commit
0db4abe64d
11 changed files with 730 additions and 547 deletions
149
plans/k8s-migration/00_overview.md
Normal file
149
plans/k8s-migration/00_overview.md
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
# Blumeops Minikube Migration Plan
|
||||
|
||||
This plan details a phased migration of blumeops services from direct hosting on indri (Mac Mini M1) to a minikube cluster, while maintaining critical infrastructure services outside of Kubernetes.
|
||||
|
||||
## Phases
|
||||
|
||||
| Phase | Name | Status | Description |
|
||||
|-------|------|--------|-------------|
|
||||
| 0 | [Foundation](P0_foundation.complete.md) | Complete | Container registry + minikube cluster |
|
||||
| 1 | [K8s Infrastructure](P1_k8s_infrastructure.md) | In Progress | Tailscale operator + CloudNativePG |
|
||||
| 2 | [Grafana](P2_grafana.md) | Pending | Migrate Grafana (pilot) |
|
||||
| 3 | [PostgreSQL](P3_postgresql.md) | Pending | Migrate to CloudNativePG |
|
||||
| 4 | [Miniflux](P4_miniflux.md) | Pending | Migrate Miniflux |
|
||||
| 5 | [devpi](P5_devpi.md) | Pending | Migrate devpi |
|
||||
| 6 | [Kiwix](P6_kiwix.md) | Pending | Migrate Kiwix |
|
||||
| 7 | [Forgejo](P7_forgejo.md) | Pending | Migrate Forgejo (highest risk) |
|
||||
| 8 | [Woodpecker](P8_woodpecker.md) | Pending | Deploy CI/CD |
|
||||
| 9 | [Cleanup](P9_cleanup.md) | Pending | Remove deprecated services |
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Services Staying on Indri (Outside K8s)
|
||||
| Service | Reason |
|
||||
|---------|--------|
|
||||
| **Zot Registry** (NEW) | Avoid circular dependency - k8s needs images to start |
|
||||
| **Prometheus** | Observability backbone must survive k8s failures |
|
||||
| **Loki** | Log aggregation backbone |
|
||||
| **Borgmatic** | Backup system |
|
||||
| **Grafana-alloy** | Metrics/logs collector on host |
|
||||
| **Plex** | Until Jellyfin replacement |
|
||||
| **Transmission** | Downloads for kiwix ZIM files |
|
||||
|
||||
### Services Moving to K8s
|
||||
| Service | Complexity | Dependencies |
|
||||
|---------|------------|--------------|
|
||||
| Grafana | LOW | Phase 1 |
|
||||
| Kiwix | LOW | Phase 1 |
|
||||
| Miniflux | MEDIUM | PostgreSQL |
|
||||
| devpi | MEDIUM | Registry |
|
||||
| PostgreSQL | HIGH | Phase 1 |
|
||||
| Forgejo | HIGH | PostgreSQL |
|
||||
| Woodpecker CI | MEDIUM | Forgejo |
|
||||
|
||||
## Technical Decisions
|
||||
|
||||
### Container Registry: Zot
|
||||
- OCI-native, lightweight
|
||||
- Native support for proxying multiple registries (Docker Hub, GHCR, Quay)
|
||||
- Built from source at `~/code/3rd/zot` (not in homebrew)
|
||||
- Binary: `~/code/3rd/zot/bin/zot-darwin-arm64`
|
||||
- Config: `~/.config/zot/config.json`
|
||||
- Data: `~/zot/`
|
||||
|
||||
### Minikube Driver: Podman
|
||||
- Rootless containers for better security
|
||||
- Lighter than full VM (QEMU)
|
||||
- Uses existing container ecosystem
|
||||
- `minikube start --driver=podman --container-runtime=cri-o`
|
||||
|
||||
### PostgreSQL: CloudNativePG Operator
|
||||
- Production-grade operator
|
||||
- Built-in backup/restore
|
||||
- Prometheus metrics
|
||||
- PITR support
|
||||
|
||||
### K8s Service Exposure: Tailscale Operator
|
||||
- `loadBalancerClass: tailscale` on Services
|
||||
- Automatic TLS and MagicDNS names
|
||||
- ACL-controlled access
|
||||
|
||||
### LaunchAgent Requirements (Critical)
|
||||
LaunchAgents do NOT get homebrew on PATH. All commands must use **absolute paths**:
|
||||
- `/Users/erichblume/code/3rd/zot/bin/zot-darwin-arm64` for zot (built from source)
|
||||
- `/opt/homebrew/opt/mise/bin/mise x --` for mise-managed tools
|
||||
- `/opt/homebrew/opt/postgresql@18/bin/pg_dump` for postgres tools
|
||||
|
||||
This applies to all mcquack LaunchAgents (zot, devpi, kiwix, borgmatic, metrics collectors).
|
||||
`brew services` handles this automatically but those aren't tracked in ansible.
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
Borgmatic remains on indri (outside k8s), writing to sifaka NAS via SMB at `/Volumes/backups`. This ensures backups continue even if k8s is down.
|
||||
|
||||
| Service | Backup Approach |
|
||||
|---------|-----------------|
|
||||
| **Zot Registry** | No backup needed - pull-through cache is re-fetchable, private images rebuilt from source control |
|
||||
| **Minikube** | No backup of cluster state - declarative manifests in git, can recreate |
|
||||
| **PostgreSQL (k8s)** | CloudNativePG scheduled backups to sifaka (Phase 1) |
|
||||
| **Grafana (k8s)** | Dashboards in ansible source control, no runtime backup needed |
|
||||
| **Miniflux (k8s)** | Database backed up via CloudNativePG |
|
||||
| **Forgejo (k8s)** | Git repos are distributed, config in ansible; data dir backed up via borgmatic before migration |
|
||||
| **devpi (k8s)** | Private packages backed up, PyPI cache re-fetchable |
|
||||
| **Kiwix (k8s)** | ZIM files re-downloadable via torrent, no backup needed |
|
||||
|
||||
**Borgmatic config changes:** None required for Phase 0. Future phases may add k8s PV paths if needed.
|
||||
|
||||
---
|
||||
|
||||
## Critical Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `ansible/playbooks/indri.yml` | Main playbook - add k8s roles, remove migrated services |
|
||||
| `ansible/roles/tailscale_serve/defaults/main.yml` | Transition services to Tailscale operator |
|
||||
| `pulumi/policy.hujson` | Add tags: k8s, registry, ci |
|
||||
| `ansible/roles/borgmatic/defaults/main.yml` | Update PostgreSQL endpoint |
|
||||
| `mise-tasks/indri-services-check` | Add k8s health checks |
|
||||
|
||||
## New Directory Structure
|
||||
|
||||
```
|
||||
ansible/
|
||||
k8s/
|
||||
operators/
|
||||
tailscale-operator.yaml
|
||||
cloudnative-pg.yaml
|
||||
databases/
|
||||
blumeops-pg.yaml
|
||||
apps/
|
||||
grafana/
|
||||
miniflux/
|
||||
forgejo/
|
||||
devpi/
|
||||
kiwix/
|
||||
woodpecker/
|
||||
roles/
|
||||
zot/ # NEW
|
||||
podman/ # NEW
|
||||
minikube/ # NEW
|
||||
```
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
- **Circular dependency prevention**: Zot registry runs outside k8s
|
||||
- **Observability**: Prometheus/Loki stay on indri
|
||||
- **Data loss prevention**: borgmatic + manual backups before each phase
|
||||
- **Recovery**: Can manually push images, restore from backups
|
||||
|
||||
## Container Images (All ARM64)
|
||||
|
||||
| Service | Image |
|
||||
|---------|-------|
|
||||
| Miniflux | `ghcr.io/miniflux/miniflux:latest` |
|
||||
| Forgejo | `codeberg.org/forgejo/forgejo:10` |
|
||||
| Grafana | `grafana/grafana:latest` |
|
||||
| Kiwix | `ghcr.io/kiwix/kiwix-serve:3.8.1` |
|
||||
| Woodpecker | `woodpeckerci/woodpecker-server` |
|
||||
|
||||
Note: Zot runs as a native binary on indri (built from source at `~/code/3rd/zot`), not as a container.
|
||||
File diff suppressed because it is too large
Load diff
113
plans/k8s-migration/P1_k8s_infrastructure.md
Normal file
113
plans/k8s-migration/P1_k8s_infrastructure.md
Normal file
|
|
@ -0,0 +1,113 @@
|
|||
# Phase 1: Kubernetes Infrastructure
|
||||
|
||||
**Goal**: Tailscale operator + CloudNativePG operator
|
||||
|
||||
**Status**: In Progress
|
||||
|
||||
**Prerequisites**: [Phase 0](P0_foundation.complete.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Update Pulumi ACLs for k8s workloads
|
||||
|
||||
Add `tag:k8s` to `pulumi/policy.hujson` - this tag is for k8s workloads that need to access other services (e.g., Woodpecker CI pushing to registry).
|
||||
|
||||
**Changes to tagOwners:**
|
||||
```hujson
|
||||
"tag:k8s": ["autogroup:admin", "tag:blumeops"],
|
||||
```
|
||||
|
||||
**Add grant for k8s→registry access:**
|
||||
```hujson
|
||||
// k8s workloads (e.g., Woodpecker CI) can push/pull from registry
|
||||
{
|
||||
"src": ["tag:k8s"],
|
||||
"dst": ["tag:registry"],
|
||||
"ip": ["tcp:443"],
|
||||
},
|
||||
```
|
||||
|
||||
**Add test case:**
|
||||
```hujson
|
||||
{
|
||||
"src": "tag:k8s",
|
||||
"accept": ["tag:registry:443"],
|
||||
},
|
||||
```
|
||||
|
||||
```bash
|
||||
mise run tailnet-preview && mise run tailnet-up
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Create Tailscale OAuth client
|
||||
|
||||
- Scopes: Devices Core, Auth Keys, Services write
|
||||
- Tag: `tag:k8s-operator`
|
||||
- Store in 1Password
|
||||
|
||||
---
|
||||
|
||||
### 3. Deploy Tailscale Kubernetes Operator
|
||||
|
||||
```bash
|
||||
helm repo add tailscale https://pkgs.tailscale.com/helmcharts
|
||||
helm install tailscale-operator tailscale/tailscale-operator \
|
||||
--namespace tailscale-system --create-namespace \
|
||||
--set oauth.clientId=$CLIENT_ID \
|
||||
--set oauth.clientSecret=$CLIENT_SECRET
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Deploy CloudNativePG operator
|
||||
|
||||
```bash
|
||||
kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.0.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Create PostgreSQL cluster
|
||||
|
||||
```yaml
|
||||
apiVersion: postgresql.cnpg.io/v1
|
||||
kind: Cluster
|
||||
metadata:
|
||||
name: blumeops-pg
|
||||
namespace: databases
|
||||
spec:
|
||||
instances: 1
|
||||
storage:
|
||||
size: 10Gi
|
||||
storageClass: standard
|
||||
monitoring:
|
||||
enablePodMonitor: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Update Alloy config
|
||||
|
||||
- Add kubernetes_sd_configs for k8s metrics
|
||||
- Scrape operator metrics
|
||||
|
||||
---
|
||||
|
||||
## New Files
|
||||
|
||||
- `ansible/k8s/operators/` - Operator manifests
|
||||
- `ansible/k8s/databases/` - PostgreSQL cluster
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
kubectl get pods -n tailscale-system
|
||||
kubectl get pods -n cnpg-system
|
||||
kubectl get cluster -n databases
|
||||
```
|
||||
52
plans/k8s-migration/P2_grafana.md
Normal file
52
plans/k8s-migration/P2_grafana.md
Normal file
|
|
@ -0,0 +1,52 @@
|
|||
# Phase 2: Grafana Migration (Pilot)
|
||||
|
||||
**Goal**: Migrate Grafana as lowest-risk pilot service
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 1](P1_k8s_infrastructure.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Deploy Grafana via Helm
|
||||
|
||||
- Copy datasource config from existing role
|
||||
- Copy dashboards from `ansible/roles/grafana/files/dashboards/`
|
||||
- Point to indri Prometheus/Loki (http://indri:9090, http://indri:3100)
|
||||
|
||||
---
|
||||
|
||||
### 2. Configure Tailscale LoadBalancer
|
||||
|
||||
```yaml
|
||||
service:
|
||||
type: LoadBalancer
|
||||
loadBalancerClass: tailscale
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Verify all dashboards work
|
||||
|
||||
---
|
||||
|
||||
### 4. Update tailscale_serve
|
||||
|
||||
Remove grafana entry from `ansible/roles/tailscale_serve/defaults/main.yml`
|
||||
|
||||
---
|
||||
|
||||
### 5. Stop brew grafana
|
||||
|
||||
```bash
|
||||
brew services stop grafana
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
- https://grafana.tail8d86e.ts.net loads
|
||||
- All dashboards functional
|
||||
55
plans/k8s-migration/P3_postgresql.md
Normal file
55
plans/k8s-migration/P3_postgresql.md
Normal file
|
|
@ -0,0 +1,55 @@
|
|||
# Phase 3: PostgreSQL Migration
|
||||
|
||||
**Goal**: Migrate miniflux database to CloudNativePG
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 2](P2_grafana.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Create databases and users in k8s PostgreSQL
|
||||
|
||||
- miniflux database/user
|
||||
- borgmatic read-only user
|
||||
|
||||
---
|
||||
|
||||
### 2. Export from brew PostgreSQL
|
||||
|
||||
```bash
|
||||
pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Expose k8s PostgreSQL via Tailscale
|
||||
|
||||
- Service with `loadBalancerClass: tailscale`
|
||||
- Tag: `svc:pg-k8s`
|
||||
|
||||
---
|
||||
|
||||
### 4. Import data
|
||||
|
||||
```bash
|
||||
psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Update borgmatic config
|
||||
|
||||
- Change hostname to k8s PostgreSQL
|
||||
|
||||
---
|
||||
|
||||
### 6. Verify data integrity
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
Keep brew PostgreSQL running until Phase 4 verified
|
||||
48
plans/k8s-migration/P4_miniflux.md
Normal file
48
plans/k8s-migration/P4_miniflux.md
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
# Phase 4: Miniflux Migration
|
||||
|
||||
**Goal**: Migrate Miniflux to k8s
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 3](P3_postgresql.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Deploy Miniflux
|
||||
|
||||
```yaml
|
||||
image: ghcr.io/miniflux/miniflux:latest
|
||||
env:
|
||||
DATABASE_URL: from secret
|
||||
RUN_MIGRATIONS: "1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Configure Tailscale LoadBalancer
|
||||
|
||||
Tag: `svc:feed`
|
||||
|
||||
---
|
||||
|
||||
### 3. Update Alloy log collection
|
||||
|
||||
Add k8s namespace
|
||||
|
||||
---
|
||||
|
||||
### 4. Verify
|
||||
|
||||
- Login works
|
||||
- Feeds refresh
|
||||
- API works
|
||||
|
||||
---
|
||||
|
||||
### 5. Stop brew miniflux
|
||||
|
||||
```bash
|
||||
brew services stop miniflux
|
||||
```
|
||||
37
plans/k8s-migration/P5_devpi.md
Normal file
37
plans/k8s-migration/P5_devpi.md
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
# Phase 5: devpi Migration
|
||||
|
||||
**Goal**: Migrate devpi to k8s
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 4](P4_miniflux.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Build devpi container
|
||||
|
||||
- Dockerfile with devpi-server + devpi-web
|
||||
- Push to local Zot registry
|
||||
|
||||
---
|
||||
|
||||
### 2. Deploy as StatefulSet
|
||||
|
||||
- PVC for data (50Gi)
|
||||
- Migrate existing data (excluding PyPI cache)
|
||||
|
||||
---
|
||||
|
||||
### 3. Configure Tailscale LoadBalancer
|
||||
|
||||
Tag: `svc:pypi`
|
||||
|
||||
---
|
||||
|
||||
### 4. Update pip.conf on gilbert
|
||||
|
||||
---
|
||||
|
||||
### 5. Stop mcquack devpi
|
||||
35
plans/k8s-migration/P6_kiwix.md
Normal file
35
plans/k8s-migration/P6_kiwix.md
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
# Phase 6: Kiwix Migration
|
||||
|
||||
**Goal**: Migrate kiwix-serve to k8s
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 5](P5_devpi.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Create NFS/hostPath PV for ZIM files
|
||||
|
||||
- Point to transmission download directory
|
||||
- ReadOnlyMany access
|
||||
|
||||
---
|
||||
|
||||
### 2. Deploy Kiwix
|
||||
|
||||
```yaml
|
||||
image: ghcr.io/kiwix/kiwix-serve:3.8.1
|
||||
args: ["/data/*.zim"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Configure Tailscale LoadBalancer
|
||||
|
||||
Tag: `svc:kiwix`
|
||||
|
||||
---
|
||||
|
||||
### 4. Stop mcquack kiwix-serve
|
||||
51
plans/k8s-migration/P7_forgejo.md
Normal file
51
plans/k8s-migration/P7_forgejo.md
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
# Phase 7: Forgejo Migration (Highest Risk)
|
||||
|
||||
**Goal**: Migrate Forgejo to k8s
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 6](P6_kiwix.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Pre-Migration Checklist
|
||||
|
||||
- [ ] Full borgmatic backup verified
|
||||
- [ ] Manual backup of `/opt/homebrew/var/forgejo`
|
||||
- [ ] Document SSH keys and webhooks
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Deploy Forgejo via Helm
|
||||
|
||||
```bash
|
||||
helm install forgejo forgejo/forgejo \
|
||||
--namespace forgejo --create-namespace
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Migrate data
|
||||
|
||||
- Stop brew forgejo
|
||||
- Copy data to PVC
|
||||
- Start k8s forgejo
|
||||
|
||||
---
|
||||
|
||||
### 3. Configure Tailscale services
|
||||
|
||||
- HTTPS 443 via LoadBalancer
|
||||
- SSH port 22 (TCP proxy)
|
||||
|
||||
---
|
||||
|
||||
### 4. Verify all repositories accessible
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
Restore brew forgejo and tailscale serve config
|
||||
32
plans/k8s-migration/P8_woodpecker.md
Normal file
32
plans/k8s-migration/P8_woodpecker.md
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
# Phase 8: CI/CD (Woodpecker)
|
||||
|
||||
**Goal**: Deploy Woodpecker CI integrated with Forgejo
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 7](P7_forgejo.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Create Forgejo OAuth application
|
||||
|
||||
- Callback: https://ci.tail8d86e.ts.net/authorize
|
||||
- Store in 1Password
|
||||
|
||||
---
|
||||
|
||||
### 2. Deploy Woodpecker Server + Agent
|
||||
|
||||
---
|
||||
|
||||
### 3. Configure Tailscale LoadBalancer
|
||||
|
||||
Tag: `svc:ci`
|
||||
|
||||
---
|
||||
|
||||
### 4. Test pipeline
|
||||
|
||||
Create `.woodpecker.yaml` in test repo
|
||||
52
plans/k8s-migration/P9_cleanup.md
Normal file
52
plans/k8s-migration/P9_cleanup.md
Normal file
|
|
@ -0,0 +1,52 @@
|
|||
# Phase 9: Cleanup
|
||||
|
||||
**Goal**: Remove deprecated services, harden system
|
||||
|
||||
**Status**: Pending
|
||||
|
||||
**Prerequisites**: [Phase 8](P8_woodpecker.md) complete
|
||||
|
||||
---
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Stop/remove unused brew services
|
||||
|
||||
- postgresql@18
|
||||
- grafana
|
||||
- miniflux
|
||||
- forgejo
|
||||
|
||||
---
|
||||
|
||||
### 2. Update ansible playbook
|
||||
|
||||
- Remove migrated service roles
|
||||
- Add k8s deployment references
|
||||
|
||||
---
|
||||
|
||||
### 3. Configure Velero backups (optional)
|
||||
|
||||
- Install with MinIO on sifaka
|
||||
- Schedule daily cluster backups
|
||||
|
||||
---
|
||||
|
||||
### 4. Update zk documentation
|
||||
|
||||
- New architecture
|
||||
- Runbooks
|
||||
- DR procedures
|
||||
|
||||
---
|
||||
|
||||
## Plan Completion
|
||||
|
||||
When all phases are complete and verified:
|
||||
|
||||
```bash
|
||||
# Rename this folder to indicate completion
|
||||
git mv plans/k8s-migration plans/k8s-migration.complete
|
||||
git commit -m "Complete k8s migration plan"
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue