P3 done
Updated P3_postgresql.complete.md with full implementation notes including: - borgmatic borg path fix - Disaster recovery testing - CloudNativePG managed roles for borgmatic user - Dual database backup configuration - ACL grant for homelab → k8s - ArgoCD selfHeal disabled for feature branch workflow - CNPG default values to prevent drift Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
e69a3df2d4
commit
463f476374
1 changed files with 322 additions and 18 deletions
|
|
@ -1,55 +1,359 @@
|
||||||
# Phase 3: PostgreSQL Migration
|
# Phase 3: PostgreSQL Disaster Recovery & Backup
|
||||||
|
|
||||||
**Goal**: Migrate miniflux database to CloudNativePG
|
**Goal**: Test disaster recovery and configure borgmatic backups for k8s-pg
|
||||||
|
|
||||||
**Status**: Pending
|
**Status**: Complete (2026-01-19)
|
||||||
|
|
||||||
**Prerequisites**: [Phase 2](P2_grafana.md) complete
|
**Prerequisites**: [Phase 2](P2_grafana.complete.md) complete
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Phase 3 establishes disaster recovery capabilities for the k8s PostgreSQL cluster:
|
||||||
|
1. **Fix borgmatic backup issues** - Resolve `borg: command not found` error
|
||||||
|
2. **Test disaster recovery** - Restore miniflux data from borgmatic backup to k8s-pg
|
||||||
|
3. **Create borgmatic user** - Read-only backup user in k8s-pg via CloudNativePG
|
||||||
|
4. **Configure dual database backup** - Backup both brew PostgreSQL and k8s-pg during migration
|
||||||
|
|
||||||
|
This phase prepares for Phase 4 (miniflux migration) by verifying we can restore data to k8s-pg.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Decisions
|
||||||
|
|
||||||
|
### Backup Both Databases During Transition
|
||||||
|
|
||||||
|
**Decision**: Configure borgmatic to backup both `localhost:5432/miniflux` (brew) and `k8s-pg.tail8d86e.ts.net:5432/miniflux` (k8s) until migration complete.
|
||||||
|
|
||||||
|
**Why**: Provides redundancy during migration. After Phase 4, remove localhost entry.
|
||||||
|
|
||||||
|
### Reuse Existing borgmatic Password
|
||||||
|
|
||||||
|
**Decision**: Use same borgmatic password from 1Password for k8s-pg user.
|
||||||
|
|
||||||
|
**Why**: Simpler credential management, password already proven secure.
|
||||||
|
|
||||||
|
### CloudNativePG Managed Roles
|
||||||
|
|
||||||
|
**Decision**: Declare borgmatic user via CloudNativePG `managed.roles` instead of SQL commands.
|
||||||
|
|
||||||
|
**Why**: Declarative, version-controlled, matches eblume user pattern.
|
||||||
|
|
||||||
|
### Disable selfHeal on apps App
|
||||||
|
|
||||||
|
**Decision**: Remove `selfHeal: true` from `argocd/apps/apps.yaml`.
|
||||||
|
|
||||||
|
**Why**: Allows temporarily pointing child apps to feature branches during development without ArgoCD reverting the change.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Steps
|
## Steps
|
||||||
|
|
||||||
### 1. Create databases and users in k8s PostgreSQL
|
### 1. Fix borgmatic borg path issue
|
||||||
|
|
||||||
- miniflux database/user
|
**Problem**: borgmatic failing with `borg: command not found`
|
||||||
- borgmatic read-only user
|
|
||||||
|
|
||||||
---
|
**Cause**: LaunchAgent doesn't have homebrew in PATH, so `borg` binary not found.
|
||||||
|
|
||||||
### 2. Export from brew PostgreSQL
|
**Solution**: Add `local_path` to borgmatic config template.
|
||||||
|
|
||||||
```bash
|
**File**: `ansible/roles/borgmatic/templates/config.yaml.j2`
|
||||||
pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql
|
```yaml
|
||||||
|
# Path to borg binary (LaunchAgent doesn't have homebrew in PATH)
|
||||||
|
local_path: {{ borgmatic_local_path }}
|
||||||
|
```
|
||||||
|
|
||||||
|
**File**: `ansible/roles/borgmatic/defaults/main.yml`
|
||||||
|
```yaml
|
||||||
|
borgmatic_local_path: /opt/homebrew/bin/borg
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### 3. Expose k8s PostgreSQL via Tailscale
|
### 2. Run manual backup to verify fix
|
||||||
|
|
||||||
- Service with `loadBalancerClass: tailscale`
|
|
||||||
- Tag: `svc:pg-k8s`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Import data
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql
|
mise run provision-indri -- --tags borgmatic
|
||||||
|
ssh indri '/opt/homebrew/bin/borgmatic --verbosity 1'
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### 5. Update borgmatic config
|
### 3. Extract miniflux dump from borgmatic
|
||||||
|
|
||||||
- Change hostname to k8s PostgreSQL
|
```bash
|
||||||
|
ssh indri 'borgmatic list --archive latest'
|
||||||
|
ssh indri 'borgmatic restore --archive latest --destination /tmp/restore'
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### 6. Verify data integrity
|
### 4. Add ACL grant for homelab → k8s
|
||||||
|
|
||||||
|
**Problem**: Connection from indri to k8s-pg blocked - Tailscale proxy logs showed "no rules matched"
|
||||||
|
|
||||||
|
**Solution**: Add ACL grant in Pulumi.
|
||||||
|
|
||||||
|
**File**: `pulumi/policy.hujson`
|
||||||
|
```hujson
|
||||||
|
// Homelab can reach k8s PostgreSQL for borgmatic backups
|
||||||
|
{
|
||||||
|
"src": ["tag:homelab"],
|
||||||
|
"dst": ["tag:k8s"],
|
||||||
|
"ip": ["tcp:5432"],
|
||||||
|
},
|
||||||
|
```
|
||||||
|
|
||||||
|
Deploy: `mise run tailnet-up`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5. Restore data to k8s-pg
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Using eblume superuser credentials from 1Password
|
||||||
|
ssh indri "psql 'postgres://eblume@k8s-pg.tail8d86e.ts.net:5432/miniflux' -f /tmp/restore/localhost/miniflux/miniflux"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verification**:
|
||||||
|
```bash
|
||||||
|
psql 'postgres://eblume@k8s-pg.tail8d86e.ts.net:5432/miniflux' -c 'SELECT COUNT(*) FROM users; SELECT COUNT(*) FROM feeds; SELECT COUNT(*) FROM entries;'
|
||||||
|
# Result: 2 users, 2 feeds, 44 entries
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 6. Create borgmatic user in k8s-pg via CloudNativePG
|
||||||
|
|
||||||
|
**File**: `argocd/manifests/databases/secret-borgmatic.yaml.tpl`
|
||||||
|
```yaml
|
||||||
|
# Template for borgmatic backup user password
|
||||||
|
# Apply with: op inject -i secret-borgmatic.yaml.tpl | kubectl apply -f -
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: blumeops-pg-borgmatic
|
||||||
|
namespace: databases
|
||||||
|
type: kubernetes.io/basic-auth
|
||||||
|
stringData:
|
||||||
|
username: borgmatic
|
||||||
|
password: {{ op://vg6xf6vvfmoh5hqjjhlhbeoaie/mw2bv5we7woicjza7hc6s44yvy/db-password }}
|
||||||
|
```
|
||||||
|
|
||||||
|
**File**: `argocd/manifests/databases/blumeops-pg.yaml` (add to managed roles)
|
||||||
|
```yaml
|
||||||
|
managed:
|
||||||
|
roles:
|
||||||
|
# ... existing eblume role ...
|
||||||
|
# borgmatic read-only user for backups
|
||||||
|
- name: borgmatic
|
||||||
|
login: true
|
||||||
|
connectionLimit: -1
|
||||||
|
ensure: present
|
||||||
|
inherit: true
|
||||||
|
inRoles:
|
||||||
|
- pg_read_all_data
|
||||||
|
passwordSecret:
|
||||||
|
name: blumeops-pg-borgmatic
|
||||||
|
```
|
||||||
|
|
||||||
|
**Deploy**:
|
||||||
|
```bash
|
||||||
|
op inject -i argocd/manifests/databases/secret-borgmatic.yaml.tpl | kubectl apply -f -
|
||||||
|
argocd app set blumeops-pg --revision feature/p3-postgresql-borgmatic
|
||||||
|
argocd app sync blumeops-pg
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 7. Configure borgmatic for dual database backup
|
||||||
|
|
||||||
|
**File**: `ansible/roles/borgmatic/defaults/main.yml`
|
||||||
|
```yaml
|
||||||
|
borgmatic_postgresql_databases:
|
||||||
|
# Brew PostgreSQL on indri (current production)
|
||||||
|
- name: miniflux
|
||||||
|
hostname: localhost
|
||||||
|
port: 5432
|
||||||
|
username: borgmatic
|
||||||
|
# k8s PostgreSQL (CloudNativePG) - backup both during migration
|
||||||
|
- name: miniflux
|
||||||
|
hostname: k8s-pg.tail8d86e.ts.net
|
||||||
|
port: 5432
|
||||||
|
username: borgmatic
|
||||||
|
```
|
||||||
|
|
||||||
|
**File**: `ansible/roles/postgresql/tasks/main.yml` (update .pgpass)
|
||||||
|
```yaml
|
||||||
|
- name: Write .pgpass file for borgmatic backups
|
||||||
|
ansible.builtin.copy:
|
||||||
|
content: |
|
||||||
|
# Managed by ansible - only read-only roles
|
||||||
|
localhost:{{ postgresql_port }}:*:borgmatic:{{ postgresql_user_passwords['borgmatic'] }}
|
||||||
|
k8s-pg.tail8d86e.ts.net:5432:*:borgmatic:{{ postgresql_user_passwords['borgmatic'] }}
|
||||||
|
dest: ~/.pgpass
|
||||||
|
mode: '0600'
|
||||||
|
no_log: true
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 8. Verify complete backup pipeline
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mise run provision-indri -- --tags borgmatic,postgresql
|
||||||
|
ssh indri '/opt/homebrew/bin/borgmatic --verbosity 1'
|
||||||
|
ssh indri 'borgmatic list --archive latest'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected output**: Archive contains both dumps:
|
||||||
|
- `localhost/miniflux/miniflux`
|
||||||
|
- `k8s-pg.tail8d86e.ts.net/miniflux/miniflux`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 9. Fix ArgoCD drift from CNPG defaults
|
||||||
|
|
||||||
|
**Problem**: ArgoCD showed blumeops-pg as OutOfSync due to CNPG operator adding default values.
|
||||||
|
|
||||||
|
**Solution**: Add CNPG defaults explicitly to managed roles.
|
||||||
|
|
||||||
|
**File**: `argocd/manifests/databases/blumeops-pg.yaml`
|
||||||
|
```yaml
|
||||||
|
managed:
|
||||||
|
roles:
|
||||||
|
- name: eblume
|
||||||
|
# ... existing fields ...
|
||||||
|
connectionLimit: -1
|
||||||
|
ensure: present
|
||||||
|
inherit: true
|
||||||
|
- name: borgmatic
|
||||||
|
# ... existing fields ...
|
||||||
|
connectionLimit: -1
|
||||||
|
ensure: present
|
||||||
|
inherit: true
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 10. Update zk documentation
|
||||||
|
|
||||||
|
Updated:
|
||||||
|
- `~/code/personal/zk/borgmatic.md` - k8s-pg backup documentation and log entry
|
||||||
|
- `~/code/personal/zk/postgresql.md` - k8s PostgreSQL section and log entry
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## New Files
|
||||||
|
|
||||||
|
| Path | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `argocd/manifests/databases/secret-borgmatic.yaml.tpl` | borgmatic user password template |
|
||||||
|
|
||||||
|
## Modified Files
|
||||||
|
|
||||||
|
| Path | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `ansible/roles/borgmatic/defaults/main.yml` | Added `borgmatic_local_path`, k8s-pg database entry |
|
||||||
|
| `ansible/roles/borgmatic/templates/config.yaml.j2` | Added `local_path` option |
|
||||||
|
| `ansible/roles/postgresql/tasks/main.yml` | Added k8s-pg to .pgpass |
|
||||||
|
| `argocd/apps/apps.yaml` | Disabled selfHeal |
|
||||||
|
| `argocd/manifests/databases/blumeops-pg.yaml` | Added borgmatic managed role, CNPG defaults |
|
||||||
|
| `pulumi/policy.hujson` | Added ACL grant homelab → k8s on tcp:5432 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- [x] borgmatic backup runs successfully
|
||||||
|
- [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries)
|
||||||
|
- [x] borgmatic user created in k8s-pg with pg_read_all_data role
|
||||||
|
- [x] Both localhost and k8s-pg databases in backup archive
|
||||||
|
- [x] ArgoCD shows blumeops-pg as Synced
|
||||||
|
- [x] zk documentation updated
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Rollback
|
## Rollback
|
||||||
|
|
||||||
Keep brew PostgreSQL running until Phase 4 verified
|
Keep brew PostgreSQL running until Phase 4 verified. To revert:
|
||||||
|
|
||||||
|
1. Remove k8s-pg entry from borgmatic databases
|
||||||
|
2. Remove k8s-pg from .pgpass
|
||||||
|
3. `mise run provision-indri -- --tags borgmatic,postgresql`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
*Added during implementation for retrospective review*
|
||||||
|
|
||||||
|
### borgmatic LaunchAgent PATH Issue
|
||||||
|
|
||||||
|
**Problem**: borgmatic LaunchAgent failed with `borg: command not found`
|
||||||
|
|
||||||
|
**Root cause**: LaunchAgents run with minimal PATH that doesn't include `/opt/homebrew/bin`
|
||||||
|
|
||||||
|
**Solution**: Added `local_path: /opt/homebrew/bin/borg` to borgmatic config. This was already done for `pg_dump_command` but not for borg itself.
|
||||||
|
|
||||||
|
**Lesson**: Any tool invoked by borgmatic needs absolute path when running from LaunchAgent.
|
||||||
|
|
||||||
|
### 1Password Field Name Mismatch
|
||||||
|
|
||||||
|
**Issue**: Initial secret template used `password` field but 1Password item had `db-password`.
|
||||||
|
|
||||||
|
**Discovery**: Error message from `op inject` indicated field not found.
|
||||||
|
|
||||||
|
**Fix**: Updated template to use correct field name `db-password`.
|
||||||
|
|
||||||
|
### ACL Grant Discovery
|
||||||
|
|
||||||
|
**Problem**: Connection from indri (tag:homelab) to k8s-pg (tag:k8s) failed.
|
||||||
|
|
||||||
|
**Diagnosis**: Checked Tailscale operator proxy logs which showed "no rules matched" - clear indication of missing ACL.
|
||||||
|
|
||||||
|
**Solution**: Added explicit grant in `pulumi/policy.hujson` for `tag:homelab` → `tag:k8s` on `tcp:5432`.
|
||||||
|
|
||||||
|
### ArgoCD selfHeal and Feature Branch Development
|
||||||
|
|
||||||
|
**Problem**: When testing changes, temporarily pointed blumeops-pg app to feature branch via `argocd app set --revision`. ArgoCD's selfHeal kept reverting it back to main.
|
||||||
|
|
||||||
|
**Discussion**: Two options considered:
|
||||||
|
- Option A: Disable selfHeal on apps app (manual sync required for new apps)
|
||||||
|
- Option B: Keep selfHeal, use different workflow
|
||||||
|
|
||||||
|
**Decision**: Option A chosen. The apps app now only has `prune: true`, not selfHeal. This allows:
|
||||||
|
1. Temporarily testing feature branches
|
||||||
|
2. Manual control over when app manifest changes are applied
|
||||||
|
|
||||||
|
**Trade-off**: Must manually sync apps app when adding/removing Application manifests.
|
||||||
|
|
||||||
|
### CloudNativePG Managed Role Reconciliation
|
||||||
|
|
||||||
|
**Issue**: After creating borgmatic secret with correct password, CNPG didn't immediately update the user.
|
||||||
|
|
||||||
|
**Solution**: Annotated the Cluster to trigger reconciliation:
|
||||||
|
```bash
|
||||||
|
kubectl annotate cluster blumeops-pg -n databases cnpg.io/reconcile=$(date +%s) --overwrite
|
||||||
|
```
|
||||||
|
|
||||||
|
### ArgoCD Drift from CNPG Defaults
|
||||||
|
|
||||||
|
**Problem**: blumeops-pg showed OutOfSync despite successful syncs.
|
||||||
|
|
||||||
|
**Cause**: CNPG operator adds default values (`connectionLimit: -1`, `ensure: present`, `inherit: true`) to managed roles that weren't in our spec.
|
||||||
|
|
||||||
|
**Solution**: Added these defaults explicitly to our spec to match what CNPG generates.
|
||||||
|
|
||||||
|
**Comment added**: Documented in blumeops-pg.yaml that these are "CNPG defaults added to prevent ArgoCD drift".
|
||||||
|
|
||||||
|
### Git Workflow for Phase 3
|
||||||
|
|
||||||
|
1. Created feature branch: `feature/p3-postgresql-borgmatic`
|
||||||
|
2. Made commits throughout implementation
|
||||||
|
3. Pointed blumeops-pg app to feature branch for testing
|
||||||
|
4. Created PR #32 for review
|
||||||
|
5. After merge, reset app to main: `argocd app set blumeops-pg --revision main`
|
||||||
|
|
||||||
|
This workflow was enabled by disabling selfHeal (see above).
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue