Updated P3_postgresql.complete.md with full implementation notes including:
- borgmatic borg path fix
- Disaster recovery testing
- CloudNativePG managed roles for borgmatic user
- Dual database backup configuration
- ACL grant for homelab → k8s
- ArgoCD selfHeal disabled for feature branch workflow
- CNPG default values to prevent drift

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-01-19 18:19:33 -08:00
commit 463f476374

View file

@ -1,55 +1,359 @@
# Phase 3: PostgreSQL Migration
# Phase 3: PostgreSQL Disaster Recovery & Backup
**Goal**: Migrate miniflux database to CloudNativePG
**Goal**: Test disaster recovery and configure borgmatic backups for k8s-pg
**Status**: Pending
**Status**: Complete (2026-01-19)
**Prerequisites**: [Phase 2](P2_grafana.md) complete
**Prerequisites**: [Phase 2](P2_grafana.complete.md) complete
---
## Overview
Phase 3 establishes disaster recovery capabilities for the k8s PostgreSQL cluster:
1. **Fix borgmatic backup issues** - Resolve `borg: command not found` error
2. **Test disaster recovery** - Restore miniflux data from borgmatic backup to k8s-pg
3. **Create borgmatic user** - Read-only backup user in k8s-pg via CloudNativePG
4. **Configure dual database backup** - Backup both brew PostgreSQL and k8s-pg during migration
This phase prepares for Phase 4 (miniflux migration) by verifying we can restore data to k8s-pg.
---
## Key Decisions
### Backup Both Databases During Transition
**Decision**: Configure borgmatic to backup both `localhost:5432/miniflux` (brew) and `k8s-pg.tail8d86e.ts.net:5432/miniflux` (k8s) until migration complete.
**Why**: Provides redundancy during migration. After Phase 4, remove localhost entry.
### Reuse Existing borgmatic Password
**Decision**: Use same borgmatic password from 1Password for k8s-pg user.
**Why**: Simpler credential management, password already proven secure.
### CloudNativePG Managed Roles
**Decision**: Declare borgmatic user via CloudNativePG `managed.roles` instead of SQL commands.
**Why**: Declarative, version-controlled, matches eblume user pattern.
### Disable selfHeal on apps App
**Decision**: Remove `selfHeal: true` from `argocd/apps/apps.yaml`.
**Why**: Allows temporarily pointing child apps to feature branches during development without ArgoCD reverting the change.
---
## Steps
### 1. Create databases and users in k8s PostgreSQL
### 1. Fix borgmatic borg path issue
- miniflux database/user
- borgmatic read-only user
**Problem**: borgmatic failing with `borg: command not found`
---
**Cause**: LaunchAgent doesn't have homebrew in PATH, so `borg` binary not found.
### 2. Export from brew PostgreSQL
**Solution**: Add `local_path` to borgmatic config template.
```bash
pg_dump -h localhost -U miniflux miniflux > miniflux_backup.sql
**File**: `ansible/roles/borgmatic/templates/config.yaml.j2`
```yaml
# Path to borg binary (LaunchAgent doesn't have homebrew in PATH)
local_path: {{ borgmatic_local_path }}
```
**File**: `ansible/roles/borgmatic/defaults/main.yml`
```yaml
borgmatic_local_path: /opt/homebrew/bin/borg
```
---
### 3. Expose k8s PostgreSQL via Tailscale
- Service with `loadBalancerClass: tailscale`
- Tag: `svc:pg-k8s`
---
### 4. Import data
### 2. Run manual backup to verify fix
```bash
psql -h pg-k8s.tail8d86e.ts.net -U miniflux miniflux < miniflux_backup.sql
mise run provision-indri -- --tags borgmatic
ssh indri '/opt/homebrew/bin/borgmatic --verbosity 1'
```
---
### 5. Update borgmatic config
### 3. Extract miniflux dump from borgmatic
- Change hostname to k8s PostgreSQL
```bash
ssh indri 'borgmatic list --archive latest'
ssh indri 'borgmatic restore --archive latest --destination /tmp/restore'
```
---
### 6. Verify data integrity
### 4. Add ACL grant for homelab → k8s
**Problem**: Connection from indri to k8s-pg blocked - Tailscale proxy logs showed "no rules matched"
**Solution**: Add ACL grant in Pulumi.
**File**: `pulumi/policy.hujson`
```hujson
// Homelab can reach k8s PostgreSQL for borgmatic backups
{
"src": ["tag:homelab"],
"dst": ["tag:k8s"],
"ip": ["tcp:5432"],
},
```
Deploy: `mise run tailnet-up`
---
### 5. Restore data to k8s-pg
```bash
# Using eblume superuser credentials from 1Password
ssh indri "psql 'postgres://eblume@k8s-pg.tail8d86e.ts.net:5432/miniflux' -f /tmp/restore/localhost/miniflux/miniflux"
```
**Verification**:
```bash
psql 'postgres://eblume@k8s-pg.tail8d86e.ts.net:5432/miniflux' -c 'SELECT COUNT(*) FROM users; SELECT COUNT(*) FROM feeds; SELECT COUNT(*) FROM entries;'
# Result: 2 users, 2 feeds, 44 entries
```
---
### 6. Create borgmatic user in k8s-pg via CloudNativePG
**File**: `argocd/manifests/databases/secret-borgmatic.yaml.tpl`
```yaml
# Template for borgmatic backup user password
# Apply with: op inject -i secret-borgmatic.yaml.tpl | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: blumeops-pg-borgmatic
namespace: databases
type: kubernetes.io/basic-auth
stringData:
username: borgmatic
password: {{ op://vg6xf6vvfmoh5hqjjhlhbeoaie/mw2bv5we7woicjza7hc6s44yvy/db-password }}
```
**File**: `argocd/manifests/databases/blumeops-pg.yaml` (add to managed roles)
```yaml
managed:
roles:
# ... existing eblume role ...
# borgmatic read-only user for backups
- name: borgmatic
login: true
connectionLimit: -1
ensure: present
inherit: true
inRoles:
- pg_read_all_data
passwordSecret:
name: blumeops-pg-borgmatic
```
**Deploy**:
```bash
op inject -i argocd/manifests/databases/secret-borgmatic.yaml.tpl | kubectl apply -f -
argocd app set blumeops-pg --revision feature/p3-postgresql-borgmatic
argocd app sync blumeops-pg
```
---
### 7. Configure borgmatic for dual database backup
**File**: `ansible/roles/borgmatic/defaults/main.yml`
```yaml
borgmatic_postgresql_databases:
# Brew PostgreSQL on indri (current production)
- name: miniflux
hostname: localhost
port: 5432
username: borgmatic
# k8s PostgreSQL (CloudNativePG) - backup both during migration
- name: miniflux
hostname: k8s-pg.tail8d86e.ts.net
port: 5432
username: borgmatic
```
**File**: `ansible/roles/postgresql/tasks/main.yml` (update .pgpass)
```yaml
- name: Write .pgpass file for borgmatic backups
ansible.builtin.copy:
content: |
# Managed by ansible - only read-only roles
localhost:{{ postgresql_port }}:*:borgmatic:{{ postgresql_user_passwords['borgmatic'] }}
k8s-pg.tail8d86e.ts.net:5432:*:borgmatic:{{ postgresql_user_passwords['borgmatic'] }}
dest: ~/.pgpass
mode: '0600'
no_log: true
```
---
### 8. Verify complete backup pipeline
```bash
mise run provision-indri -- --tags borgmatic,postgresql
ssh indri '/opt/homebrew/bin/borgmatic --verbosity 1'
ssh indri 'borgmatic list --archive latest'
```
**Expected output**: Archive contains both dumps:
- `localhost/miniflux/miniflux`
- `k8s-pg.tail8d86e.ts.net/miniflux/miniflux`
---
### 9. Fix ArgoCD drift from CNPG defaults
**Problem**: ArgoCD showed blumeops-pg as OutOfSync due to CNPG operator adding default values.
**Solution**: Add CNPG defaults explicitly to managed roles.
**File**: `argocd/manifests/databases/blumeops-pg.yaml`
```yaml
managed:
roles:
- name: eblume
# ... existing fields ...
connectionLimit: -1
ensure: present
inherit: true
- name: borgmatic
# ... existing fields ...
connectionLimit: -1
ensure: present
inherit: true
```
---
### 10. Update zk documentation
Updated:
- `~/code/personal/zk/borgmatic.md` - k8s-pg backup documentation and log entry
- `~/code/personal/zk/postgresql.md` - k8s PostgreSQL section and log entry
---
## New Files
| Path | Purpose |
|------|---------|
| `argocd/manifests/databases/secret-borgmatic.yaml.tpl` | borgmatic user password template |
## Modified Files
| Path | Change |
|------|--------|
| `ansible/roles/borgmatic/defaults/main.yml` | Added `borgmatic_local_path`, k8s-pg database entry |
| `ansible/roles/borgmatic/templates/config.yaml.j2` | Added `local_path` option |
| `ansible/roles/postgresql/tasks/main.yml` | Added k8s-pg to .pgpass |
| `argocd/apps/apps.yaml` | Disabled selfHeal |
| `argocd/manifests/databases/blumeops-pg.yaml` | Added borgmatic managed role, CNPG defaults |
| `pulumi/policy.hujson` | Added ACL grant homelab → k8s on tcp:5432 |
---
## Verification
- [x] borgmatic backup runs successfully
- [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries)
- [x] borgmatic user created in k8s-pg with pg_read_all_data role
- [x] Both localhost and k8s-pg databases in backup archive
- [x] ArgoCD shows blumeops-pg as Synced
- [x] zk documentation updated
---
## Rollback
Keep brew PostgreSQL running until Phase 4 verified
Keep brew PostgreSQL running until Phase 4 verified. To revert:
1. Remove k8s-pg entry from borgmatic databases
2. Remove k8s-pg from .pgpass
3. `mise run provision-indri -- --tags borgmatic,postgresql`
---
## Implementation Notes
*Added during implementation for retrospective review*
### borgmatic LaunchAgent PATH Issue
**Problem**: borgmatic LaunchAgent failed with `borg: command not found`
**Root cause**: LaunchAgents run with minimal PATH that doesn't include `/opt/homebrew/bin`
**Solution**: Added `local_path: /opt/homebrew/bin/borg` to borgmatic config. This was already done for `pg_dump_command` but not for borg itself.
**Lesson**: Any tool invoked by borgmatic needs absolute path when running from LaunchAgent.
### 1Password Field Name Mismatch
**Issue**: Initial secret template used `password` field but 1Password item had `db-password`.
**Discovery**: Error message from `op inject` indicated field not found.
**Fix**: Updated template to use correct field name `db-password`.
### ACL Grant Discovery
**Problem**: Connection from indri (tag:homelab) to k8s-pg (tag:k8s) failed.
**Diagnosis**: Checked Tailscale operator proxy logs which showed "no rules matched" - clear indication of missing ACL.
**Solution**: Added explicit grant in `pulumi/policy.hujson` for `tag:homelab``tag:k8s` on `tcp:5432`.
### ArgoCD selfHeal and Feature Branch Development
**Problem**: When testing changes, temporarily pointed blumeops-pg app to feature branch via `argocd app set --revision`. ArgoCD's selfHeal kept reverting it back to main.
**Discussion**: Two options considered:
- Option A: Disable selfHeal on apps app (manual sync required for new apps)
- Option B: Keep selfHeal, use different workflow
**Decision**: Option A chosen. The apps app now only has `prune: true`, not selfHeal. This allows:
1. Temporarily testing feature branches
2. Manual control over when app manifest changes are applied
**Trade-off**: Must manually sync apps app when adding/removing Application manifests.
### CloudNativePG Managed Role Reconciliation
**Issue**: After creating borgmatic secret with correct password, CNPG didn't immediately update the user.
**Solution**: Annotated the Cluster to trigger reconciliation:
```bash
kubectl annotate cluster blumeops-pg -n databases cnpg.io/reconcile=$(date +%s) --overwrite
```
### ArgoCD Drift from CNPG Defaults
**Problem**: blumeops-pg showed OutOfSync despite successful syncs.
**Cause**: CNPG operator adds default values (`connectionLimit: -1`, `ensure: present`, `inherit: true`) to managed roles that weren't in our spec.
**Solution**: Added these defaults explicitly to our spec to match what CNPG generates.
**Comment added**: Documented in blumeops-pg.yaml that these are "CNPG defaults added to prevent ArgoCD drift".
### Git Workflow for Phase 3
1. Created feature branch: `feature/p3-postgresql-borgmatic`
2. Made commits throughout implementation
3. Pointed blumeops-pg app to feature branch for testing
4. Created PR #32 for review
5. After merge, reset app to main: `argocd app set blumeops-pg --revision main`
This workflow was enabled by disabling selfHeal (see above).