blumeops

Author	SHA1	Message	Date
Erich Blume	947e4310c3	C2: migrate immich from minikube to ringtail (mikado chain) (#356 ) ## Summary C2 Mikado chain to move the entire Immich stack (server, ML, valkey, postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the largest single tenant on minikube (~1.5 GiB resident) and minikube is currently memory-saturated (97% RAM, swapping). This is the first concrete chain in the broader indri-k8s decommission effort. This PR contains the planning layer only — 7 cards (1 goal + 6 prerequisites). Implementation cycles follow per the Mikado Branch Invariant. ## Goal end-state - Immich `server`, `machine-learning`, `valkey` on ringtail. - ML pod uses ringtail's RTX 4080 (performance win — currently CPU-only). - CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail. - Library still on sifaka NFS — ringtail mounts the same path. - `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress. - Minikube `immich` and `immich-pg` are removed. ## Cards \| Card \| Depends on \| \|---\|---\| \| `migrate-immich-to-ringtail` (goal) \| all six below \| \| `cnpg-on-ringtail` \| — \| \| `immich-pg-on-ringtail` \| cnpg-on-ringtail \| \| `immich-pg-data-migration` \| immich-pg-on-ringtail \| \| `sifaka-nfs-from-ringtail` \| — \| \| `immich-app-on-ringtail` \| immich-pg-on-ringtail, sifaka-nfs-from-ringtail \| \| `immich-cutover-and-decommission` \| immich-pg-data-migration, immich-app-on-ringtail \| ## Key constraints - No data loss. Downtime is acceptable; data loss is not. Two surfaces matter: postgres (ML embeddings, face data — slow to re-derive) and the library files (don't move, but NFS access from ringtail must be verified). - Migration method: Option A is a CNPG `externalCluster` basebackup → promote. Option B is `pg_dump`/`pg_restore` as a documented fallback. Either way, dry-run against a scratch cluster first. - Why pg moves too (not cross-cluster): keeping pg on minikube would block the whole decommission, and Immich is chatty with pg so tailnet round-trips would hurt. ## Test plan - [ ] Plan review — does the dependency graph make sense? - [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the chain correctly. - [ ] Per-card implementation cycles land separately (commit convention enforced by hook). Reviewed-on: #356	2026-05-13 16:46:17 -07:00
Erich Blume	07f52e9488	Deploy Paperless-ngx document management (#328 ) All checks were successful Build Container / detect (push) Successful in 2s Details Build Container / build-dockerfile (paperless) (push) Successful in 9s Details ## Summary - Add paperless-ngx (v2.20.13) as a new ArgoCD-managed service on indri - Dockerfile built from forge mirror (`mirrors/paperless-ngx`), multi-stage with s6-overlay - PostgreSQL database via `blumeops-pg` CNPG cluster, Redis sidecar for Celery - NFS document storage on sifaka (`/volume1/paperless`) - Authentik OIDC SSO via baked JSON blob from 1Password - Caddy route at `paperless.ops.eblu.me` - 1Password item "Paperless (blumeops)" created with all secrets ## Files - `containers/paperless/Dockerfile` — multi-stage build - `argocd/manifests/paperless/` — full k8s manifest set - `argocd/apps/paperless.yaml` — ArgoCD application - `argocd/manifests/databases/` — CNPG role + ExternalSecret - `ansible/roles/caddy/defaults/main.yml` — Caddy route - `service-versions.yaml` — version tracking entry - `docs/reference/services/paperless.md` — reference card ## Remaining deploy steps 1. Build container: `mise run container-build-and-release paperless` 2. Update kustomization.yaml `newTag` with actual image tag 3. Create Authentik application/provider for paperless 4. Create `paperless` database on blumeops-pg 5. Sync ArgoCD apps, then sync paperless from branch 6. Provision Caddy: `mise run provision-indri -- --tags caddy` 7. Verify at https://paperless.ops.eblu.me 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #328	2026-04-08 17:54:12 -07:00
Erich Blume	efae404d1e	Remove superuser from teslamate PG role, transfer extension ownership teslamate had superuser on the shared blumeops-pg cluster (which also hosts miniflux and authentik). Downgraded to plain database owner with extension ownership (cube, earthdistance) transferred manually so it can still ALTER EXTENSION UPDATE. earthdistance is untrusted in PG so DROP+CREATE would need temporary superuser escalation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 15:36:39 -07:00
Erich Blume	ca0c9354ee	Add borgmatic backups for authentik and immich databases (#314 ) ## Summary - Add `authentik` database (blumeops-pg cluster) to borgmatic pg_dump backups - Add `immich` database (immich-pg cluster) to borgmatic pg_dump backups - For immich-pg: new borgmatic managed role with `pg_read_all_data`, ExternalSecret, Tailscale LoadBalancer service, and Caddy L4 TCP proxy on port 5433 - Update backup docs to reflect all four CNPG databases + mealie SQLite ## Deploy plan Deploy order matters — k8s resources must exist before ansible can route to them: 1. ArgoCD (databases app): sync to pick up immich-pg borgmatic role, ExternalSecret, and Tailscale service ``` argocd app set blumeops-pg --revision feature/borgmatic-all-pg-backups argocd app sync blumeops-pg ``` 2. Wait for `immich-pg-tailscale` service to get a Tailscale IP and `immich-pg.tail8d86e.ts.net` to resolve 3. Ansible (caddy): deploy Caddy L4 route for port 5433 ``` mise run provision-indri -- --tags caddy ``` 4. Ansible (borgmatic): deploy updated config and .pgpass ``` mise run provision-indri -- --tags borgmatic ``` 5. Verify: trigger a manual borgmatic run and check all four pg_dump streams succeed ``` borgmatic --verbosity 1 2>&1 \| grep -E '(Dumping\|ERROR)' ``` ## Test plan - [x] `kubectl kustomize` builds cleanly - [x] `ansible --check --diff` for borgmatic and caddy show expected changes - [ ] ArgoCD sync succeeds for databases app - [ ] `immich-pg.tail8d86e.ts.net` resolves - [ ] `pg.ops.eblu.me:5433` accepts connections - [ ] `borgmatic --verbosity 1` dumps all four databases without errors Reviewed-on: #314	2026-03-27 16:59:58 -07:00
Erich Blume	02eb169403	Pin blumeops-pg to PostgreSQL 18.3 Replace floating :18 tag with pinned :18.3 (upstream out-of-cycle release fixing 18.2 regressions). Stamps service as reviewed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 16:25:32 -08:00
Erich Blume	71cb256527	Deploy Authentik identity provider (C2 Mikado) (#227 ) ## Summary C2 Mikado chain for deploying Authentik as the SSO identity provider, replacing Dex. This PR will evolve over multiple sessions. Each iteration adds documentation (prerequisite cards) and eventually code as leaf nodes are resolved. ## Current Mikado State - Goal: `deploy-authentik` (active) - Leaf prerequisites: - `build-authentik-container` — Build Nix container image - `provision-authentik-database` — Create PostgreSQL database on CNPG cluster - `create-authentik-secrets` — Create 1Password item with credentials ## Process refinements - Updated agent-change-process with lessons from first attempt: reset code before committing cards, open PRs early ## Test plan - [ ] `mise run docs-mikado` shows correct dependency chain - [ ] Leaf nodes can be worked independently - [ ] Container builds on ringtail - [ ] Authentik starts and reaches healthy state - [ ] Forgejo OAuth2 connector works Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/227	2026-02-20 12:55:59 -08:00
Erich Blume	22f418d0dc	Doc review: connect-to-postgres, create-release-artifact-workflow, deploy-k8s-service (#191 ) ## Summary Review session covering 3 docs, plus a codebase-wide cleanup: ### Docs reviewed - connect-to-postgres — verified end-to-end (psql connection tested), stamped - create-release-artifact-workflow — clarified that `build-blumeops.yaml` is only a version bump example (not a packages API example) - deploy-k8s-service — fixed stale repoURL (`indri:2200` → `forge.ops.eblu.me:2222`), wrong Caddy config keys (`upstream` → `backend`, added missing `host`), updated Homepage group to "Services", added Tailscale tag documentation ### Codebase cleanup - Migrated all remaining `op item get --fields` calls to `op read` URI syntax across 7 files (docs, READMEs, YAML comments) - Simplified the `op read` vs `op item get` guidance in CLAUDE.md ## Side findings (not addressed) - New `immich-pg` CNPG cluster not yet documented in the postgresql reference card ## Test plan - [x] `psql` connection to `pg.ops.eblu.me` verified - [x] All pre-commit hooks pass - [x] `docs-check-links`, `docs-check-index`, `docs-check-frontmatter` pass Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/191	2026-02-15 07:42:01 -08:00
Erich Blume	9114aac8f6	Switch all ExternalSecrets to creationPolicy: Owner ESO now has full ownership of these secrets. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:27:16 -08:00
Erich Blume	dd6cf20d51	Remove obsolete secret templates - Delete 13 .yaml.tpl files replaced by ExternalSecrets - Update immich/README.md with direct CNPG secret copy instructions - Update miniflux/README.md with context flag and ESO note Only 1password-connect/secret-credentials.yaml.tpl remains (bootstrap). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 20:26:37 -08:00
Erich Blume	351528474c	Add ExternalSecrets for remaining k8s secrets Migrate 10 secret templates to ESO ExternalSecrets with 1Password Connect: - databases: eblume, borgmatic, teslamate passwords - tailscale-operator: OAuth client credentials - grafana-config: admin password, teslamate datasource - teslamate: db password, encryption key - forgejo-runner: runner registration token - argocd: forge SSH credentials All use creationPolicy: Merge for safe migration from existing secrets. Skipped: - miniflux/secret-db: Uses CNPG secret, not 1Password directly - immich/secret-db: Requires 1Password item creation first - 1password-connect: Bootstrap secret, must stay as template Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 19:50:38 -08:00
Erich Blume	8621996343	Add Immich photo management + migrate forge URLs (#62 ) ## Summary - Migrate all ArgoCD app repo URLs from `indri.tail8d86e.ts.net:2200` to `forge.ops.eblu.me:2222` - Add Immich self-hosted photo management service with: - Helm chart deployment via ArgoCD - PostgreSQL cluster with pgvecto.rs for AI vector search (immich-pg) - NFS storage on sifaka for photo library (2Ti) - Tailscale Ingress + Caddy proxy for `photos.ops.eblu.me` - Machine learning service for face/object recognition ## Deployment and Testing - [x] Update ArgoCD repo-creds-forge secret with new URL (one-time manual step) - [ ] Sync `apps` to pick up new applications - [ ] Sync all existing apps to verify new forge URL works - [ ] Sync `blumeops-pg` to deploy immich-pg cluster - [ ] Wait for immich-pg to be healthy - [ ] Create immich-db secret from auto-generated password - [ ] Sync `immich-storage` (PV, PVC, Ingress) - [ ] Sync `immich` (Helm chart) - [ ] Run `mise run provision-indri -- --tags caddy` to add photos.ops.eblu.me - [ ] Verify Immich UI is accessible 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/62	2026-01-26 11:20:11 -08:00
Erich Blume	272ddb213b	Add TeslaMate deployment for Tesla Model Y data logging (#47 ) ## Summary - Add TeslaMate k8s deployment with Tailscale ingress at tesla.tail8d86e.ts.net - Add teslamate user to CloudNativePG blumeops-pg cluster - Add TeslaMate PostgreSQL datasource to Grafana - Import 18 TeslaMate Grafana dashboards for charging, drives, efficiency, etc. - Add teslamate database to borgmatic backup configuration ## Deployment and Testing - [ ] Create 1Password items: "TeslaMate DB Password" and "TeslaMate Encryption Key" - [ ] Apply database user secret: `op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl \| kubectl apply -f -` - [ ] Sync blumeops-pg: `argocd app sync blumeops-pg` - [ ] Create teslamate database - [ ] Apply teslamate secrets (encryption key, db connection) - [ ] Apply Grafana datasource secret: `op inject -i argocd/manifests/grafana-config/secret-teslamate-datasource.yaml.tpl \| kubectl apply -f -` - [ ] Sync apps and teslamate: `argocd app sync apps teslamate grafana grafana-config` - [ ] Complete Tesla API OAuth flow at https://tesla.tail8d86e.ts.net - [ ] Verify data collection starts - [ ] Verify Grafana dashboards show data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/47	2026-01-22 21:25:44 -08:00
Erich Blume	17023085cb	Migrate observability stack to Kubernetes (#42 ) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42	2026-01-22 12:06:02 -08:00
Erich Blume	21848a7919	P5.1: Migrate minikube from podman to QEMU2 driver (#38 ) ## Summary - Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support - Update ansible minikube role with qemu installation and containerd runtime - Remove podman role dependency from indri.yml - Add synology user creation steps and post-migration zot reconfiguration notes ## Why Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support. ## Deployment and Testing - [ ] Create k8s-storage user on Synology DSM - [ ] Store credentials in 1Password (synology-k8s-storage) - [ ] Export current k8s state - [ ] Stop and delete podman-based minikube cluster - [ ] Run ansible to create QEMU2 cluster - [ ] Test NFS volume mount with test pod - [ ] Redeploy ArgoCD and all apps - [ ] Verify all services healthy - [ ] Reconfigure zot registry mirrors for containerd (post-migration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38	2026-01-21 16:03:37 -08:00
Erich Blume	735b643429	P4: Miniflux migration + PostgreSQL consolidation (#33 ) ## Summary - Deploy miniflux in k8s via ArgoCD - Expose via Tailscale Ingress at feed.tail8d86e.ts.net - Retire brew PostgreSQL (no longer needed) - Rename k8s-pg to pg (canonical hostname) - Remove ansible miniflux and postgresql roles - Update borgmatic to backup pg.tail8d86e.ts.net - Update all zk documentation ## Deployment and Testing - [x] Miniflux pod running in k8s - [x] User login works at https://feed.tail8d86e.ts.net - [x] Feeds and entries visible - [x] brew miniflux and postgresql stopped - [x] Tailscale services migrated (feed, pg) - [x] zk documentation updated - [x] Run ansible to apply role removals - [ ] Verify borgmatic backup with new pg hostname 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/33	2026-01-20 09:04:47 -08:00
Erich Blume	0c6f0a13c3	Add CNPG default values to prevent ArgoCD drift CloudNativePG operator fills in connectionLimit, ensure, and inherit defaults on managed roles. Adding these explicitly keeps ArgoCD in sync. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 18:02:42 -08:00
Erich Blume	eb952aae01	P3: PostgreSQL disaster recovery test and borgmatic k8s-pg backup (#32 ) ## Summary - Fixed borgmatic `borg: command not found` by adding `local_path` config option - Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg - Added borgmatic user to k8s-pg via CloudNativePG managed roles - Configured borgmatic to backup both localhost and k8s-pg PostgreSQL databases - Added Tailscale ACL grant for `tag:homelab` → `tag:k8s` on port 5432 - Disabled selfHeal on apps app to allow manual revision changes during development ## Changes - `ansible/roles/borgmatic/` - Added `local_path` and k8s-pg database entry - `ansible/roles/postgresql/tasks/main.yml` - Added k8s-pg to `.pgpass` - `argocd/apps/apps.yaml` - Disabled selfHeal - `argocd/manifests/databases/blumeops-pg.yaml` - Added borgmatic managed role - `argocd/manifests/databases/secret-borgmatic.yaml.tpl` - New secret template - `pulumi/policy.hujson` - Added ACL grant for backup access ## Deployment and Testing - [x] Borgmatic backup runs successfully - [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries verified) - [x] borgmatic user created in k8s-pg with pg_read_all_data role - [x] Both localhost and k8s-pg databases in backup archive - [x] zk documentation updated (borgmatic.md, postgresql.md) - [ ] After merge: set blumeops-pg app back to main revision 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/32	2026-01-19 18:00:32 -08:00
Erich Blume	a8f4d00294	K8s Migration Phase 1: Infrastructure Setup (#29 ) ## Summary - Split k8s migration plan into phases folder for easier navigation - Added `tag:k8s` to Pulumi ACLs for Kubernetes workloads - Phase 1 work in progress ## Phase 1 Goals - Tailscale Kubernetes Operator - CloudNativePG Operator - PostgreSQL cluster for future app migrations ## Deployment and Testing - [ ] Review Phase 1 plan - [ ] `mise run tailnet-preview` to verify ACL changes - [ ] `mise run tailnet-up` to apply ACL changes - [ ] Create Tailscale OAuth client (manual) - [ ] Deploy operators and PostgreSQL cluster 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/29	2026-01-19 09:49:52 -08:00

18 commits