Wave 1 indri→ringtail migration: paperless, teslamate, mealie (#363)

Migrate paperless, teslamate, and mealie off the OOM-saturated minikube-indri node onto ringtail k3s, shedding ~1.1 GiB of resident load. Second chain in the indri-k8s decommission after immich. **Containers ported to Nix (default.nix), build-verified on ringtail:** - paperless → wraps nixpkgs paperless-ngx 2.20.15 (pinned unstable); runs as web/worker/beat/consumer - mealie → wraps nixpkgs mealie 3.16.0 (forward 4-minor bump, breaking-change reviewed); single gunicorn, SQLite - teslamate → from-scratch beamPackages mixRelease (not in nixpkgs); erlang_27+elixir_1_18, npm assets, ex_cldr locales pre-fetched **Data:** cold downtime-tolerant cutover. paperless+teslamate postgres dump/restore from quiesced source into a new ringtail blumeops-pg CNPG cluster; mealie SQLite PVC copied. Source DBs untouched until verified (rollback = repoint). **Also:** ringtail blumeops-pg cluster + ExternalSecrets scaffold; fixes pre-existing shower version-check drift. Runbook: docs/how-to/ringtail/migrate-wave1-ringtail.md. Deploy-from-branch + cutover happens before merge; container images rebuilt from main after merge. Reviewed-on: #363
2026-06-03 10:34:00 -07:00 · 2026-06-03 10:34:00 -07:00 · fcac8e5a72
commit fcac8e5a72
parent 40bd929820
45 changed files with 1422 additions and 445 deletions
--- a/docs/changelog.d/migrate-wave1-ringtail.infra.md
+++ b/docs/changelog.d/migrate-wave1-ringtail.infra.md
@ -0,0 +1,13 @@
+Move paperless, teslamate, and mealie off `minikube-indri` onto
+`k3s-ringtail`, shedding ~1.1 GiB of resident load from the
+OOM-thrashing 8 GiB minikube node (the kernel OOM killer had been
+killing `kube-apiserver`/`dockerd`/argocd, flapping every
+minikube-hosted service at once). paperless + teslamate databases
+move into a fresh CNPG `blumeops-pg` cluster on ringtail via a cold
+`pg_dump`/`pg_restore` from the quiesced source — row counts verified
+equal before any routing flip; source DBs dropped only after the
+ringtail side serves traffic. mealie's SQLite PVC is copied as-is.
+paperless media stays on sifaka NFS. Downtime-tolerant cold cutover
+(no streaming replication); rollback is repoint-and-scale-up with the
+source untouched. Second chain in the indri-k8s decommission after
+[[migrate-immich-to-ringtail]].
--- a/docs/how-to/immich/migrate-immich-to-ringtail.md
+++ b/docs/how-to/immich/migrate-immich-to-ringtail.md
@ -122,6 +122,8 @@ file).

 ## Related

+- [[migrate-wave1-ringtail]] — the next chain in the indri-k8s
+  decommission: paperless, teslamate, and mealie
 - [[shower-on-ringtail]] — a previous migration to ringtail (simpler:
  no upstream cluster, SQLite, no GPU)
 - [[connect-to-postgres]] — getting a psql session against CNPG
--- a/docs/how-to/ringtail/migrate-wave1-ringtail.md
+++ b/docs/how-to/ringtail/migrate-wave1-ringtail.md
@ -0,0 +1,176 @@
+---
+title: Migrate Wave 1 (paperless, teslamate, mealie) to Ringtail
+modified: 2026-06-03
+last-reviewed: 2026-06-03
+tags:
+  - how-to
+  - operations
+  - ringtail
+  - migration
+---
+
+# Migrate Wave 1 to Ringtail
+
+Move paperless, teslamate, and mealie off `minikube-indri` and onto
+`k3s-ringtail`. This is the load-shedding response to minikube going
+OOM: the kernel OOM killer was thrashing the 8 GiB node — killing
+`kube-apiserver`, `dockerd`, and the argocd application-controller —
+which made every minikube-hosted service probe-flap at once. These
+three app pods are ~1.1 GiB resident combined and are the heaviest
+non-observability tenants left on minikube. Following
+[[migrate-immich-to-ringtail]], the first chain in the indri-k8s
+decommission.
+
+## End state
+
+- `paperless`, `teslamate`, and `mealie` run on ringtail k3s in their
+  own namespaces, off minikube entirely.
+- A CNPG `blumeops-pg` Cluster runs in a `databases` namespace on
+  ringtail (PostgreSQL, owned by ringtail's `cnpg-system` operator),
+  holding the `paperless` and `teslamate` databases. Apps reach it
+  in-cluster via `blumeops-pg-rw.databases.svc.cluster.local`.
+- mealie keeps its SQLite database; its 2 GiB `mealie-data` PVC is
+  copied to a ringtail PVC.
+- paperless media still lives on [[sifaka]] via NFS (RWX, 500 GiB),
+  mounted from ringtail pods. teslamate has no file state.
+- Routing: `paperless.ops.eblu.me`, `teslamate.ops.eblu.me`, and
+  `mealie.ops.eblu.me` (Caddy on indri) proxy to Tailscale
+  ProxyGroup ingresses on ringtail. Service names are unchanged.
+- The minikube manifests and the `paperless`/`teslamate`/`mealie`
+  databases inside indri's `blumeops-pg` are removed only after
+  cutover is verified.
+
+## Non-goals
+
+- Migrating the rest of `blumeops-pg` (e.g. miniflux) — that is a
+  later wave. This chain moves only the paperless + teslamate
+  databases out; the source cluster on indri stays up for the others.
+- Version bumps or config changes. Lift-and-shift only.
+- Public (Fly) exposure changes. These stay tailnet-only.
+- The observability stack (prometheus/loki/tempo/grafana) — deferred;
+  it carries 50 GiB of local TSDB and is the riskiest move.
+
+## Critical constraint: no data loss
+
+**Downtime is acceptable — data loss is not.** We can take each
+service fully offline for its cutover, which removes the entire
+class of streaming-replication and double-writer hazards. The cold
+dump is taken from a *quiesced* source, so it is internally
+consistent.
+
+Data surfaces:
+
+1. **paperless postgres** — document metadata, tags, correspondents,
+   the search index state. The document *files* are on NFS and never
+   move, but losing the DB means files-without-index. This is the
+   surface to protect most carefully.
+2. **teslamate postgres** — drive/charge history. Re-derivable only
+   from Tesla's API for a limited window; treat as unrecoverable.
+3. **mealie SQLite** — recipes, meal plans. On the `mealie-data` PVC.
+
+The source databases on indri are **never dropped until the ringtail
+side is verified and serving**. Rollback is "repoint and scale back
+up," not "restore from backup." [[borgmatic]] remains the backstop.
+
+## Why a fresh CNPG cluster (not cross-cluster pg)
+
+indri's `blumeops-pg` is already exposed tailnet-wide at
+`pg.ops.eblu.me` (Caddy L4), so we *could* leave the DBs on indri and
+just move the app pods. We are not, because:
+
+- The goal is to retire minikube — keeping pg there blocks it and
+  leaves a cross-host runtime dependency (ringtail apps SPOF on
+  indri's pg over the tailnet).
+- CNPG is the same operator on both clusters; a Cluster CR on ringtail
+  is mechanically equivalent to the one on minikube.
+- Naming the ringtail cluster `blumeops-pg` in `databases` lets apps
+  use the same in-cluster DNS they would on indri.
+
+## Cold-cutover procedure (per service)
+
+Do these one service at a time. paperless first (heaviest, highest
+data-sensitivity), then teslamate, then mealie.
+
+### 0. Prerequisites (once, before any service)
+
+- Confirm ringtail's `cnpg-system` operator and `databases` namespace
+  are healthy (immich-pg already runs there).
+- Confirm ringtail pods can reach indri's `pg.ops.eblu.me:5432` (used
+  only to pull the dump) and the sifaka NFS export for paperless
+  media. See [[sifaka-nfs-from-ringtail]].
+- Define the ringtail `blumeops-pg` CNPG Cluster manifest (model on
+  `databases-ringtail/immich-pg.yaml`) and its ExternalSecrets for
+  the per-app roles. Sync it; let it come up empty and healthy.
+
+### 1. Quiesce the source
+
+```fish
+kubectl --context=minikube-indri -n <ns> scale deploy/<app> --replicas=0
+# confirm 0 running, DB now has no writers
+```
+
+### 2. Dump from indri, restore to ringtail (postgres apps)
+
+```fish
+# dump the single app DB from the quiesced source
+kubectl --context=minikube-indri -n databases exec blumeops-pg-1 -- \
+  pg_dump -Fc -d <appdb> > /tmp/<appdb>.dump
+
+# restore into the ringtail cluster
+kubectl --context=k3s-ringtail -n databases exec -i blumeops-pg-1 -- \
+  pg_restore --no-owner --role=<approle> -d <appdb> < /tmp/<appdb>.dump
+```
+
+For **mealie** (SQLite) instead: copy the `mealie-data` PVC contents
+to the ringtail PVC (e.g. a one-shot rsync pod mounting both, or
+`kubectl cp` via a helper pod). Verify the `.db` file size and that
+mealie boots read-only against it.
+
+### 3. Verify the restore (before any routing flips)
+
+- Row counts match source for the key tables, scripted:
+  - paperless: `documents_document`, `documents_tag`,
+    `documents_correspondent`, `auth_user`.
+  - teslamate: `cars`, `drives`, `charging_processes`, `positions`.
+- `pg_dump --schema-only --no-owner` diff between source and dest is
+  empty modulo CNPG-managed roles.
+- Boot the app against the ringtail DB on its tailnet name *before*
+  Caddy is flipped, and smoke-test (paperless: documents list +
+  search; teslamate: dashboard loads recent drives; mealie: recipes
+  list).
+
+### 4. Release the service name
+
+```fish
+# delete the minikube tailscale ingress so ringtail can claim the name
+kubectl --context=minikube-indri -n <ns> delete ingress <app>-tailscale
+```
+
+### 5. Bring up on ringtail
+
+- Apply the ringtail manifests (new ArgoCD app `<app>-ringtail`,
+  `destination.server` = `https://ringtail.tail8d86e.ts.net:6443`).
+  App points at `blumeops-pg-rw.databases.svc.cluster.local`.
+- Sync; wait for healthy + the ProxyGroup ingress to get its name.
+
+### 6. Flip routing
+
+- Repoint the Caddy `<app>.ops.eblu.me` upstream at the ringtail
+  ProxyGroup ingress (provision-indri, caddy role).
+- `mise run services-check` — confirm the service flips from FIRING
+  to OK and no neighbours regressed.
+
+### 7. Decommission the source (only after verification)
+
+- Remove the minikube manifests for the app.
+- Drop the app DB from indri's `blumeops-pg` (paperless/teslamate)
+  **last**, once the ringtail side has served real traffic.
+
+## Rollback
+
+If a cutover fails verification at any step before §7:
+
+- Re-create the minikube tailscale ingress (if §4 ran).
+- Scale the minikube app back to `1`.
+- Repoint Caddy back to the minikube ingress.
+- The source DB was never modified or dropped. Document the failure.