From 46237336950ee50ff9927d1af9f57209356bdba2 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Wed, 13 May 2026 11:05:40 -0700 Subject: [PATCH] C2(migrate-immich-to-ringtail): plan introduce mikado chain Goal: move immich (server, ML, valkey, postgres) off minikube-indri onto k3s-ringtail. Immich is the largest single tenant on minikube (~1.5 GiB resident) and minikube is memory-saturated. Prerequisite cards: - cnpg-on-ringtail - immich-pg-on-ringtail (requires cnpg-on-ringtail) - immich-pg-data-migration (requires immich-pg-on-ringtail) - sifaka-nfs-from-ringtail - immich-app-on-ringtail (requires immich-pg-on-ringtail, sifaka-nfs-from-ringtail) - immich-cutover-and-decommission (requires immich-pg-data-migration, immich-app-on-ringtail) Data loss is a critical failure; downtime is acceptable. The cutover plan favors a CNPG externalCluster basebackup (Option A) with pg_dump as the documented fallback (Option B). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/how-to/immich/cnpg-on-ringtail.md | 53 +++++++++++ docs/how-to/immich/immich-app-on-ringtail.md | 71 ++++++++++++++ .../immich/immich-cutover-and-decommission.md | 93 ++++++++++++++++++ .../how-to/immich/immich-pg-data-migration.md | 82 ++++++++++++++++ docs/how-to/immich/immich-pg-on-ringtail.md | 61 ++++++++++++ .../immich/migrate-immich-to-ringtail.md | 95 +++++++++++++++++++ .../how-to/immich/sifaka-nfs-from-ringtail.md | 68 +++++++++++++ 7 files changed, 523 insertions(+) create mode 100644 docs/how-to/immich/cnpg-on-ringtail.md create mode 100644 docs/how-to/immich/immich-app-on-ringtail.md create mode 100644 docs/how-to/immich/immich-cutover-and-decommission.md create mode 100644 docs/how-to/immich/immich-pg-data-migration.md create mode 100644 docs/how-to/immich/immich-pg-on-ringtail.md create mode 100644 docs/how-to/immich/migrate-immich-to-ringtail.md create mode 100644 docs/how-to/immich/sifaka-nfs-from-ringtail.md diff --git a/docs/how-to/immich/cnpg-on-ringtail.md b/docs/how-to/immich/cnpg-on-ringtail.md new file mode 100644 index 0000000..4a7aff0 --- /dev/null +++ b/docs/how-to/immich/cnpg-on-ringtail.md @@ -0,0 +1,53 @@ +--- +title: CNPG Operator on Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +status: active +tags: + - how-to + - operations + - postgres + - ringtail +--- + +# CNPG Operator on Ringtail + +Bring up the `cloudnative-pg` operator on `k3s-ringtail`. Today the +operator only exists on `minikube-indri` (see +`argocd/apps/cloudnative-pg.yaml`, destination `kubernetes.default.svc`). + +Prerequisite of [[migrate-immich-to-ringtail]]; consumed by +[[immich-pg-on-ringtail]]. + +## What to do + +- Add a sibling `argocd/apps/cloudnative-pg-ringtail.yaml` pointing + at the same mirror (`mirrors/cloudnative-pg`, tag `v1.27.1`), + destination `https://ringtail.tail8d86e.ts.net:6443`, + namespace `cnpg-system`. +- Mirror the `ServerSideApply=true` and `CreateNamespace=true` sync + options (the CRDs exceed the annotation size limit). +- Sync `apps` then `cloudnative-pg-ringtail`. Verify the operator + pod is running on ringtail. + +## Verification + +```fish +kubectl --context=k3s-ringtail -n cnpg-system get pods +kubectl --context=k3s-ringtail get crd clusters.postgresql.cnpg.io +``` + +## Why a separate app + +Each ArgoCD app targets a single cluster via `destination.server`. +We could parameterize with ApplicationSets, but blumeops' convention +is to duplicate the manifest with a `-ringtail` suffix (see +`alloy-ringtail`, `external-secrets-ringtail`, etc.). Keep the +convention. + +## Out of scope + +- Postgres clusters themselves (`immich-pg`, etc.) — those come from + [[immich-pg-on-ringtail]]. +- Removing the minikube cnpg operator. That happens at the very end + of the indri-k8s decommission, not in this chain. diff --git a/docs/how-to/immich/immich-app-on-ringtail.md b/docs/how-to/immich/immich-app-on-ringtail.md new file mode 100644 index 0000000..80c8e5c --- /dev/null +++ b/docs/how-to/immich/immich-app-on-ringtail.md @@ -0,0 +1,71 @@ +--- +title: Immich App on Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +status: active +requires: + - immich-pg-on-ringtail + - sifaka-nfs-from-ringtail +tags: + - how-to + - operations + - immich +--- + +# Immich App on Ringtail + +Bring up `immich-server`, `immich-machine-learning`, and +`immich-valkey` on ringtail. This card stands the stack up against +the *new* pg cluster — it does not move user traffic. Cutover lives +in [[immich-cutover-and-decommission]]. + +## What to do + +- New manifest dir `argocd/manifests/immich-ringtail/` (the suffix + matches the `-ringtail` convention used by other apps). Port from + `argocd/manifests/immich/`: + - `deployment-server.yaml` — point `DB_HOSTNAME` at the ringtail + pg service. + - `deployment-ml.yaml` — add a node selector / toleration so it + schedules where the GPU is, and a `resources.limits` for + `nvidia.com/gpu: 1`. Verify the immich-ml image actually wants + CUDA (it has CPU and CUDA variants — check the upstream chart). + See `argocd/manifests/frigate/` for the existing GPU pod pattern. + - `deployment-valkey.yaml` — straight port. + - `service*.yaml` — straight port. + - `pvc-ml-cache.yaml` — straight port (empty `local-path` PVC). + - `pv-nfs.yaml` + `pvc.yaml` — already covered by + [[sifaka-nfs-from-ringtail]] (may live in this dir or theirs). + - `ingress-tailscale.yaml` — ProxyGroup ingress, **must not** set + an explicit `host:` (or use `host: *`) per the lesson on + ProxyGroup VIP routing. + - `kustomization.yaml` — same `images:` block (server, ML, valkey). +- New ArgoCD app `argocd/apps/immich-ringtail.yaml` targeting + ringtail, namespace `immich`. **Manual sync only** until the + cutover. +- Existing `argocd/apps/immich.yaml` (minikube) stays untouched + during this card — both apps exist briefly. + +## Bring it up against a copy of the DB + +Use the throwaway/test path from [[immich-pg-data-migration#Dry run +before real cutover]]: point the ringtail immich at the *test* pg +cluster first, verify the pod boots, the web UI loads (via +`kubectl port-forward`), assets list, ML embeddings query. Then +tear it down. + +## Verification + +- All three pods Ready. +- ML pod has a GPU attached: `nvidia-smi` inside the container shows + the 4080. +- `immich-server` connects to pg and valkey (no `ECONNREFUSED` in + logs). +- A `kubectl port-forward` to the server service shows the Immich + web UI. + +## Out of scope + +- Public/tailnet routing flip. Caddy still points at the minikube + Tailscale ingress until [[immich-cutover-and-decommission]]. +- Removing the minikube immich. Same. diff --git a/docs/how-to/immich/immich-cutover-and-decommission.md b/docs/how-to/immich/immich-cutover-and-decommission.md new file mode 100644 index 0000000..813ace3 --- /dev/null +++ b/docs/how-to/immich/immich-cutover-and-decommission.md @@ -0,0 +1,93 @@ +--- +title: Immich Cutover and Decommission +modified: 2026-05-13 +last-reviewed: 2026-05-13 +status: active +requires: + - immich-pg-data-migration + - immich-app-on-ringtail +tags: + - how-to + - operations + - immich + - migration +--- + +# Immich Cutover and Decommission + +The user-visible flip. By the time this card opens, the ringtail +stack has been proven against a copy of the data. This card does the +real cutover. + +## Pre-cutover checklist + +- [[immich-pg-data-migration]] dry-run succeeded; method is chosen. +- Ringtail immich stack has been brought up against the test pg, + pods healthy, UI loaded ([[immich-app-on-ringtail#Verification]]). +- Borgmatic just ran successfully (a fresh nightly archive is a + belt-and-suspenders fallback, on top of the live source pg). +- User has been told to stop uploading from the iOS app for the + cutover window. + +## Cutover sequence + +1. **Quiesce source.** `kubectl --context=minikube-indri -n immich + scale deploy/immich-server --replicas=0` and same for ML. Leave + valkey + pg running. Confirm no client traffic on the source pg + via `pg_stat_activity`. +2. **Final sync.** Per chosen method in + [[immich-pg-data-migration]]: + - Option A: promote the ringtail replica. + - Option B: take final `pg_dump`, restore to ringtail + `immich-pg`. +3. **Verify.** Run the row-count and schema-diff checks from + [[immich-pg-data-migration#Verification on the real run]]. +4. **Bring up ringtail immich** against the now-promoted pg + (`argocd app sync immich-ringtail`). Wait for Ready. +5. **Flip routing.** Update Caddy on indri + (`ansible/roles/caddy/defaults/main.yml`): `photos.ops.eblu.me` + upstream changes to the ringtail Tailscale ingress hostname. + `mise run provision-indri -- --tags caddy`. +6. **Smoke test.** Open `photos.ops.eblu.me` in a browser. Sign in. + Scroll the timeline. Open an album. Trigger an ML search. +7. **Update borgmatic.** If the Tailscale hostname for pg changed, + update `borgmatic.cfg` on indri to point at the ringtail + `immich-pg-tailscale` service. Run a manual backup to verify. + +## After cutover + +- `argocd app set immich --revision ` is no longer relevant; + the minikube `immich` app gets deleted entirely. +- Delete `argocd/apps/immich.yaml`, `argocd/manifests/immich/`, and + the minikube `argocd/manifests/databases/immich-pg.yaml` + + `external-secret-immich-borgmatic.yaml` + + `service-immich-pg-tailscale.yaml`. +- Rename `immich-ringtail` back to `immich` (the `-ringtail` suffix + was scaffolding for the dual-cluster window; once minikube is + empty of immich, the unsuffixed name is clean). +- Confirm the minikube `immich-pg` PVC is no longer used, then + delete it (the PV with `Retain` policy will persist — clean that + up too). + +## Verification (definition of done) + +- `photos.ops.eblu.me` works for a real session, including ML search. +- Source minikube has no `immich` pods, no `immich-pg`, no PVCs. +- Memory pressure on minikube has dropped (≥1.5 GiB reclaimed). Check + `docker stats minikube` on indri. +- Nightly borgmatic run after the cutover completes successfully, + with the immich-pg archive showing the new source. + +## Rollback (within the cutover window) + +If smoke test fails: flip Caddy back, scale ringtail immich to 0, +scale source immich back up. Source pg was never destroyed. File a +plan reset on the relevant prerequisite card and try again next +session. + +## Out of scope + +- Decommissioning all of minikube. This chain just removes immich. + Other tenants migrate in their own chains as part of the broader + indri-k8s decommission. See [[migrate-immich-to-ringtail]] for + context. diff --git a/docs/how-to/immich/immich-pg-data-migration.md b/docs/how-to/immich/immich-pg-data-migration.md new file mode 100644 index 0000000..1482fd9 --- /dev/null +++ b/docs/how-to/immich/immich-pg-data-migration.md @@ -0,0 +1,82 @@ +--- +title: Immich Postgres Data Migration +modified: 2026-05-13 +last-reviewed: 2026-05-13 +status: active +requires: + - immich-pg-on-ringtail +tags: + - how-to + - operations + - postgres + - immich + - critical +--- + +# Immich Postgres Data Migration + +**This is the data-loss surface of the migration.** Pick a method, +prove it on a throwaway copy first, then run the real cutover. + +## Decision: pick one + +### Option A — CNPG `externalCluster` bootstrap (preferred) + +Stand the ringtail cluster up as a streaming replica of the minikube +cluster via `bootstrap.pg_basebackup.source`. Replica catches up +online; when ready, promote it and point Immich at it. This is +CNPG's documented PG-to-PG migration path and gives near-zero data +loss (the WAL position at promote == the position at app stop). + +Requires: network path from ringtail to minikube's pg over the +tailnet (the existing `immich-pg-tailscale` Service works), and a +superuser secret minikube-side exposed to ringtail's basebackup. + +Pitfall to plan around: the ringtail Cluster CR will need its +`bootstrap` block rewritten *after* promotion (CNPG doesn't +gracefully drop the externalCluster reference). Account for this in +[[immich-pg-on-ringtail]] — it may force a reset of that card. + +### Option B — pg_dump / pg_restore + +Stop immich, `pg_dump -Fc` from minikube, scp to ringtail, restore. +Simpler but full downtime for the whole dump+restore window +(measure on a copy first — VectorChord indexes are slow to rebuild). +Smaller blast radius; no streaming-replication moving parts. + +Use this if Option A hits any blocker. Data loss should still be +zero if the source is stopped first. + +### Option C — leave pg on minikube + +Rejected. See goal card [[migrate-immich-to-ringtail#Why postgres on +ringtail (not cross-cluster)]]. + +## Dry run before real cutover + +Whichever option wins: + +1. Snapshot the minikube `immich-pg` PVC or take a fresh `pg_dump` + into a scratch location. +2. Restore into a *separate* ringtail CNPG cluster (different name, + e.g. `immich-pg-test`) and point a scratch immich-server pod at + it. +3. Verify: pod boots, can list assets, ML embeddings query without + error, face thumbnails render. VectorChord-backed queries should + not error. +4. Tear the scratch cluster down before doing the real one. + +## Verification on the real run + +- Row counts match for `assets`, `albums`, `users`, `face`, + `asset_face`, `smart_search` (the embedding table) — script this. +- `pg_dump --schema-only --no-owner` diff between source and dest + should be empty modulo CNPG-managed roles. +- Immich `/api/server-info/version` and `/api/server-info/statistics` + return sane numbers. + +## Rollback + +If the cutover fails verification: stop the ringtail immich, repoint +ArgoCD `immich.destination` back to minikube, re-sync. Source pg was +never deleted. Document what failed and reset the chain. diff --git a/docs/how-to/immich/immich-pg-on-ringtail.md b/docs/how-to/immich/immich-pg-on-ringtail.md new file mode 100644 index 0000000..933c5b3 --- /dev/null +++ b/docs/how-to/immich/immich-pg-on-ringtail.md @@ -0,0 +1,61 @@ +--- +title: Immich Postgres Cluster on Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +status: active +requires: + - cnpg-on-ringtail +tags: + - how-to + - operations + - postgres + - immich +--- + +# Immich Postgres Cluster on Ringtail + +Stand up a fresh `immich-pg` CNPG Cluster on ringtail, ready to receive +data. **No data import yet** — that's [[immich-pg-data-migration]]. + +## What to do + +- Create `argocd/manifests/databases-ringtail/` (or pick another + namespace name — verify what other ringtail pg clusters will use; + if none yet, `databases` is fine). +- Port these from the minikube side: + - `immich-pg.yaml` — CNPG Cluster CR. Same image + (`ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0`), same + extensions, same managed `borgmatic` role. Bump `storage.size` if + the minikube 10 GiB looks tight (check actual usage first). + `storageClass: local-path` on ringtail (default). + - `external-secret-immich-borgmatic.yaml` — same 1Password item, + same field, but referencing the ringtail `ClusterSecretStore` + (`onepassword-blumeops` already exists per the + `external-secrets-ringtail` app). + - Service for in-cluster access (the operator creates `immich-pg-rw` + etc. automatically; verify the app deployment uses those names). + - A Tailscale Service if we want backups to keep working via the + same hostname during the transition — see "Borgmatic" below. +- New ArgoCD app `argocd/apps/databases-ringtail.yaml` pointing at + the new path, destination ringtail. + +## Verification + +- Cluster reaches `Ready`. +- `psql` can connect via the app role and CREATE EXTENSION shows + `vchord`, `vector`, `cube`, `earthdistance` already installed. +- `borgmatic` role exists with `pg_read_all_data` membership. + +## Borgmatic implications + +`borgmatic.cfg` on indri targets `immich-pg-tailscale` over the +tailnet. During migration both clusters will exist briefly. Decide +upfront: backup the *source* pg until cutover, then flip borgmatic +to the ringtail Tailscale service. Document the flip in +[[immich-cutover-and-decommission]]. + +## Out of scope + +- Importing data. That is [[immich-pg-data-migration]], which may + drive a reset on this card if the migration approach (e.g. CNPG + `externalCluster` bootstrap) requires changes to this Cluster CR. diff --git a/docs/how-to/immich/migrate-immich-to-ringtail.md b/docs/how-to/immich/migrate-immich-to-ringtail.md new file mode 100644 index 0000000..2d9bcb2 --- /dev/null +++ b/docs/how-to/immich/migrate-immich-to-ringtail.md @@ -0,0 +1,95 @@ +--- +title: Migrate Immich to Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +status: active +branch: mikado/migrate-immich-to-ringtail +requires: + - cnpg-on-ringtail + - immich-pg-on-ringtail + - immich-pg-data-migration + - sifaka-nfs-from-ringtail + - immich-app-on-ringtail + - immich-cutover-and-decommission +tags: + - how-to + - operations + - immich + - migration +--- + +# Migrate Immich to Ringtail + +Move the entire Immich stack (server, ML, valkey, postgres) off +`minikube-indri` and onto `k3s-ringtail`. This is the first concrete +chain in the broader indri-k8s decommission: minikube is +memory-saturated (97% RAM, swapping), and Immich is the single +largest tenant (~1.5 GiB resident). + +## End state + +- Immich `server`, `machine-learning`, and `valkey` Deployments run on + ringtail k3s in the `immich` namespace. +- The `immich-machine-learning` pod uses ringtail's RTX 4080 via the + `nvidia-device-plugin` (performance win — currently CPU-only on + minikube). +- A CNPG `immich-pg` Cluster (PostgreSQL 17 + VectorChord) runs in a + `databases` namespace on ringtail, owned by the `cnpg-system` + operator on ringtail. +- The photo library still lives on [[sifaka]] at `/volume1/photos`, + mounted via NFS from ringtail pods (RWX). +- Routing: `photos.ops.eblu.me` (Caddy on indri) proxies to a + Tailscale ProxyGroup ingress on ringtail. No public surface today. +- The ArgoCD `immich` app's `destination.server` points at + `https://ringtail.tail8d86e.ts.net:6443`. The old minikube + manifests are removed. + +## Non-goals + +- Public exposure via Fly. Immich stays tailnet-only. +- Changing the immich version or runtime configuration. This is a + lift-and-shift; bumps come later. +- Backing up to a different target. [[borgmatic]] keeps running on + indri (it pulls via Tailscale and uses sifaka SMB for the library). + +## Critical constraint: no data loss + +Downtime is acceptable (Immich is a single-user system; we can take +it offline for the cutover). **Data loss is not.** Two surfaces matter: + +1. **Postgres** — face data, ML embeddings (vectors), album state, + sharing, etc. Re-derivable in theory; weeks of recompute in + practice. See [[immich-pg-data-migration]]. +2. **Library files** — `/volume1/photos`. Not moving, but the NFS + path must be verified accessible from ringtail before cutover. + See [[sifaka-nfs-from-ringtail]]. + +[[borgmatic]] backs both up to sifaka + BorgBase nightly; restore is +possible but slow. Treat it as a fallback, not a plan. + +## Why postgres on ringtail (not cross-cluster) + +`immich-pg` already has a Tailscale Service we could point ringtail +at, leaving the DB on minikube. We're not doing that because: + +- The whole goal is to retire minikube — keeping pg there blocks it. +- Immich is chatty against pg; tailnet round-trips would hurt. +- CNPG is the same operator on both sides — a Cluster CR on ringtail + is mechanically equivalent. + +## Approach + +This is a C2 Mikado chain. The prerequisite cards each represent a +distinct surface that has to work before cutover. See +[[agent-change-process#C2 — Mikado Chain]] for the discipline. + +## Related + +- [[shower-on-ringtail]] — a previous migration to ringtail (simpler: + no upstream cluster, SQLite, no GPU) +- [[connect-to-postgres]] — getting a psql session against CNPG +- [[ringtail]] — the target cluster +- [[cnpg-on-ringtail]], [[immich-pg-on-ringtail]], + [[immich-pg-data-migration]], [[sifaka-nfs-from-ringtail]], + [[immich-app-on-ringtail]], [[immich-cutover-and-decommission]] — + the prerequisite cards diff --git a/docs/how-to/immich/sifaka-nfs-from-ringtail.md b/docs/how-to/immich/sifaka-nfs-from-ringtail.md new file mode 100644 index 0000000..8732d4b --- /dev/null +++ b/docs/how-to/immich/sifaka-nfs-from-ringtail.md @@ -0,0 +1,68 @@ +--- +title: Sifaka NFS Photos from Ringtail +modified: 2026-05-13 +last-reviewed: 2026-05-13 +status: active +tags: + - how-to + - operations + - storage + - nfs + - sifaka +--- + +# Sifaka NFS Photos from Ringtail + +The Immich library lives at `sifaka:/volume1/photos` and is mounted +into the pod via an NFS PV (see `argocd/manifests/immich/pv-nfs.yaml`). +That PV is currently scoped to indri. We need ringtail to mount the +same path with the same RWX semantics, without breaking the existing +indri mount during the transition. + +## What to verify / do + +- Check `sifaka` DSM NFS rules for the `photos` share. Per + [[shower-on-ringtail#NFS + SMB share on sifaka]] convention, rules + use `192.168.1.0/24` + `100.64.0.0/10` with + `all_squash`/`Map all users to admin`. The existing rule may + already cover ringtail (it's on `192.168.1.21` per the recent + static-IP pin). If so this card is a verification card. +- If the rule is locked to indri's IP: add an entry for ringtail + (192.168.1.21) or widen to the subnet pattern above. +- Test mount from a ringtail debug pod (busybox or alpine with + nfs-utils) against the `photos` share. Read a file. Write a temp + file. Delete it. +- Watch for the known sifaka NFS-over-Tailscale gotcha: sifaka's + Tailscale must be in TUN mode (not userspace) for NFS to work + reliably over the tailnet. The NFS path here goes over the LAN + (not tailnet), so this shouldn't bite, but worth confirming the + NFS traffic is on `192.168.1.x` not `100.x`. + +## PV + PVC on ringtail + +- New `pv-nfs.yaml` mirroring the minikube one (name can be shared + if the PV is cluster-scoped — but PVs are per-cluster, so just + duplicate). Same `server: sifaka`, same path, same + `accessModes: [ReadWriteMany]`, `persistentVolumeReclaimPolicy: + Retain`. +- New `pvc.yaml` in the ringtail `immich` namespace bound to it. +- The minikube PVC stays bound and active until cutover — both + clusters can have the share NFS-mounted simultaneously (NFS RWX + permits this). Immich itself must not be running on both sides + at once. + +## Verification + +- A pod on ringtail can `ls /mnt/photos/` and see the same files + as the indri pod. +- File written from ringtail pod is visible from indri pod and + vice versa (proves there's no caching surprise). + +## Out of scope + +- Migrating photo files. Nothing moves; this is just adding a second + NFS client. +- The `pvc-ml-cache.yaml` PVC (a separate ML model cache). That's + not on NFS — it's a regular PVC. Recreated empty on ringtail in + [[immich-app-on-ringtail]]; the first ML pod boot will repopulate + it.