blumeops/docs/how-to/immich/immich-cutover-and-decommission.md
Erich Blume 947e4310c3 C2: migrate immich from minikube to ringtail (mikado chain) (#356)
## Summary

C2 Mikado chain to move the entire Immich stack (server, ML, valkey,
postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the
largest single tenant on minikube (~1.5 GiB resident) and minikube is
currently memory-saturated (97% RAM, swapping). This is the first
concrete chain in the broader indri-k8s decommission effort.

This PR contains the planning layer only — 7 cards (1 goal + 6
prerequisites). Implementation cycles follow per the Mikado Branch
Invariant.

## Goal end-state

- Immich `server`, `machine-learning`, `valkey` on ringtail.
- ML pod uses ringtail's RTX 4080 (performance win — currently
  CPU-only).
- CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail.
- Library still on sifaka NFS — ringtail mounts the same path.
- `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress.
- Minikube `immich` and `immich-pg` are removed.

## Cards

| Card | Depends on |
|---|---|
| `migrate-immich-to-ringtail` (goal) | all six below |
| `cnpg-on-ringtail` | — |
| `immich-pg-on-ringtail` | cnpg-on-ringtail |
| `immich-pg-data-migration` | immich-pg-on-ringtail |
| `sifaka-nfs-from-ringtail` | — |
| `immich-app-on-ringtail` | immich-pg-on-ringtail, sifaka-nfs-from-ringtail |
| `immich-cutover-and-decommission` | immich-pg-data-migration, immich-app-on-ringtail |

## Key constraints

- **No data loss.** Downtime is acceptable; data loss is not. Two
  surfaces matter: postgres (ML embeddings, face data — slow to
  re-derive) and the library files (don't move, but NFS access from
  ringtail must be verified).
- **Migration method:** Option A is a CNPG `externalCluster`
  basebackup → promote. Option B is `pg_dump`/`pg_restore` as a
  documented fallback. Either way, dry-run against a scratch
  cluster first.
- **Why pg moves too** (not cross-cluster): keeping pg on minikube
  would block the whole decommission, and Immich is chatty with pg
  so tailnet round-trips would hurt.

## Test plan

- [ ] Plan review — does the dependency graph make sense?
- [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the
      chain correctly.
- [ ] Per-card implementation cycles land separately (commit
      convention enforced by hook).

Reviewed-on: #356
2026-05-13 16:46:17 -07:00

4.3 KiB

title modified last-reviewed tags
Immich Cutover and Decommission 2026-05-13 2026-05-13
how-to
operations
immich
migration

Immich Cutover and Decommission

The user-visible flip. By the time this card opens, the ringtail stack has been proven against a copy of the data. This card does the real cutover.

Pre-cutover checklist

  • immich-pg-data-migration dry-run succeeded; method is chosen.
  • Ringtail immich stack has been brought up against the test pg, pods healthy, UI loaded (immich-app-on-ringtail#Verification).
  • Borgmatic just ran successfully (a fresh nightly archive is a belt-and-suspenders fallback, on top of the live source pg).
  • User has been told to stop uploading from the iOS app for the cutover window.

Cutover sequence

  1. Quiesce source. kubectl --context=minikube-indri -n immich scale deploy/immich-server --replicas=0 and same for ML. Leave valkey + pg running. Confirm no client traffic on the source pg via pg_stat_activity.
  2. Tear down the minikube Tailscale ingress. The photos Tailscale device name must be freed before ringtail's ingress can claim it (Tailscale enforces uniqueness across the tailnet). kubectl --context=minikube-indri -n immich delete ingress immich-tailscale and wait for the corresponding tailscale-LB StatefulSet pod to terminate. Verify the photos device is gone: tailscale status | grep -i photos from any tailnet host.
  3. Final sync. Per chosen method in immich-pg-data-migration:
    • Option A: promote the ringtail replica.
    • Option B: take final pg_dump, restore to ringtail immich-pg.
  4. Verify. Run the row-count and schema-diff checks from immich-pg-data-migration#Verification on the real run.
  5. Flip the ringtail ingress to photos. Update argocd/manifests/immich-ringtail/ingress-tailscale.yaml: tls.hosts: [photos] (was [photos-ringtail] during staging per immich-app-on-ringtail). Commit, argocd app sync immich-ringtail. Wait for the photos device to register on the tailnet again.
  6. Bring up ringtail immich against the now-promoted pg (argocd app sync immich-ringtail). Wait for Ready.
  7. Flip routing. Update Caddy on indri (ansible/roles/caddy/defaults/main.yml): photos.ops.eblu.me upstream changes to the ringtail Tailscale ingress hostname (photos — same MagicDNS name, now pointing to the ringtail proxy). mise run provision-indri -- --tags caddy.
  8. Smoke test. Open photos.ops.eblu.me in a browser. Sign in. Scroll the timeline. Open an album. Trigger an ML search.
  9. Update borgmatic. If the Tailscale hostname for pg changed, update borgmatic.cfg on indri to point at the ringtail immich-pg-tailscale service. Run a manual backup to verify.

After cutover

  • argocd app set immich --revision <branch> is no longer relevant; the minikube immich app gets deleted entirely.
  • Delete argocd/apps/immich.yaml, argocd/manifests/immich/, and the minikube argocd/manifests/databases/immich-pg.yaml + external-secret-immich-borgmatic.yaml + service-immich-pg-tailscale.yaml.
  • Rename immich-ringtail back to immich (the -ringtail suffix was scaffolding for the dual-cluster window; once minikube is empty of immich, the unsuffixed name is clean).
  • Confirm the minikube immich-pg PVC is no longer used, then delete it (the PV with Retain policy will persist — clean that up too).

Verification (definition of done)

  • photos.ops.eblu.me works for a real session, including ML search.
  • Source minikube has no immich pods, no immich-pg, no PVCs.
  • Memory pressure on minikube has dropped (≥1.5 GiB reclaimed). Check docker stats minikube on indri.
  • Nightly borgmatic run after the cutover completes successfully, with the immich-pg archive showing the new source.

Rollback (within the cutover window)

If smoke test fails: flip Caddy back, scale ringtail immich to 0, scale source immich back up. Source pg was never destroyed. File a plan reset on the relevant prerequisite card and try again next session.

Out of scope

  • Decommissioning all of minikube. This chain just removes immich. Other tenants migrate in their own chains as part of the broader indri-k8s decommission. See migrate-immich-to-ringtail for context.