C2(migrate-immich-to-ringtail): plan introduce mikado chain

Goal: move immich (server, ML, valkey, postgres) off minikube-indri
onto k3s-ringtail. Immich is the largest single tenant on minikube
(~1.5 GiB resident) and minikube is memory-saturated.

Prerequisite cards:
- cnpg-on-ringtail
- immich-pg-on-ringtail (requires cnpg-on-ringtail)
- immich-pg-data-migration (requires immich-pg-on-ringtail)
- sifaka-nfs-from-ringtail
- immich-app-on-ringtail (requires immich-pg-on-ringtail, sifaka-nfs-from-ringtail)
- immich-cutover-and-decommission (requires immich-pg-data-migration, immich-app-on-ringtail)

Data loss is a critical failure; downtime is acceptable. The cutover
plan favors a CNPG externalCluster basebackup (Option A) with pg_dump
as the documented fallback (Option B).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-05-13 11:05:40 -07:00
commit 4623733695
7 changed files with 523 additions and 0 deletions

View file

@ -0,0 +1,53 @@
---
title: CNPG Operator on Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
status: active
tags:
- how-to
- operations
- postgres
- ringtail
---
# CNPG Operator on Ringtail
Bring up the `cloudnative-pg` operator on `k3s-ringtail`. Today the
operator only exists on `minikube-indri` (see
`argocd/apps/cloudnative-pg.yaml`, destination `kubernetes.default.svc`).
Prerequisite of [[migrate-immich-to-ringtail]]; consumed by
[[immich-pg-on-ringtail]].
## What to do
- Add a sibling `argocd/apps/cloudnative-pg-ringtail.yaml` pointing
at the same mirror (`mirrors/cloudnative-pg`, tag `v1.27.1`),
destination `https://ringtail.tail8d86e.ts.net:6443`,
namespace `cnpg-system`.
- Mirror the `ServerSideApply=true` and `CreateNamespace=true` sync
options (the CRDs exceed the annotation size limit).
- Sync `apps` then `cloudnative-pg-ringtail`. Verify the operator
pod is running on ringtail.
## Verification
```fish
kubectl --context=k3s-ringtail -n cnpg-system get pods
kubectl --context=k3s-ringtail get crd clusters.postgresql.cnpg.io
```
## Why a separate app
Each ArgoCD app targets a single cluster via `destination.server`.
We could parameterize with ApplicationSets, but blumeops' convention
is to duplicate the manifest with a `-ringtail` suffix (see
`alloy-ringtail`, `external-secrets-ringtail`, etc.). Keep the
convention.
## Out of scope
- Postgres clusters themselves (`immich-pg`, etc.) — those come from
[[immich-pg-on-ringtail]].
- Removing the minikube cnpg operator. That happens at the very end
of the indri-k8s decommission, not in this chain.

View file

@ -0,0 +1,71 @@
---
title: Immich App on Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
status: active
requires:
- immich-pg-on-ringtail
- sifaka-nfs-from-ringtail
tags:
- how-to
- operations
- immich
---
# Immich App on Ringtail
Bring up `immich-server`, `immich-machine-learning`, and
`immich-valkey` on ringtail. This card stands the stack up against
the *new* pg cluster — it does not move user traffic. Cutover lives
in [[immich-cutover-and-decommission]].
## What to do
- New manifest dir `argocd/manifests/immich-ringtail/` (the suffix
matches the `-ringtail` convention used by other apps). Port from
`argocd/manifests/immich/`:
- `deployment-server.yaml` — point `DB_HOSTNAME` at the ringtail
pg service.
- `deployment-ml.yaml` — add a node selector / toleration so it
schedules where the GPU is, and a `resources.limits` for
`nvidia.com/gpu: 1`. Verify the immich-ml image actually wants
CUDA (it has CPU and CUDA variants — check the upstream chart).
See `argocd/manifests/frigate/` for the existing GPU pod pattern.
- `deployment-valkey.yaml` — straight port.
- `service*.yaml` — straight port.
- `pvc-ml-cache.yaml` — straight port (empty `local-path` PVC).
- `pv-nfs.yaml` + `pvc.yaml` — already covered by
[[sifaka-nfs-from-ringtail]] (may live in this dir or theirs).
- `ingress-tailscale.yaml` — ProxyGroup ingress, **must not** set
an explicit `host:` (or use `host: *`) per the lesson on
ProxyGroup VIP routing.
- `kustomization.yaml` — same `images:` block (server, ML, valkey).
- New ArgoCD app `argocd/apps/immich-ringtail.yaml` targeting
ringtail, namespace `immich`. **Manual sync only** until the
cutover.
- Existing `argocd/apps/immich.yaml` (minikube) stays untouched
during this card — both apps exist briefly.
## Bring it up against a copy of the DB
Use the throwaway/test path from [[immich-pg-data-migration#Dry run
before real cutover]]: point the ringtail immich at the *test* pg
cluster first, verify the pod boots, the web UI loads (via
`kubectl port-forward`), assets list, ML embeddings query. Then
tear it down.
## Verification
- All three pods Ready.
- ML pod has a GPU attached: `nvidia-smi` inside the container shows
the 4080.
- `immich-server` connects to pg and valkey (no `ECONNREFUSED` in
logs).
- A `kubectl port-forward` to the server service shows the Immich
web UI.
## Out of scope
- Public/tailnet routing flip. Caddy still points at the minikube
Tailscale ingress until [[immich-cutover-and-decommission]].
- Removing the minikube immich. Same.

View file

@ -0,0 +1,93 @@
---
title: Immich Cutover and Decommission
modified: 2026-05-13
last-reviewed: 2026-05-13
status: active
requires:
- immich-pg-data-migration
- immich-app-on-ringtail
tags:
- how-to
- operations
- immich
- migration
---
# Immich Cutover and Decommission
The user-visible flip. By the time this card opens, the ringtail
stack has been proven against a copy of the data. This card does the
real cutover.
## Pre-cutover checklist
- [[immich-pg-data-migration]] dry-run succeeded; method is chosen.
- Ringtail immich stack has been brought up against the test pg,
pods healthy, UI loaded ([[immich-app-on-ringtail#Verification]]).
- Borgmatic just ran successfully (a fresh nightly archive is a
belt-and-suspenders fallback, on top of the live source pg).
- User has been told to stop uploading from the iOS app for the
cutover window.
## Cutover sequence
1. **Quiesce source.** `kubectl --context=minikube-indri -n immich
scale deploy/immich-server --replicas=0` and same for ML. Leave
valkey + pg running. Confirm no client traffic on the source pg
via `pg_stat_activity`.
2. **Final sync.** Per chosen method in
[[immich-pg-data-migration]]:
- Option A: promote the ringtail replica.
- Option B: take final `pg_dump`, restore to ringtail
`immich-pg`.
3. **Verify.** Run the row-count and schema-diff checks from
[[immich-pg-data-migration#Verification on the real run]].
4. **Bring up ringtail immich** against the now-promoted pg
(`argocd app sync immich-ringtail`). Wait for Ready.
5. **Flip routing.** Update Caddy on indri
(`ansible/roles/caddy/defaults/main.yml`): `photos.ops.eblu.me`
upstream changes to the ringtail Tailscale ingress hostname.
`mise run provision-indri -- --tags caddy`.
6. **Smoke test.** Open `photos.ops.eblu.me` in a browser. Sign in.
Scroll the timeline. Open an album. Trigger an ML search.
7. **Update borgmatic.** If the Tailscale hostname for pg changed,
update `borgmatic.cfg` on indri to point at the ringtail
`immich-pg-tailscale` service. Run a manual backup to verify.
## After cutover
- `argocd app set immich --revision <branch>` is no longer relevant;
the minikube `immich` app gets deleted entirely.
- Delete `argocd/apps/immich.yaml`, `argocd/manifests/immich/`, and
the minikube `argocd/manifests/databases/immich-pg.yaml` +
`external-secret-immich-borgmatic.yaml` +
`service-immich-pg-tailscale.yaml`.
- Rename `immich-ringtail` back to `immich` (the `-ringtail` suffix
was scaffolding for the dual-cluster window; once minikube is
empty of immich, the unsuffixed name is clean).
- Confirm the minikube `immich-pg` PVC is no longer used, then
delete it (the PV with `Retain` policy will persist — clean that
up too).
## Verification (definition of done)
- `photos.ops.eblu.me` works for a real session, including ML search.
- Source minikube has no `immich` pods, no `immich-pg`, no PVCs.
- Memory pressure on minikube has dropped (≥1.5 GiB reclaimed). Check
`docker stats minikube` on indri.
- Nightly borgmatic run after the cutover completes successfully,
with the immich-pg archive showing the new source.
## Rollback (within the cutover window)
If smoke test fails: flip Caddy back, scale ringtail immich to 0,
scale source immich back up. Source pg was never destroyed. File a
plan reset on the relevant prerequisite card and try again next
session.
## Out of scope
- Decommissioning all of minikube. This chain just removes immich.
Other tenants migrate in their own chains as part of the broader
indri-k8s decommission. See [[migrate-immich-to-ringtail]] for
context.

View file

@ -0,0 +1,82 @@
---
title: Immich Postgres Data Migration
modified: 2026-05-13
last-reviewed: 2026-05-13
status: active
requires:
- immich-pg-on-ringtail
tags:
- how-to
- operations
- postgres
- immich
- critical
---
# Immich Postgres Data Migration
**This is the data-loss surface of the migration.** Pick a method,
prove it on a throwaway copy first, then run the real cutover.
## Decision: pick one
### Option A — CNPG `externalCluster` bootstrap (preferred)
Stand the ringtail cluster up as a streaming replica of the minikube
cluster via `bootstrap.pg_basebackup.source`. Replica catches up
online; when ready, promote it and point Immich at it. This is
CNPG's documented PG-to-PG migration path and gives near-zero data
loss (the WAL position at promote == the position at app stop).
Requires: network path from ringtail to minikube's pg over the
tailnet (the existing `immich-pg-tailscale` Service works), and a
superuser secret minikube-side exposed to ringtail's basebackup.
Pitfall to plan around: the ringtail Cluster CR will need its
`bootstrap` block rewritten *after* promotion (CNPG doesn't
gracefully drop the externalCluster reference). Account for this in
[[immich-pg-on-ringtail]] — it may force a reset of that card.
### Option B — pg_dump / pg_restore
Stop immich, `pg_dump -Fc` from minikube, scp to ringtail, restore.
Simpler but full downtime for the whole dump+restore window
(measure on a copy first — VectorChord indexes are slow to rebuild).
Smaller blast radius; no streaming-replication moving parts.
Use this if Option A hits any blocker. Data loss should still be
zero if the source is stopped first.
### Option C — leave pg on minikube
Rejected. See goal card [[migrate-immich-to-ringtail#Why postgres on
ringtail (not cross-cluster)]].
## Dry run before real cutover
Whichever option wins:
1. Snapshot the minikube `immich-pg` PVC or take a fresh `pg_dump`
into a scratch location.
2. Restore into a *separate* ringtail CNPG cluster (different name,
e.g. `immich-pg-test`) and point a scratch immich-server pod at
it.
3. Verify: pod boots, can list assets, ML embeddings query without
error, face thumbnails render. VectorChord-backed queries should
not error.
4. Tear the scratch cluster down before doing the real one.
## Verification on the real run
- Row counts match for `assets`, `albums`, `users`, `face`,
`asset_face`, `smart_search` (the embedding table) — script this.
- `pg_dump --schema-only --no-owner` diff between source and dest
should be empty modulo CNPG-managed roles.
- Immich `/api/server-info/version` and `/api/server-info/statistics`
return sane numbers.
## Rollback
If the cutover fails verification: stop the ringtail immich, repoint
ArgoCD `immich.destination` back to minikube, re-sync. Source pg was
never deleted. Document what failed and reset the chain.

View file

@ -0,0 +1,61 @@
---
title: Immich Postgres Cluster on Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
status: active
requires:
- cnpg-on-ringtail
tags:
- how-to
- operations
- postgres
- immich
---
# Immich Postgres Cluster on Ringtail
Stand up a fresh `immich-pg` CNPG Cluster on ringtail, ready to receive
data. **No data import yet** — that's [[immich-pg-data-migration]].
## What to do
- Create `argocd/manifests/databases-ringtail/` (or pick another
namespace name — verify what other ringtail pg clusters will use;
if none yet, `databases` is fine).
- Port these from the minikube side:
- `immich-pg.yaml` — CNPG Cluster CR. Same image
(`ghcr.io/tensorchord/cloudnative-vectorchord:17-0.5.0`), same
extensions, same managed `borgmatic` role. Bump `storage.size` if
the minikube 10 GiB looks tight (check actual usage first).
`storageClass: local-path` on ringtail (default).
- `external-secret-immich-borgmatic.yaml` — same 1Password item,
same field, but referencing the ringtail `ClusterSecretStore`
(`onepassword-blumeops` already exists per the
`external-secrets-ringtail` app).
- Service for in-cluster access (the operator creates `immich-pg-rw`
etc. automatically; verify the app deployment uses those names).
- A Tailscale Service if we want backups to keep working via the
same hostname during the transition — see "Borgmatic" below.
- New ArgoCD app `argocd/apps/databases-ringtail.yaml` pointing at
the new path, destination ringtail.
## Verification
- Cluster reaches `Ready`.
- `psql` can connect via the app role and CREATE EXTENSION shows
`vchord`, `vector`, `cube`, `earthdistance` already installed.
- `borgmatic` role exists with `pg_read_all_data` membership.
## Borgmatic implications
`borgmatic.cfg` on indri targets `immich-pg-tailscale` over the
tailnet. During migration both clusters will exist briefly. Decide
upfront: backup the *source* pg until cutover, then flip borgmatic
to the ringtail Tailscale service. Document the flip in
[[immich-cutover-and-decommission]].
## Out of scope
- Importing data. That is [[immich-pg-data-migration]], which may
drive a reset on this card if the migration approach (e.g. CNPG
`externalCluster` bootstrap) requires changes to this Cluster CR.

View file

@ -0,0 +1,95 @@
---
title: Migrate Immich to Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
status: active
branch: mikado/migrate-immich-to-ringtail
requires:
- cnpg-on-ringtail
- immich-pg-on-ringtail
- immich-pg-data-migration
- sifaka-nfs-from-ringtail
- immich-app-on-ringtail
- immich-cutover-and-decommission
tags:
- how-to
- operations
- immich
- migration
---
# Migrate Immich to Ringtail
Move the entire Immich stack (server, ML, valkey, postgres) off
`minikube-indri` and onto `k3s-ringtail`. This is the first concrete
chain in the broader indri-k8s decommission: minikube is
memory-saturated (97% RAM, swapping), and Immich is the single
largest tenant (~1.5 GiB resident).
## End state
- Immich `server`, `machine-learning`, and `valkey` Deployments run on
ringtail k3s in the `immich` namespace.
- The `immich-machine-learning` pod uses ringtail's RTX 4080 via the
`nvidia-device-plugin` (performance win — currently CPU-only on
minikube).
- A CNPG `immich-pg` Cluster (PostgreSQL 17 + VectorChord) runs in a
`databases` namespace on ringtail, owned by the `cnpg-system`
operator on ringtail.
- The photo library still lives on [[sifaka]] at `/volume1/photos`,
mounted via NFS from ringtail pods (RWX).
- Routing: `photos.ops.eblu.me` (Caddy on indri) proxies to a
Tailscale ProxyGroup ingress on ringtail. No public surface today.
- The ArgoCD `immich` app's `destination.server` points at
`https://ringtail.tail8d86e.ts.net:6443`. The old minikube
manifests are removed.
## Non-goals
- Public exposure via Fly. Immich stays tailnet-only.
- Changing the immich version or runtime configuration. This is a
lift-and-shift; bumps come later.
- Backing up to a different target. [[borgmatic]] keeps running on
indri (it pulls via Tailscale and uses sifaka SMB for the library).
## Critical constraint: no data loss
Downtime is acceptable (Immich is a single-user system; we can take
it offline for the cutover). **Data loss is not.** Two surfaces matter:
1. **Postgres** — face data, ML embeddings (vectors), album state,
sharing, etc. Re-derivable in theory; weeks of recompute in
practice. See [[immich-pg-data-migration]].
2. **Library files**`/volume1/photos`. Not moving, but the NFS
path must be verified accessible from ringtail before cutover.
See [[sifaka-nfs-from-ringtail]].
[[borgmatic]] backs both up to sifaka + BorgBase nightly; restore is
possible but slow. Treat it as a fallback, not a plan.
## Why postgres on ringtail (not cross-cluster)
`immich-pg` already has a Tailscale Service we could point ringtail
at, leaving the DB on minikube. We're not doing that because:
- The whole goal is to retire minikube — keeping pg there blocks it.
- Immich is chatty against pg; tailnet round-trips would hurt.
- CNPG is the same operator on both sides — a Cluster CR on ringtail
is mechanically equivalent.
## Approach
This is a C2 Mikado chain. The prerequisite cards each represent a
distinct surface that has to work before cutover. See
[[agent-change-process#C2 — Mikado Chain]] for the discipline.
## Related
- [[shower-on-ringtail]] — a previous migration to ringtail (simpler:
no upstream cluster, SQLite, no GPU)
- [[connect-to-postgres]] — getting a psql session against CNPG
- [[ringtail]] — the target cluster
- [[cnpg-on-ringtail]], [[immich-pg-on-ringtail]],
[[immich-pg-data-migration]], [[sifaka-nfs-from-ringtail]],
[[immich-app-on-ringtail]], [[immich-cutover-and-decommission]] —
the prerequisite cards

View file

@ -0,0 +1,68 @@
---
title: Sifaka NFS Photos from Ringtail
modified: 2026-05-13
last-reviewed: 2026-05-13
status: active
tags:
- how-to
- operations
- storage
- nfs
- sifaka
---
# Sifaka NFS Photos from Ringtail
The Immich library lives at `sifaka:/volume1/photos` and is mounted
into the pod via an NFS PV (see `argocd/manifests/immich/pv-nfs.yaml`).
That PV is currently scoped to indri. We need ringtail to mount the
same path with the same RWX semantics, without breaking the existing
indri mount during the transition.
## What to verify / do
- Check `sifaka` DSM NFS rules for the `photos` share. Per
[[shower-on-ringtail#NFS + SMB share on sifaka]] convention, rules
use `192.168.1.0/24` + `100.64.0.0/10` with
`all_squash`/`Map all users to admin`. The existing rule may
already cover ringtail (it's on `192.168.1.21` per the recent
static-IP pin). If so this card is a verification card.
- If the rule is locked to indri's IP: add an entry for ringtail
(192.168.1.21) or widen to the subnet pattern above.
- Test mount from a ringtail debug pod (busybox or alpine with
nfs-utils) against the `photos` share. Read a file. Write a temp
file. Delete it.
- Watch for the known sifaka NFS-over-Tailscale gotcha: sifaka's
Tailscale must be in TUN mode (not userspace) for NFS to work
reliably over the tailnet. The NFS path here goes over the LAN
(not tailnet), so this shouldn't bite, but worth confirming the
NFS traffic is on `192.168.1.x` not `100.x`.
## PV + PVC on ringtail
- New `pv-nfs.yaml` mirroring the minikube one (name can be shared
if the PV is cluster-scoped — but PVs are per-cluster, so just
duplicate). Same `server: sifaka`, same path, same
`accessModes: [ReadWriteMany]`, `persistentVolumeReclaimPolicy:
Retain`.
- New `pvc.yaml` in the ringtail `immich` namespace bound to it.
- The minikube PVC stays bound and active until cutover — both
clusters can have the share NFS-mounted simultaneously (NFS RWX
permits this). Immich itself must not be running on both sides
at once.
## Verification
- A pod on ringtail can `ls /mnt/photos/` and see the same files
as the indri pod.
- File written from ringtail pod is visible from indri pod and
vice versa (proves there's no caching surprise).
## Out of scope
- Migrating photo files. Nothing moves; this is just adding a second
NFS client.
- The `pvc-ml-cache.yaml` PVC (a separate ML model cache). That's
not on NFS — it's a regular PVC. Recreated empty on ringtail in
[[immich-app-on-ringtail]]; the first ML pod boot will repopulate
it.