## Summary
C2 Mikado chain to move the entire Immich stack (server, ML, valkey,
postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the
largest single tenant on minikube (~1.5 GiB resident) and minikube is
currently memory-saturated (97% RAM, swapping). This is the first
concrete chain in the broader indri-k8s decommission effort.
This PR contains the planning layer only — 7 cards (1 goal + 6
prerequisites). Implementation cycles follow per the Mikado Branch
Invariant.
## Goal end-state
- Immich `server`, `machine-learning`, `valkey` on ringtail.
- ML pod uses ringtail's RTX 4080 (performance win — currently
CPU-only).
- CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail.
- Library still on sifaka NFS — ringtail mounts the same path.
- `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress.
- Minikube `immich` and `immich-pg` are removed.
## Cards
| Card | Depends on |
|---|---|
| `migrate-immich-to-ringtail` (goal) | all six below |
| `cnpg-on-ringtail` | — |
| `immich-pg-on-ringtail` | cnpg-on-ringtail |
| `immich-pg-data-migration` | immich-pg-on-ringtail |
| `sifaka-nfs-from-ringtail` | — |
| `immich-app-on-ringtail` | immich-pg-on-ringtail, sifaka-nfs-from-ringtail |
| `immich-cutover-and-decommission` | immich-pg-data-migration, immich-app-on-ringtail |
## Key constraints
- **No data loss.** Downtime is acceptable; data loss is not. Two
surfaces matter: postgres (ML embeddings, face data — slow to
re-derive) and the library files (don't move, but NFS access from
ringtail must be verified).
- **Migration method:** Option A is a CNPG `externalCluster`
basebackup → promote. Option B is `pg_dump`/`pg_restore` as a
documented fallback. Either way, dry-run against a scratch
cluster first.
- **Why pg moves too** (not cross-cluster): keeping pg on minikube
would block the whole decommission, and Immich is chatty with pg
so tailnet round-trips would hurt.
## Test plan
- [ ] Plan review — does the dependency graph make sense?
- [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the
chain correctly.
- [ ] Per-card implementation cycles land separately (commit
convention enforced by hook).
Reviewed-on: #356
3.8 KiB
| title | modified | last-reviewed | tags | |||
|---|---|---|---|---|---|---|
| Immich App on Ringtail | 2026-05-13 | 2026-05-13 |
|
Immich App on Ringtail
Bring up immich-server, immich-machine-learning, and
immich-valkey on ringtail. This card stands the stack up against
the new pg cluster — it does not move user traffic. Cutover lives
in immich-cutover-and-decommission.
What to do
- New manifest dir
argocd/manifests/immich-ringtail/(the suffix matches the-ringtailconvention used by other apps). Port fromargocd/manifests/immich/:-
deployment-server.yaml— pointDB_HOSTNAMEat the ringtail pg service. -
deployment-ml.yaml— useruntimeClassName: nvidia+ aresources.limitsfornvidia.com/gpu: 1. Use the-cudatag of the immich-ml image (set in kustomization). Ringtail is single-node, so no node selector needed. Seeargocd/manifests/frigate/for the existing GPU pod pattern.GPU contention discovery: ringtail's
nvidia-device-pluginis configured withtimeSlicing.replicas: 2. Frigate + Ollama already consume both virtual slices. Adding immich-ml requires bumping the count to >= 3. Editargocd/manifests/nvidia-device-plugin/configmap.yaml(or wherever the device-plugin config lives) and re-sync thenvidia-device-pluginArgoCD app. The plugin pod restarts and the new advertised count appears as the node'snvidia.com/gpuallocatable. -
deployment-valkey.yaml— straight port, BUT use the upstream multi-archdocker.io/valkey/valkey:<version>image — do NOT use theregistry.ops.eblu.me/blumeops/valkeyrewrite in the kustomization. That mirror was built on indri (arm64) and is single-arch; pulling it on ringtail (amd64) getsexec format errorin CrashLoopBackOff. The mirror should eventually carry a multi-arch tag, at which point the rewrite can return. -
service*.yaml— straight port. -
pvc-ml-cache.yaml— straight port (emptylocal-pathPVC). -
pv-nfs.yaml+pvc.yaml— already covered by sifaka-nfs-from-ringtail (may live in this dir or theirs). -
ingress-tailscale.yaml— ProxyGroup ingress, must not set an explicithost:(or usehost: *) per the lesson on ProxyGroup VIP routing. Hostname collision warning: the minikube ingress claims the Tailscale device namephotos(tls.hosts: [photos]). Two devices on the tailnet cannot share that name. While the ringtail deployment is being staged it must use a differenttls.hostsvalue (e.g.photos-ringtail) so it can coexist with the running minikube one. The flip tophotoshappens at cutover time, after the minikube ingress has been removed. See immich-cutover-and-decommission#Cutover sequence. -
kustomization.yaml— sameimages:block (server, ML, valkey).
-
- New ArgoCD app
argocd/apps/immich-ringtail.yamltargeting ringtail, namespaceimmich. Manual sync only until the cutover. - Existing
argocd/apps/immich.yaml(minikube) stays untouched during this card — both apps exist briefly.
Bring it up against a copy of the DB
Use the throwaway/test path from [[immich-pg-data-migration#Dry run
before real cutover]]: point the ringtail immich at the test pg
cluster first, verify the pod boots, the web UI loads (via
kubectl port-forward), assets list, ML embeddings query. Then
tear it down.
Verification
- All three pods Ready.
- ML pod has a GPU attached:
nvidia-smiinside the container shows the 4080. immich-serverconnects to pg and valkey (noECONNREFUSEDin logs).- A
kubectl port-forwardto the server service shows the Immich web UI.
Out of scope
- Public/tailnet routing flip. Caddy still points at the minikube Tailscale ingress until immich-cutover-and-decommission.
- Removing the minikube immich. Same.