## Summary
C2 Mikado chain to move the entire Immich stack (server, ML, valkey,
postgres) off `minikube-indri` and onto `k3s-ringtail`. Immich is the
largest single tenant on minikube (~1.5 GiB resident) and minikube is
currently memory-saturated (97% RAM, swapping). This is the first
concrete chain in the broader indri-k8s decommission effort.
This PR contains the planning layer only — 7 cards (1 goal + 6
prerequisites). Implementation cycles follow per the Mikado Branch
Invariant.
## Goal end-state
- Immich `server`, `machine-learning`, `valkey` on ringtail.
- ML pod uses ringtail's RTX 4080 (performance win — currently
CPU-only).
- CNPG `immich-pg` (PG17 + VectorChord) runs on ringtail.
- Library still on sifaka NFS — ringtail mounts the same path.
- `photos.ops.eblu.me` reroutes through Caddy → ringtail ingress.
- Minikube `immich` and `immich-pg` are removed.
## Cards
| Card | Depends on |
|---|---|
| `migrate-immich-to-ringtail` (goal) | all six below |
| `cnpg-on-ringtail` | — |
| `immich-pg-on-ringtail` | cnpg-on-ringtail |
| `immich-pg-data-migration` | immich-pg-on-ringtail |
| `sifaka-nfs-from-ringtail` | — |
| `immich-app-on-ringtail` | immich-pg-on-ringtail, sifaka-nfs-from-ringtail |
| `immich-cutover-and-decommission` | immich-pg-data-migration, immich-app-on-ringtail |
## Key constraints
- **No data loss.** Downtime is acceptable; data loss is not. Two
surfaces matter: postgres (ML embeddings, face data — slow to
re-derive) and the library files (don't move, but NFS access from
ringtail must be verified).
- **Migration method:** Option A is a CNPG `externalCluster`
basebackup → promote. Option B is `pg_dump`/`pg_restore` as a
documented fallback. Either way, dry-run against a scratch
cluster first.
- **Why pg moves too** (not cross-cluster): keeping pg on minikube
would block the whole decommission, and Immich is chatty with pg
so tailnet round-trips would hurt.
## Test plan
- [ ] Plan review — does the dependency graph make sense?
- [ ] `mise run docs-mikado migrate-immich-to-ringtail` shows the
chain correctly.
- [ ] Per-card implementation cycles land separately (commit
convention enforced by hook).
Reviewed-on: #356
69 lines
1.9 KiB
YAML
69 lines
1.9 KiB
YAML
---
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: immich-machine-learning
|
|
namespace: immich
|
|
spec:
|
|
replicas: 1
|
|
selector:
|
|
matchLabels:
|
|
app: immich
|
|
component: machine-learning
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: immich
|
|
component: machine-learning
|
|
spec:
|
|
runtimeClassName: nvidia
|
|
securityContext:
|
|
seccompProfile:
|
|
type: RuntimeDefault
|
|
containers:
|
|
- name: machine-learning
|
|
# ringtail uses the -cuda tag (set in kustomization.yaml)
|
|
# to take advantage of the RTX 4080 via the nvidia
|
|
# device plugin. Time-slicing is configured for 4 replicas
|
|
# so frigate + ollama + this pod can share.
|
|
image: ghcr.io/immich-app/immich-machine-learning:kustomized
|
|
ports:
|
|
- name: http
|
|
containerPort: 3003
|
|
env:
|
|
- name: TZ
|
|
value: "America/Los_Angeles"
|
|
- name: TRANSFORMERS_CACHE
|
|
value: /cache
|
|
- name: HF_XET_CACHE
|
|
value: /cache/huggingface-xet
|
|
- name: MPLCONFIGDIR
|
|
value: /cache/matplotlib-config
|
|
volumeMounts:
|
|
- name: cache
|
|
mountPath: /cache
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /ping
|
|
port: 3003
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 30
|
|
timeoutSeconds: 5
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /ping
|
|
port: 3003
|
|
initialDelaySeconds: 15
|
|
periodSeconds: 10
|
|
timeoutSeconds: 5
|
|
resources:
|
|
requests:
|
|
memory: "512Mi"
|
|
cpu: "100m"
|
|
limits:
|
|
memory: "4Gi"
|
|
nvidia.com/gpu: "1"
|
|
volumes:
|
|
- name: cache
|
|
persistentVolumeClaim:
|
|
claimName: immich-ml-cache
|