diff --git a/docs/how-to/immich/immich-app-on-ringtail.md b/docs/how-to/immich/immich-app-on-ringtail.md index 41266ca..2d23c1d 100644 --- a/docs/how-to/immich/immich-app-on-ringtail.md +++ b/docs/how-to/immich/immich-app-on-ringtail.md @@ -26,12 +26,28 @@ in [[immich-cutover-and-decommission]]. `argocd/manifests/immich/`: - `deployment-server.yaml` — point `DB_HOSTNAME` at the ringtail pg service. - - `deployment-ml.yaml` — add a node selector / toleration so it - schedules where the GPU is, and a `resources.limits` for - `nvidia.com/gpu: 1`. Verify the immich-ml image actually wants - CUDA (it has CPU and CUDA variants — check the upstream chart). - See `argocd/manifests/frigate/` for the existing GPU pod pattern. - - `deployment-valkey.yaml` — straight port. + - `deployment-ml.yaml` — use `runtimeClassName: nvidia` + a + `resources.limits` for `nvidia.com/gpu: 1`. Use the `-cuda` tag + of the immich-ml image (set in kustomization). Ringtail is + single-node, so no node selector needed. See + `argocd/manifests/frigate/` for the existing GPU pod pattern. + + **GPU contention discovery:** ringtail's `nvidia-device-plugin` + is configured with `timeSlicing.replicas: 2`. Frigate + Ollama + already consume both virtual slices. Adding immich-ml requires + bumping the count to >= 3. Edit + `argocd/manifests/nvidia-device-plugin/configmap.yaml` (or + wherever the device-plugin config lives) and re-sync the + `nvidia-device-plugin` ArgoCD app. The plugin pod restarts and + the new advertised count appears as the node's + `nvidia.com/gpu` allocatable. + - `deployment-valkey.yaml` — straight port, BUT use the upstream + multi-arch `docker.io/valkey/valkey:` image — do NOT + use the `registry.ops.eblu.me/blumeops/valkey` rewrite in the + kustomization. That mirror was built on indri (arm64) and is + single-arch; pulling it on ringtail (amd64) gets `exec format + error` in CrashLoopBackOff. The mirror should eventually carry + a multi-arch tag, at which point the rewrite can return. - `service*.yaml` — straight port. - `pvc-ml-cache.yaml` — straight port (empty `local-path` PVC). - `pv-nfs.yaml` + `pvc.yaml` — already covered by