Limit ollama to one loaded model and one parallel request

Prevents OOM when switching between models — only one 14B model
fits in 16GB VRAM at a time with KV cache for context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-03-02 21:23:12 -08:00
commit a32c99a252

View file

@ -28,6 +28,10 @@ spec:
value: /models
- name: OLLAMA_HOST
value: "0.0.0.0:11434"
- name: OLLAMA_MAX_LOADED_MODELS
value: "1"
- name: OLLAMA_NUM_PARALLEL
value: "1"
volumeMounts:
- name: models
mountPath: /models