blumeops/argocd
Erich Blume a32c99a252 Limit ollama to one loaded model and one parallel request
Prevents OOM when switching between models — only one 14B model
fits in 16GB VRAM at a time with KV cache for context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 21:23:12 -08:00
..
apps Deploy Ollama LLM server on ringtail (#277) 2026-03-02 20:39:51 -08:00
manifests Limit ollama to one loaded model and one parallel request 2026-03-02 21:23:12 -08:00