Add qwen3.5:27b to Ollama and bump memory limit to 22Gi

The 27B Q4_K_M model is ~17 GB, exceeding the 16 GB VRAM on the RTX 4080 by ~1 GB. Ollama will offload a few layers to CPU RAM, so the pod memory limit needs headroom beyond the previous 16Gi. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:55:51 -07:00 · 2026-03-11 18:55:51 -07:00 · 6d4929a66c
commit 6d4929a66c
parent 40f1568088
2 changed files with 2 additions and 1 deletions
--- a/argocd/manifests/ollama/deployment.yaml
+++ b/argocd/manifests/ollama/deployment.yaml
@ -40,7 +40,7 @@ spec:
              memory: "512Mi"
              cpu: "500m"
            limits:
-              memory: "16Gi"
+              memory: "22Gi"
              cpu: "4000m"
              nvidia.com/gpu: "1"
          livenessProbe:
--- a/argocd/manifests/ollama/models.txt
+++ b/argocd/manifests/ollama/models.txt
@ -5,3 +5,4 @@ deepseek-r1:14b
 phi4:14b
 gemma3:12b
 qwen3.5:9b
+qwen3.5:27b