Add qwen3.5:27b to Ollama and bump memory limit to 22Gi
The 27B Q4_K_M model is ~17 GB, exceeding the 16 GB VRAM on the RTX 4080 by ~1 GB. Ollama will offload a few layers to CPU RAM, so the pod memory limit needs headroom beyond the previous 16Gi. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
40f1568088
commit
6d4929a66c
2 changed files with 2 additions and 1 deletions
|
|
@ -40,7 +40,7 @@ spec:
|
|||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
limits:
|
||||
memory: "16Gi"
|
||||
memory: "22Gi"
|
||||
cpu: "4000m"
|
||||
nvidia.com/gpu: "1"
|
||||
livenessProbe:
|
||||
|
|
|
|||
|
|
@ -5,3 +5,4 @@ deepseek-r1:14b
|
|||
phi4:14b
|
||||
gemma3:12b
|
||||
qwen3.5:9b
|
||||
qwen3.5:27b
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue