The 27B Q4_K_M model is ~17 GB, exceeding the 16 GB VRAM on the RTX 4080 by ~1 GB. Ollama will offload a few layers to CPU RAM, so the pod memory limit needs headroom beyond the previous 16Gi. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 lines
148 B
Text
8 lines
148 B
Text
# Models to pull from Ollama registry
|
|
# One model per line. Comments with #.
|
|
qwen2.5:14b
|
|
deepseek-r1:14b
|
|
phi4:14b
|
|
gemma3:12b
|
|
qwen3.5:9b
|
|
qwen3.5:27b
|