The 27B Q4_K_M model is ~17 GB, exceeding the 16 GB VRAM on the RTX 4080 by ~1 GB. Ollama will offload a few layers to CPU RAM, so the pod memory limit needs headroom beyond the previous 16Gi. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| deployment.yaml | ||
| ingress-tailscale.yaml | ||
| kustomization.yaml | ||
| models.txt | ||
| pv-hostpath.yaml | ||
| pvc.yaml | ||
| service.yaml | ||
| sync-models.sh | ||