The 27B Q4_K_M model needs ~7.3 GiB system RAM for CPU-offloaded layers
but only 6.8 GiB was available within the 22Gi cgroup. Bumping to 24Gi
and enabling flash attention (reduces KV cache memory) should provide
enough headroom.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 27B Q4_K_M model is ~17 GB, exceeding the 16 GB VRAM on the RTX 4080
by ~1 GB. Ollama will offload a few layers to CPU RAM, so the pod memory
limit needs headroom beyond the previous 16Gi.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bare image references in manifests were ambiguous — unclear whether the
tag was intentionally omitted or managed by kustomize. Add :kustomized
sentinel to all 37 image refs overridden by kustomize images transformer.
Add sync notes for tailscale-operator proxyclass (CRD fields not processed
by kustomize). Mark devpi reviewed (6.19.1 is current).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents OOM when switching between models — only one 14B model
fits in 16GB VRAM at a time with KV cache for context.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>