blumeops/docs/reference/services/ollama.md
Erich Blume 4aa0872949 C0: review ollama doc — refresh image, models, last-reviewed
Bumped documented image tag to 0.20.4 (matches kustomization newTag),
added the two qwen3.5 models from models.txt, and stamped the card.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:42:33 -07:00

2.7 KiB

title modified last-reviewed tags
Ollama 2026-05-01 2026-05-01
service
ai

Ollama

LLM inference server with GPU acceleration. Runs on ringtail with declarative model management via a sidecar.

Quick Reference

Property Value
URL https://ollama.ops.eblu.me
Tailscale URL https://ollama.tail8d86e.ts.net
Namespace ollama
Cluster ringtail k3s
Image ollama/ollama:0.20.4
Upstream https://github.com/ollama/ollama
Manifests argocd/manifests/ollama/
API Port 11434

Architecture

models.txt (ConfigMap, declarative)
    │
    ▼
model-sync sidecar ──ollama pull──► Ollama server (GPU)
    │                                    │
    │ reads /config/models.txt           │ serves /api/*
    │ polls every 30 min                 │ NVIDIA runtime (RTX 4080, time-sliced)
    │                                    │
    └────────────────────────────────────┘
                     │
                /models (200 Gi hostPath PV)
                /mnt/storage1/ollama on ringtail

Models

Declared in argocd/manifests/ollama/models.txt. The model-sync sidecar pulls missing models on startup and every 30 minutes.

Model Parameters
qwen2.5:14b 14B
deepseek-r1:14b 14B
phi4:14b 14B
gemma3:12b 12B
qwen3.5:9b 9B
qwen3.5:27b 27B

To add or remove models, edit models.txt and sync via ArgoCD.

GPU

Shares ringtail's RTX 4080 with frigate via NVIDIA device plugin time-slicing (2 virtual slots). Constrained to one loaded model and one parallel request to avoid VRAM contention.

Setting Value
OLLAMA_MAX_LOADED_MODELS 1
OLLAMA_NUM_PARALLEL 1
GPU limit nvidia.com/gpu: "1" (time-sliced)

Storage

Mount Backend Size
/models hostPath PV (/mnt/storage1/ollama) 200 Gi

PV reclaim policy is Retain — models survive PV deletion.

Networking

Endpoint Reachable from
https://ollama.ops.eblu.me Public internet (Fly.io → Caddy)
https://ollama.tail8d86e.ts.net Tailnet clients
http://ollama.ollama.svc.cluster.local:11434 In-cluster (ringtail)

Tailscale ingress uses ProxyGroup ingress — no explicit host: field (see tailscale-operator).