Bumped documented image tag to 0.20.4 (matches kustomization newTag), added the two qwen3.5 models from models.txt, and stamped the card. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.7 KiB
2.7 KiB
| title | modified | last-reviewed | tags | ||
|---|---|---|---|---|---|
| Ollama | 2026-05-01 | 2026-05-01 |
|
Ollama
LLM inference server with GPU acceleration. Runs on ringtail with declarative model management via a sidecar.
Quick Reference
| Property | Value |
|---|---|
| URL | https://ollama.ops.eblu.me |
| Tailscale URL | https://ollama.tail8d86e.ts.net |
| Namespace | ollama |
| Cluster | ringtail k3s |
| Image | ollama/ollama:0.20.4 |
| Upstream | https://github.com/ollama/ollama |
| Manifests | argocd/manifests/ollama/ |
| API Port | 11434 |
Architecture
models.txt (ConfigMap, declarative)
│
▼
model-sync sidecar ──ollama pull──► Ollama server (GPU)
│ │
│ reads /config/models.txt │ serves /api/*
│ polls every 30 min │ NVIDIA runtime (RTX 4080, time-sliced)
│ │
└────────────────────────────────────┘
│
/models (200 Gi hostPath PV)
/mnt/storage1/ollama on ringtail
Models
Declared in argocd/manifests/ollama/models.txt. The model-sync sidecar pulls missing models on startup and every 30 minutes.
| Model | Parameters |
|---|---|
qwen2.5:14b |
14B |
deepseek-r1:14b |
14B |
phi4:14b |
14B |
gemma3:12b |
12B |
qwen3.5:9b |
9B |
qwen3.5:27b |
27B |
To add or remove models, edit models.txt and sync via ArgoCD.
GPU
Shares ringtail's RTX 4080 with frigate via NVIDIA device plugin time-slicing (2 virtual slots). Constrained to one loaded model and one parallel request to avoid VRAM contention.
| Setting | Value |
|---|---|
OLLAMA_MAX_LOADED_MODELS |
1 |
OLLAMA_NUM_PARALLEL |
1 |
| GPU limit | nvidia.com/gpu: "1" (time-sliced) |
Storage
| Mount | Backend | Size |
|---|---|---|
/models |
hostPath PV (/mnt/storage1/ollama) |
200 Gi |
PV reclaim policy is Retain — models survive PV deletion.
Networking
| Endpoint | Reachable from |
|---|---|
https://ollama.ops.eblu.me |
Public internet (Fly.io → Caddy) |
https://ollama.tail8d86e.ts.net |
Tailnet clients |
http://ollama.ollama.svc.cluster.local:11434 |
In-cluster (ringtail) |
Tailscale ingress uses ProxyGroup ingress — no explicit host: field (see tailscale-operator).
Related
- frigate — Shares GPU via time-slicing
- ringtail — Host node
- apps — ArgoCD application registry
- tailscale-operator — Tailscale ingress