Erich Blume 4aa0872949 C0: review ollama doc — refresh image, models, last-reviewed

Bumped documented image tag to 0.20.4 (matches kustomization newTag),
added the two qwen3.5 models from models.txt, and stamped the card.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 10:42:33 -07:00

2.7 KiB

Raw Blame History

title

modified

last-reviewed

Ollama

LLM inference server with GPU acceleration. Runs on ringtail with declarative model management via a sidecar.

Quick Reference

Property	Value
URL	https://ollama.ops.eblu.me
Tailscale URL	https://ollama.tail8d86e.ts.net
Namespace	`ollama`
Cluster	ringtail k3s
Image	`ollama/ollama:0.20.4`
Upstream	https://github.com/ollama/ollama
Manifests	`argocd/manifests/ollama/`
API Port	11434

Architecture

models.txt (ConfigMap, declarative)
    │
    ▼
model-sync sidecar ──ollama pull──► Ollama server (GPU)
    │                                    │
    │ reads /config/models.txt           │ serves /api/*
    │ polls every 30 min                 │ NVIDIA runtime (RTX 4080, time-sliced)
    │                                    │
    └────────────────────────────────────┘
                     │
                /models (200 Gi hostPath PV)
                /mnt/storage1/ollama on ringtail

Models

Declared in argocd/manifests/ollama/models.txt. The model-sync sidecar pulls missing models on startup and every 30 minutes.

Model	Parameters
`qwen2.5:14b`	14B
`deepseek-r1:14b`	14B
`phi4:14b`	14B
`gemma3:12b`	12B
`qwen3.5:9b`	9B
`qwen3.5:27b`	27B

To add or remove models, edit models.txt and sync via ArgoCD.

GPU

Shares ringtail's RTX 4080 with frigate via NVIDIA device plugin time-slicing (2 virtual slots). Constrained to one loaded model and one parallel request to avoid VRAM contention.

Setting	Value
`OLLAMA_MAX_LOADED_MODELS`	1
`OLLAMA_NUM_PARALLEL`	1
GPU limit	`nvidia.com/gpu: "1"` (time-sliced)

Storage

Mount	Backend	Size
`/models`	hostPath PV (`/mnt/storage1/ollama`)	200 Gi

PV reclaim policy is Retain — models survive PV deletion.

Networking

Endpoint	Reachable from
`https://ollama.ops.eblu.me`	Public internet (Fly.io → Caddy)
`https://ollama.tail8d86e.ts.net`	Tailnet clients
`http://ollama.ollama.svc.cluster.local:11434`	In-cluster (ringtail)

Tailscale ingress uses ProxyGroup ingress — no explicit host: field (see tailscale-operator).

frigate — Shares GPU via time-slicing
ringtail — Host node
apps — ArgoCD application registry
tailscale-operator — Tailscale ingress

2.7 KiB Raw Blame History