From 6ca3c6770577a1759df32da93647bf188d054efb Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Wed, 4 Mar 2026 19:43:14 -0800 Subject: [PATCH] Add Ollama reference card and update indexes Co-Authored-By: Claude Opus 4.6 --- .../changelog.d/+ollama-reference-card.doc.md | 1 + docs/reference/kubernetes/apps.md | 3 +- docs/reference/reference.md | 3 +- docs/reference/services/ollama.md | 89 +++++++++++++++++++ 4 files changed, 94 insertions(+), 2 deletions(-) create mode 100644 docs/changelog.d/+ollama-reference-card.doc.md create mode 100644 docs/reference/services/ollama.md diff --git a/docs/changelog.d/+ollama-reference-card.doc.md b/docs/changelog.d/+ollama-reference-card.doc.md new file mode 100644 index 0000000..2d88484 --- /dev/null +++ b/docs/changelog.d/+ollama-reference-card.doc.md @@ -0,0 +1 @@ +Add reference card for the Ollama LLM inference service. diff --git a/docs/reference/kubernetes/apps.md b/docs/reference/kubernetes/apps.md index dfe8084..919907e 100644 --- a/docs/reference/kubernetes/apps.md +++ b/docs/reference/kubernetes/apps.md @@ -1,6 +1,6 @@ --- title: Apps -modified: 2026-02-25 +modified: 2026-03-04 tags: - kubernetes - argocd @@ -36,6 +36,7 @@ Registry of all applications deployed via [[argocd]]. | `teslamate` | teslamate | `argocd/manifests/teslamate/` | [[teslamate]] | | `cv` | cv | `argocd/manifests/cv/` | [[cv]] | | `forgejo-runner` | forgejo-runner | `argocd/manifests/forgejo-runner/` | [[forgejo]] CI | +| `ollama` | ollama | `argocd/manifests/ollama/` | [[ollama]] | ## Sync Policies diff --git a/docs/reference/reference.md b/docs/reference/reference.md index b4d3ee3..9faa8e2 100644 --- a/docs/reference/reference.md +++ b/docs/reference/reference.md @@ -1,6 +1,6 @@ --- title: Reference -modified: 2026-02-24 +modified: 2026-03-04 tags: - reference --- @@ -40,6 +40,7 @@ Individual service reference cards with URLs and configuration details. | [[authentik]] | OIDC identity provider | k8s (ringtail) | | [[docs]] | Documentation site (Quartz) | k8s | | [[flyio-proxy]] | Public reverse proxy (Fly.io + Tailscale) | Fly.io | +| [[ollama]] | LLM inference server | k8s (ringtail) | | [[automounter]] | SMB share automounter | indri | ## Infrastructure diff --git a/docs/reference/services/ollama.md b/docs/reference/services/ollama.md new file mode 100644 index 0000000..75480cb --- /dev/null +++ b/docs/reference/services/ollama.md @@ -0,0 +1,89 @@ +--- +title: Ollama +modified: 2026-03-04 +tags: + - service + - ai +--- + +# Ollama + +LLM inference server with GPU acceleration. Runs on [[ringtail]] with declarative model management via a sidecar. + +## Quick Reference + +| Property | Value | +|----------|-------| +| **URL** | https://ollama.ops.eblu.me | +| **Tailscale URL** | https://ollama.tail8d86e.ts.net | +| **Namespace** | `ollama` | +| **Cluster** | ringtail k3s | +| **Image** | `ollama/ollama:0.17.5` | +| **Upstream** | https://github.com/ollama/ollama | +| **Manifests** | `argocd/manifests/ollama/` | +| **API Port** | 11434 | + +## Architecture + +``` +models.txt (ConfigMap, declarative) + │ + ▼ +model-sync sidecar ──ollama pull──► Ollama server (GPU) + │ │ + │ reads /config/models.txt │ serves /api/* + │ polls every 30 min │ NVIDIA runtime (RTX 4080, time-sliced) + │ │ + └────────────────────────────────────┘ + │ + /models (200 Gi hostPath PV) + /mnt/storage1/ollama on ringtail +``` + +## Models + +Declared in `argocd/manifests/ollama/models.txt`. The model-sync sidecar pulls missing models on startup and every 30 minutes. + +| Model | Parameters | +|-------|------------| +| `qwen2.5:14b` | 14B | +| `deepseek-r1:14b` | 14B | +| `phi4:14b` | 14B | +| `gemma3:12b` | 12B | + +To add or remove models, edit `models.txt` and sync via ArgoCD. + +## GPU + +Shares [[ringtail]]'s RTX 4080 with [[frigate]] via NVIDIA device plugin time-slicing (2 virtual slots). Constrained to one loaded model and one parallel request to avoid VRAM contention. + +| Setting | Value | +|---------|-------| +| `OLLAMA_MAX_LOADED_MODELS` | 1 | +| `OLLAMA_NUM_PARALLEL` | 1 | +| GPU limit | `nvidia.com/gpu: "1"` (time-sliced) | + +## Storage + +| Mount | Backend | Size | +|-------|---------|------| +| `/models` | hostPath PV (`/mnt/storage1/ollama`) | 200 Gi | + +PV reclaim policy is `Retain` — models survive PV deletion. + +## Networking + +| Endpoint | Reachable from | +|----------|----------------| +| `https://ollama.ops.eblu.me` | Public internet (Fly.io → Caddy) | +| `https://ollama.tail8d86e.ts.net` | Tailnet clients | +| `http://ollama.ollama.svc.cluster.local:11434` | In-cluster (ringtail) | + +Tailscale ingress uses ProxyGroup `ingress` — no explicit `host:` field (see [[tailscale-operator]]). + +## Related + +- [[frigate]] — Shares GPU via time-slicing +- [[ringtail]] — Host node +- [[apps]] — ArgoCD application registry +- [[tailscale-operator]] — Tailscale ingress