Deploy Ollama LLM server on ringtail (#277)
## Summary - Deploy Ollama as a new ArgoCD-managed service on ringtail's k3s cluster with GPU acceleration - Declarative model management via `models.txt` + sidecar sync script (mirrors kiwix torrent pattern) - Initial models: `qwen2.5:14b`, `deepseek-r1:14b`, `phi4:14b`, `gemma3:12b` - hostPath PV on `/mnt/storage1/ollama` for fast local model storage (200Gi) - Tailscale ingress at `ollama.ops.eblu.me` for API access from tailnet - Enable GPU time-slicing (`replicas: 2`) on nvidia-device-plugin so Frigate and Ollama share the RTX 4080 ## Deployment and Testing - [ ] Deploy nvidia-device-plugin changes first: `argocd app sync nvidia-device-plugin` - [ ] Verify GPU time-slicing: `kubectl describe node ringtail --context=k3s-ringtail` shows `nvidia.com/gpu: 2` - [ ] Sync `apps` app with `--revision feature/ollama-ringtail` - [ ] Set ollama app to branch: `argocd app set ollama --revision feature/ollama-ringtail && argocd app sync ollama` - [ ] Verify model-sync sidecar pulls models: `kubectl logs -n ollama deploy/ollama -c model-sync --context=k3s-ringtail` - [ ] Test API: `curl https://ollama.ops.eblu.me/api/tags` - [ ] Test inference: `curl https://ollama.ops.eblu.me/api/generate -d '{"model":"qwen2.5:14b","prompt":"Hello"}'` - [ ] Verify Frigate still works after GPU sharing change - [ ] After merge: `argocd app set ollama --revision main && argocd app sync ollama` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/277
This commit is contained in:
parent
0f79c61c42
commit
31d925814f
15 changed files with 292 additions and 0 deletions
84
argocd/manifests/ollama/deployment.yaml
Normal file
84
argocd/manifests/ollama/deployment.yaml
Normal file
|
|
@ -0,0 +1,84 @@
|
|||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: ollama
|
||||
namespace: ollama
|
||||
spec:
|
||||
replicas: 1
|
||||
strategy:
|
||||
type: Recreate
|
||||
selector:
|
||||
matchLabels:
|
||||
app: ollama
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: ollama
|
||||
spec:
|
||||
runtimeClassName: nvidia
|
||||
containers:
|
||||
- name: ollama
|
||||
image: ollama/ollama
|
||||
ports:
|
||||
- containerPort: 11434
|
||||
name: http
|
||||
env:
|
||||
- name: OLLAMA_MODELS
|
||||
value: /models
|
||||
- name: OLLAMA_HOST
|
||||
value: "0.0.0.0:11434"
|
||||
volumeMounts:
|
||||
- name: models
|
||||
mountPath: /models
|
||||
resources:
|
||||
requests:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
limits:
|
||||
memory: "16Gi"
|
||||
cpu: "4000m"
|
||||
nvidia.com/gpu: "1"
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /api/tags
|
||||
port: 11434
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 30
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /api/tags
|
||||
port: 11434
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 10
|
||||
- name: model-sync
|
||||
image: ollama/ollama
|
||||
command: ["/bin/bash", "/scripts/sync-models.sh"]
|
||||
env:
|
||||
- name: MODEL_LIST
|
||||
value: /config/models.txt
|
||||
- name: OLLAMA_HOST
|
||||
value: "http://localhost:11434"
|
||||
volumeMounts:
|
||||
- name: models-config
|
||||
mountPath: /config
|
||||
- name: sync-script
|
||||
mountPath: /scripts
|
||||
resources:
|
||||
requests:
|
||||
memory: "64Mi"
|
||||
cpu: "50m"
|
||||
limits:
|
||||
memory: "256Mi"
|
||||
cpu: "200m"
|
||||
volumes:
|
||||
- name: models
|
||||
persistentVolumeClaim:
|
||||
claimName: ollama-models
|
||||
- name: models-config
|
||||
configMap:
|
||||
name: ollama-models
|
||||
- name: sync-script
|
||||
configMap:
|
||||
name: ollama-sync-script
|
||||
defaultMode: 0755 # yamllint disable-line rule:octal-values
|
||||
Loading…
Add table
Add a link
Reference in a new issue