Commit graph

3 commits

Author SHA1 Message Date
31d925814f Deploy Ollama LLM server on ringtail (#277)
## Summary
- Deploy Ollama as a new ArgoCD-managed service on ringtail's k3s cluster with GPU acceleration
- Declarative model management via `models.txt` + sidecar sync script (mirrors kiwix torrent pattern)
- Initial models: `qwen2.5:14b`, `deepseek-r1:14b`, `phi4:14b`, `gemma3:12b`
- hostPath PV on `/mnt/storage1/ollama` for fast local model storage (200Gi)
- Tailscale ingress at `ollama.ops.eblu.me` for API access from tailnet
- Enable GPU time-slicing (`replicas: 2`) on nvidia-device-plugin so Frigate and Ollama share the RTX 4080

## Deployment and Testing
- [ ] Deploy nvidia-device-plugin changes first: `argocd app sync nvidia-device-plugin`
- [ ] Verify GPU time-slicing: `kubectl describe node ringtail --context=k3s-ringtail` shows `nvidia.com/gpu: 2`
- [ ] Sync `apps` app with `--revision feature/ollama-ringtail`
- [ ] Set ollama app to branch: `argocd app set ollama --revision feature/ollama-ringtail && argocd app sync ollama`
- [ ] Verify model-sync sidecar pulls models: `kubectl logs -n ollama deploy/ollama -c model-sync --context=k3s-ringtail`
- [ ] Test API: `curl https://ollama.ops.eblu.me/api/tags`
- [ ] Test inference: `curl https://ollama.ops.eblu.me/api/generate -d '{"model":"qwen2.5:14b","prompt":"Hello"}'`
- [ ] Verify Frigate still works after GPU sharing change
- [ ] After merge: `argocd app set ollama --revision main && argocd app sync ollama`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/277
2026-03-02 20:39:51 -08:00
9b44a8ec51 Add kustomize images: and configMapGenerator: across services (#264)
## Summary

- Move hardcoded image tags to kustomization.yaml `images:` transformer across **22 services** — image names in manifests become version-agnostic templates, with tags centralized in one place per service
- Replace hand-written ConfigMap manifests with `configMapGenerator:` in **12 services** — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes
- Create new `kustomization.yaml` for **forgejo-runner** and **nvidia-device-plugin** (switches ArgoCD from directory mode to kustomize mode, rendered output identical)

### Services modified

**Images only (8):** cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent

**Images + configMapGenerator (10):** alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus

**Images only, no configMapGenerator (4):** authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is)

**Skipped entirely (6):** argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml)

### What changes at deploy time

- **images:** — no functional diff, `kustomize build` produces identical output with tags
- **configMapGenerator:** — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change

## Test plan

- [x] `kubectl kustomize` builds all 30 service directories successfully
- [x] Image tags verified in rendered output for all modified services
- [x] ConfigMap hash suffixes verified in rendered output
- [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names
- [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.)
- [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes
- [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264
2026-02-24 14:25:19 -08:00
d5d32fe91f Port Frigate NVR to ringtail k3s with GPU acceleration (#217)
## Summary

- Enable NVIDIA container toolkit on ringtail NixOS and configure k3s containerd with nvidia runtime
- Add NVIDIA device plugin ArgoCD app (RuntimeClass + DaemonSet) to expose `nvidia.com/gpu` resources
- Re-target Frigate from indri minikube (arm64, ZMQ detector) to ringtail k3s (x86_64, TensorRT/ONNX)
- Switch Frigate image to `-tensorrt` variant with GPU resource limits and increased shared memory

## Manual Prerequisites

1. **NFS access**: Verify ringtail can mount `sifaka:/volume1/frigate`
   ```fish
   ssh ringtail 'sudo mount -t nfs sifaka:/volume1/frigate /mnt/storage1 && ls /mnt/storage1 && sudo umount /mnt/storage1'
   ```
2. **YOLO model**: Verify `/volume1/frigate/models/yolov9m.onnx` exists on sifaka

## Deployment Steps

1. Provision ringtail: `mise run provision-ringtail`
2. Sync ArgoCD apps: `argocd app sync apps --prune`
3. Deploy NVIDIA device plugin: `argocd app sync nvidia-device-plugin`
4. Verify GPU: `kubectl --context=k3s-ringtail get nodes -o json | jq '.items[].status.capacity'`
5. Deploy Frigate: `argocd app sync frigate`

## Verification

- [ ] `nvidia.com/gpu: 1` visible in node capacity
- [ ] Frigate pod running with GPU allocated
- [ ] Frigate UI loads at `https://nvr.ops.eblu.me`
- [ ] Detector shows ONNX/TensorRT on System page
- [ ] Camera feed with bounding boxes in live view
- [ ] TensorRT engine build completes (watch logs on first start)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/217
2026-02-19 14:27:04 -08:00