Port Frigate NVR to ringtail k3s with GPU acceleration (#217)

## Summary

- Enable NVIDIA container toolkit on ringtail NixOS and configure k3s containerd with nvidia runtime
- Add NVIDIA device plugin ArgoCD app (RuntimeClass + DaemonSet) to expose `nvidia.com/gpu` resources
- Re-target Frigate from indri minikube (arm64, ZMQ detector) to ringtail k3s (x86_64, TensorRT/ONNX)
- Switch Frigate image to `-tensorrt` variant with GPU resource limits and increased shared memory

## Manual Prerequisites

1. **NFS access**: Verify ringtail can mount `sifaka:/volume1/frigate`
   ```fish
   ssh ringtail 'sudo mount -t nfs sifaka:/volume1/frigate /mnt/storage1 && ls /mnt/storage1 && sudo umount /mnt/storage1'
   ```
2. **YOLO model**: Verify `/volume1/frigate/models/yolov9m.onnx` exists on sifaka

## Deployment Steps

1. Provision ringtail: `mise run provision-ringtail`
2. Sync ArgoCD apps: `argocd app sync apps --prune`
3. Deploy NVIDIA device plugin: `argocd app sync nvidia-device-plugin`
4. Verify GPU: `kubectl --context=k3s-ringtail get nodes -o json | jq '.items[].status.capacity'`
5. Deploy Frigate: `argocd app sync frigate`

## Verification

- [ ] `nvidia.com/gpu: 1` visible in node capacity
- [ ] Frigate pod running with GPU allocated
- [ ] Frigate UI loads at `https://nvr.ops.eblu.me`
- [ ] Detector shows ONNX/TensorRT on System page
- [ ] Camera feed with bounding boxes in live view
- [ ] TensorRT engine build completes (watch logs on first start)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/217
This commit is contained in:
Erich Blume 2026-02-19 14:27:04 -08:00
commit d5d32fe91f
14 changed files with 157 additions and 12 deletions

View file

@ -35,6 +35,28 @@ in
package = config.boot.kernelPackages.nvidiaPackages.stable;
};
# NVIDIA container toolkit (CDI specs + runtime for containerd/k3s GPU pods)
hardware.nvidia-container-toolkit.enable = true;
# Stable path to NVIDIA driver libraries for k3s device plugin pod mounts.
# Avoids mounting all of /nix/store — only the driver derivation is needed.
environment.etc."nvidia-driver/lib".source = "${config.hardware.nvidia.package}/lib";
# Stable-path wrapper for nvidia-container-runtime.cdi (the CDI-based OCI
# runtime that injects GPU devices/libs from NixOS-generated CDI specs).
# The wrapper adds runc to PATH since k3s doesn't ship a standalone runc binary.
environment.etc."nvidia-container-runtime/nvidia-runtime-cdi-wrapper" = {
mode = "0755";
text = ''
#!/bin/sh
export PATH="${pkgs.runc}/bin:$PATH"
exec ${pkgs.nvidia-container-toolkit.tools}/bin/nvidia-container-runtime.cdi "$@"
'';
};
# NFS client support (required for k3s to mount NFS PersistentVolumes)
boot.supportedFilesystems = [ "nfs" ];
# Wayland / Sway
programs.sway = {
enable = true;
@ -109,6 +131,19 @@ in
"--write-kubeconfig-mode=644"
"--tls-san=ringtail.tail8d86e.ts.net"
];
containerdConfigTemplate = ''
{{ template "base" . }}
[plugins.'io.containerd.cri.v1.runtime']
enable_cdi = true
cdi_spec_dirs = ["/var/run/cdi", "/etc/cdi"]
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_type = "io.containerd.runc.v2"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia.options]
BinaryName = "/etc/nvidia-container-runtime/nvidia-runtime-cdi-wrapper"
'';
};
# K3s containerd registry mirrors (pull through Zot on indri)