Fix services-check and update docs for Frigate migration to ringtail (#218)

## Summary
- Move mosquitto, ntfy, frigate, frigate-notify pod checks from `minikube-indri` to `k3s-ringtail` context in `services-check`
- Add `nvidia-device-plugin` pod check for ringtail k3s
- Rename "Kubernetes pods" section to "Indri minikube pods" for clarity
- Update 8 documentation files to reflect the migration completed in PRs #216/#217

## Files Changed
| File | Change |
|------|--------|
| `mise-tasks/services-check` | Move 4 pod checks to k3s-ringtail, add nvidia-device-plugin |
| `docs/reference/services/frigate.md` | Image→tensorrt, detector→ONNX/CUDA, shm→512Mi |
| `docs/reference/infrastructure/ringtail.md` | List actual k3s workloads |
| `docs/reference/infrastructure/indri.md` | Note frigate migration |
| `docs/explanation/architecture.md` | Add ringtail to diagram + compute layer |
| `docs/reference/kubernetes/cluster.md` | Note two clusters, add k3s section |
| `docs/reference/reference.md` | Update frigate/ntfy location |
| `docs/how-to/plans/completed/operationalize-reolink-camera.md` | Add post-completion migration note |
| `CLAUDE.md` | Add k3s-ringtail context guidance |

## Test plan
- [ ] `mise run services-check` — all checks pass
- [ ] Review each doc for accuracy against deployed state

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/218
This commit is contained in:
Erich Blume 2026-02-19 14:38:21 -08:00
commit 291fff345c
10 changed files with 62 additions and 29 deletions

View file

@ -14,7 +14,7 @@ blumeops is Erich Blume's GitOps repository for personal infrastructure, orchest
1. **Always run `mise run zk-docs -- --style=header --color=never --decorations=always` at session start**
This will refresh your context with important information you will be assumed to know and follow.
2. **Always use `--context=minikube-indri` with kubectl** - work contexts must never be touched
2. **Always use `--context=minikube-indri` with kubectl** (or `--context=k3s-ringtail` for ringtail services) - work contexts must never be touched
3. **Feature branches only** - checkout main, pull, create branch, commit often
4. **Create PRs via `tea pr create`** - user reviews before deploy, merges after
5. **Check PR comments with `mise run pr-comments <pr_number>`** before proceeding
@ -52,7 +52,7 @@ encounter wiki-links (`[[like-this]]`) it is referring to docs/ cards.
### Kubernetes (ArgoCD)
Most services run in minikube on indri via ArgoCD (app-of-apps, manual sync).
Most services run in minikube on indri via ArgoCD (app-of-apps, manual sync). GPU workloads (Frigate, Mosquitto, ntfy) run on ringtail's k3s cluster, also managed by ArgoCD.
**PR workflow:**
1. Create branch, modify `argocd/manifests/<service>/`

View file

@ -0,0 +1 @@
Update services-check and documentation to reflect Frigate, Mosquitto, and ntfy migration from indri minikube to ringtail k3s (PRs #216, #217).

View file

@ -1,6 +1,6 @@
---
title: Architecture
modified: 2026-02-09
modified: 2026-02-19
last-reviewed: 2026-02-09
tags:
- explanation
@ -15,7 +15,7 @@ How all the BlumeOps pieces fit together.
## Physical Layer
Two always-on devices form the infrastructure backbone:
Three always-on devices form the infrastructure backbone:
```
┌─────────────────┐ ┌─────────────────┐
@ -23,8 +23,13 @@ Two always-on devices form the infrastructure backbone:
│ Mac Mini M1 │────▶│ Synology NAS │
│ (compute) │ │ (storage) │
└─────────────────┘ └─────────────────┘
│ Tailscale
│ ▲
│ Tailscale │ NFS
│ ┌──────┴──────────┐
│ │ Ringtail │
│ │ NixOS PC │
│ │ (GPU compute) │
│ └─────────────────┘
┌─────────────────┐
│ Gilbert │
@ -33,7 +38,8 @@ Two always-on devices form the infrastructure backbone:
└─────────────────┘
```
- **[[indri]]** runs all services (native and containerized)
- **[[indri]]** runs most services (native and containerized)
- **[[ringtail]]** runs GPU workloads (Frigate NVR) and related services (MQTT, ntfy)
- **[[sifaka]]** provides bulk storage and backup targets
- **[[gilbert]]** is the development workstation
@ -61,11 +67,13 @@ See [[routing]] for the full service URL table and port map.
## Compute Layer
Services run in two places on [[indri]]:
Services run across three compute targets:
**Native (Ansible)** — services that need host-level access run directly on macOS, managed via Ansible roles in `ansible/roles/`. See [[indri]] for the full list.
**Native on indri (Ansible)** — services that need host-level access run directly on macOS, managed via Ansible roles in `ansible/roles/`. See [[indri]] for the full list.
**Kubernetes (ArgoCD)** — most services run in minikube, managed via ArgoCD from `argocd/manifests/`. See [[apps]] for the application registry.
**Minikube on indri (ArgoCD)** — most services run in minikube, managed via ArgoCD from `argocd/manifests/`. See [[apps]] for the application registry.
**K3s on ringtail (ArgoCD)** — GPU workloads and related services run on [[ringtail]]'s single-node k3s cluster. Frigate NVR uses the RTX 4080 for object detection; Mosquitto and ntfy support its alerting pipeline.
## Data Flow

View file

@ -1,6 +1,6 @@
---
title: "Plan: Operationalize ReoLink Camera"
modified: 2026-02-11
modified: 2026-02-19
tags:
- how-to
- plans
@ -277,6 +277,10 @@ Camera settings to apply: enable RTSP and ONVIF, set "fluency first" encoding mo
| `argocd/manifests/prometheus/configmap.yaml` | Prometheus scrape target config |
| `docs/reference/storage/sifaka.md` | NFS export documentation |
## Post-Completion Update
Frigate, Mosquitto, and ntfy were migrated from indri's minikube to [[ringtail]]'s k3s cluster with RTX 4080 GPU acceleration (PRs #216, #217). The ZMQ Apple Silicon Detector has been retired in favour of ONNX with CUDA execution provider. Object detection now runs on the GPU rather than CPU.
## Related
- [[add-unifi-pulumi-stack]] — network segmentation (IoT VLAN for camera)

View file

@ -1,6 +1,6 @@
---
title: Indri
modified: 2026-02-09
modified: 2026-02-19
tags:
- infrastructure
- host
@ -32,7 +32,7 @@ Primary BlumeOps server. Mac Mini M1 (2020).
- [[caddy]] - Reverse proxy for `*.ops.eblu.me`
**Kubernetes (via minikube):**
- [[apps|All k8s applications]]
- [[apps|Most k8s applications]] (Frigate, Mosquitto, ntfy migrated to [[ringtail]] k3s)
**GUI Applications (manual start required):**
- Docker Desktop - Container runtime for minikube

View file

@ -63,7 +63,13 @@ Sync order: `1password-connect-ringtail` -> `external-secrets-crds-ringtail` ->
### Workloads
No k8s workloads currently deployed. K3s is available for future workloads (e.g. Frigate, running nix-built containers).
| Workload | Namespace | Notes |
|----------|-----------|-------|
| [[frigate]] | `frigate` | NVR with GPU-accelerated detection (RTX 4080) |
| [[frigate]]-notify | `frigate` | MQTT-to-ntfy alert bridge |
| Mosquitto | `mqtt` | MQTT broker for Frigate events |
| [[ntfy]] | `ntfy` | Push notification server |
| nvidia-device-plugin | `nvidia-device-plugin` | Exposes GPU to pods via CDI + nvidia RuntimeClass |
### Manual Cluster Registration

View file

@ -1,13 +1,13 @@
---
title: Cluster
modified: 2026-02-07
modified: 2026-02-19
tags:
- kubernetes
---
# Kubernetes Cluster
Single-node Minikube cluster running on [[indri]].
BlumeOps runs two Kubernetes clusters: a Minikube cluster on [[indri]] (most services) and a k3s cluster on [[ringtail]] (GPU workloads, MQTT, notifications). Both are managed by [[argocd]] on indri.
## Cluster Specifications
@ -33,6 +33,16 @@ Containerd uses [[zot]] as a pull-through cache at `host.minikube.internal:5050`
Mirrors configured: `registry.ops.eblu.me`, `docker.io`, `ghcr.io`, `quay.io`
## K3s on Ringtail
Single-node k3s cluster for workloads requiring amd64 or GPU access. See [[ringtail]] for cluster specs, workload list, and secrets management.
| Property | Value |
|----------|-------|
| **Context** | `k3s-ringtail` |
| **API Server** | `https://ringtail.tail8d86e.ts.net:6443` |
| **Workloads** | Frigate (GPU), Mosquitto, ntfy, frigate-notify, nvidia-device-plugin |
## Related
- [[apps|Apps]] - ArgoCD applications

View file

@ -1,6 +1,6 @@
---
title: Reference
modified: 2026-02-17
modified: 2026-02-19
tags:
- reference
---
@ -21,7 +21,7 @@ Individual service reference cards with URLs and configuration details.
| [[caddy]] | Reverse proxy & TLS termination | indri |
| [[1password]] | Secrets management | cloud + k8s |
| [[forgejo]] | Git forge & CI/CD | indri |
| [[frigate]] | Network video recorder | k8s |
| [[frigate]] | Network video recorder | k8s (ringtail) |
| [[grafana]] | Dashboards & visualization | k8s |
| [[immich]] | Photo management | k8s |
| [[jellyfin]] | Media server | indri |
@ -29,7 +29,7 @@ Individual service reference cards with URLs and configuration details.
| [[loki]] | Log aggregation | k8s |
| [[miniflux]] | RSS feed reader | k8s |
| [[navidrome]] | Music streaming | k8s |
| [[ntfy]] | Push notifications | k8s |
| [[ntfy]] | Push notifications | k8s (ringtail) |
| [[postgresql]] | Database cluster | k8s |
| [[prometheus]] | Metrics collection | k8s |
| [[teslamate]] | Tesla data logger | k8s |

View file

@ -1,6 +1,6 @@
---
title: Frigate
modified: 2026-02-17
modified: 2026-02-19
tags:
- service
- surveillance
@ -17,7 +17,7 @@ Open-source network video recorder (NVR) with object detection. Runs cloud-free
| **URL** | https://nvr.ops.eblu.me |
| **Tailscale URL** | https://nvr.tail8d86e.ts.net |
| **Namespace** | `frigate` |
| **Image** | `ghcr.io/blakeblackshear/frigate:0.17.0-rc2-standard-arm64` |
| **Image** | `ghcr.io/blakeblackshear/frigate:0.17.0-rc2-tensorrt` |
| **Upstream** | https://github.com/blakeblackshear/frigate |
| **Manifests** | `argocd/manifests/frigate/` |
@ -30,7 +30,7 @@ ReoLink Camera (GableCam)
Frigate pod (ringtail k3s)
├── go2rtc — RTSP restream proxy
├── FFmpeg — stream decoding
├── detector — GPU-accelerated (RTX 4080, pending migration)
├── detector — ONNX with CUDA (RTX 4080)
├── /media/frigate — NFS recordings (sifaka)
└── /db — SQLite (local PVC)
@ -47,7 +47,7 @@ Camera credentials are stored in 1Password and synced via [[external-secrets]] t
## Detection
Object detection will use GPU-accelerated inference on [[ringtail]]'s RTX 4080 (migration pending). The previous Apple Silicon Detector on [[indri]] has been retired.
Object detection runs on [[ringtail]]'s RTX 4080 via the ONNX detector with CUDA execution provider. The model is YOLO-NAS-S (`yolo_nas_s.onnx`). The previous Apple Silicon Detector on [[indri]] has been retired.
Two zones are configured: `driveway_entrance` (triggers review alerts for person/car) and `driveway` (triggers review detections).
@ -66,7 +66,7 @@ Two zones are configured: `driveway_entrance` (triggers review alerts for person
|-------|---------|------|
| `/media/frigate` | NFS PV on [[sifaka]] (`/volume1/frigate`) | 2 Ti |
| `/db` | Local PVC (`frigate-database`) | SQLite |
| `/dev/shm` | Memory-backed `emptyDir` | 256 Mi |
| `/dev/shm` | Memory-backed `emptyDir` | 512 Mi |
## Alerting (frigate-notify)

View file

@ -91,6 +91,14 @@ check_service "k3s" "ssh ringtail 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml k3s kube
check_service "k3s-apiserver (remote)" "kubectl --context=k3s-ringtail get --raw /healthz"
check_service "forgejo-runner" "ssh ringtail 'systemctl is-active gitea-runner-nix_container_builder.service'"
echo ""
echo "Ringtail k3s pods:"
check_service "mosquitto" "kubectl --context=k3s-ringtail -n mqtt get pods -l app=mosquitto -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "ntfy" "kubectl --context=k3s-ringtail -n ntfy get pods -l app=ntfy -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "frigate" "kubectl --context=k3s-ringtail -n frigate get pods -l app=frigate -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "frigate-notify" "kubectl --context=k3s-ringtail -n frigate get pods -l app=frigate-notify -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "nvidia-device-plugin" "kubectl --context=k3s-ringtail -n nvidia-device-plugin get pods -l app=nvidia-device-plugin -o jsonpath='{.items[0].status.phase}' | grep -q Running"
echo ""
echo "Public services (via Fly.io):"
check_http "Docs (public)" "https://docs.eblu.me/"
@ -102,17 +110,13 @@ echo "Database:"
check_service "PostgreSQL (k8s)" "pg_isready -h pg.ops.eblu.me -p 5432"
echo ""
echo "Kubernetes pods:"
echo "Indri minikube pods:"
check_service "prometheus-0" "kubectl --context=minikube-indri -n monitoring get pod prometheus-0 -o jsonpath='{.status.phase}' | grep -q Running"
check_service "loki-0" "kubectl --context=minikube-indri -n monitoring get pod loki-0 -o jsonpath='{.status.phase}' | grep -q Running"
check_service "grafana" "kubectl --context=minikube-indri -n monitoring get pods -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "miniflux" "kubectl --context=minikube-indri -n miniflux get pods -l app=miniflux -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "teslamate" "kubectl --context=minikube-indri -n teslamate get pods -l app=teslamate -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "blumeops-pg" "kubectl --context=minikube-indri -n databases get pods -l cnpg.io/cluster=blumeops-pg -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "mosquitto" "kubectl --context=minikube-indri -n mqtt get pods -l app=mosquitto -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "ntfy" "kubectl --context=minikube-indri -n ntfy get pods -l app=ntfy -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "frigate" "kubectl --context=minikube-indri -n frigate get pods -l app=frigate -o jsonpath='{.items[0].status.phase}' | grep -q Running"
check_service "frigate-notify" "kubectl --context=minikube-indri -n frigate get pods -l app=frigate-notify -o jsonpath='{.items[0].status.phase}' | grep -q Running"
echo ""
echo "ArgoCD app sync status:"