diff --git a/CLAUDE.md b/CLAUDE.md index 08c356e..c325c74 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -14,7 +14,7 @@ blumeops is Erich Blume's GitOps repository for personal infrastructure, orchest 1. **Always run `mise run zk-docs -- --style=header --color=never --decorations=always` at session start** This will refresh your context with important information you will be assumed to know and follow. -2. **Always use `--context=minikube-indri` with kubectl** - work contexts must never be touched +2. **Always use `--context=minikube-indri` with kubectl** (or `--context=k3s-ringtail` for ringtail services) - work contexts must never be touched 3. **Feature branches only** - checkout main, pull, create branch, commit often 4. **Create PRs via `tea pr create`** - user reviews before deploy, merges after 5. **Check PR comments with `mise run pr-comments `** before proceeding @@ -52,7 +52,7 @@ encounter wiki-links (`[[like-this]]`) it is referring to docs/ cards. ### Kubernetes (ArgoCD) -Most services run in minikube on indri via ArgoCD (app-of-apps, manual sync). +Most services run in minikube on indri via ArgoCD (app-of-apps, manual sync). GPU workloads (Frigate, Mosquitto, ntfy) run on ringtail's k3s cluster, also managed by ArgoCD. **PR workflow:** 1. Create branch, modify `argocd/manifests//` diff --git a/docs/changelog.d/fix-services-check-ringtail-docs.doc.md b/docs/changelog.d/fix-services-check-ringtail-docs.doc.md new file mode 100644 index 0000000..74d2067 --- /dev/null +++ b/docs/changelog.d/fix-services-check-ringtail-docs.doc.md @@ -0,0 +1 @@ +Update services-check and documentation to reflect Frigate, Mosquitto, and ntfy migration from indri minikube to ringtail k3s (PRs #216, #217). diff --git a/docs/explanation/architecture.md b/docs/explanation/architecture.md index 6c0d00b..f4872cc 100644 --- a/docs/explanation/architecture.md +++ b/docs/explanation/architecture.md @@ -1,6 +1,6 @@ --- title: Architecture -modified: 2026-02-09 +modified: 2026-02-19 last-reviewed: 2026-02-09 tags: - explanation @@ -15,7 +15,7 @@ How all the BlumeOps pieces fit together. ## Physical Layer -Two always-on devices form the infrastructure backbone: +Three always-on devices form the infrastructure backbone: ``` ┌─────────────────┐ ┌─────────────────┐ @@ -23,8 +23,13 @@ Two always-on devices form the infrastructure backbone: │ Mac Mini M1 │────▶│ Synology NAS │ │ (compute) │ │ (storage) │ └─────────────────┘ └─────────────────┘ - │ - │ Tailscale + │ ▲ + │ Tailscale │ NFS + │ ┌──────┴──────────┐ + │ │ Ringtail │ + │ │ NixOS PC │ + │ │ (GPU compute) │ + │ └─────────────────┘ ▼ ┌─────────────────┐ │ Gilbert │ @@ -33,7 +38,8 @@ Two always-on devices form the infrastructure backbone: └─────────────────┘ ``` -- **[[indri]]** runs all services (native and containerized) +- **[[indri]]** runs most services (native and containerized) +- **[[ringtail]]** runs GPU workloads (Frigate NVR) and related services (MQTT, ntfy) - **[[sifaka]]** provides bulk storage and backup targets - **[[gilbert]]** is the development workstation @@ -61,11 +67,13 @@ See [[routing]] for the full service URL table and port map. ## Compute Layer -Services run in two places on [[indri]]: +Services run across three compute targets: -**Native (Ansible)** — services that need host-level access run directly on macOS, managed via Ansible roles in `ansible/roles/`. See [[indri]] for the full list. +**Native on indri (Ansible)** — services that need host-level access run directly on macOS, managed via Ansible roles in `ansible/roles/`. See [[indri]] for the full list. -**Kubernetes (ArgoCD)** — most services run in minikube, managed via ArgoCD from `argocd/manifests/`. See [[apps]] for the application registry. +**Minikube on indri (ArgoCD)** — most services run in minikube, managed via ArgoCD from `argocd/manifests/`. See [[apps]] for the application registry. + +**K3s on ringtail (ArgoCD)** — GPU workloads and related services run on [[ringtail]]'s single-node k3s cluster. Frigate NVR uses the RTX 4080 for object detection; Mosquitto and ntfy support its alerting pipeline. ## Data Flow diff --git a/docs/how-to/plans/completed/operationalize-reolink-camera.md b/docs/how-to/plans/completed/operationalize-reolink-camera.md index f82ead3..c621fa8 100644 --- a/docs/how-to/plans/completed/operationalize-reolink-camera.md +++ b/docs/how-to/plans/completed/operationalize-reolink-camera.md @@ -1,6 +1,6 @@ --- title: "Plan: Operationalize ReoLink Camera" -modified: 2026-02-11 +modified: 2026-02-19 tags: - how-to - plans @@ -277,6 +277,10 @@ Camera settings to apply: enable RTSP and ONVIF, set "fluency first" encoding mo | `argocd/manifests/prometheus/configmap.yaml` | Prometheus scrape target config | | `docs/reference/storage/sifaka.md` | NFS export documentation | +## Post-Completion Update + +Frigate, Mosquitto, and ntfy were migrated from indri's minikube to [[ringtail]]'s k3s cluster with RTX 4080 GPU acceleration (PRs #216, #217). The ZMQ Apple Silicon Detector has been retired in favour of ONNX with CUDA execution provider. Object detection now runs on the GPU rather than CPU. + ## Related - [[add-unifi-pulumi-stack]] — network segmentation (IoT VLAN for camera) diff --git a/docs/reference/infrastructure/indri.md b/docs/reference/infrastructure/indri.md index 4d59da9..54465de 100644 --- a/docs/reference/infrastructure/indri.md +++ b/docs/reference/infrastructure/indri.md @@ -1,6 +1,6 @@ --- title: Indri -modified: 2026-02-09 +modified: 2026-02-19 tags: - infrastructure - host @@ -32,7 +32,7 @@ Primary BlumeOps server. Mac Mini M1 (2020). - [[caddy]] - Reverse proxy for `*.ops.eblu.me` **Kubernetes (via minikube):** -- [[apps|All k8s applications]] +- [[apps|Most k8s applications]] (Frigate, Mosquitto, ntfy migrated to [[ringtail]] k3s) **GUI Applications (manual start required):** - Docker Desktop - Container runtime for minikube diff --git a/docs/reference/infrastructure/ringtail.md b/docs/reference/infrastructure/ringtail.md index f6e0cc3..772566c 100644 --- a/docs/reference/infrastructure/ringtail.md +++ b/docs/reference/infrastructure/ringtail.md @@ -63,7 +63,13 @@ Sync order: `1password-connect-ringtail` -> `external-secrets-crds-ringtail` -> ### Workloads -No k8s workloads currently deployed. K3s is available for future workloads (e.g. Frigate, running nix-built containers). +| Workload | Namespace | Notes | +|----------|-----------|-------| +| [[frigate]] | `frigate` | NVR with GPU-accelerated detection (RTX 4080) | +| [[frigate]]-notify | `frigate` | MQTT-to-ntfy alert bridge | +| Mosquitto | `mqtt` | MQTT broker for Frigate events | +| [[ntfy]] | `ntfy` | Push notification server | +| nvidia-device-plugin | `nvidia-device-plugin` | Exposes GPU to pods via CDI + nvidia RuntimeClass | ### Manual Cluster Registration diff --git a/docs/reference/kubernetes/cluster.md b/docs/reference/kubernetes/cluster.md index ccab89d..e7e49bf 100644 --- a/docs/reference/kubernetes/cluster.md +++ b/docs/reference/kubernetes/cluster.md @@ -1,13 +1,13 @@ --- title: Cluster -modified: 2026-02-07 +modified: 2026-02-19 tags: - kubernetes --- # Kubernetes Cluster -Single-node Minikube cluster running on [[indri]]. +BlumeOps runs two Kubernetes clusters: a Minikube cluster on [[indri]] (most services) and a k3s cluster on [[ringtail]] (GPU workloads, MQTT, notifications). Both are managed by [[argocd]] on indri. ## Cluster Specifications @@ -33,6 +33,16 @@ Containerd uses [[zot]] as a pull-through cache at `host.minikube.internal:5050` Mirrors configured: `registry.ops.eblu.me`, `docker.io`, `ghcr.io`, `quay.io` +## K3s on Ringtail + +Single-node k3s cluster for workloads requiring amd64 or GPU access. See [[ringtail]] for cluster specs, workload list, and secrets management. + +| Property | Value | +|----------|-------| +| **Context** | `k3s-ringtail` | +| **API Server** | `https://ringtail.tail8d86e.ts.net:6443` | +| **Workloads** | Frigate (GPU), Mosquitto, ntfy, frigate-notify, nvidia-device-plugin | + ## Related - [[apps|Apps]] - ArgoCD applications diff --git a/docs/reference/reference.md b/docs/reference/reference.md index e1cacc0..97ada9f 100644 --- a/docs/reference/reference.md +++ b/docs/reference/reference.md @@ -1,6 +1,6 @@ --- title: Reference -modified: 2026-02-17 +modified: 2026-02-19 tags: - reference --- @@ -21,7 +21,7 @@ Individual service reference cards with URLs and configuration details. | [[caddy]] | Reverse proxy & TLS termination | indri | | [[1password]] | Secrets management | cloud + k8s | | [[forgejo]] | Git forge & CI/CD | indri | -| [[frigate]] | Network video recorder | k8s | +| [[frigate]] | Network video recorder | k8s (ringtail) | | [[grafana]] | Dashboards & visualization | k8s | | [[immich]] | Photo management | k8s | | [[jellyfin]] | Media server | indri | @@ -29,7 +29,7 @@ Individual service reference cards with URLs and configuration details. | [[loki]] | Log aggregation | k8s | | [[miniflux]] | RSS feed reader | k8s | | [[navidrome]] | Music streaming | k8s | -| [[ntfy]] | Push notifications | k8s | +| [[ntfy]] | Push notifications | k8s (ringtail) | | [[postgresql]] | Database cluster | k8s | | [[prometheus]] | Metrics collection | k8s | | [[teslamate]] | Tesla data logger | k8s | diff --git a/docs/reference/services/frigate.md b/docs/reference/services/frigate.md index 0e661fe..b5b597b 100644 --- a/docs/reference/services/frigate.md +++ b/docs/reference/services/frigate.md @@ -1,6 +1,6 @@ --- title: Frigate -modified: 2026-02-17 +modified: 2026-02-19 tags: - service - surveillance @@ -17,7 +17,7 @@ Open-source network video recorder (NVR) with object detection. Runs cloud-free | **URL** | https://nvr.ops.eblu.me | | **Tailscale URL** | https://nvr.tail8d86e.ts.net | | **Namespace** | `frigate` | -| **Image** | `ghcr.io/blakeblackshear/frigate:0.17.0-rc2-standard-arm64` | +| **Image** | `ghcr.io/blakeblackshear/frigate:0.17.0-rc2-tensorrt` | | **Upstream** | https://github.com/blakeblackshear/frigate | | **Manifests** | `argocd/manifests/frigate/` | @@ -30,7 +30,7 @@ ReoLink Camera (GableCam) Frigate pod (ringtail k3s) ├── go2rtc — RTSP restream proxy ├── FFmpeg — stream decoding - ├── detector — GPU-accelerated (RTX 4080, pending migration) + ├── detector — ONNX with CUDA (RTX 4080) ├── /media/frigate — NFS recordings (sifaka) └── /db — SQLite (local PVC) │ @@ -47,7 +47,7 @@ Camera credentials are stored in 1Password and synced via [[external-secrets]] t ## Detection -Object detection will use GPU-accelerated inference on [[ringtail]]'s RTX 4080 (migration pending). The previous Apple Silicon Detector on [[indri]] has been retired. +Object detection runs on [[ringtail]]'s RTX 4080 via the ONNX detector with CUDA execution provider. The model is YOLO-NAS-S (`yolo_nas_s.onnx`). The previous Apple Silicon Detector on [[indri]] has been retired. Two zones are configured: `driveway_entrance` (triggers review alerts for person/car) and `driveway` (triggers review detections). @@ -66,7 +66,7 @@ Two zones are configured: `driveway_entrance` (triggers review alerts for person |-------|---------|------| | `/media/frigate` | NFS PV on [[sifaka]] (`/volume1/frigate`) | 2 Ti | | `/db` | Local PVC (`frigate-database`) | SQLite | -| `/dev/shm` | Memory-backed `emptyDir` | 256 Mi | +| `/dev/shm` | Memory-backed `emptyDir` | 512 Mi | ## Alerting (frigate-notify) diff --git a/mise-tasks/services-check b/mise-tasks/services-check index 500851f..8ef7559 100755 --- a/mise-tasks/services-check +++ b/mise-tasks/services-check @@ -91,6 +91,14 @@ check_service "k3s" "ssh ringtail 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml k3s kube check_service "k3s-apiserver (remote)" "kubectl --context=k3s-ringtail get --raw /healthz" check_service "forgejo-runner" "ssh ringtail 'systemctl is-active gitea-runner-nix_container_builder.service'" +echo "" +echo "Ringtail k3s pods:" +check_service "mosquitto" "kubectl --context=k3s-ringtail -n mqtt get pods -l app=mosquitto -o jsonpath='{.items[0].status.phase}' | grep -q Running" +check_service "ntfy" "kubectl --context=k3s-ringtail -n ntfy get pods -l app=ntfy -o jsonpath='{.items[0].status.phase}' | grep -q Running" +check_service "frigate" "kubectl --context=k3s-ringtail -n frigate get pods -l app=frigate -o jsonpath='{.items[0].status.phase}' | grep -q Running" +check_service "frigate-notify" "kubectl --context=k3s-ringtail -n frigate get pods -l app=frigate-notify -o jsonpath='{.items[0].status.phase}' | grep -q Running" +check_service "nvidia-device-plugin" "kubectl --context=k3s-ringtail -n nvidia-device-plugin get pods -l app=nvidia-device-plugin -o jsonpath='{.items[0].status.phase}' | grep -q Running" + echo "" echo "Public services (via Fly.io):" check_http "Docs (public)" "https://docs.eblu.me/" @@ -102,17 +110,13 @@ echo "Database:" check_service "PostgreSQL (k8s)" "pg_isready -h pg.ops.eblu.me -p 5432" echo "" -echo "Kubernetes pods:" +echo "Indri minikube pods:" check_service "prometheus-0" "kubectl --context=minikube-indri -n monitoring get pod prometheus-0 -o jsonpath='{.status.phase}' | grep -q Running" check_service "loki-0" "kubectl --context=minikube-indri -n monitoring get pod loki-0 -o jsonpath='{.status.phase}' | grep -q Running" check_service "grafana" "kubectl --context=minikube-indri -n monitoring get pods -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].status.phase}' | grep -q Running" check_service "miniflux" "kubectl --context=minikube-indri -n miniflux get pods -l app=miniflux -o jsonpath='{.items[0].status.phase}' | grep -q Running" check_service "teslamate" "kubectl --context=minikube-indri -n teslamate get pods -l app=teslamate -o jsonpath='{.items[0].status.phase}' | grep -q Running" check_service "blumeops-pg" "kubectl --context=minikube-indri -n databases get pods -l cnpg.io/cluster=blumeops-pg -o jsonpath='{.items[0].status.phase}' | grep -q Running" -check_service "mosquitto" "kubectl --context=minikube-indri -n mqtt get pods -l app=mosquitto -o jsonpath='{.items[0].status.phase}' | grep -q Running" -check_service "ntfy" "kubectl --context=minikube-indri -n ntfy get pods -l app=ntfy -o jsonpath='{.items[0].status.phase}' | grep -q Running" -check_service "frigate" "kubectl --context=minikube-indri -n frigate get pods -l app=frigate -o jsonpath='{.items[0].status.phase}' | grep -q Running" -check_service "frigate-notify" "kubectl --context=minikube-indri -n frigate get pods -l app=frigate-notify -o jsonpath='{.items[0].status.phase}' | grep -q Running" echo "" echo "ArgoCD app sync status:"