Recurring review sweep: 4 doc cards + nvidia-device-plugin v0.19.2 (#366)
Knocks out the two daily recurring review tasks (doc review + service review) in one PR. ## Doc review (4 never-reviewed reference cards, `last-reviewed: 2026-06-04`) - **cluster.md** — Kubernetes version v1.34.0 → **v1.35.0**; refreshed the stale ringtail workload list and noted the in-progress minikube→k3s migration (points to `[[ringtail]]` as the canonical list). - **ntfy.md / tempo.md / alloy.md** — corrected image references: these are now **locally-built `registry.ops.eblu.me/blumeops/*` nix containers** (ntfy v2.19.2, tempo v2.10.3, alloy-k8s v1.16.0), not upstream Docker Hub. Fly.io alloy binary bumped to v1.16.1. ## Service review - **nvidia-device-plugin** (ringtail GPU): v0.19.0 → **v0.19.2**. Upstream patch releases — CDI/Tegra fixes + dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup (the service-account change in the notes is helm-only). ## Not in this PR (need container rebuilds, deferred) The other stale services are locally-built nix images, so upgrading them is a forge-runner rebuild rather than a clean tag bump — left untouched (not date-bumped, so they resurface): **prometheus** (v3.10.0→v3.12.0), **loki** (3.6.7→3.7.2), **kube-state-metrics**, **homepage**. Happy to do these as a follow-up rebuild PR. ## Deploy / verify Not yet deployed — `nvidia-device-plugin` still points at `main`. After review: ``` argocd app set nvidia-device-plugin --revision reviews-jun4 && argocd app sync nvidia-device-plugin # after merge: argocd app set nvidia-device-plugin --revision main && argocd app sync nvidia-device-plugin ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #366
This commit is contained in:
parent
02ea1cc72a
commit
bb55fa9566
8 changed files with 21 additions and 13 deletions
|
|
@ -10,4 +10,4 @@ resources:
|
|||
|
||||
images:
|
||||
- name: nvcr.io/nvidia/k8s-device-plugin
|
||||
newTag: v0.19.0
|
||||
newTag: v0.19.2
|
||||
|
|
|
|||
1
docs/changelog.d/reviews-jun4.doc.md
Normal file
1
docs/changelog.d/reviews-jun4.doc.md
Normal file
|
|
@ -0,0 +1 @@
|
|||
Reviewed four never-reviewed reference cards (`cluster`, `ntfy`, `tempo`, `alloy`) and corrected drift: minikube is now Kubernetes v1.35.0; ntfy, tempo, and alloy-k8s images are now locally-built `registry.ops.eblu.me/blumeops/*` nix containers (v2.19.2, v2.10.3, v1.16.0) rather than upstream Docker Hub; the Fly.io alloy binary is v1.16.1; and the ringtail workload list reflects the in-progress minikube→k3s migration.
|
||||
1
docs/changelog.d/reviews-jun4.infra.md
Normal file
1
docs/changelog.d/reviews-jun4.infra.md
Normal file
|
|
@ -0,0 +1 @@
|
|||
Upgraded the nvidia-device-plugin on ringtail from v0.19.0 to v0.19.2 (upstream patch release: CDI/Tegra fixes and dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup).
|
||||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: Cluster
|
||||
modified: 2026-02-19
|
||||
modified: 2026-06-04
|
||||
last-reviewed: 2026-06-04
|
||||
tags:
|
||||
- kubernetes
|
||||
---
|
||||
|
|
@ -15,7 +16,7 @@ BlumeOps runs two Kubernetes clusters: a Minikube cluster on [[indri]] (most ser
|
|||
|----------|-------|
|
||||
| **Driver** | docker |
|
||||
| **Container Runtime** | docker |
|
||||
| **Kubernetes Version** | v1.34.0 |
|
||||
| **Kubernetes Version** | v1.35.0 |
|
||||
| **CPUs** | 6 |
|
||||
| **Memory** | 11GB |
|
||||
| **Disk** | 200GB |
|
||||
|
|
@ -41,7 +42,9 @@ Single-node k3s cluster for workloads requiring amd64 or GPU access. See [[ringt
|
|||
|----------|-------|
|
||||
| **Context** | `k3s-ringtail` |
|
||||
| **API Server** | `https://ringtail.tail8d86e.ts.net:6443` |
|
||||
| **Workloads** | Frigate (GPU), ntfy, frigate-notify, nvidia-device-plugin |
|
||||
| **Workloads** | GPU workloads (Frigate, Ollama), notifications (ntfy, frigate-notify), [[authentik]], and services migrated off indri minikube (Immich, Mealie, Paperless, TeslaMate). See [[ringtail]] for the authoritative list. |
|
||||
|
||||
Services are being progressively migrated from indri's minikube to ringtail's k3s; the split above reflects an in-progress state, not a fixed boundary.
|
||||
|
||||
## Related
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: Alloy
|
||||
modified: 2026-03-13
|
||||
modified: 2026-06-04
|
||||
last-reviewed: 2026-06-04
|
||||
tags:
|
||||
- service
|
||||
- observability
|
||||
|
|
@ -20,10 +21,10 @@ Unified observability collector for metrics and logs with three deployments:
|
|||
| **Indri Binary** | `~/.local/bin/alloy` |
|
||||
| **Indri Config** | `~/.config/grafana-alloy/config.alloy` |
|
||||
| **K8s Namespace** | `alloy` |
|
||||
| **K8s Image** | `grafana/alloy:v1.14.0` |
|
||||
| **K8s Image** | `registry.ops.eblu.me/blumeops/alloy:v1.16.0-9564435` (locally built) |
|
||||
| **ArgoCD App** | `alloy-k8s` |
|
||||
| **Fly.io Config** | `fly/alloy.river` |
|
||||
| **Fly.io Image** | `grafana/alloy:v1.5.1` (binary copied into nginx container) |
|
||||
| **Fly.io Image** | `grafana/alloy:v1.16.1` (binary copied into nginx container, sha-pinned) |
|
||||
|
||||
## Metrics Collected
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: Ntfy
|
||||
modified: 2026-02-17
|
||||
modified: 2026-06-04
|
||||
last-reviewed: 2026-06-04
|
||||
tags:
|
||||
- service
|
||||
- notifications
|
||||
|
|
@ -17,7 +18,7 @@ Self-hosted push notification service. Ntfy receives HTTP POST messages and deli
|
|||
| **URL** | https://ntfy.ops.eblu.me |
|
||||
| **Tailscale URL** | https://ntfy.tail8d86e.ts.net |
|
||||
| **Namespace** | `ntfy` |
|
||||
| **Image** | `binwiederhier/ntfy:v2.17.0` |
|
||||
| **Image** | `registry.ops.eblu.me/blumeops/ntfy:v2.19.2-fd0bebb-nix` (locally built) |
|
||||
| **Upstream** | https://github.com/binwiederhier/ntfy |
|
||||
| **Manifests** | `argocd/manifests/ntfy/` |
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
---
|
||||
title: Tempo
|
||||
modified: 2026-03-05
|
||||
modified: 2026-06-04
|
||||
last-reviewed: 2026-06-04
|
||||
tags:
|
||||
- service
|
||||
- observability
|
||||
|
|
@ -18,7 +19,7 @@ Distributed tracing backend for BlumeOps infrastructure. Receives traces via OTL
|
|||
| **Tailscale URL** | https://tempo.tail8d86e.ts.net |
|
||||
| **OTLP Endpoint** | https://tempo-otlp.tail8d86e.ts.net |
|
||||
| **Namespace** | `monitoring` |
|
||||
| **Image** | `grafana/tempo:2.10.1` |
|
||||
| **Image** | `registry.ops.eblu.me/blumeops/tempo:v2.10.3-75f9ba4` (locally built) |
|
||||
| **Storage** | 10Gi PVC (local filesystem) |
|
||||
| **Retention** | 7 days |
|
||||
|
||||
|
|
|
|||
|
|
@ -56,8 +56,8 @@ services:
|
|||
|
||||
- name: nvidia-device-plugin
|
||||
type: argocd
|
||||
last-reviewed: 2026-03-27
|
||||
current-version: "v0.19.0"
|
||||
last-reviewed: 2026-06-04
|
||||
current-version: "v0.19.2"
|
||||
upstream-source: https://github.com/NVIDIA/k8s-device-plugin/releases
|
||||
notes: DaemonSet + RuntimeClass on ringtail for GPU workloads
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue