blumeops/service-versions.yaml

431 lines
14 KiB
YAML
Raw Normal View History

2026-04-13 08:40:49 -07:00
# Service / Tooling/ Application Version Tracking
#
# Tracks when each BlumeOps service was last reviewed for version freshness.
# Used by `mise run service-review` to surface stale services.
#
# Fields:
# name - kebab-case service identifier
# type - argocd | ansible | nixos | fly | mise
# last-reviewed - date (YYYY-MM-DD) or null
# current-version - deployed version string or null
# upstream-source - URL to upstream releases/changelog
# notes - optional context
services:
- name: prometheus
type: argocd
last-reviewed: 2026-03-18
current-version: "v3.10.0"
upstream-source: https://github.com/prometheus/prometheus/releases
- name: loki
type: argocd
2026-03-20 16:10:19 -07:00
last-reviewed: 2026-03-20
current-version: "3.6.7"
upstream-source: https://github.com/grafana/loki/releases
- name: kube-state-metrics
type: argocd
Upgrade Tailscale operator v1.94.2 → v1.96.3 (#304) ## Summary - Bump Tailscale operator, proxy containers, and init containers from v1.94.2 to v1.96.3 across both clusters (indri + ringtail via shared base kustomization) - Replace hand-rolled `until tailscale status` polling loop in `fly/start.sh` with `tailscale wait --timeout 60s` (new in v1.96.2) - Stamp kube-state-metrics review date (already current at v2.18.0) ## Notable upstream changes (v1.94.2 → v1.96.3) - Go upgraded from 1.25 to 1.26 - `tailscale wait` command — blocks until daemon is running + interface has IP - AuthKey policy now applies only when users are not logged in (behavioral change) - Peer Relay improvements (metrics, EC2 IMDS, UDP socket scaling) - UPnP stability fixes ## Deploy plan 1. Merge PR 2. Sync tailscale-operator on indri: `argocd app sync tailscale-operator` 3. Sync tailscale-operator on ringtail: `argocd app sync tailscale-operator-ringtail --server ringtail...` 4. Verify proxy pods roll with new image: `kubectl --context=minikube-indri -n tailscale get pods` 5. Verify ingress connectivity (spot-check a few `*.tail8d86e.ts.net` services) 6. Rebuild + deploy Fly proxy container (separate step, picks up `tailscale wait` change) ## Test plan - [ ] ArgoCD diff looks clean for both apps before sync - [ ] Proxy pods on indri come up healthy with v1.96.3 images - [ ] Proxy pods on ringtail come up healthy with v1.96.3 images - [ ] Tailscale ingress services remain reachable (e.g., grafana, prometheus) - [ ] Fly proxy rebuild deploys successfully with `tailscale wait` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/304
2026-03-22 19:31:22 -07:00
last-reviewed: 2026-03-22
Bump mosquitto to 2.1.2 and tailscale-operator to v1.94.2 (#197) ## Summary - Pin mosquitto from floating `:2` tag to `2.1.2` (latest upstream, released Feb 9 2026) - Bump tailscale k8s-operator and proxy images from `v1.94.1` to `v1.94.2` - Record 7 reviewed services in `service-versions.yaml` (first service review pass) ## Services reviewed (11 total) | Service | Deployed | Latest | Status | |---------|----------|--------|--------| | prometheus | v3.9.1 | v3.9.1 | Current | | loki | 3.6.5 | 3.6.5 | Current | | kube-state-metrics | v2.18.0 | v2.18.0 | Current | | mosquitto | :2 (floating) | 2.1.2 | **Pinned in this PR** | | frigate | 0.16.4 | 0.16.4 | Current | | alloy-k8s | v1.13.1 | v1.13.1 | Current | | tailscale-operator | v1.94.1 | v1.94.2 | **Bumped in this PR** | | ntfy | v2.11.0 | v2.17.0 | Stale (future PR) | | frigate-notify | v0.3.5 | v0.5.4 | Stale (future PR) | | homepage | chart 2.1.0 | app v1.10.1 | Stale (future PR) | | grafana | chart 8.8.2 | chart 10.5.15 | Stale (future PR) | ## Deployment and Testing - [ ] `argocd app sync apps` - [ ] `argocd app set mosquitto --revision service-review/mosquitto-tailscale-operator && argocd app sync mosquitto` - [ ] `argocd app set tailscale-operator --revision service-review/mosquitto-tailscale-operator && argocd app sync tailscale-operator` - [ ] Verify mosquitto pod restarts with pinned image - [ ] Verify tailscale operator and proxy pods update - [ ] `mise run services-check` Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/197
2026-02-16 17:14:38 -08:00
current-version: "v2.18.0"
upstream-source: https://github.com/kubernetes/kube-state-metrics/releases
- name: ntfy
type: argocd
last-reviewed: 2026-03-23
current-version: "v2.19.2"
upstream-source: https://github.com/binwiederhier/ntfy/releases
- name: homepage
type: argocd
last-reviewed: 2026-03-26
current-version: "v1.11.0"
upstream-source: https://github.com/gethomepage/homepage/releases
Replace Homepage Helm chart with kustomize manifests and custom Dockerfile (#221) ## Summary - Replace third-party Helm chart (jameswynn/homepage v2.1.0, pinned at app v1.2.0) with plain kustomize manifests and a custom Dockerfile building from forge mirror at v1.10.1 - Adds Dockerfile (`containers/homepage/`) with multi-stage build (node:22-slim builder, node:22-alpine runtime) - Creates kustomize manifests: Deployment, Service, ConfigMap (6 config files), ServiceAccount, ClusterRole, ClusterRoleBinding - Keeps existing ingress-tailscale.yaml and all 6 ExternalSecret resources unchanged - Updates ArgoCD app definition from multi-source Helm to single directory source ## Prerequisite - Homepage source mirrored at forge.ops.eblu.me/eblume/homepage.git ✅ - Container must be built and pushed before syncing: `mise run container-release homepage v1.10.1` ## Deployment and Testing - [ ] Build and push container image: `mise run container-release homepage v1.10.1` - [ ] Branch-test via ArgoCD: `argocd app set homepage --revision feature/homepage-kustomize && argocd app sync homepage` - [ ] Verify dashboard loads at go.ops.eblu.me / go.tail8d86e.ts.net - [ ] Verify k8s autodiscovery works (services appear on dashboard) - [ ] Verify widgets load (weather, Forgejo, Jellyfin, etc.) - [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/221
2026-02-19 18:29:19 -08:00
notes: Custom container, kustomize manifests
Port Frigate NVR to ringtail k3s with GPU acceleration (#217) ## Summary - Enable NVIDIA container toolkit on ringtail NixOS and configure k3s containerd with nvidia runtime - Add NVIDIA device plugin ArgoCD app (RuntimeClass + DaemonSet) to expose `nvidia.com/gpu` resources - Re-target Frigate from indri minikube (arm64, ZMQ detector) to ringtail k3s (x86_64, TensorRT/ONNX) - Switch Frigate image to `-tensorrt` variant with GPU resource limits and increased shared memory ## Manual Prerequisites 1. **NFS access**: Verify ringtail can mount `sifaka:/volume1/frigate` ```fish ssh ringtail 'sudo mount -t nfs sifaka:/volume1/frigate /mnt/storage1 && ls /mnt/storage1 && sudo umount /mnt/storage1' ``` 2. **YOLO model**: Verify `/volume1/frigate/models/yolov9m.onnx` exists on sifaka ## Deployment Steps 1. Provision ringtail: `mise run provision-ringtail` 2. Sync ArgoCD apps: `argocd app sync apps --prune` 3. Deploy NVIDIA device plugin: `argocd app sync nvidia-device-plugin` 4. Verify GPU: `kubectl --context=k3s-ringtail get nodes -o json | jq '.items[].status.capacity'` 5. Deploy Frigate: `argocd app sync frigate` ## Verification - [ ] `nvidia.com/gpu: 1` visible in node capacity - [ ] Frigate pod running with GPU allocated - [ ] Frigate UI loads at `https://nvr.ops.eblu.me` - [ ] Detector shows ONNX/TensorRT on System page - [ ] Camera feed with bounding boxes in live view - [ ] TensorRT engine build completes (watch logs on first start) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/217
2026-02-19 14:27:04 -08:00
- name: nvidia-device-plugin
type: argocd
last-reviewed: 2026-03-27
current-version: "v0.19.0"
Port Frigate NVR to ringtail k3s with GPU acceleration (#217) ## Summary - Enable NVIDIA container toolkit on ringtail NixOS and configure k3s containerd with nvidia runtime - Add NVIDIA device plugin ArgoCD app (RuntimeClass + DaemonSet) to expose `nvidia.com/gpu` resources - Re-target Frigate from indri minikube (arm64, ZMQ detector) to ringtail k3s (x86_64, TensorRT/ONNX) - Switch Frigate image to `-tensorrt` variant with GPU resource limits and increased shared memory ## Manual Prerequisites 1. **NFS access**: Verify ringtail can mount `sifaka:/volume1/frigate` ```fish ssh ringtail 'sudo mount -t nfs sifaka:/volume1/frigate /mnt/storage1 && ls /mnt/storage1 && sudo umount /mnt/storage1' ``` 2. **YOLO model**: Verify `/volume1/frigate/models/yolov9m.onnx` exists on sifaka ## Deployment Steps 1. Provision ringtail: `mise run provision-ringtail` 2. Sync ArgoCD apps: `argocd app sync apps --prune` 3. Deploy NVIDIA device plugin: `argocd app sync nvidia-device-plugin` 4. Verify GPU: `kubectl --context=k3s-ringtail get nodes -o json | jq '.items[].status.capacity'` 5. Deploy Frigate: `argocd app sync frigate` ## Verification - [ ] `nvidia.com/gpu: 1` visible in node capacity - [ ] Frigate pod running with GPU allocated - [ ] Frigate UI loads at `https://nvr.ops.eblu.me` - [ ] Detector shows ONNX/TensorRT on System page - [ ] Camera feed with bounding boxes in live view - [ ] TensorRT engine build completes (watch logs on first start) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/217
2026-02-19 14:27:04 -08:00
upstream-source: https://github.com/NVIDIA/k8s-device-plugin/releases
notes: DaemonSet + RuntimeClass on ringtail for GPU workloads
- name: frigate
type: argocd
last-reviewed: 2026-03-24
current-version: "0.17.1"
upstream-source: https://github.com/blakeblackshear/frigate/releases
- name: frigate-notify
type: argocd
last-reviewed: 2026-03-28
current-version: "v0.5.4"
upstream-source: https://github.com/0x2142/frigate-notify/releases
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
- name: tempo
type: argocd
last-reviewed: 2026-04-02
current-version: "2.10.3"
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
upstream-source: https://github.com/grafana/tempo/releases
notes: Home-built container from forge mirror
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
- name: alloy-tracing-ringtail
type: argocd
last-reviewed: 2026-03-13
current-version: "v1.14.0"
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
upstream-source: https://github.com/grafana/alloy/releases
notes: Privileged DaemonSet with Beyla eBPF for HTTP tracing on ringtail
- name: alloy-ringtail
type: argocd
last-reviewed: 2026-03-13
current-version: "v1.14.0"
upstream-source: https://github.com/grafana/alloy/releases
notes: DaemonSet on ringtail for host metrics and pod logs
- name: alloy-k8s
type: argocd
last-reviewed: 2026-03-13
current-version: "v1.14.0"
upstream-source: https://github.com/grafana/alloy/releases
- name: tailscale-operator
type: argocd
Upgrade Tailscale operator v1.94.2 → v1.96.3 (#304) ## Summary - Bump Tailscale operator, proxy containers, and init containers from v1.94.2 to v1.96.3 across both clusters (indri + ringtail via shared base kustomization) - Replace hand-rolled `until tailscale status` polling loop in `fly/start.sh` with `tailscale wait --timeout 60s` (new in v1.96.2) - Stamp kube-state-metrics review date (already current at v2.18.0) ## Notable upstream changes (v1.94.2 → v1.96.3) - Go upgraded from 1.25 to 1.26 - `tailscale wait` command — blocks until daemon is running + interface has IP - AuthKey policy now applies only when users are not logged in (behavioral change) - Peer Relay improvements (metrics, EC2 IMDS, UDP socket scaling) - UPnP stability fixes ## Deploy plan 1. Merge PR 2. Sync tailscale-operator on indri: `argocd app sync tailscale-operator` 3. Sync tailscale-operator on ringtail: `argocd app sync tailscale-operator-ringtail --server ringtail...` 4. Verify proxy pods roll with new image: `kubectl --context=minikube-indri -n tailscale get pods` 5. Verify ingress connectivity (spot-check a few `*.tail8d86e.ts.net` services) 6. Rebuild + deploy Fly proxy container (separate step, picks up `tailscale wait` change) ## Test plan - [ ] ArgoCD diff looks clean for both apps before sync - [ ] Proxy pods on indri come up healthy with v1.96.3 images - [ ] Proxy pods on ringtail come up healthy with v1.96.3 images - [ ] Tailscale ingress services remain reachable (e.g., grafana, prometheus) - [ ] Fly proxy rebuild deploys successfully with `tailscale wait` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/304
2026-03-22 19:31:22 -07:00
last-reviewed: 2026-03-22
current-version: "v1.94.2"
upstream-source: https://github.com/tailscale/tailscale/releases
- name: grafana
type: argocd
last-reviewed: 2026-04-02
current-version: "12.4.2"
upstream-source: https://github.com/grafana/grafana/releases
notes: Home-built container from Alpine; upgraded from Helm to Kustomize
- name: grafana-sidecar
type: argocd
parent: grafana
last-reviewed: "2026-04-13"
current-version: "2.6.0"
upstream-source: https://github.com/kiwigrid/k8s-sidecar/releases
notes: Dashboard ConfigMap watcher sidecar in grafana deployment
- name: cloudnative-pg
type: argocd
last-reviewed: 2026-03-28
current-version: "v1.28.1"
upstream-source: https://github.com/cloudnative-pg/cloudnative-pg/releases
notes: Deployed via Helm chart (chart v0.27.1 from forge mirror)
- name: immich
type: argocd
last-reviewed: 2026-04-04
current-version: "v2.6.3"
upstream-source: https://github.com/immich-app/immich/releases
notes: Kustomize manifests with upstream images
- name: external-secrets
type: argocd
last-reviewed: 2026-03-25
current-version: "v2.2.0"
upstream-source: https://github.com/external-secrets/external-secrets/releases
notes: Static kustomize manifests rendered from upstream Helm chart
- name: 1password-connect
type: argocd
last-reviewed: 2026-04-06
current-version: "1.8.2"
upstream-source: https://hub.docker.com/r/1password/connect-api/tags
notes: Kustomize manifests rendered from connect-helm-charts v2.4.1
- name: argocd
type: argocd
last-reviewed: 2026-04-07
current-version: "v3.3.6"
upstream-source: https://github.com/argoproj/argo-cd/releases
notes: Kustomize-based install with ServerSideApply
- name: blumeops-pg
type: argocd
last-reviewed: 2026-03-28
current-version: "18.3"
upstream-source: https://github.com/cloudnative-pg/cloudnative-pg/releases
notes: CloudNativePG Cluster resource; pinned to PG minor version
- name: authentik
type: argocd
last-reviewed: "2026-04-08"
current-version: "2026.2.2"
upstream-source: https://github.com/goauthentik/authentik/releases
- name: authentik-redis
type: argocd
parent: authentik
last-reviewed: "2026-03-24"
current-version: "8.2.3"
upstream-source: https://github.com/redis/redis/releases
notes: >-
Attached service: Redis cache/broker for Authentik (sessions, Celery task
queue, caching). Nix-built container from nixpkgs with version assertion.
2026-03-02 20:39:51 -08:00
- name: ollama
type: argocd
last-reviewed: "2026-04-09"
current-version: "0.20.4"
2026-03-02 20:39:51 -08:00
upstream-source: https://github.com/ollama/ollama/releases
notes: LLM inference server on ringtail (GPU); upstream container image
- name: navidrome
type: argocd
Native Dagger container builds + Navidrome v0.61.1 (#330) ## Summary - Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops` - Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step - Migrate navidrome as the first container (`containers/navidrome/container.py`) - Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding) - Add `dagger call container-version` for CI version extraction without Dockerfile parsing - All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode - Legacy `docker_build()` fallback preserved for all other containers ## Motivation When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline. ## Container build dispatch needed After merge, dispatch container build for navidrome: ``` mise run container-build-and-release navidrome --ref 470b4bd ``` ## Deploy steps 1. Wait for container build to complete 2. Back up navidrome-data PVC (non-reversible DB migrations) 3. `argocd app set navidrome --revision main && argocd app sync navidrome` 4. Verify at https://dj.ops.eblu.me ## Future Remaining containers migrate incrementally in follow-up PRs using the same pattern. Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/330
2026-04-11 17:11:56 -07:00
last-reviewed: 2026-04-11
current-version: "v0.61.1"
upstream-source: https://github.com/navidrome/navidrome/releases
- name: miniflux
type: argocd
last-reviewed: 2026-04-12
current-version: "2.2.19"
upstream-source: https://github.com/miniflux/v2/releases
- name: teslamate
type: argocd
last-reviewed: 2026-04-14
current-version: "v3.0.0"
upstream-source: https://github.com/teslamate-org/teslamate/releases
- name: transmission
type: argocd
last-reviewed: 2026-04-15
current-version: "4.1.1-r1"
upstream-source: https://github.com/transmission/transmission/releases
Replace transmission-exporter with homegrown Python exporter (#283) ## Summary - Replace unmaintained `metalmatze/transmission-exporter:master` sidecar with a homegrown Python exporter - Uses `prometheus_client` + `transmission-rpc` with collect-on-scrape pattern (fresh metrics per scrape, no stale labels) - Same metric names so existing Grafana Transmission dashboard works unchanged - Container built with `uv` for dependency management, follows `grafana-sidecar` Dockerfile pattern ## Changes - **New:** `containers/transmission-exporter/exporter.py` — single-file exporter (~130 lines) - **New:** `containers/transmission-exporter/Dockerfile` — multi-stage Alpine build with uv - **Modified:** `argocd/manifests/torrent/deployment.yaml` — swap sidecar image reference - **Modified:** `argocd/manifests/torrent/kustomization.yaml` — add image tag entry - **Modified:** `service-versions.yaml` — add transmission-exporter entry ## Deployment and Testing - [ ] Build container: `mise run container-build-and-release transmission-exporter` - [ ] Update kustomization.yaml newTag with build SHA - [ ] Branch deploy: `argocd app set torrent --revision feature/transmission-exporter-python && argocd app sync torrent` - [ ] Verify metrics: `kubectl -n torrent --context=minikube-indri port-forward svc/transmission 19091:19091` then `curl localhost:19091/metrics | grep transmission_` - [ ] Verify Grafana Transmission dashboard panels populate - [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent` Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/283
2026-03-04 21:55:00 -08:00
- name: transmission-exporter
type: argocd
last-reviewed: 2026-04-15
current-version: "1.0.1"
Replace transmission-exporter with homegrown Python exporter (#283) ## Summary - Replace unmaintained `metalmatze/transmission-exporter:master` sidecar with a homegrown Python exporter - Uses `prometheus_client` + `transmission-rpc` with collect-on-scrape pattern (fresh metrics per scrape, no stale labels) - Same metric names so existing Grafana Transmission dashboard works unchanged - Container built with `uv` for dependency management, follows `grafana-sidecar` Dockerfile pattern ## Changes - **New:** `containers/transmission-exporter/exporter.py` — single-file exporter (~130 lines) - **New:** `containers/transmission-exporter/Dockerfile` — multi-stage Alpine build with uv - **Modified:** `argocd/manifests/torrent/deployment.yaml` — swap sidecar image reference - **Modified:** `argocd/manifests/torrent/kustomization.yaml` — add image tag entry - **Modified:** `service-versions.yaml` — add transmission-exporter entry ## Deployment and Testing - [ ] Build container: `mise run container-build-and-release transmission-exporter` - [ ] Update kustomization.yaml newTag with build SHA - [ ] Branch deploy: `argocd app set torrent --revision feature/transmission-exporter-python && argocd app sync torrent` - [ ] Verify metrics: `kubectl -n torrent --context=minikube-indri port-forward svc/transmission 19091:19091` then `curl localhost:19091/metrics | grep transmission_` - [ ] Verify Grafana Transmission dashboard panels populate - [ ] After merge: `argocd app set torrent --revision main && argocd app sync torrent` Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/283
2026-03-04 21:55:00 -08:00
upstream-source: null
notes: Homegrown Python exporter, no upstream
- name: kiwix
type: argocd
last-reviewed: 2026-04-17
current-version: "3.8.2"
upstream-source: https://github.com/kiwix/kiwix-tools/releases
- name: devpi
Migrate devpi from minikube to indri (launchd) (#341) ## Summary Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent. ## What changed - **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box). - **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`. - **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role. - **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`. - **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted. - **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list. ## Already deployed (live on indri) - Service running: `launchctl list mcquack.eblume.devpi` → PID 53888 - `curl https://pypi.ops.eblu.me/+api` returns 200 ✅ - `mise run docs-mikado` works again ✅ - 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/` - Minikube namespace and PVC fully reclaimed ## Test plan - [ ] `mise run services-check` (after merge) - [ ] CI workflows that use devpi succeed - [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines) ## Context This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain. Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/341
2026-04-29 13:38:36 -07:00
type: ansible
last-reviewed: 2026-04-29
current-version: "6.19.3"
upstream-source: https://github.com/devpi/devpi/releases
Migrate devpi from minikube to indri (launchd) (#341) ## Summary Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent. ## What changed - **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box). - **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`. - **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role. - **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`. - **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted. - **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list. ## Already deployed (live on indri) - Service running: `launchctl list mcquack.eblume.devpi` → PID 53888 - `curl https://pypi.ops.eblu.me/+api` returns 200 ✅ - `mise run docs-mikado` works again ✅ - 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/` - Minikube namespace and PVC fully reclaimed ## Test plan - [ ] `mise run services-check` (after merge) - [ ] CI workflows that use devpi succeed - [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines) ## Context This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain. Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/341
2026-04-29 13:38:36 -07:00
notes: Installed via uv into a venv on indri; version pinned in ansible/roles/devpi/defaults/main.yml
- name: cv
type: argocd
last-reviewed: 2026-04-27
current-version: "1.0.3"
upstream-source: https://forge.eblu.me/eblume/cv
notes: Personal static site; review build deps (WeasyPrint, Jinja2) in source repo
- name: docs
type: argocd
last-reviewed: 2026-03-07
current-version: "1.28.2"
upstream-source: https://github.com/jackyzha0/quartz/releases
notes: Quartz static site generator; container version tracks nginx base
- name: forgejo-runner
type: argocd
last-reviewed: 2026-04-20
current-version: "12.8.2"
upstream-source: https://code.forgejo.org/forgejo/runner/releases
notes: >-
Runner daemon version (code.forgejo.org/forgejo/runner). Job execution
image is tracked separately as runner-job-image.
- name: runner-job-image
type: argocd
last-reviewed: 2026-04-21
current-version: "0.20.6"
upstream-source: https://github.com/dagger/dagger/releases
notes: >-
Forgejo Actions job execution image. CONTAINER_APP_VERSION tracks the
Dagger CLI version, the primary build tool in the image.
- name: nix-container-builder
type: nixos
last-reviewed: 2026-04-01
current-version: "12.7.2"
upstream-source: https://code.forgejo.org/forgejo/runner/releases
notes: >-
Forgejo runner on ringtail; pinned via nixpkgs-services overlay in flake.nix.
Update nixpkgs-services rev during service reviews, not via nix flake update.
- name: snowflake-proxy
type: nixos
last-reviewed: 2026-04-01
current-version: "2.11.0"
upstream-source: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/releases
notes: >-
Tor Snowflake proxy on ringtail; pinned via nixpkgs-services overlay in flake.nix.
Anti-censorship bridge, not an exit node.
- name: k3s
type: nixos
last-reviewed: 2026-04-01
current-version: "1.34.5+k3s1"
upstream-source: https://github.com/k3s-io/k3s/releases
notes: >-
Single-node k3s cluster on ringtail; pinned via nixpkgs-services overlay in flake.nix.
Update nixpkgs-services rev during service reviews.
- name: minikube
type: ansible
last-reviewed: 2026-04-01
current-version: "1.38.0"
upstream-source: https://github.com/kubernetes/minikube/releases
notes: >-
Single-node minikube on indri; installed via homebrew (not version-pinned).
Homebrew may silently upgrade on brew update/upgrade.
- name: mealie
type: argocd
last-reviewed: 2026-03-16
current-version: "v3.12.0"
upstream-source: https://github.com/mealie-recipes/mealie/releases
notes: Recipe manager; built from source via forge mirror
2026-04-08 17:54:12 -07:00
- name: paperless
type: argocd
last-reviewed: "2026-04-08"
current-version: "v2.20.13"
upstream-source: https://github.com/paperless-ngx/paperless-ngx/releases
notes: Document management; built from source via forge mirror
- name: unpoller
type: argocd
last-reviewed: 2026-03-16
current-version: "v2.34.0"
upstream-source: https://github.com/unpoller/unpoller/releases
notes: UniFi metrics exporter for Prometheus
- name: prowler
type: argocd
last-reviewed: 2026-04-14
current-version: "5.23.0"
upstream-source: https://github.com/prowler-cloud/prowler/releases
notes: CIS Kubernetes Benchmark scanner; weekly CronJob on minikube-indri
- name: kingfisher
type: argocd
last-reviewed: 2026-03-29
current-version: "165768b"
upstream-source: https://github.com/mongodb/kingfisher/releases
notes: Secret scanner; sporked from upstream with --clone-url-base patch. Version is upstream main SHA.
- name: forgejo
type: ansible
last-reviewed: 2026-03-28
current-version: "14.0.3"
upstream-source: https://codeberg.org/forgejo/forgejo/releases
notes: Built from source on indri (~/code/3rd/forgejo)
- name: alloy
type: ansible
last-reviewed: 2026-03-13
current-version: "v1.14.0"
upstream-source: https://github.com/grafana/alloy/releases
notes: Built from source on indri
- name: zot
type: ansible
last-reviewed: 2026-03-14
current-version: "v2.1.15"
upstream-source: https://github.com/project-zot/zot/releases
notes: Built from source on indri
- name: caddy
type: ansible
last-reviewed: 2026-03-15
current-version: "v2.11.2"
upstream-source: https://github.com/caddyserver/caddy/releases
notes: Built from source with Gandi DNS and Layer 4 plugins
- name: borgmatic
type: ansible
last-reviewed: 2026-04-15
current-version: "2.1.4"
upstream-source: https://github.com/borgmatic-collective/borgmatic/releases
notes: Installed via mise (pipx); version pinned in ansible/roles/borgmatic/defaults/main.yml and mise.toml
- name: jellyfin
type: ansible
last-reviewed: 2026-03-17
current-version: "10.11.6"
upstream-source: https://github.com/jellyfin/jellyfin/releases
- name: automounter
type: ansible
last-reviewed: 2026-03-17
current-version: "1.11.0"
upstream-source: https://www.pixeleyes.co.nz/automounter/
notes: Mac App Store app, no Ansible role. Updates via App Store.
- name: flyio-tailscale
type: fly
last-reviewed: "2026-04-10"
current-version: "v1.94.1"
upstream-source: https://github.com/tailscale/tailscale/releases
notes: >-
Pinned after v1.96.5 broke MagicDNS in containers. Test DNS resolution
inside Fly container before upgrading. COPY --from in fly/Dockerfile.
- name: flyio-nginx
type: fly
last-reviewed: "2026-04-10"
current-version: "1.29.6-alpine"
upstream-source: https://hub.docker.com/_/nginx
notes: Base image for Fly proxy (fly/Dockerfile)
- name: flyio-alloy
type: fly
parent: flyio-nginx
last-reviewed: "2026-04-10"
current-version: "v1.14.1"
upstream-source: https://github.com/grafana/alloy/releases
notes: COPY --from in fly/Dockerfile for log shipping and metrics
- name: dagger
type: mise
last-reviewed: 2026-04-21
current-version: "0.20.6"
upstream-source: https://github.com/dagger/dagger/releases
notes: Dagger CI/CD engine; pinned in mise.toml
- name: ansible-core
type: mise
last-reviewed: 2026-04-12
current-version: "2.20.1"
upstream-source: https://github.com/ansible/ansible/releases
notes: Installed via pipx/uvx with botocore and boto3
- name: prek
type: mise
last-reviewed: 2026-04-12
current-version: "0.3.4"
upstream-source: https://github.com/j178/prek/releases
notes: Pre-commit hook runner (Rust reimplementation)
- name: pulumi-cli
type: mise
last-reviewed: 2026-04-12
current-version: "3.215.0"
upstream-source: https://github.com/pulumi/pulumi/releases
notes: IaC CLI for tailscale and gandi stacks
- name: ty
type: mise
last-reviewed: 2026-04-12
current-version: "0.0.29"
upstream-source: https://github.com/astral-sh/ty/releases
notes: Astral Python typechecker (beta); prek hook