The public forge.eblu.me now black-holes /mirrors/ at the Fly edge
(AI-scraper mitigation), so the in-cluster ArgoCD repo-server got a 403
fetching the upstream operator manifest — leaving tailscale-operator and
tailscale-operator-ringtail in Unknown sync. Use forge.ops.eblu.me.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary
- Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools
- Reuses TLS connections through the Tailscale tunnel instead of handshaking per request
- Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS)
## Trade-off
DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this.
## Still TODO on this branch
- [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder)
- [ ] Docs pass
- [ ] Deploy from branch and verify latency improvement
- [ ] Changelog fragment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #337
v1.96.3 exists as a GitHub release but Docker Hub images for both
tailscale/tailscale and tailscale/k8s-operator haven't been published
yet (v1.94.2 is still latest). Revert the image tags; the fly/start.sh
`tailscale wait` improvement and review date stamps are retained.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Bump Tailscale operator, proxy containers, and init containers from v1.94.2 to v1.96.3 across both clusters (indri + ringtail via shared base kustomization)
- Replace hand-rolled `until tailscale status` polling loop in `fly/start.sh` with `tailscale wait --timeout 60s` (new in v1.96.2)
- Stamp kube-state-metrics review date (already current at v2.18.0)
## Notable upstream changes (v1.94.2 → v1.96.3)
- Go upgraded from 1.25 to 1.26
- `tailscale wait` command — blocks until daemon is running + interface has IP
- AuthKey policy now applies only when users are not logged in (behavioral change)
- Peer Relay improvements (metrics, EC2 IMDS, UDP socket scaling)
- UPnP stability fixes
## Deploy plan
1. Merge PR
2. Sync tailscale-operator on indri: `argocd app sync tailscale-operator`
3. Sync tailscale-operator on ringtail: `argocd app sync tailscale-operator-ringtail --server ringtail...`
4. Verify proxy pods roll with new image: `kubectl --context=minikube-indri -n tailscale get pods`
5. Verify ingress connectivity (spot-check a few `*.tail8d86e.ts.net` services)
6. Rebuild + deploy Fly proxy container (separate step, picks up `tailscale wait` change)
## Test plan
- [ ] ArgoCD diff looks clean for both apps before sync
- [ ] Proxy pods on indri come up healthy with v1.96.3 images
- [ ] Proxy pods on ringtail come up healthy with v1.96.3 images
- [ ] Tailscale ingress services remain reachable (e.g., grafana, prometheus)
- [ ] Fly proxy rebuild deploys successfully with `tailscale wait`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #304
## Summary
- Mirrors `tailscale/tailscale` on forge (`mirrors/tailscale`)
- Replaces vendored `operator.yaml` (495 KB / 5,386 lines) with ArgoCD apps sourcing the upstream static manifest, pinned via `targetRevision: v1.94.2`
- Adds `tailscale-operator-base` app for indri and `tailscale-operator-base-ringtail` for ringtail
- Local kustomization retains only ProxyClass and DNSConfig custom resources
- Updates `[[tailscale-operator]]` doc to reflect new sourcing
## Deployment and Testing
- [ ] Register `mirrors/tailscale` repo in ArgoCD (it needs to know about the new repo)
- [ ] Sync `apps` app to pick up the new `tailscale-operator-base` app definitions
- [ ] Sync `tailscale-operator-base` — verify CRDs, RBAC, operator Deployment come up
- [ ] Sync `tailscale-operator` — verify ProxyClass, DNSConfig still apply cleanly
- [ ] Verify existing Tailscale Ingresses still work (ProxyGroup pods healthy)
- [ ] Repeat for ringtail cluster
- [ ] After merge: apps already point at tags, no revision reset needed
Reviewed-on: #295
Bare image references in manifests were ambiguous — unclear whether the
tag was intentionally omitted or managed by kustomize. Add :kustomized
sentinel to all 37 image refs overridden by kustomize images transformer.
Add sync notes for tailscale-operator proxyclass (CRD fields not processed
by kustomize). Mark devpi reviewed (6.19.1 is current).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Move hardcoded image tags to kustomization.yaml `images:` transformer across **22 services** — image names in manifests become version-agnostic templates, with tags centralized in one place per service
- Replace hand-written ConfigMap manifests with `configMapGenerator:` in **12 services** — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes
- Create new `kustomization.yaml` for **forgejo-runner** and **nvidia-device-plugin** (switches ArgoCD from directory mode to kustomize mode, rendered output identical)
### Services modified
**Images only (8):** cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent
**Images + configMapGenerator (10):** alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus
**Images only, no configMapGenerator (4):** authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is)
**Skipped entirely (6):** argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml)
### What changes at deploy time
- **images:** — no functional diff, `kustomize build` produces identical output with tags
- **configMapGenerator:** — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change
## Test plan
- [x] `kubectl kustomize` builds all 30 service directories successfully
- [x] Image tags verified in rendered output for all modified services
- [x] ConfigMap hash suffixes verified in rendered output
- [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names
- [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.)
- [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes
- [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264