Switch Fly proxy to upstream keepalive pools #337

Merged
eblume merged 6 commits from fly-proxy-keepalive into main 2026-04-17 16:39:52 -07:00

6 commits

Author SHA1 Message Date
5aa4cb403a Bump ProxyGroup ingress pod resource requests
Increase from 1m CPU / 1Mi memory to 100m CPU / 128Mi memory. The
ingress pods handle TLS termination for all 19 Tailscale Ingress
services — the previous minimal requests may have caused the scheduler
to deprioritize them under load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 16:30:49 -07:00
f5ba7f03aa Add Grafana alert for Fly proxy upstream unreachable (502 rate)
Fires when >50% of requests return 502 for 3+ minutes, indicating
stale upstream DNS after Tailscale Ingress pod restart. Alert message
includes the fix: mise run fly-reload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 16:08:12 -07:00
a700befd5b Docs: update proxy architecture for upstream keepalive
Update flyio-proxy, forgejo, routing, manage-flyio-proxy,
expose-service-publicly, and mise-tasks docs to reflect:

- Upstream keepalive pools replacing variable-based proxy_pass
- proxy_ssl_name requirement for upstream blocks
- MagicDNS readiness check in start.sh
- fly-reload task for DNS re-resolution
- Crawler mitigation (robots.txt, archive redirect, release caching)
- Forgejo /metrics endpoint and archive cleanup cron
- cv.eblu.me in routing and exposed services tables
- upstream_response_time histogram metric
- Changelog fragment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 16:04:54 -07:00
903db4079d Fix upstream keepalive: set proxy_ssl_name for correct SNI
With upstream blocks, nginx sends the block name as SNI instead of
the actual hostname. The Tailscale Ingress proxy needs the correct
SNI to route TLS connections. Add explicit proxy_ssl_name for each
upstream, and set Host header for docs/cv backends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:51:51 -07:00
1236d381eb Wait for MagicDNS readiness before starting nginx
Upstream blocks resolve DNS at config load. If MagicDNS isn't ready yet
(Tailscale just connected), nginx gets empty resolution and returns 502.
Poll nslookup until resolution works before launching nginx.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:47:21 -07:00
6a1d9cc0bf Switch Fly proxy to upstream keepalive pools
Replace per-request DNS resolution (variable-based proxy_pass) with
static upstream blocks and keepalive connection pools. This reuses
TLS connections through the Tailscale tunnel instead of handshaking
per request, which should significantly reduce latency at >1 req/s.

Trade-off: DNS is resolved at config load, not per-request. If
Tailscale Ingress pods get new IPs, run `mise run fly-reload` to
re-resolve.

Also adds mise-tasks/fly-reload for nginx config reload without
full redeploy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:42:57 -07:00