Three changes to eliminate 502s during proxy deploys: 1. Start nginx after Tailscale connects (not before) so MagicDNS is always available when the first request arrives. This is the community-recommended pattern for Tailscale sidecars on Fly.io. 2. Switch deploy strategy to bluegreen — the old machine keeps serving traffic until the new one passes health checks, then Fly.io cuts over. Rolling deploys with a single machine always cause downtime. 3. Replace top-level [checks] with [[http_service.checks]]. Top-level checks only monitor; they don't gate traffic routing. Service-level checks tell the Fly Proxy to hold traffic until the app is ready. The sentinel file (/tmp/tailscale-ready) and nginx if-check are removed since nginx no longer starts before Tailscale. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
27 lines
915 B
Bash
27 lines
915 B
Bash
#!/bin/sh
|
|
set -e
|
|
|
|
# Connect to tailnet first — nginx needs MagicDNS for upstream resolution.
|
|
# With bluegreen deploys, the old machine serves traffic until this one is
|
|
# fully ready. Fly.io runs Firecracker microVMs that support TUN devices
|
|
# natively — no need for --tun=userspace-networking.
|
|
tailscaled --statedir=/var/lib/tailscale &
|
|
sleep 2
|
|
|
|
tailscale up --authkey="${TS_AUTHKEY}" --hostname=flyio-proxy
|
|
until tailscale status > /dev/null 2>&1; do sleep 1; done
|
|
echo "Tailscale connected"
|
|
|
|
# Start nginx — MagicDNS is available, health check passes immediately.
|
|
nginx -g "daemon off;" &
|
|
NGINX_PID=$!
|
|
echo "Nginx started"
|
|
|
|
# Start Alloy for observability (logs → Loki, metrics → Prometheus)
|
|
alloy run /etc/alloy/config.alloy \
|
|
--server.http.listen-addr=127.0.0.1:12345 \
|
|
--storage.path=/tmp/alloy-data &
|
|
echo "Alloy started"
|
|
|
|
# Block on nginx — container exits if nginx stops
|
|
wait $NGINX_PID
|