blumeops/fly/start.sh
Erich Blume fe0e913963
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m37s
Switch Fly proxy to upstream keepalive pools (#337)
## Summary

- Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools
- Reuses TLS connections through the Tailscale tunnel instead of handshaking per request
- Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS)

## Trade-off

DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this.

## Still TODO on this branch

- [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder)
- [ ] Docs pass
- [ ] Deploy from branch and verify latency improvement
- [ ] Changelog fragment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #337
2026-04-17 16:39:52 -07:00

45 lines
1.5 KiB
Bash

#!/bin/sh
set -e
# Connect to tailnet first — nginx needs MagicDNS for upstream resolution.
# With bluegreen deploys, the old machine serves traffic until this one is
# fully ready. Fly.io runs Firecracker microVMs that support TUN devices
# natively — no need for --tun=userspace-networking.
tailscaled --statedir=/var/lib/tailscale &
sleep 2
tailscale up --authkey="${TS_AUTHKEY}" --hostname=flyio-proxy
until tailscale status > /dev/null 2>&1; do sleep 1; done
echo "Tailscale connected"
# Wait for MagicDNS to be ready — upstream blocks resolve DNS at config
# load, so nginx will fail to start if MagicDNS can't resolve yet.
echo "Waiting for MagicDNS..."
until nslookup forge.tail8d86e.ts.net 100.100.100.100 > /dev/null 2>&1; do
sleep 1
done
echo "MagicDNS ready"
# Ensure fail2ban deny file exists before nginx starts
touch /etc/nginx/forge-deny.conf
# Start nginx — MagicDNS is available, upstreams resolved.
nginx -g "daemon off;" &
NGINX_PID=$!
echo "Nginx started"
# Start fail2ban for login brute-force protection.
# Non-fatal — nginx rate limiting is the primary defense; fail2ban is additive.
if fail2ban-server -b; then
echo "fail2ban started"
else
echo "WARNING: fail2ban failed to start (nginx rate limiting still active)"
fi
# Start Alloy for observability (logs → Loki, metrics → Prometheus)
alloy run /etc/alloy/config.alloy \
--server.http.listen-addr=127.0.0.1:12345 \
--storage.path=/tmp/alloy-data &
echo "Alloy started"
# Block on nginx — container exits if nginx stops
wait $NGINX_PID