blumeops/fly/start.sh
Erich Blume 959b6842bc
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m40s
Zero-downtime Fly.io deploys (#132)
## Summary
- Start nginx after Tailscale connects (community best practice for Tailscale sidecars)
- Switch to `bluegreen` deploy strategy — old machine serves until new one is healthy
- Replace top-level `[checks]` with `[[http_service.checks]]` — only service-level checks gate traffic routing ([confirmed by Fly.io staff](https://community.fly.io/t/clarifying-the-types-of-health-checks/20379))
- Remove sentinel file and nginx if-check (no longer needed)

Supersedes the approach in #131 — that helped (502 window dropped from ~30s to ~3s) but couldn't fully eliminate it because top-level checks don't gate routing and Fly.io's proxy sends traffic as soon as the port is reachable.

## Deployment and Testing
- [ ] Merge and `fly deploy` from `fly/` directory
- [ ] Verify deploy completes with zero 502s (watch `fly logs` and Grafana docs-apm)
- [ ] Confirm `fly checks list` shows the new service-level check passing

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/132
2026-02-09 11:34:19 -08:00

27 lines
915 B
Bash

#!/bin/sh
set -e
# Connect to tailnet first — nginx needs MagicDNS for upstream resolution.
# With bluegreen deploys, the old machine serves traffic until this one is
# fully ready. Fly.io runs Firecracker microVMs that support TUN devices
# natively — no need for --tun=userspace-networking.
tailscaled --statedir=/var/lib/tailscale &
sleep 2
tailscale up --authkey="${TS_AUTHKEY}" --hostname=flyio-proxy
until tailscale status > /dev/null 2>&1; do sleep 1; done
echo "Tailscale connected"
# Start nginx — MagicDNS is available, health check passes immediately.
nginx -g "daemon off;" &
NGINX_PID=$!
echo "Nginx started"
# Start Alloy for observability (logs → Loki, metrics → Prometheus)
alloy run /etc/alloy/config.alloy \
--server.http.listen-addr=127.0.0.1:12345 \
--storage.path=/tmp/alloy-data &
echo "Alloy started"
# Block on nginx — container exits if nginx stops
wait $NGINX_PID