Fix 502 errors during Fly.io proxy deploys #131

Merged
eblume merged 1 commit from fix/deploy-healthcheck-race into main 2026-02-09 11:07:37 -08:00
Owner

Summary

  • Health check (/healthz) now returns 503 until Tailscale is connected
  • start.sh creates /tmp/tailscale-ready sentinel after tailscale up succeeds
  • Fly.io keeps the old machine serving traffic during the ~7s startup window

Previously, nginx passed the health check immediately, Fly.io routed traffic to the new machine, but MagicDNS wasn't available yet — causing upstream DNS timeouts and 502s on every request until Tailscale connected.

Deployment and Testing

  • Merge and fly deploy from fly/ directory
  • Verify deploy completes with zero 502s (check Grafana docs-apm dashboard)
  • Confirm health check transitions from 503 → 200 in fly logs
## Summary - Health check (`/healthz`) now returns 503 until Tailscale is connected - `start.sh` creates `/tmp/tailscale-ready` sentinel after `tailscale up` succeeds - Fly.io keeps the old machine serving traffic during the ~7s startup window Previously, nginx passed the health check immediately, Fly.io routed traffic to the new machine, but MagicDNS wasn't available yet — causing upstream DNS timeouts and 502s on every request until Tailscale connected. ## Deployment and Testing - [ ] Merge and `fly deploy` from `fly/` directory - [ ] Verify deploy completes with zero 502s (check Grafana docs-apm dashboard) - [ ] Confirm health check transitions from 503 → 200 in `fly logs`
The health check returned 200 immediately on nginx start, before
Tailscale connected. Fly.io routed traffic to the new machine with
a cold proxy cache and no MagicDNS, causing upstream DNS timeouts.

Defer the health check by returning 503 until a sentinel file
(/tmp/tailscale-ready) is created after Tailscale connects. This
keeps the old machine serving traffic during the startup window.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
eblume merged commit bd61da4f85 into main 2026-02-09 11:07:37 -08:00
eblume referenced this pull request from a commit 2026-02-09 11:34:21 -08:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/blumeops!131
No description provided.