Switch Fly proxy to upstream keepalive pools (#337)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m37s
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m37s
## Summary - Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools - Reuses TLS connections through the Tailscale tunnel instead of handshaking per request - Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS) ## Trade-off DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this. ## Still TODO on this branch - [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder) - [ ] Docs pass - [ ] Deploy from branch and verify latency improvement - [ ] Changelog fragment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #337
This commit is contained in:
parent
54b1cee950
commit
fe0e913963
12 changed files with 229 additions and 102 deletions
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: Manage Fly.io Proxy
|
||||
modified: 2026-02-08
|
||||
last-reviewed: 2026-03-07
|
||||
modified: 2026-04-17
|
||||
last-reviewed: 2026-04-17
|
||||
tags:
|
||||
- how-to
|
||||
- fly-io
|
||||
|
|
@ -23,6 +23,16 @@ mise run fly-deploy
|
|||
|
||||
Pushes to `fly/` on main also trigger automatic deployment via the Forgejo CI workflow.
|
||||
|
||||
## Reload Nginx (Re-resolve Upstream DNS)
|
||||
|
||||
Nginx uses `upstream` blocks with keepalive connection pools. DNS is resolved at config load. If Tailscale Ingress pods get new IPs (restart, reschedule, minikube restart), reload nginx to re-resolve without a full redeploy:
|
||||
|
||||
```bash
|
||||
mise run fly-reload
|
||||
```
|
||||
|
||||
A Grafana alert fires when upstreams are unreachable, prompting this action. A full `fly-deploy` also re-resolves DNS (it replaces the container).
|
||||
|
||||
## Add a New Public Service
|
||||
|
||||
See [[expose-service-publicly#Per-service setup]] for the full walkthrough. In short:
|
||||
|
|
@ -78,12 +88,16 @@ The auth key expires every 90 days. To rotate:
|
|||
|
||||
## Troubleshooting
|
||||
|
||||
**502 Bad Gateway**: Check `fly logs` for nginx upstream errors. Verify the backend Tailscale service is running (`tailscale status` from inside the container via `fly ssh console`).
|
||||
**502 Bad Gateway after Tailscale Ingress restart**: Upstream DNS is stale. Run `mise run fly-reload` to re-resolve. This is the most common cause of 502s.
|
||||
|
||||
**502 Bad Gateway on fresh deploy**: MagicDNS may not be ready when nginx starts. The `start.sh` script polls `nslookup` before launching nginx, but if it still fails, check that `tailscale status` is healthy inside the container.
|
||||
|
||||
**Health check failing**: `fly ssh console -a blumeops-proxy` then `curl localhost:8080/healthz` to test locally.
|
||||
|
||||
**TLS errors on custom domain**: Check cert status with `fly certs show <domain> -a blumeops-proxy`. Certs auto-provision via Let's Encrypt and may take a few minutes.
|
||||
|
||||
**High latency (>1s p50)**: Likely lost keepalive — redeploy with `mise run fly-deploy`. Before the keepalive change (April 2026), per-request TLS handshakes through the WireGuard tunnel caused 35s+ p50 at >1 req/s.
|
||||
|
||||
## Related
|
||||
|
||||
- [[flyio-proxy]] - Service reference card
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue