Switch Fly proxy to upstream keepalive pools (#337)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m37s
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m37s
## Summary - Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools - Reuses TLS connections through the Tailscale tunnel instead of handshaking per request - Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS) ## Trade-off DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this. ## Still TODO on this branch - [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder) - [ ] Docs pass - [ ] Deploy from branch and verify latency improvement - [ ] Changelog fragment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #337
This commit is contained in:
parent
54b1cee950
commit
fe0e913963
12 changed files with 229 additions and 102 deletions
|
|
@ -373,6 +373,66 @@ groups:
|
|||
type: and
|
||||
refId: C
|
||||
|
||||
- orgId: 1
|
||||
name: flyio-proxy-health
|
||||
folder: Infrastructure Alerts
|
||||
interval: 30s
|
||||
rules:
|
||||
- uid: flyio-upstream-unreachable
|
||||
title: FlyioUpstreamUnreachable
|
||||
condition: C
|
||||
for: 3m
|
||||
noDataState: OK
|
||||
execErrState: Alerting
|
||||
annotations:
|
||||
summary: >-
|
||||
Fly.io proxy returning elevated 502s — upstream DNS may be stale. Run: mise run fly-reload
|
||||
runbook_url: https://docs.eblu.me/how-to/operations/manage-flyio-proxy
|
||||
labels:
|
||||
severity: warning
|
||||
service: flyio-proxy
|
||||
data:
|
||||
- refId: A
|
||||
datasourceUid: prometheus
|
||||
relativeTimeRange:
|
||||
from: 300
|
||||
to: 0
|
||||
model:
|
||||
expr: >-
|
||||
sum(rate(flyio_nginx_http_requests_total{instance="flyio-proxy",status="502"}[5m]))
|
||||
/ sum(rate(flyio_nginx_http_requests_total{instance="flyio-proxy"}[5m]))
|
||||
> 0.5
|
||||
interval: ""
|
||||
refId: A
|
||||
- refId: B
|
||||
datasourceUid: "__expr__"
|
||||
relativeTimeRange:
|
||||
from: 0
|
||||
to: 0
|
||||
model:
|
||||
type: reduce
|
||||
expression: A
|
||||
reducer: last
|
||||
settings:
|
||||
mode: dropNN
|
||||
refId: B
|
||||
- refId: C
|
||||
datasourceUid: "__expr__"
|
||||
relativeTimeRange:
|
||||
from: 0
|
||||
to: 0
|
||||
model:
|
||||
type: threshold
|
||||
expression: B
|
||||
conditions:
|
||||
- evaluator:
|
||||
type: gt
|
||||
params:
|
||||
- 0
|
||||
operator:
|
||||
type: and
|
||||
refId: C
|
||||
|
||||
templates:
|
||||
- orgId: 1
|
||||
name: ntfy-infra
|
||||
|
|
|
|||
|
|
@ -21,5 +21,9 @@ spec:
|
|||
pod:
|
||||
tailscaleContainer:
|
||||
image: docker.io/tailscale/tailscale:v1.94.2
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
tailscaleInitContainer:
|
||||
image: docker.io/tailscale/tailscale:v1.94.2
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue