Erich Blume 0c52404ec5 C1: docs — add rotate-fly-deploy-token how-to

New rotation card documenting the 75-day cadence for the Fly.io API
token. Recommends `fly tokens create org` (single-org scope) over
`deploy` (single-app scope): both have effectively the same blast
radius for a single-app personal org, and `org` silences the
"Metrics token unavailable: ... context canceled" warning that
`fly status` emits when called with an app-scoped token.

Linked from manage-flyio-proxy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 16:49:22 -07:00

3.3 KiB

Raw Blame History

title

modified

last-reviewed

Manage Fly.io Proxy

Operational tasks for the flyio-proxy public reverse proxy.

Deploy Changes

After modifying files in fly/:

mise run fly-deploy

Pushes to fly/ on main also trigger automatic deployment via the Forgejo CI workflow.

Add a New Public Service

See expose-service-publicly#Per-service setup for the full walkthrough. In short:

Add a server block to fly/nginx.conf
Add a Fly.io certificate: fly certs add <domain> -a blumeops-proxy
Deploy: mise run fly-deploy
Verify against blumeops-proxy.fly.dev with a Host header
Add DNS CNAME via Pulumi: mise run dns-preview then mise run dns-up

Emergency Shutoff

If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network):

Level 1 — Stop the container (seconds, reversible):

mise run fly-shutoff
# or: fly scale count 0 -a blumeops-proxy --yes

All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with fly scale count 1 -a blumeops-proxy.

Level 2 — Revoke Tailscale access (seconds): Remove the flyio-proxy node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised.

Level 3 — Remove DNS (minutes to hours): Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff.

Level 1 is the primary response. It is a single command, takes effect in seconds, and is trivially reversible. Keep mise run fly-shutoff somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress.

Check Status

# App and machine status
fly status -a blumeops-proxy

# Live logs
fly logs -a blumeops-proxy

# Health check
curl -sf https://blumeops-proxy.fly.dev/healthz

# Certificate status
fly certs list -a blumeops-proxy

Rotate Tailscale Auth Key

The auth key expires every 90 days. To rotate:

Re-apply Pulumi to generate a new key: mise run tailnet-up
Re-run setup to stage the new secret: mise run fly-setup
Deploy to pick up the new secret: mise run fly-deploy

Rotate Fly.io API Token

See rotate-fly-deploy-token for the full rotation procedure (75-day cadence, org-scoped).

Troubleshooting

502 Bad Gateway on fresh deploy: MagicDNS may not be ready when nginx starts. The start.sh script polls nslookup before launching nginx, but if it still fails, check that tailscale status is healthy inside the container.

Health check failing: fly ssh console -a blumeops-proxy then curl localhost:8080/healthz to test locally.

TLS errors on custom domain: Check cert status with fly certs show <domain> -a blumeops-proxy. Certs auto-provision via Let's Encrypt and may take a few minutes.

High latency (>1s p50): Check if direct WireGuard peering is established: fly ssh console -a blumeops-proxy -C "tailscale ping indri". If it shows via DERP, the tunnel is relayed and latency will be 10-30s. See tailscale#Direct Peering vs DERP Relay for diagnosis.

flyio-proxy - Service reference card
expose-service-publicly - Full setup guide and architecture

3.3 KiB Raw Blame History