blumeops/docs/how-to/operations/manage-flyio-proxy.md

96 lines
3.3 KiB
Markdown
Raw Normal View History

---
title: Manage Fly.io Proxy
modified: 2026-04-18
last-reviewed: 2026-04-18
tags:
- how-to
- fly-io
- networking
- operations
---
# Manage Fly.io Proxy
Operational tasks for the [[flyio-proxy]] public reverse proxy.
## Deploy Changes
After modifying files in `fly/`:
```bash
mise run fly-deploy
```
Pushes to `fly/` on main also trigger automatic deployment via the Forgejo CI workflow.
## Add a New Public Service
See [[expose-service-publicly#Per-service setup]] for the full walkthrough. In short:
1. Add a `server` block to `fly/nginx.conf`
2. Add a Fly.io certificate: `fly certs add <domain> -a blumeops-proxy`
3. Deploy: `mise run fly-deploy`
4. Verify against `blumeops-proxy.fly.dev` with a `Host` header
5. Add DNS CNAME via Pulumi: `mise run dns-preview` then `mise run dns-up`
## Emergency Shutoff
If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network):
**Level 1 — Stop the container (seconds, reversible):**
```bash
mise run fly-shutoff
# or: fly scale count 0 -a blumeops-proxy --yes
```
All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with `fly scale count 1 -a blumeops-proxy`.
**Level 2 — Revoke Tailscale access (seconds):**
Remove the `flyio-proxy` node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised.
**Level 3 — Remove DNS (minutes to hours):**
Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff.
**Level 1 is the primary response.** It is a single command, takes effect in seconds, and is trivially reversible. Keep `mise run fly-shutoff` somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress.
## Check Status
```bash
# App and machine status
fly status -a blumeops-proxy
# Live logs
fly logs -a blumeops-proxy
# Health check
curl -sf https://blumeops-proxy.fly.dev/healthz
# Certificate status
fly certs list -a blumeops-proxy
```
## Rotate Tailscale Auth Key
The auth key expires every 90 days. To rotate:
1. Re-apply Pulumi to generate a new key: `mise run tailnet-up`
2. Re-run setup to stage the new secret: `mise run fly-setup`
3. Deploy to pick up the new secret: `mise run fly-deploy`
C1: SHA-pin tooling dependencies (2026-04 cycle) (#344) ## Summary Monthly tooling dependency refresh, with a one-time conversion from version-tag pins (`rev = "vX.Y.Z"`, `image:tag`, `>=`) to SHA / digest pins everywhere. ## Changes - **prek hooks**: all `rev = "vX.Y.Z"` → commit SHA + `# vX.Y.Z` comment. Bumped trufflehog (3.94.0→3.95.2), kingfisher (1.91.0→1.97.0), ruff (0.15.7→0.15.12), shfmt (3.13.0→3.13.1), prettier (3.8.1→3.8.3), actionlint (1.7.11→1.7.12). - **fly/Dockerfile**: tag pins → `image@sha256:...` digest pins. Bumped nginx (1.29.6→1.30.0-alpine), tailscale (v1.94.1→v1.94.2 — still inside the safe pre-1.96.5 range), alloy (v1.14.1→v1.16.0). - **mise-tasks**: PEP 723 inline deps converted from `>=` to `==` (PEP 508 doesn't support hashes inline). All scripts pinned to current latest: rich 15.0.0, typer 0.25.0, pyyaml 6.0.3, httpx 0.28.1. - **prek `additional_dependencies`**: ansible-lint==26.4.0, ansible-core==2.20.5. - **taplo-lint**: pass `--no-schema`. Upstream's `--default-schema-catalogs` returns a format taplo v0.9.3 can't parse — we don't validate against TOML schemas anyway, so this turns off the broken catalog fetch. - **docs/update-tooling-dependencies**: documents the SHA-pin convention, `docker buildx imagetools inspect` for digest lookup, and `prek clean` before re-verifying (cache grows to several GiB). Forgejo workflow `actions/checkout@v6.0.2` was already at the latest SHA — no change. ## Test plan - [x] `prek run --all-files` passes after `prek clean` - [x] `deploy-fly` workflow builds and deploys the new fly image on merge - [x] `fly status -a blumeops-proxy` healthy after deploy - [x] Spot-check a few mise tasks (`mise run blumeops-tasks`, `mise run docs-check-links`) to confirm pinned deps resolve cleanly Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/344
2026-04-30 16:51:43 -07:00
## Rotate Fly.io API Token
See [[rotate-fly-deploy-token]] for the full rotation procedure (75-day cadence, `org`-scoped).
## Troubleshooting
**502 Bad Gateway on fresh deploy**: MagicDNS may not be ready when nginx starts. The `start.sh` script polls `nslookup` before launching nginx, but if it still fails, check that `tailscale status` is healthy inside the container.
**Health check failing**: `fly ssh console -a blumeops-proxy` then `curl localhost:8080/healthz` to test locally.
**TLS errors on custom domain**: Check cert status with `fly certs show <domain> -a blumeops-proxy`. Certs auto-provision via Let's Encrypt and may take a few minutes.
**High latency (>1s p50)**: Check if direct WireGuard peering is established: `fly ssh console -a blumeops-proxy -C "tailscale ping indri"`. If it shows `via DERP`, the tunnel is relayed and latency will be 10-30s. See [[tailscale#Direct Peering vs DERP Relay]] for diagnosis.
## Related
- [[flyio-proxy]] - Service reference card
- [[expose-service-publicly]] - Full setup guide and architecture