Compare commits

...
Sign in to create a new pull request.

6 commits

Author SHA1 Message Date
5aa4cb403a Bump ProxyGroup ingress pod resource requests
Increase from 1m CPU / 1Mi memory to 100m CPU / 128Mi memory. The
ingress pods handle TLS termination for all 19 Tailscale Ingress
services — the previous minimal requests may have caused the scheduler
to deprioritize them under load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 16:30:49 -07:00
f5ba7f03aa Add Grafana alert for Fly proxy upstream unreachable (502 rate)
Fires when >50% of requests return 502 for 3+ minutes, indicating
stale upstream DNS after Tailscale Ingress pod restart. Alert message
includes the fix: mise run fly-reload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 16:08:12 -07:00
a700befd5b Docs: update proxy architecture for upstream keepalive
Update flyio-proxy, forgejo, routing, manage-flyio-proxy,
expose-service-publicly, and mise-tasks docs to reflect:

- Upstream keepalive pools replacing variable-based proxy_pass
- proxy_ssl_name requirement for upstream blocks
- MagicDNS readiness check in start.sh
- fly-reload task for DNS re-resolution
- Crawler mitigation (robots.txt, archive redirect, release caching)
- Forgejo /metrics endpoint and archive cleanup cron
- cv.eblu.me in routing and exposed services tables
- upstream_response_time histogram metric
- Changelog fragment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 16:04:54 -07:00
903db4079d Fix upstream keepalive: set proxy_ssl_name for correct SNI
With upstream blocks, nginx sends the block name as SNI instead of
the actual hostname. The Tailscale Ingress proxy needs the correct
SNI to route TLS connections. Add explicit proxy_ssl_name for each
upstream, and set Host header for docs/cv backends.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:51:51 -07:00
1236d381eb Wait for MagicDNS readiness before starting nginx
Upstream blocks resolve DNS at config load. If MagicDNS isn't ready yet
(Tailscale just connected), nginx gets empty resolution and returns 502.
Poll nslookup until resolution works before launching nginx.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:47:21 -07:00
6a1d9cc0bf Switch Fly proxy to upstream keepalive pools
Replace per-request DNS resolution (variable-based proxy_pass) with
static upstream blocks and keepalive connection pools. This reuses
TLS connections through the Tailscale tunnel instead of handshaking
per request, which should significantly reduce latency at >1 req/s.

Trade-off: DNS is resolved at config load, not per-request. If
Tailscale Ingress pods get new IPs, run `mise run fly-reload` to
re-resolve.

Also adds mise-tasks/fly-reload for nginx config reload without
full redeploy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:42:57 -07:00
12 changed files with 229 additions and 102 deletions

View file

@ -373,6 +373,66 @@ groups:
type: and type: and
refId: C refId: C
- orgId: 1
name: flyio-proxy-health
folder: Infrastructure Alerts
interval: 30s
rules:
- uid: flyio-upstream-unreachable
title: FlyioUpstreamUnreachable
condition: C
for: 3m
noDataState: OK
execErrState: Alerting
annotations:
summary: >-
Fly.io proxy returning elevated 502s — upstream DNS may be stale. Run: mise run fly-reload
runbook_url: https://docs.eblu.me/how-to/operations/manage-flyio-proxy
labels:
severity: warning
service: flyio-proxy
data:
- refId: A
datasourceUid: prometheus
relativeTimeRange:
from: 300
to: 0
model:
expr: >-
sum(rate(flyio_nginx_http_requests_total{instance="flyio-proxy",status="502"}[5m]))
/ sum(rate(flyio_nginx_http_requests_total{instance="flyio-proxy"}[5m]))
> 0.5
interval: ""
refId: A
- refId: B
datasourceUid: "__expr__"
relativeTimeRange:
from: 0
to: 0
model:
type: reduce
expression: A
reducer: last
settings:
mode: dropNN
refId: B
- refId: C
datasourceUid: "__expr__"
relativeTimeRange:
from: 0
to: 0
model:
type: threshold
expression: B
conditions:
- evaluator:
type: gt
params:
- 0
operator:
type: and
refId: C
templates: templates:
- orgId: 1 - orgId: 1
name: ntfy-infra name: ntfy-infra

View file

@ -21,5 +21,9 @@ spec:
pod: pod:
tailscaleContainer: tailscaleContainer:
image: docker.io/tailscale/tailscale:v1.94.2 image: docker.io/tailscale/tailscale:v1.94.2
resources:
requests:
cpu: 100m
memory: 128Mi
tailscaleInitContainer: tailscaleInitContainer:
image: docker.io/tailscale/tailscale:v1.94.2 image: docker.io/tailscale/tailscale:v1.94.2

View file

@ -0,0 +1 @@
Switched Fly proxy to upstream keepalive pools, reducing forge.eblu.me latency from 35s+ p50 to sub-second. Added `mise run fly-reload` for DNS re-resolution without redeploy.

View file

@ -1,7 +1,7 @@
--- ---
title: Manage Fly.io Proxy title: Manage Fly.io Proxy
modified: 2026-02-08 modified: 2026-04-17
last-reviewed: 2026-03-07 last-reviewed: 2026-04-17
tags: tags:
- how-to - how-to
- fly-io - fly-io
@ -23,6 +23,16 @@ mise run fly-deploy
Pushes to `fly/` on main also trigger automatic deployment via the Forgejo CI workflow. Pushes to `fly/` on main also trigger automatic deployment via the Forgejo CI workflow.
## Reload Nginx (Re-resolve Upstream DNS)
Nginx uses `upstream` blocks with keepalive connection pools. DNS is resolved at config load. If Tailscale Ingress pods get new IPs (restart, reschedule, minikube restart), reload nginx to re-resolve without a full redeploy:
```bash
mise run fly-reload
```
A Grafana alert fires when upstreams are unreachable, prompting this action. A full `fly-deploy` also re-resolves DNS (it replaces the container).
## Add a New Public Service ## Add a New Public Service
See [[expose-service-publicly#Per-service setup]] for the full walkthrough. In short: See [[expose-service-publicly#Per-service setup]] for the full walkthrough. In short:
@ -78,12 +88,16 @@ The auth key expires every 90 days. To rotate:
## Troubleshooting ## Troubleshooting
**502 Bad Gateway**: Check `fly logs` for nginx upstream errors. Verify the backend Tailscale service is running (`tailscale status` from inside the container via `fly ssh console`). **502 Bad Gateway after Tailscale Ingress restart**: Upstream DNS is stale. Run `mise run fly-reload` to re-resolve. This is the most common cause of 502s.
**502 Bad Gateway on fresh deploy**: MagicDNS may not be ready when nginx starts. The `start.sh` script polls `nslookup` before launching nginx, but if it still fails, check that `tailscale status` is healthy inside the container.
**Health check failing**: `fly ssh console -a blumeops-proxy` then `curl localhost:8080/healthz` to test locally. **Health check failing**: `fly ssh console -a blumeops-proxy` then `curl localhost:8080/healthz` to test locally.
**TLS errors on custom domain**: Check cert status with `fly certs show <domain> -a blumeops-proxy`. Certs auto-provision via Let's Encrypt and may take a few minutes. **TLS errors on custom domain**: Check cert status with `fly certs show <domain> -a blumeops-proxy`. Certs auto-provision via Let's Encrypt and may take a few minutes.
**High latency (>1s p50)**: Likely lost keepalive — redeploy with `mise run fly-deploy`. Before the keepalive change (April 2026), per-request TLS handshakes through the WireGuard tunnel caused 35s+ p50 at >1 req/s.
## Related ## Related
- [[flyio-proxy]] - Service reference card - [[flyio-proxy]] - Service reference card

View file

@ -1,6 +1,6 @@
--- ---
title: Routing title: Routing
modified: 2026-03-03 modified: 2026-04-17
tags: tags:
- infrastructure - infrastructure
- networking - networking
@ -51,6 +51,7 @@ DNS CNAMEs point to `blumeops-proxy.fly.dev`. TLS via Fly.io-managed Let's Encry
| Service | URL | Description | | Service | URL | Description |
|---------|-----|-------------| |---------|-----|-------------|
| [[docs]] | https://docs.eblu.me | Documentation site | | [[docs]] | https://docs.eblu.me | Documentation site |
| [[cv]] | https://cv.eblu.me | CV / resume |
| [[forgejo]] | https://forge.eblu.me | Git hosting (public) | | [[forgejo]] | https://forge.eblu.me | Git hosting (public) |
## Tailscale-Only Services ## Tailscale-Only Services

View file

@ -1,6 +1,6 @@
--- ---
title: Fly.io Proxy title: Fly.io Proxy
modified: 2026-02-08 modified: 2026-04-17
tags: tags:
- service - service
- networking - networking
@ -26,11 +26,21 @@ Public reverse proxy on [Fly.io](https://fly.io) that exposes selected BlumeOps
| Public domain | Backend | Service | | Public domain | Backend | Service |
|---------------|---------|---------| |---------------|---------|---------|
| `docs.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] | | `docs.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] |
| `cv.eblu.me` | `cv.tail8d86e.ts.net` | [[cv]] |
| `forge.eblu.me` | `forge.tail8d86e.ts.net` | [[forgejo]] |
## Architecture ## Architecture
Internet traffic hits Fly.io's Anycast edge, terminates TLS with a Let's Encrypt certificate, and is proxied by nginx to the backend service over a Tailscale WireGuard tunnel. See [[expose-service-publicly]] for the full architecture diagram. Internet traffic hits Fly.io's Anycast edge, terminates TLS with a Let's Encrypt certificate, and is proxied by nginx to the backend service over a Tailscale WireGuard tunnel. See [[expose-service-publicly]] for the full architecture diagram.
### Upstream Keepalive
Nginx uses `upstream` blocks with `keepalive` connection pools to reuse TLS connections through the WireGuard tunnel. This avoids a per-request TLS handshake, which was previously the dominant source of latency (35s+ p50 before keepalive, sub-second after).
**Trade-off:** DNS for upstream hostnames is resolved once at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule, minikube restart), run `mise run fly-reload` to re-resolve without a full redeploy. A Grafana alert fires when upstreams are unreachable.
Each upstream requires `proxy_ssl_name` set to the actual Tailscale hostname — nginx sends the upstream block name as SNI by default, which the Tailscale Ingress proxy won't recognize.
## Key Files ## Key Files
| File | Purpose | | File | Purpose |
@ -39,7 +49,7 @@ Internet traffic hits Fly.io's Anycast edge, terminates TLS with a Let's Encrypt
| `fly/Dockerfile` | nginx + Tailscale + Alloy container | | `fly/Dockerfile` | nginx + Tailscale + Alloy container |
| `fly/nginx.conf` | Reverse proxy, caching, rate limiting, JSON logging | | `fly/nginx.conf` | Reverse proxy, caching, rate limiting, JSON logging |
| `fly/alloy.river` | Alloy config: log tailing, metric extraction, remote_write | | `fly/alloy.river` | Alloy config: log tailing, metric extraction, remote_write |
| `fly/start.sh` | Entrypoint: start Tailscale, Alloy, then nginx | | `fly/start.sh` | Entrypoint: start Tailscale, wait for MagicDNS, then nginx + Alloy |
| `pulumi/tailscale/__main__.py` | Auth key (`tag:flyio-proxy`) | | `pulumi/tailscale/__main__.py` | Auth key (`tag:flyio-proxy`) |
| `pulumi/tailscale/policy.hujson` | ACL grants for proxy | | `pulumi/tailscale/policy.hujson` | ACL grants for proxy |
| `pulumi/gandi/__main__.py` | DNS CNAMEs | | `pulumi/gandi/__main__.py` | DNS CNAMEs |
@ -57,7 +67,8 @@ The Tailscale auth key is `preauthorized=True` to avoid device approval hangs on
- **Logs**: nginx JSON access logs tailed and pushed to [[loki|Loki]] (`{instance="flyio-proxy", job="flyio-nginx"}`) - **Logs**: nginx JSON access logs tailed and pushed to [[loki|Loki]] (`{instance="flyio-proxy", job="flyio-nginx"}`)
- **Metrics**: Derived from access logs, pushed to [[prometheus|Prometheus]] via `remote_write` - **Metrics**: Derived from access logs, pushed to [[prometheus|Prometheus]] via `remote_write`
- `flyio_nginx_http_requests_total` — request rate by status/method/host - `flyio_nginx_http_requests_total` — request rate by status/method/host
- `flyio_nginx_http_request_duration_seconds` — latency histogram - `flyio_nginx_http_request_duration_seconds` — total request latency histogram (includes proxy overhead)
- `flyio_nginx_upstream_response_time_seconds` — backend response time histogram (Forgejo processing only)
- `flyio_nginx_http_response_bytes_total` — response bandwidth - `flyio_nginx_http_response_bytes_total` — response bandwidth
- `flyio_nginx_cache_requests_total` — cache HIT/MISS/EXPIRED counts - `flyio_nginx_cache_requests_total` — cache HIT/MISS/EXPIRED counts
@ -74,7 +85,21 @@ Alloy listens on `127.0.0.1:12345` for self-scraping its `/metrics` endpoint. Al
The `tag:flyio-proxy` ACL grants access only to `tag:flyio-target:443`. Services must explicitly opt in by adding a `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to their Tailscale Ingress. This means the proxy can only reach endpoints that have been individually tagged — a compromised nginx config cannot route to arbitrary services on the tailnet. The `tag:flyio-proxy` ACL grants access only to `tag:flyio-target:443`. Services must explicitly opt in by adding a `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to their Tailscale Ingress. This means the proxy can only reach endpoints that have been individually tagged — a compromised nginx config cannot route to arbitrary services on the tailnet.
Currently tagged as `tag:flyio-target`: [[docs]], [[loki]], [[prometheus]]. Loki and Prometheus are tagged so that [[alloy|Alloy]] (running inside the container) can push logs and metrics directly via their Tailscale Ingress endpoints — the restricted ACL means Caddy on indri (`tag:homelab`) is not reachable from the proxy. Currently tagged as `tag:flyio-target`: [[docs]], [[cv]], [[forgejo]], [[loki]], [[prometheus]]. Loki and Prometheus are tagged so that [[alloy|Alloy]] (running inside the container) can push logs and metrics directly via their Tailscale Ingress endpoints — the restricted ACL means Caddy on indri (`tag:homelab`) is not reachable from the proxy.
### Crawler Mitigation
The proxy serves a `robots.txt` blocking crawlers from expensive endpoints:
- `/mirrors/` — large mirrored repos
- `/user/` — auth endpoints (crawlers follow redirect loops)
- `/users/` — user profile pages
- `/*/archive/` — git bundle generation (DoS vector, see below)
- `/*/releases/download/` — release artifacts
Archive requests (`/<owner>/<repo>/archive/*`) are 302-redirected to `forge.ops.eblu.me` (tailnet-only), preventing unauthenticated archive generation. This mitigates a known Forgejo DoS vector where crawlers requesting unique commit SHAs trigger unbounded git bundle generation.
Release downloads are cached at the proxy layer (7-day TTL, keyed by URI) to absorb repeated downloads of the same artifact.
To expose an additional service through the proxy, add the `tag:flyio-target` annotation to its Tailscale Ingress. See [[expose-service-publicly]] for the full workflow. To expose an additional service through the proxy, add the `tag:flyio-target` annotation to its Tailscale Ingress. See [[expose-service-publicly]] for the full workflow.

View file

@ -1,6 +1,6 @@
--- ---
title: Forgejo title: Forgejo
modified: 2026-03-28 modified: 2026-04-17
tags: tags:
- service - service
- git - git
@ -148,12 +148,24 @@ The UI shows `forge.eblu.me` for HTTPS clone URLs and `forge.ops.eblu.me` for SS
- **Rate limiting:** nginx rate limits login/signup/forgot-password endpoints (3r/s per client IP via `Fly-Client-IP` header) - **Rate limiting:** nginx rate limits login/signup/forgot-password endpoints (3r/s per client IP via `Fly-Client-IP` header)
- **fail2ban:** Runs in the Fly.io container; bans IPs after 5 failed logins in 10 minutes via nginx deny list (ephemeral across deploys) - **fail2ban:** Runs in the Fly.io container; bans IPs after 5 failed logins in 10 minutes via nginx deny list (ephemeral across deploys)
- **Swagger:** Blocked at the proxy (`/swagger` returns 403); use forge.ops.eblu.me for API access - **Swagger:** Blocked at the proxy (`/swagger` returns 403); use forge.ops.eblu.me for API access
- **Archive redirect:** Archive endpoints (`/*/archive/*`) are 302-redirected to `forge.ops.eblu.me` — prevents unauthenticated crawlers from triggering unbounded git bundle generation (known DoS vector, see [[flyio-proxy#Crawler Mitigation]])
- **robots.txt:** Blocks crawlers from `/mirrors/`, `/user/`, `/users/`, `/*/archive/`, `/*/releases/download/`
- **OAuth dead-end:** "Sign in with Authentik" redirects to the (tailnet-only) Authentik URL — SSO only works from the tailnet - **OAuth dead-end:** "Sign in with Authentik" redirects to the (tailnet-only) Authentik URL — SSO only works from the tailnet
### Break-glass ### Break-glass
`mise run fly-shutoff` stops all public traffic immediately. forge.ops.eblu.me continues to work from the tailnet. See [[expose-service-publicly#Break-glass shutoff]]. `mise run fly-shutoff` stops all public traffic immediately. forge.ops.eblu.me continues to work from the tailnet. See [[expose-service-publicly#Break-glass shutoff]].
## Monitoring
Forgejo exposes a Prometheus `/metrics` endpoint (enabled via `[metrics]` in `app.ini`). Alloy on indri scrapes it at `localhost:3001/metrics`. Metrics are mostly Go runtime stats and repo counters (no per-request latency histogram).
Request latency is measured at the Fly.io proxy layer via the `flyio_nginx_upstream_response_time_seconds` histogram, visible on the Forgejo Grafana dashboard under "Forgejo: Upstream Response Time".
### Archive Cleanup
The `[cron.archive_cleanup]` section is enabled with `OLDER_THAN = 2h` and `RUN_AT_START = true`. This prevents the `repo-archive/` directory from growing unboundedly when crawlers or users trigger archive downloads. Without this, the directory grew to 54GB in 2 days during a crawler incident in April 2026.
## Mirrors ## Mirrors
Forgejo hosts pull mirrors of external repositories (GitHub, etc.) for supply chain control. Mirrors live in the `mirrors/` org and sync on a configurable interval. See [[manage-forgejo-mirrors]] for operations. Forgejo hosts pull mirrors of external repositories (GitHub, etc.) for supply chain control. Mirrors live in the `mirrors/` org and sync on a configurable interval. See [[manage-forgejo-mirrors]] for operations.

View file

@ -33,7 +33,8 @@ Run `mise tasks --sort name` for the live list with descriptions.
| `provision-indri` | Run Ansible playbook for [[indri]] | | `provision-indri` | Run Ansible playbook for [[indri]] |
| `provision-ringtail` | Run Ansible playbook for [[ringtail]] (NixOS) | | `provision-ringtail` | Run Ansible playbook for [[ringtail]] (NixOS) |
| `provision-sifaka` | Run Ansible playbook for [[sifaka]] | | `provision-sifaka` | Run Ansible playbook for [[sifaka]] |
| `fly-deploy` | Deploy Fly.io public proxy | | `fly-deploy` | Deploy Fly.io public proxy (uses op for auth) |
| `fly-reload` | Reload nginx config, re-resolve upstream DNS (no redeploy) |
| `fly-setup` | One-time Fly.io secrets and certs setup | | `fly-setup` | One-time Fly.io secrets and certs setup |
| `fly-shutoff` | Emergency shutoff: stop all Fly.io proxy machines | | `fly-shutoff` | Emergency shutoff: stop all Fly.io proxy machines |
| `dns-preview` | Preview DNS changes with [[pulumi]] | | `dns-preview` | Preview DNS changes with [[pulumi]] |

View file

@ -1,7 +1,7 @@
--- ---
title: Expose a Service Publicly title: Expose a Service Publicly
modified: 2026-03-15 modified: 2026-04-17
last-reviewed: 2026-03-03 last-reviewed: 2026-04-17
tags: tags:
- tutorials - tutorials
- fly-io - fly-io
@ -116,8 +116,8 @@ See the actual files in `fly/` for current configuration. Key design points:
- **`fly.toml`** — uses bluegreen deploys so the old machine serves traffic until the new one passes health checks. `auto_stop_machines = "off"` keeps the proxy always-on. - **`fly.toml`** — uses bluegreen deploys so the old machine serves traffic until the new one passes health checks. `auto_stop_machines = "off"` keeps the proxy always-on.
- **`Dockerfile`** — multi-stage build pulling nginx, Tailscale, and [[alloy]] binaries. Alloy runs as a sidecar inside the container for observability (see below). - **`Dockerfile`** — multi-stage build pulling nginx, Tailscale, and [[alloy]] binaries. Alloy runs as a sidecar inside the container for observability (see below).
- **`start.sh`** — starts `tailscaled` first (MagicDNS must be available before nginx resolves upstreams), then nginx in the background, then Alloy, and blocks on the nginx process. - **`start.sh`** — starts `tailscaled` first, waits for MagicDNS readiness (polls `nslookup` against `100.100.100.100`), then starts nginx, fail2ban, and Alloy, and blocks on the nginx process. The MagicDNS check is required because `upstream` blocks resolve DNS at config load.
- **`nginx.conf`** — uses a `resolver 100.100.100.100` directive so upstream DNS resolution is deferred to request time (not config load time). Each service gets a `server` block with a `set $upstream` variable pattern. Includes a JSON access log format that Alloy tails for log collection and metric extraction. A catch-all server block serves `/healthz` and rejects unknown hosts. - **`nginx.conf`** — uses `upstream` blocks with `keepalive` connection pools for each backend service. DNS is resolved at config load via MagicDNS (`resolver 100.100.100.100`). Each upstream requires `proxy_ssl_name` set explicitly to the Tailscale hostname (nginx sends the block name as SNI by default). A `map` directive conditionally sets the `Connection` header — empty string for keepalive on normal requests, `upgrade` only for WebSocket requests. Includes a JSON access log format that Alloy tails for log collection and metric extraction. A catch-all server block serves `/healthz` and rejects unknown hosts.
- **`error.html`** — shown via `proxy_intercept_errors` when upstreams are unreachable (indri offline, tunnel down, etc.). Cached responses still take priority via `proxy_cache_use_stale`. - **`error.html`** — shown via `proxy_intercept_errors` when upstreams are unreachable (indri offline, tunnel down, etc.). Cached responses still take priority via `proxy_cache_use_stale`.
#### Observability sidecar #### Observability sidecar
@ -216,11 +216,18 @@ To expose an additional service (example: `wiki.eblu.me`):
### 1. Add nginx server block ### 1. Add nginx server block
Edit `fly/nginx.conf` — add a new `server` block. The configuration Edit `fly/nginx.conf` — two changes needed:
differs significantly between static and dynamic services. See the
existing `docs.eblu.me` and `cv.eblu.me` blocks in `fly/nginx.conf` 1. **Add an `upstream` block** (in the `http` context, alongside the existing ones):
for the current pattern (uses `set $upstream` variable for deferred
DNS resolution, `proxy_intercept_errors` for error pages, etc.). ```nginx
upstream wiki_backend {
server wiki.tail8d86e.ts.net:443;
keepalive 4;
}
```
2. **Add a `server` block.** The configuration differs significantly between static and dynamic services. See the existing blocks in `fly/nginx.conf` for the current pattern.
**Static site template** (simplified — adapt from existing blocks): **Static site template** (simplified — adapt from existing blocks):
@ -239,12 +246,16 @@ server {
} }
location / { location / {
set $upstream_wiki https://wiki.tail8d86e.ts.net; proxy_pass https://wiki_backend$request_uri;
proxy_pass $upstream_wiki$request_uri;
proxy_ssl_verify off; proxy_ssl_verify off;
proxy_ssl_server_name on; proxy_ssl_server_name on;
proxy_ssl_name wiki.tail8d86e.ts.net;
proxy_set_header Host wiki.tail8d86e.ts.net;
proxy_intercept_errors on; proxy_intercept_errors on;
proxy_http_version 1.1;
proxy_set_header Connection $connection_upgrade;
proxy_cache services; proxy_cache services;
proxy_cache_valid 200 1d; proxy_cache_valid 200 1d;
proxy_cache_valid 404 1m; proxy_cache_valid 404 1m;
@ -259,66 +270,12 @@ server {
} }
``` ```
**Dynamic service template** (e.g., Forgejo — see `fly/nginx.conf` for the live configuration): **Key points for all upstream blocks:**
- `proxy_ssl_name` must be set explicitly — nginx sends the upstream block name as SNI by default, which the Tailscale Ingress won't recognize
- `proxy_http_version 1.1` + `Connection $connection_upgrade` enables keepalive (empty string for normal requests, "upgrade" for WebSocket)
- `keepalive` pool size: 4 for low-traffic static sites, 8 for higher-traffic dynamic services
```nginx **Dynamic service template** — see `fly/nginx.conf` for the live Forgejo configuration, which includes rate-limited auth endpoints, cached static assets and release downloads, archive endpoint redirects, robots.txt, and WebSocket support.
# --- forge.eblu.me (dynamic, authenticated) ---
server {
listen 8080;
server_name forge.eblu.me;
# Higher rate limit — git operations, CI webhooks, and API calls
# can legitimately burst. Forgejo also has its own rate limiting,
# so this is a safety net, not the primary control.
limit_req zone=general burst=50 nodelay;
# Git LFS and repo uploads can be large
client_max_body_size 512m;
error_page 502 503 504 /error.html;
location = /error.html {
root /usr/share/nginx/html;
internal;
}
location / {
set $upstream_forge https://forge.tail8d86e.ts.net;
proxy_pass $upstream_forge$request_uri;
proxy_ssl_verify off;
proxy_ssl_server_name on;
proxy_intercept_errors on;
# NO proxy_cache — dynamic content with sessions.
# Caching would serve stale pages and break authentication.
# Pass through headers needed for proper proxying
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (Forgejo uses it for live updates)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# Selectively cache static assets only
location ~* \.(css|js|png|jpg|svg|woff2?)$ {
set $upstream_forge_static https://forge.tail8d86e.ts.net;
proxy_pass $upstream_forge_static$request_uri;
proxy_ssl_verify off;
proxy_ssl_server_name on;
proxy_cache services;
proxy_cache_valid 200 7d;
proxy_cache_key $host$uri;
add_header X-Cache-Status $upstream_cache_status;
add_header X-Clacks-Overhead "GNU Terry Pratchett" always;
}
}
```
Key differences for dynamic services: Key differences for dynamic services:
- **No blanket caching** — only static assets (CSS, JS, images) are cached - **No blanket caching** — only static assets (CSS, JS, images) are cached

View file

@ -46,18 +46,32 @@ http {
proxy_cache_path /tmp/cache levels=1:2 keys_zone=services:10m proxy_cache_path /tmp/cache levels=1:2 keys_zone=services:10m
max_size=200m inactive=24h; max_size=200m inactive=24h;
# MagicDNS resolver using a variable in proxy_pass defers upstream DNS # WebSocket-aware Connection header. Only send "upgrade" when the client
# resolution to request time (not config time). Results are cached for # actually requests a protocol switch; otherwise empty string to preserve
# 30s per worker to avoid per-request DNS lookups. # upstream keepalive connections.
map $http_upgrade $connection_upgrade {
default "";
websocket upgrade;
}
# --- Upstream pools with keepalive ---
# DNS is resolved once at config load via MagicDNS. If Tailscale Ingress
# pods get new IPs (restart, reschedule), run `mise run fly-reload` to
# re-resolve. A Grafana alert fires when upstreams are unreachable.
resolver 100.100.100.100 valid=30s; resolver 100.100.100.100 valid=30s;
resolver_timeout 5s; resolver_timeout 5s;
# WebSocket-aware Connection header. Only send "upgrade" when the client upstream forge_backend {
# actually requests a protocol switch; otherwise "close" (the HTTP/1.1 server forge.tail8d86e.ts.net:443;
# default when keepalive pooling is not available). keepalive 8;
map $http_upgrade $connection_upgrade { }
default close; upstream docs_backend {
websocket upgrade; server docs.tail8d86e.ts.net:443;
keepalive 4;
}
upstream cv_backend {
server cv.tail8d86e.ts.net:443;
keepalive 4;
} }
# --- docs.eblu.me (static site) --- # --- docs.eblu.me (static site) ---
@ -76,12 +90,16 @@ http {
internal; internal;
} }
location / { location / {
set $upstream_docs https://docs.tail8d86e.ts.net; proxy_pass https://docs_backend$request_uri;
proxy_pass $upstream_docs$request_uri;
proxy_ssl_verify off; proxy_ssl_verify off;
proxy_ssl_server_name on; proxy_ssl_server_name on;
proxy_ssl_name docs.tail8d86e.ts.net;
proxy_set_header Host docs.tail8d86e.ts.net;
proxy_intercept_errors on; proxy_intercept_errors on;
proxy_http_version 1.1;
proxy_set_header Connection $connection_upgrade;
# Cache aggressively static site only. # Cache aggressively static site only.
# Do NOT use these settings for dynamic services. # Do NOT use these settings for dynamic services.
proxy_cache services; proxy_cache services;
@ -116,12 +134,16 @@ http {
} }
location / { location / {
set $upstream_cv https://cv.tail8d86e.ts.net; proxy_pass https://cv_backend$request_uri;
proxy_pass $upstream_cv$request_uri;
proxy_ssl_verify off; proxy_ssl_verify off;
proxy_ssl_server_name on; proxy_ssl_server_name on;
proxy_ssl_name cv.tail8d86e.ts.net;
proxy_set_header Host cv.tail8d86e.ts.net;
proxy_intercept_errors on; proxy_intercept_errors on;
proxy_http_version 1.1;
proxy_set_header Connection $connection_upgrade;
proxy_cache services; proxy_cache services;
proxy_cache_valid 200 1d; proxy_cache_valid 200 1d;
proxy_cache_valid 404 1m; proxy_cache_valid 404 1m;
@ -187,10 +209,10 @@ http {
location ~ ^/user/(login|sign_up|forgot_password) { location ~ ^/user/(login|sign_up|forgot_password) {
limit_req zone=forge_auth burst=5 nodelay; limit_req zone=forge_auth burst=5 nodelay;
set $upstream_forge https://forge.tail8d86e.ts.net; proxy_pass https://forge_backend$request_uri;
proxy_pass $upstream_forge$request_uri;
proxy_ssl_verify off; proxy_ssl_verify off;
proxy_ssl_server_name on; proxy_ssl_server_name on;
proxy_ssl_name forge.tail8d86e.ts.net;
proxy_intercept_errors on; proxy_intercept_errors on;
proxy_set_header Host $host; proxy_set_header Host $host;
@ -206,10 +228,13 @@ http {
# Cache release artifact downloads immutable files keyed by tag+filename. # Cache release artifact downloads immutable files keyed by tag+filename.
# Avoids hammering Forgejo when crawlers or users re-download the same asset. # Avoids hammering Forgejo when crawlers or users re-download the same asset.
location ~ ^/[^/]+/[^/]+/releases/download/ { location ~ ^/[^/]+/[^/]+/releases/download/ {
set $upstream_forge_releases https://forge.tail8d86e.ts.net; proxy_pass https://forge_backend$request_uri;
proxy_pass $upstream_forge_releases$request_uri;
proxy_ssl_verify off; proxy_ssl_verify off;
proxy_ssl_server_name on; proxy_ssl_server_name on;
proxy_ssl_name forge.tail8d86e.ts.net;
proxy_http_version 1.1;
proxy_set_header Connection $connection_upgrade;
proxy_cache services; proxy_cache services;
proxy_cache_valid 200 7d; proxy_cache_valid 200 7d;
@ -226,10 +251,13 @@ http {
# Selectively cache static assets only # Selectively cache static assets only
location ~* \.(css|js|png|jpg|svg|woff2?)$ { location ~* \.(css|js|png|jpg|svg|woff2?)$ {
set $upstream_forge_static https://forge.tail8d86e.ts.net; proxy_pass https://forge_backend$request_uri;
proxy_pass $upstream_forge_static$request_uri;
proxy_ssl_verify off; proxy_ssl_verify off;
proxy_ssl_server_name on; proxy_ssl_server_name on;
proxy_ssl_name forge.tail8d86e.ts.net;
proxy_http_version 1.1;
proxy_set_header Connection $connection_upgrade;
proxy_cache services; proxy_cache services;
proxy_cache_valid 200 7d; proxy_cache_valid 200 7d;
@ -240,10 +268,10 @@ http {
} }
location / { location / {
set $upstream_forge https://forge.tail8d86e.ts.net; proxy_pass https://forge_backend$request_uri;
proxy_pass $upstream_forge$request_uri;
proxy_ssl_verify off; proxy_ssl_verify off;
proxy_ssl_server_name on; proxy_ssl_server_name on;
proxy_ssl_name forge.tail8d86e.ts.net;
proxy_intercept_errors on; proxy_intercept_errors on;
# NO proxy_cache dynamic content with sessions # NO proxy_cache dynamic content with sessions

View file

@ -11,10 +11,18 @@ tailscale up --authkey="${TS_AUTHKEY}" --hostname=flyio-proxy
until tailscale status > /dev/null 2>&1; do sleep 1; done until tailscale status > /dev/null 2>&1; do sleep 1; done
echo "Tailscale connected" echo "Tailscale connected"
# Wait for MagicDNS to be ready — upstream blocks resolve DNS at config
# load, so nginx will fail to start if MagicDNS can't resolve yet.
echo "Waiting for MagicDNS..."
until nslookup forge.tail8d86e.ts.net 100.100.100.100 > /dev/null 2>&1; do
sleep 1
done
echo "MagicDNS ready"
# Ensure fail2ban deny file exists before nginx starts # Ensure fail2ban deny file exists before nginx starts
touch /etc/nginx/forge-deny.conf touch /etc/nginx/forge-deny.conf
# Start nginx — MagicDNS is available, health check passes immediately. # Start nginx — MagicDNS is available, upstreams resolved.
nginx -g "daemon off;" & nginx -g "daemon off;" &
NGINX_PID=$! NGINX_PID=$!
echo "Nginx started" echo "Nginx started"

16
mise-tasks/fly-reload Executable file
View file

@ -0,0 +1,16 @@
#!/usr/bin/env bash
#MISE description="Reload Fly.io proxy nginx config (re-resolves upstream DNS)"
set -euo pipefail
export FLY_API_TOKEN
FLY_API_TOKEN="$(op read 'op://blumeops/fly.io admin/add more/deploy-token')"
# SSH into the Fly machine and send nginx a reload signal.
# This re-resolves upstream DNS without a full redeploy.
APP="blumeops-proxy"
MACHINE_ID=$(fly machines list -a "$APP" --json | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['id'])")
echo "Reloading nginx on machine $MACHINE_ID..."
fly ssh console -a "$APP" -C "nginx -s reload"
echo "Done. Upstream DNS re-resolved."