Remove status line, update code examples to reflect lessons learned: TUN networking (not userspace), iptables, healthz on default_server, proxy_ssl_server_name, and preauthorized auth key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
745 lines
24 KiB
Markdown
745 lines
24 KiB
Markdown
---
|
|
title: Expose a Service Publicly
|
|
tags:
|
|
- how-to
|
|
- fly-io
|
|
- tailscale
|
|
- networking
|
|
aliases: []
|
|
id: expose-service-publicly
|
|
---
|
|
|
|
# Expose a Service Publicly via Fly.io + Tailscale
|
|
|
|
This guide describes how to expose a BlumeOps service to the public internet
|
|
using a reverse proxy container on [Fly.io](https://fly.io) that tunnels back
|
|
to [[indri]] over [[tailscale]]. The approach keeps the home IP hidden,
|
|
requires no changes to existing infrastructure (`*.ops.eblu.me`, [[caddy]],
|
|
DNS), and is reusable for multiple services.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Internet → <service>.eblu.me
|
|
│
|
|
Fly.io edge (Anycast, TLS via Let's Encrypt)
|
|
│
|
|
Fly.io VM (nginx reverse proxy + Tailscale)
|
|
│ (WireGuard tunnel)
|
|
tailnet (tail8d86e.ts.net)
|
|
│
|
|
<service>.tail8d86e.ts.net (Tailscale ingress)
|
|
│
|
|
k8s Service → pod
|
|
```
|
|
(The approach works similarly for non-k8s services via `tailscale serve`
|
|
service definitions, eg. [[forgejo]] and [[zot]])
|
|
|
|
A single Fly.io container serves as the public-facing proxy for all exposed
|
|
services. Each service gets a `server` block in the nginx config and a DNS
|
|
CNAME. The container joins the tailnet via an ephemeral auth key and reaches
|
|
backend services through Tailscale ingress endpoints.
|
|
|
|
Existing `*.ops.eblu.me` services remain private behind Tailscale — this
|
|
approach does not touch [[caddy]], [[gandi]] DNS-01, or any other existing
|
|
infrastructure. They can continue to operate in parallel for private access.
|
|
|
|
## Key decisions
|
|
|
|
| Decision | Choice | Rationale |
|
|
|----------|--------|-----------|
|
|
| Proxy host | Fly.io (free tier) | Managed container, no server to maintain via Ansible. Shared IPv4 + IPv6 are free for HTTP/HTTPS; dedicated IPv4 is $2/mo if a service needs non-HTTP(S) protocols |
|
|
| Tunnel | Tailscale (existing) | Already in use, WireGuard encryption, ACL control |
|
|
| DNS | CNAME at [[gandi]] | No DNS migration needed, no Cloudflare dependency |
|
|
| TLS (public) | Fly.io auto-provisions Let's Encrypt | No cert management, `$0.10/mo` per hostname |
|
|
| TLS (origin) | Tailscale handles encryption | WireGuard tunnel encrypts all traffic |
|
|
| CDN/cache | nginx `proxy_cache` in container | Per-service: aggressive for static sites, selective or disabled for dynamic services |
|
|
| DDoS | Fly.io Anycast + nginx rate limiting | Not enterprise-grade; see [[#Break-glass shutoff]] |
|
|
| IaC | `fly/` directory in repo, Pulumi for DNS + TS key | No well-maintained Fly.io Pulumi provider; `fly.toml` is the app's IaC |
|
|
|
|
## TLS in this architecture
|
|
|
|
There are three independent TLS segments — none involve Caddy:
|
|
|
|
1. **Browser → Fly.io edge**: Fly.io auto-provisions a Let's Encrypt
|
|
certificate for each custom domain (e.g., `docs.eblu.me`). Validated via
|
|
TLS-ALPN challenge — no DNS API needed.
|
|
2. **nginx → Tailscale ingress**: nginx proxies to
|
|
`https://<service>.tail8d86e.ts.net`. The Tailscale ingress serves a
|
|
Tailscale-issued cert. nginx uses `proxy_ssl_verify off` since the
|
|
underlying tunnel is already encrypted.
|
|
3. **WireGuard tunnel**: All Tailscale traffic is encrypted at the network
|
|
layer regardless of application-level TLS.
|
|
|
|
Caddy continues to serve `*.ops.eblu.me` with its existing Gandi DNS-01
|
|
certificates. The two TLS domains are completely independent.
|
|
|
|
## External references
|
|
|
|
- [Tailscale on Fly.io](https://tailscale.com/kb/1132/flydotio) — official guide for running Tailscale in a Fly.io container
|
|
- [Fly.io Custom Domains](https://fly.io/docs/networking/custom-domain/) — how Fly handles TLS for custom domains
|
|
- [Home Assistant + Fly.io + Tailscale](https://community.home-assistant.io/t/expose-ha-to-the-internet-via-a-cloud-reverse-proxy-fly-io-and-a-vpn-tailscale-for-free-for-now-without-opening-ports/352118) — community guide describing this exact pattern
|
|
|
|
---
|
|
|
|
## One-time setup (first service)
|
|
|
|
These steps establish the Fly.io proxy infrastructure. They only need to be done once.
|
|
|
|
### Step 1: Fly.io account and app
|
|
|
|
1. Create or recover a Fly.io account at https://fly.io (requires credit card for free tier)
|
|
2. Install `flyctl`: `brew install flyctl`
|
|
3. Authenticate: `fly auth login`
|
|
4. Create the app: `fly apps create blumeops-proxy`
|
|
5. Store the Fly.io deploy token in 1Password (blumeops vault):
|
|
- Generate: `fly tokens create deploy -a blumeops-proxy`
|
|
- Store as `fly-deploy-token` field
|
|
|
|
### Step 2: Repository structure
|
|
|
|
Create the `fly/` directory at the repository root. This is separate from `containers/` because the image is built and deployed directly to Fly.io by `fly deploy` — it never goes through `registry.ops.eblu.me`.
|
|
|
|
```
|
|
fly/
|
|
├── README.md # Setup notes and context
|
|
├── fly.toml # Fly.io app configuration
|
|
├── Dockerfile # nginx + tailscale
|
|
├── nginx.conf # Reverse proxy + cache config
|
|
└── start.sh # Entrypoint: start tailscale, then nginx
|
|
```
|
|
|
|
**`fly/fly.toml`** — app configuration:
|
|
|
|
```toml
|
|
app = "blumeops-proxy"
|
|
primary_region = "sjc"
|
|
|
|
[build]
|
|
|
|
[http_service]
|
|
internal_port = 8080
|
|
force_https = true
|
|
auto_stop_machines = false
|
|
auto_start_machines = true
|
|
min_machines_running = 1
|
|
|
|
[checks]
|
|
[checks.health]
|
|
port = 8080
|
|
type = "http"
|
|
interval = "30s"
|
|
timeout = "5s"
|
|
path = "/healthz"
|
|
```
|
|
|
|
**`fly/Dockerfile`** — nginx + tailscale:
|
|
|
|
```dockerfile
|
|
FROM nginx:alpine
|
|
|
|
# Copy tailscale binaries from official image
|
|
COPY --from=docker.io/tailscale/tailscale:stable \
|
|
/usr/local/bin/tailscaled /usr/local/bin/tailscaled
|
|
COPY --from=docker.io/tailscale/tailscale:stable \
|
|
/usr/local/bin/tailscale /usr/local/bin/tailscale
|
|
|
|
RUN mkdir -p /var/run/tailscale /var/lib/tailscale \
|
|
&& apk add --no-cache iptables ip6tables
|
|
|
|
COPY nginx.conf /etc/nginx/nginx.conf
|
|
COPY start.sh /start.sh
|
|
RUN chmod +x /start.sh
|
|
|
|
EXPOSE 8080
|
|
|
|
CMD ["/start.sh"]
|
|
```
|
|
|
|
**`fly/start.sh`** — entrypoint:
|
|
|
|
```bash
|
|
#!/bin/sh
|
|
set -e
|
|
|
|
# Start tailscale daemon. Fly.io runs Firecracker microVMs which support
|
|
# TUN devices natively — no need for --tun=userspace-networking.
|
|
tailscaled --statedir=/var/lib/tailscale &
|
|
sleep 2
|
|
|
|
# Authenticate and join tailnet
|
|
tailscale up --authkey="${TS_AUTHKEY}" --hostname=flyio-proxy
|
|
|
|
# Wait for tailscale to be ready
|
|
until tailscale status > /dev/null 2>&1; do sleep 1; done
|
|
echo "Tailscale connected"
|
|
|
|
# Start nginx — MagicDNS resolves *.tail8d86e.ts.net hostnames
|
|
nginx -g "daemon off;"
|
|
```
|
|
|
|
**`fly/nginx.conf`** — reverse proxy with caching and rate limiting:
|
|
|
|
> The example below shows a **static site** configuration (docs.eblu.me).
|
|
> For dynamic services, see [[#Considerations for dynamic services]].
|
|
|
|
```nginx
|
|
worker_processes auto;
|
|
|
|
events {
|
|
worker_connections 1024;
|
|
}
|
|
|
|
http {
|
|
include /etc/nginx/mime.types;
|
|
default_type application/octet-stream;
|
|
|
|
# Rate limiting zones — define per-service zones as needed
|
|
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
|
|
|
|
# Proxy cache: 200MB, evict after 24h of no access
|
|
proxy_cache_path /tmp/cache levels=1:2 keys_zone=services:10m
|
|
max_size=200m inactive=24h;
|
|
|
|
# --- docs.eblu.me (static site) ---
|
|
server {
|
|
listen 8080;
|
|
server_name docs.eblu.me;
|
|
|
|
limit_req zone=general burst=20 nodelay;
|
|
|
|
location / {
|
|
proxy_pass https://docs.tail8d86e.ts.net;
|
|
proxy_ssl_verify off;
|
|
proxy_ssl_server_name on;
|
|
|
|
# Cache aggressively — static site only.
|
|
# Do NOT use these settings for dynamic services.
|
|
proxy_cache services;
|
|
proxy_cache_valid 200 1d;
|
|
proxy_cache_valid 404 1m;
|
|
proxy_cache_use_stale error timeout updating;
|
|
proxy_cache_lock on;
|
|
|
|
# Prevent cache-busting: ignore query strings and
|
|
# client cache-control headers.
|
|
# Safe for static sites; breaks dynamic services.
|
|
proxy_cache_key $host$uri;
|
|
proxy_ignore_headers Cache-Control Set-Cookie;
|
|
|
|
add_header X-Cache-Status $upstream_cache_status;
|
|
}
|
|
}
|
|
|
|
# Catch-all: reject unknown hosts, but serve health check
|
|
server {
|
|
listen 8080 default_server;
|
|
|
|
location /healthz {
|
|
return 200 "ok\n";
|
|
}
|
|
|
|
location / {
|
|
return 444;
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Step 3: Tailscale auth key and ACLs (Pulumi)
|
|
|
|
Extend the existing `pulumi/tailscale/` project.
|
|
|
|
**Add to `pulumi/tailscale/__main__.py`:**
|
|
|
|
```python
|
|
# Auth key for Fly.io proxy container
|
|
flyio_key = tailscale.TailnetKey(
|
|
"flyio-proxy-key",
|
|
reusable=True,
|
|
ephemeral=True,
|
|
preauthorized=True, # Skip device approval on the tailnet
|
|
tags=["tag:flyio-proxy"],
|
|
expiry=7776000, # 90 days
|
|
)
|
|
pulumi.export("flyio_authkey", flyio_key.key)
|
|
```
|
|
|
|
> **Note:** `preauthorized=True` is required if your tailnet has device
|
|
> approval enabled. Without it, each new container start (including
|
|
> health-check restarts) creates a node that needs manual approval,
|
|
> causing the container to hang before nginx starts.
|
|
|
|
**Add to `pulumi/tailscale/policy.hujson`:**
|
|
|
|
Tag owner:
|
|
```
|
|
"tag:flyio-proxy": ["autogroup:admin", "tag:blumeops"],
|
|
```
|
|
|
|
Access grant (Fly.io proxy → k8s services on HTTPS only):
|
|
```
|
|
{
|
|
"src": ["tag:flyio-proxy"],
|
|
"dst": ["tag:k8s"],
|
|
"ip": ["tcp:443"],
|
|
},
|
|
```
|
|
|
|
ACL test:
|
|
```
|
|
{
|
|
"src": "tag:flyio-proxy",
|
|
"accept": ["tag:k8s:443"],
|
|
"deny": ["tag:homelab:22", "tag:nas:445", "tag:registry:443"],
|
|
},
|
|
```
|
|
|
|
Deploy: `mise run tailnet-preview` then `mise run tailnet-up`.
|
|
|
|
After deploying, extract the auth key and set it as a Fly.io secret:
|
|
|
|
```bash
|
|
# Get the key from Pulumi state
|
|
cd pulumi/tailscale && pulumi stack output flyio_authkey --show-secrets
|
|
|
|
# Set it in Fly.io
|
|
fly secrets set TS_AUTHKEY="tskey-auth-..." -a blumeops-proxy
|
|
```
|
|
|
|
Store the auth key in 1Password as well for the `fly-setup` mise task.
|
|
|
|
### Step 4: Mise tasks
|
|
|
|
**`mise-tasks/fly-deploy`:**
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
#MISE description="Deploy the Fly.io public proxy"
|
|
|
|
set -euo pipefail
|
|
|
|
cd "$(dirname "$0")/../fly"
|
|
fly deploy "$@"
|
|
```
|
|
|
|
**`mise-tasks/fly-setup`:**
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
#MISE description="One-time setup: configure Fly.io secrets and certs (idempotent)"
|
|
|
|
set -euo pipefail
|
|
|
|
APP="blumeops-proxy"
|
|
|
|
# Fetch Tailscale auth key from Pulumi state
|
|
echo "Fetching Tailscale auth key from Pulumi..."
|
|
TS_AUTHKEY=$(cd "$(dirname "$0")/../pulumi/tailscale" && pulumi stack select tail8d86e && pulumi stack output flyio_authkey --show-secrets)
|
|
fly secrets set TS_AUTHKEY="$TS_AUTHKEY" --stage -a "$APP"
|
|
echo "Tailscale auth key staged (will take effect on next deploy)"
|
|
|
|
# Allocate IPs (idempotent — fly errors if already allocated)
|
|
# Shared IPv4 is free and sufficient for HTTP/HTTPS services.
|
|
# Use 'fly ips allocate-v4' (no --shared) for dedicated IPv4 ($2/mo)
|
|
# if the service needs non-HTTP protocols.
|
|
fly ips allocate-v4 --shared -a "$APP" 2>/dev/null || true
|
|
fly ips allocate-v6 -a "$APP" 2>/dev/null || true
|
|
echo "IPs allocated"
|
|
|
|
# Add certs for all public domains (idempotent — fly ignores duplicates)
|
|
fly certs add docs.eblu.me -a "$APP" 2>/dev/null || true
|
|
# fly certs add wiki.eblu.me -a "$APP" 2>/dev/null || true # future services
|
|
echo "Certificates configured"
|
|
|
|
echo "Done. Run 'mise run fly-deploy' to deploy."
|
|
```
|
|
|
|
**`mise-tasks/fly-shutoff`:**
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
#MISE description="Emergency shutoff: stop all Fly.io proxy machines"
|
|
|
|
set -euo pipefail
|
|
|
|
APP="blumeops-proxy"
|
|
|
|
echo "EMERGENCY SHUTOFF: Stopping all machines for $APP"
|
|
fly scale count 0 -a "$APP" --yes
|
|
echo "All machines stopped. Public services are offline."
|
|
echo "To restore: fly scale count 1 -a $APP"
|
|
```
|
|
|
|
### Step 5: Forgejo CI workflow
|
|
|
|
**`.forgejo/workflows/deploy-fly.yaml`:**
|
|
|
|
```yaml
|
|
name: Deploy Fly.io Proxy
|
|
|
|
on:
|
|
workflow_dispatch:
|
|
push:
|
|
branches: [main]
|
|
paths:
|
|
- 'fly/**'
|
|
|
|
jobs:
|
|
deploy:
|
|
runs-on: k8s
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
|
|
- name: Install flyctl
|
|
run: |
|
|
curl -L https://fly.io/install.sh | sh
|
|
echo "/root/.fly/bin" >> "$GITHUB_PATH"
|
|
|
|
- name: Deploy to Fly.io
|
|
env:
|
|
FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }}
|
|
run: |
|
|
cd fly
|
|
fly deploy
|
|
|
|
- name: Verify health
|
|
env:
|
|
FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }}
|
|
run: |
|
|
fly status -a blumeops-proxy
|
|
echo ""
|
|
echo "Health check:"
|
|
sleep 10
|
|
curl -sf https://blumeops-proxy.fly.dev/healthz || echo "Warning: health check failed (may need DNS propagation)"
|
|
```
|
|
|
|
The `FLY_DEPLOY_TOKEN` Forgejo Actions secret must be set via the [[forgejo]] API or UI, following the pattern in the `forgejo_actions_secrets` Ansible role.
|
|
|
|
---
|
|
|
|
## Per-service setup
|
|
|
|
To expose an additional service (example: `wiki.eblu.me`):
|
|
|
|
### 1. Add nginx server block
|
|
|
|
Edit `fly/nginx.conf` — add a new `server` block. The configuration
|
|
differs significantly between static and dynamic services.
|
|
|
|
**Static site example** (same pattern as docs):
|
|
|
|
```nginx
|
|
# --- wiki.eblu.me (static) ---
|
|
server {
|
|
listen 8080;
|
|
server_name wiki.eblu.me;
|
|
|
|
limit_req zone=general burst=20 nodelay;
|
|
|
|
location / {
|
|
proxy_pass https://wiki.tail8d86e.ts.net;
|
|
proxy_ssl_verify off;
|
|
|
|
proxy_cache services;
|
|
proxy_cache_valid 200 1d;
|
|
proxy_cache_valid 404 1m;
|
|
proxy_cache_use_stale error timeout updating;
|
|
proxy_cache_lock on;
|
|
proxy_cache_key $host$uri;
|
|
proxy_ignore_headers Cache-Control Set-Cookie;
|
|
|
|
add_header X-Cache-Status $upstream_cache_status;
|
|
}
|
|
}
|
|
```
|
|
|
|
**Dynamic service example** (e.g., Forgejo):
|
|
|
|
```nginx
|
|
# --- forge.eblu.me (dynamic, authenticated) ---
|
|
server {
|
|
listen 8080;
|
|
server_name forge.eblu.me;
|
|
|
|
# Higher rate limit — git operations, CI webhooks, and API calls
|
|
# can legitimately burst. Forgejo also has its own rate limiting,
|
|
# so this is a safety net, not the primary control.
|
|
limit_req zone=general burst=50 nodelay;
|
|
|
|
# Git LFS and repo uploads can be large
|
|
client_max_body_size 512m;
|
|
|
|
location / {
|
|
proxy_pass https://forge.tail8d86e.ts.net;
|
|
proxy_ssl_verify off;
|
|
|
|
# NO proxy_cache — dynamic content with sessions.
|
|
# Caching would serve stale pages and break authentication.
|
|
|
|
# Pass through headers needed for proper proxying
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
|
|
# WebSocket support (Forgejo uses it for live updates)
|
|
proxy_http_version 1.1;
|
|
proxy_set_header Upgrade $http_upgrade;
|
|
proxy_set_header Connection "upgrade";
|
|
}
|
|
|
|
# Selectively cache static assets only
|
|
location ~* \.(css|js|png|jpg|svg|woff2?)$ {
|
|
proxy_pass https://forge.tail8d86e.ts.net;
|
|
proxy_ssl_verify off;
|
|
|
|
proxy_cache services;
|
|
proxy_cache_valid 200 7d;
|
|
proxy_cache_key $host$uri;
|
|
|
|
add_header X-Cache-Status $upstream_cache_status;
|
|
}
|
|
}
|
|
```
|
|
|
|
Key differences for dynamic services:
|
|
- **No blanket caching** — only static assets (CSS, JS, images) are cached
|
|
- **Respect `Set-Cookie`** — do not ignore session headers
|
|
- **Include query strings** in non-cached requests (default behavior when
|
|
`proxy_cache_key` is not overridden)
|
|
- **Higher rate limits** — legitimate usage patterns are burstier
|
|
- **Proxy headers** — pass `X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`
|
|
so the backend sees the real client IP (important for Forgejo's audit logs
|
|
and its own rate limiting)
|
|
- **WebSocket support** — many modern web apps use WebSockets
|
|
- **Larger body size** — git pushes and file uploads need more than the default 1MB
|
|
|
|
### 2. Add Fly.io certificate
|
|
|
|
```bash
|
|
fly certs add wiki.eblu.me -a blumeops-proxy
|
|
```
|
|
|
|
Or add it to `mise-tasks/fly-setup` so it's captured for future runs.
|
|
|
|
### 3. Deploy
|
|
|
|
```bash
|
|
mise run fly-deploy
|
|
```
|
|
|
|
Or push the `fly/nginx.conf` change to main — the Forgejo workflow deploys automatically.
|
|
|
|
### 4. Verify against fly.dev
|
|
|
|
Test the proxy before touching DNS. Use the `Host` header to simulate
|
|
the real domain:
|
|
|
|
```bash
|
|
# Health check
|
|
curl -sf https://blumeops-proxy.fly.dev/healthz
|
|
|
|
# Simulate real domain request
|
|
curl -I -H "Host: wiki.eblu.me" https://blumeops-proxy.fly.dev/
|
|
# Should return 200 with X-Cache-Status header
|
|
```
|
|
|
|
If this fails, debug without any public DNS impact.
|
|
|
|
### 5. Add DNS CNAME (Pulumi)
|
|
|
|
Only after verifying the proxy works. Add to `pulumi/gandi/__main__.py`:
|
|
|
|
```python
|
|
wiki_public = gandi.livedns.Record(
|
|
"wiki-public",
|
|
zone=domain,
|
|
name="wiki",
|
|
type="CNAME",
|
|
ttl=300,
|
|
values=["blumeops-proxy.fly.dev."],
|
|
)
|
|
```
|
|
|
|
Deploy: `mise run dns-preview` then `mise run dns-up`.
|
|
|
|
### 6. Verify with real domain
|
|
|
|
```bash
|
|
curl -I https://wiki.eblu.me
|
|
# Should return 200 with X-Cache-Status header
|
|
```
|
|
|
|
### 7. Update Tailscale ACLs if needed
|
|
|
|
The one-time setup grants `tag:flyio-proxy` access to `tag:k8s` on port
|
|
443. If the new service needs a different grant, add it to
|
|
`policy.hujson`. Examples:
|
|
|
|
- **Another k8s service** (e.g., Kiwix): No ACL change needed — already
|
|
covered by `tag:k8s:443`.
|
|
- **Forgejo on indri**: Needs a new grant for `tag:homelab` on the
|
|
relevant ports (e.g., `tcp:3001` for HTTP, `tcp:2200` for SSH). Add
|
|
this as a separate, narrow grant — do not widen the existing one.
|
|
- **Non-Tailscale-ingress service**: If the backend uses `tailscale
|
|
serve` instead of the k8s Tailscale operator, the Tailscale node will
|
|
have its own tag. Grant `tag:flyio-proxy` access to that specific tag.
|
|
|
|
---
|
|
|
|
## Security
|
|
|
|
### DDoS and rate limiting
|
|
|
|
This approach provides basic protection, not enterprise-grade:
|
|
|
|
- **Fly.io Anycast** absorbs volumetric L3/L4 attacks
|
|
- **nginx `limit_req`** caps per-IP request rates at the container level
|
|
- **nginx `proxy_cache`** serves most requests from cache — only cache
|
|
misses traverse the Tailscale tunnel to indri
|
|
|
|
For **static sites**, the cache is the primary defense. Most requests
|
|
never reach the origin. Cache-busting is mitigated by ignoring query
|
|
strings (`proxy_cache_key $host$uri`) and client cache-control headers.
|
|
|
|
For **dynamic services**, the cache covers only static assets. Most
|
|
requests flow through the Tailscale tunnel to indri on every hit. This
|
|
makes dynamic services significantly more vulnerable to L7 DDoS — an
|
|
attacker sending high volumes of legitimate-looking requests (login
|
|
pages, API endpoints, search queries) bypasses the cache entirely.
|
|
Mitigations for dynamic services:
|
|
|
|
- nginx `limit_req` is the primary defense at the proxy layer — tune
|
|
the rate and burst per service
|
|
- The backend service's own rate limiting (e.g., Forgejo's built-in
|
|
rate limiter) provides a second layer
|
|
- fail2ban on indri (see below) can block IPs showing abuse patterns
|
|
- The break-glass shutoff remains the last resort
|
|
|
|
If a publicly exposed dynamic service attracts targeted attacks or the
|
|
home network bandwidth is impacted, consider migrating to Cloudflare
|
|
Tunnel for enterprise-grade DDoS protection (requires DNS migration;
|
|
see plan history in git).
|
|
|
|
### fail2ban
|
|
|
|
fail2ban monitors log files for repeated failed authentication attempts
|
|
(SSH brute force, bad login passwords, API abuse) and bans IPs via
|
|
firewall rules.
|
|
|
|
**Static sites**: fail2ban does not apply. There is no login surface,
|
|
no sessions, no credentials to brute force.
|
|
|
|
**Dynamic services with authentication** (e.g., Forgejo): fail2ban is
|
|
relevant and should be configured on **indri**, not on Fly.io. The
|
|
nginx proxy is transparent — it forwards requests but does not see
|
|
authentication outcomes. fail2ban watches the service's own logs on
|
|
indri for patterns like repeated failed logins.
|
|
|
|
Setup considerations for Forgejo specifically:
|
|
|
|
- Forgejo logs failed auth attempts to its log file
|
|
- fail2ban needs a filter matching Forgejo's log format
|
|
- Banned IPs are blocked at indri's firewall (the Fly.io proxy IP is
|
|
the Tailscale address of the `flyio-proxy` node, not the end user's
|
|
IP)
|
|
- **Important**: for fail2ban to see real client IPs, the nginx proxy
|
|
must pass `X-Real-IP` / `X-Forwarded-For` headers (included in the
|
|
dynamic service nginx config above), and Forgejo must be configured
|
|
to trust the proxy and log the forwarded IP rather than the proxy's
|
|
Tailscale IP
|
|
- Disable open user registration before exposing Forgejo publicly —
|
|
require explicit invites
|
|
|
|
### Break-glass shutoff
|
|
|
|
If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network):
|
|
|
|
**Level 1 — Stop the container (seconds, reversible):**
|
|
```bash
|
|
mise run fly-shutoff
|
|
# or: fly scale count 0 -a blumeops-proxy --yes
|
|
```
|
|
All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with `fly scale count 1 -a blumeops-proxy`.
|
|
|
|
**Level 2 — Revoke Tailscale access (seconds):**
|
|
Remove the `flyio-proxy` node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised.
|
|
|
|
**Level 3 — Remove DNS (minutes to hours):**
|
|
Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff.
|
|
|
|
**Level 1 is the primary response.** It is a single command, takes effect in seconds, and is trivially reversible. Document the `mise run fly-shutoff` command somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress.
|
|
|
|
---
|
|
|
|
## Considerations for dynamic services
|
|
|
|
The architecture described in this guide works for both static and dynamic
|
|
services, but the nginx configuration and security posture differ
|
|
significantly. This section summarizes what changes when exposing a
|
|
dynamic, authenticated service like [[forgejo]].
|
|
|
|
| Concern | Static site | Dynamic service |
|
|
|---------|-------------|-----------------|
|
|
| Caching | Aggressive (cache everything, 1d TTL) | Static assets only, or disabled |
|
|
| Session cookies | Ignored (`proxy_ignore_headers Set-Cookie`) | Must be passed through |
|
|
| Query strings | Ignored in cache key | Included (default behavior) |
|
|
| Rate limiting | 10r/s is plenty | Higher burst needed; coordinate with backend rate limiter |
|
|
| Request body size | Default 1MB is fine | Increase for uploads (`client_max_body_size`) |
|
|
| WebSocket | Not needed | Often needed (`proxy_http_version 1.1`, `Upgrade` headers) |
|
|
| Proxy headers | Optional | Required (`X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`) |
|
|
| fail2ban | Not applicable | Configure on indri, watching service logs |
|
|
| DDoS exposure | Low — cache absorbs most traffic | Higher — most requests hit origin |
|
|
| Pre-exposure checklist | Deploy and go | Disable open registration, audit access controls, configure fail2ban |
|
|
|
|
### Checklist before exposing a dynamic service
|
|
|
|
- [ ] Disable open user registration (require invites or admin approval)
|
|
- [ ] Audit access controls and permissions
|
|
- [ ] Configure the service to log the forwarded client IP (not the proxy IP)
|
|
- [ ] Set up fail2ban on indri with a filter for the service's log format
|
|
- [ ] Add narrow Tailscale ACL grant for `tag:flyio-proxy` to the service
|
|
- [ ] Test the nginx config locally or in staging before deploying
|
|
- [ ] Rehearse the break-glass shutoff (`mise run fly-shutoff`)
|
|
|
|
---
|
|
|
|
## IaC summary
|
|
|
|
| Component | Managed by | Declarative? |
|
|
|-----------|------------|:---:|
|
|
| Tailscale auth key | Pulumi (`pulumi/tailscale/`) | yes |
|
|
| Tailscale ACLs | Pulumi (`pulumi/tailscale/policy.hujson`) | yes |
|
|
| DNS CNAMEs | Pulumi (`pulumi/gandi/`) | yes |
|
|
| Container + app config | `fly/Dockerfile` + `fly/fly.toml` in repo | yes |
|
|
| Deployment | Forgejo CI on push to `fly/`, or `mise run fly-deploy` | yes |
|
|
| Fly.io secrets + certs | `mise run fly-setup` (one-time, idempotent) | semi |
|
|
|
|
The "semi" for Fly.io secrets is a one-time operation backed by a repeatable mise task. Fly.io does not have a mature Pulumi or Terraform provider, so `fly.toml` + `flyctl` is the standard IaC model for Fly.io apps.
|
|
|
|
---
|
|
|
|
## Verification
|
|
|
|
### Pre-DNS (verify against fly.dev)
|
|
|
|
Test the proxy works before creating any public DNS records:
|
|
|
|
1. `curl -sf https://blumeops-proxy.fly.dev/healthz` — returns `ok`
|
|
2. `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` — returns 200 with `X-Cache-Status` header
|
|
3. `fly status -a blumeops-proxy` — shows healthy machine
|
|
4. All `*.ops.eblu.me` services still work from tailnet (unchanged)
|
|
5. `mise run services-check` passes
|
|
|
|
If anything fails here, debug without public DNS impact.
|
|
|
|
### Post-DNS (after CNAME is live)
|
|
|
|
After deploying DNS (`mise run dns-up`):
|
|
|
|
1. `curl -I https://docs.eblu.me` — returns 200 with `X-Cache-Status` header
|
|
2. `dig docs.eblu.me` — resolves to Fly.io IPs (not Tailscale IP)
|
|
3. `dig forge.ops.eblu.me` — still resolves to `100.98.163.89` (unchanged)
|
|
4. Second request to same URL shows `X-Cache-Status: HIT`
|