docs/expose-service-publicly pt2 - fly.io #119

Merged
eblume merged 4 commits from docs/expose-service-publicly into main 2026-02-08 00:38:28 -08:00
4 changed files with 647 additions and 135 deletions

View file

@ -1 +1 @@
Add how-to guide for exposing services publicly via Cloudflare Tunnel.
Add how-to guide for exposing services publicly via Fly.io reverse proxy + Tailscale tunnel.

View file

@ -2,197 +2,698 @@
title: Expose a Service Publicly
tags:
- how-to
- cloudflare
- fly-io
- tailscale
- networking
aliases: []
id: expose-service-publicly
---
# Expose a Service Publicly via Cloudflare Tunnel
# Expose a Service Publicly via Fly.io + Tailscale
> **Status:** Plan — not yet implemented. Execute phases in order when ready.
> **Status:** Plan — not yet implemented. First target: `docs.eblu.me`.
This guide describes how to expose a BlumeOps service to the public internet securely using Cloudflare as a CDN and DDoS shield, with a Cloudflare Tunnel creating an outbound-only connection that never exposes the home IP.
The first service to expose is `docs.eblu.me`. The pattern is reusable for future services.
This guide describes how to expose a BlumeOps service to the public internet
using a reverse proxy container on [Fly.io](https://fly.io) that tunnels back
to [[indri]] over [[tailscale]]. The approach keeps the home IP hidden,
requires no changes to existing infrastructure (`*.ops.eblu.me`, [[caddy]],
DNS), and is reusable for multiple services.
## Architecture
```
Internet → docs.eblu.me (Cloudflare proxied CNAME)
Internet → <service>.eblu.me
Cloudflare Edge (CDN, WAF, DDoS protection)
Fly.io edge (Anycast, TLS via Let's Encrypt)
Cloudflare Tunnel (outbound from k8s)
Fly.io VM (nginx reverse proxy + Tailscale)
│ (WireGuard tunnel)
tailnet (tail8d86e.ts.net)
cloudflared pod in minikube
<service>.tail8d86e.ts.net (Tailscale ingress)
docs k8s Service (ClusterIP, port 80)
docs pod (nginx + Quartz static site)
Tailnet → *.ops.eblu.me (unchanged, DNS-only to Tailscale IP)
k8s Service → pod
```
(The approach works similarly for non-k8s services via `tailscale serve`
service definitions, eg. [[forgejo]] and [[zot]])
All existing `*.ops.eblu.me` services remain private behind Tailscale. Only explicitly configured subdomains (like `docs.eblu.me`) are exposed publicly through Cloudflare.
A single Fly.io container serves as the public-facing proxy for all exposed
services. Each service gets a `server` block in the nginx config and a DNS
CNAME. The container joins the tailnet via an ephemeral auth key and reaches
backend services through Tailscale ingress endpoints.
## Key Decisions
Existing `*.ops.eblu.me` services remain private behind Tailscale — this
approach does not touch [[caddy]], [[gandi]] DNS-01, or any other existing
infrastructure. They can continue to operate in parallel for private access.
## Key decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| DNS hosting | Move from [[gandi]] to Cloudflare (free) | CNAME/partial setup needs Business plan @ $200/mo |
| Gandi role | Registrar only | Domain renewal, WHOIS. No more DNS hosting. |
| Tunnel host | Kubernetes | ArgoCD managed, direct ClusterIP access, no Tailscale hop |
| [[caddy]] TLS | Migrate to Cloudflare DNS-01 plugin | Gandi DNS-01 won't work after nameserver change |
| Cloudflare account | Recover existing, instrument with IaC | |
| Proxy host | Fly.io (free tier) | Managed container, no server to maintain via Ansible |
| Tunnel | Tailscale (existing) | Already in use, WireGuard encryption, ACL control |
| DNS | CNAME at [[gandi]] | No DNS migration needed, no Cloudflare dependency |
| TLS (public) | Fly.io auto-provisions Let's Encrypt | No cert management, `$0.10/mo` per hostname |
| TLS (origin) | Tailscale handles encryption | WireGuard tunnel encrypts all traffic |
| CDN/cache | nginx `proxy_cache` in container | Per-service: aggressive for static sites, selective or disabled for dynamic services |
| DDoS | Fly.io Anycast + nginx rate limiting | Not enterprise-grade; see [[#Break-glass shutoff]] |
| IaC | `fly/` directory in repo, Pulumi for DNS + TS key | No well-maintained Fly.io Pulumi provider; `fly.toml` is the app's IaC |
## Prerequisites
## TLS in this architecture
- Cloudflare account with `eblu.me` zone added (free plan)
- Cloudflare API token stored in 1Password with scopes: Zone:DNS:Edit, Zone:Zone:Read, Account:Cloudflare Tunnel:Edit, Account:Account Settings:Read
- Cloudflare account ID and zone ID noted
There are three independent TLS segments — none involve Caddy:
## Phase 0: Preparation (manual)
1. **Browser → Fly.io edge**: Fly.io auto-provisions a Let's Encrypt
certificate for each custom domain (e.g., `docs.eblu.me`). Validated via
TLS-ALPN challenge — no DNS API needed.
2. **nginx → Tailscale ingress**: nginx proxies to
`https://<service>.tail8d86e.ts.net`. The Tailscale ingress serves a
Tailscale-issued cert. nginx uses `proxy_ssl_verify off` since the
underlying tunnel is already encrypted.
3. **WireGuard tunnel**: All Tailscale traffic is encrypted at the network
layer regardless of application-level TLS.
1. Recover Cloudflare account access
2. Add `eblu.me` zone (free plan) — Cloudflare scans existing records from Gandi
3. **Do not change nameservers yet** — wait until Phase 3
4. Create API token with the scopes listed above
5. Store API token and account ID in 1Password (blumeops vault)
Caddy continues to serve `*.ops.eblu.me` with its existing Gandi DNS-01
certificates. The two TLS domains are completely independent.
## Phase 1: Caddy TLS migration
## External references
**Why first**: Blocking dependency for the nameserver change. Once nameservers move to Cloudflare, Gandi LiveDNS can't serve DNS-01 ACME challenges.
- [Tailscale on Fly.io](https://tailscale.com/kb/1132/flydotio) — official guide for running Tailscale in a Fly.io container
- [Fly.io Custom Domains](https://fly.io/docs/networking/custom-domain/) — how Fly handles TLS for custom domains
- [Home Assistant + Fly.io + Tailscale](https://community.home-assistant.io/t/expose-ha-to-the-internet-via-a-cloud-reverse-proxy-fly-io-and-a-vpn-tailscale-for-free-for-now-without-opening-ports/352118) — community guide describing this exact pattern
### Caddy binary rebuild
---
Rebuild Caddy with `github.com/caddy-dns/cloudflare` instead of `github.com/caddy-dns/gandi` using `xcaddy` in `~/code/3rd/caddy/`.
## One-time setup (first service)
### Files to modify
These steps establish the Fly.io proxy infrastructure. They only need to be done once.
- `ansible/roles/caddy/templates/Caddyfile.j2` — change `dns gandi {env.GANDI_BEARER_TOKEN}` to `dns cloudflare {env.CF_API_TOKEN}`
- `ansible/roles/caddy/templates/caddy-wrapper.sh.j2` — source Cloudflare API token instead of Gandi PAT
- `ansible/roles/caddy/defaults/main.yml` — update token variable name
- `ansible/playbooks/indri.yml` — add pre_task to fetch Cloudflare API token from 1Password, replace Gandi PAT fetch
### Step 1: Fly.io account and app
### Deployment sequence
1. Create or recover a Fly.io account at https://fly.io (requires credit card for free tier)
2. Install `flyctl`: `brew install flyctl`
3. Authenticate: `fly auth login`
4. Create the app: `fly apps create blumeops-proxy`
5. Store the Fly.io deploy token in 1Password (blumeops vault):
- Generate: `fly tokens create deploy -a blumeops-proxy`
- Store as `fly-deploy-token` field
1. Set up Cloudflare zone with all records (Phase 2)
2. Prepare Caddy migration on a branch (this phase)
3. Change nameservers at Gandi (Phase 3)
4. Immediately deploy Caddy update: `mise run provision-indri -- --tags caddy`
5. Caddy's next TLS renewal uses Cloudflare DNS-01
### Step 2: Repository structure
Existing certificates are valid for ~90 days, providing a grace window.
Create the `fly/` directory at the repository root. This is separate from `containers/` because the image is built and deployed directly to Fly.io by `fly deploy` — it never goes through `registry.ops.eblu.me`.
## Phase 2: Pulumi — Cloudflare IaC
```
fly/
├── README.md # Setup notes and context
├── fly.toml # Fly.io app configuration
├── Dockerfile # nginx + tailscale
├── nginx.conf # Reverse proxy + cache config
└── start.sh # Entrypoint: start tailscale, then nginx
```
Create a new Pulumi project at `pulumi/cloudflare/`.
**`fly/fly.toml`** — app configuration:
### Files to create
```toml
app = "blumeops-proxy"
primary_region = "sjc"
- `pulumi/cloudflare/Pulumi.yaml` — project definition (`blumeops-cloudflare`, python/uv)
- `pulumi/cloudflare/Pulumi.eblu-me.yaml` — stack config (domain, account-id)
- `pulumi/cloudflare/pyproject.toml` — deps: `pulumi>=3.0.0`, `pulumi-cloudflare>=5.0.0`
- `pulumi/cloudflare/__main__.py`
[build]
### Pulumi program manages
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = false
auto_start_machines = true
min_machines_running = 1
- Zone lookup for `eblu.me`
- DNS records:
- `*.ops.eblu.me` A record → Tailscale IP, **proxied=False** (grey cloud, private)
- `ops.eblu.me` A record → Tailscale IP, **proxied=False**
- `docs.eblu.me` CNAME → `<tunnel-id>.cfargotunnel.com`, **proxied=True** (orange cloud, CDN)
- Cloudflare Tunnel resource
- Tunnel config (ingress: `docs.eblu.me``http://docs.docs.svc.cluster.local:80`)
- Cache rules for static docs site (edge TTL: 1 day, browser TTL: 1 hour)
- Zone security settings (SSL: full, min TLS 1.2, always HTTPS)
[checks]
[checks.health]
port = 8080
type = "http"
interval = "30s"
timeout = "5s"
path = "/healthz"
```
### New mise tasks
**`fly/Dockerfile`** — nginx + tailscale:
Following the `dns-preview`/`dns-up` pattern:
```dockerfile
FROM nginx:alpine
- `mise-tasks/cloudflare-preview``pulumi preview` with 1Password token injection
- `mise-tasks/cloudflare-up``pulumi up` with 1Password token injection
# Copy tailscale binaries from official image
COPY --from=docker.io/tailscale/tailscale:stable \
/usr/local/bin/tailscaled /usr/local/bin/tailscaled
COPY --from=docker.io/tailscale/tailscale:stable \
/usr/local/bin/tailscale /usr/local/bin/tailscale
Keep `pulumi/gandi/` until migration is confirmed working. Then `pulumi destroy` the Gandi stack and archive the code.
RUN mkdir -p /var/run/tailscale /var/lib/tailscale
## Phase 3: DNS migration
COPY nginx.conf /etc/nginx/nginx.conf
COPY start.sh /start.sh
RUN chmod +x /start.sh
### Pre-migration checklist
EXPOSE 8080
- [ ] Cloudflare zone active with all records (Phase 2)
- [ ] Caddy migration branch ready (Phase 1)
- [ ] Cloudflare Tunnel created and configured (Phase 2)
- [ ] cloudflared running in k8s (Phase 4)
CMD ["/start.sh"]
```
### Steps
**`fly/start.sh`** — entrypoint:
1. At Gandi registrar dashboard: change nameservers to Cloudflare's assigned NS
2. Deploy Caddy update immediately: `mise run provision-indri -- --tags caddy`
3. Monitor propagation: `dig +trace docs.eblu.me`, `dig +trace forge.ops.eblu.me`
4. Verify tailnet services still work from tailnet clients
5. Verify `docs.eblu.me` resolves publicly
```bash
#!/bin/sh
set -e
### Rollback
# Start tailscale in userspace networking mode (no TUN device needed)
tailscaled --tun=userspace-networking --statedir=/var/lib/tailscale &
sleep 2
Change nameservers back to Gandi's at registrar. Everything reverts.
# Authenticate and join tailnet
tailscale up --authkey="${TS_AUTHKEY}" --hostname=flyio-proxy
## Phase 4: cloudflared in Kubernetes
# Wait for tailscale to be ready
until tailscale status > /dev/null 2>&1; do sleep 1; done
echo "Tailscale connected"
### Files to create
# Start nginx
nginx -g "daemon off;"
```
- `argocd/apps/cloudflare-tunnel.yaml` — ArgoCD Application
- `argocd/manifests/cloudflare-tunnel/deployment.yaml` — cloudflared Deployment
- Image: `cloudflare/cloudflared:latest` (or pinned version)
- Args: `tunnel --no-autoupdate run --token <tunnel-token>`
- Single replica, tunnel token injected from a Secret
- `argocd/manifests/cloudflare-tunnel/external-secret.yaml` — ExternalSecret to pull tunnel token from 1Password
- `argocd/manifests/cloudflare-tunnel/kustomization.yaml`
**`fly/nginx.conf`** — reverse proxy with caching and rate limiting:
### Tunnel routing (managed by Pulumi)
> The example below shows a **static site** configuration (docs.eblu.me).
> For dynamic services, see [[#Considerations for dynamic services]].
- `docs.eblu.me``http://docs.docs.svc.cluster.local:80` (direct k8s service access)
- Catch-all → `http_status:404`
```nginx
worker_processes auto;
Namespace: `cloudflare-tunnel` (dedicated, reusable for future public services)
events {
worker_connections 1024;
}
## Phase 5: Documentation and cleanup
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
### Files to create
# Rate limiting zones — define per-service zones as needed
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
- `docs/reference/infrastructure/cloudflare.md` — reference card
- `docs/changelog.d/<branch>.feature.md` — changelog fragment
# Proxy cache: 200MB, evict after 24h of no access
proxy_cache_path /tmp/cache levels=1:2 keys_zone=services:10m
max_size=200m inactive=24h;
### Files to modify
# --- docs.eblu.me (static site) ---
server {
listen 8080;
server_name docs.eblu.me;
- `docs/reference/infrastructure/routing.md` — add public services section
- `docs/reference/infrastructure/gandi.md` — update to registrar-only role
- `docs/reference/services/docs.md` — add public URL `https://docs.eblu.me`
- `docs/reference/reference.md` — add Cloudflare to infrastructure section
- `CLAUDE.md` — update routing table, add cloudflare tasks
limit_req zone=general burst=20 nodelay;
location / {
proxy_pass https://docs.tail8d86e.ts.net;
proxy_ssl_verify off;
# Cache aggressively — static site only.
# Do NOT use these settings for dynamic services.
proxy_cache services;
proxy_cache_valid 200 1d;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_lock on;
# Prevent cache-busting: ignore query strings and
# client cache-control headers.
# Safe for static sites; breaks dynamic services.
proxy_cache_key $host$uri;
proxy_ignore_headers Cache-Control Set-Cookie;
add_header X-Cache-Status $upstream_cache_status;
}
location /healthz {
return 200 "ok\n";
}
}
# Catch-all: reject unknown hosts
server {
listen 8080 default_server;
return 444;
}
}
```
### Step 3: Tailscale auth key and ACLs (Pulumi)
Extend the existing `pulumi/tailscale/` project.
**Add to `pulumi/tailscale/__main__.py`:**
```python
# Auth key for Fly.io proxy container
flyio_key = tailscale.TailscaleKey(
"flyio-proxy-key",
reusable=True,
ephemeral=True,
tags=["tag:flyio-proxy"],
expiry=7776000, # 90 days
)
pulumi.export("flyio_authkey", flyio_key.key)
```
**Add to `pulumi/tailscale/policy.hujson`:**
Tag owner:
```
"tag:flyio-proxy": ["autogroup:admin", "tag:blumeops"],
```
Access grant (Fly.io proxy → k8s services on HTTPS only):
```
{
"src": ["tag:flyio-proxy"],
"dst": ["tag:k8s"],
"ip": ["tcp:443"],
},
```
ACL test:
```
{
"src": "tag:flyio-proxy",
"accept": ["tag:k8s:443"],
"deny": ["tag:homelab:22", "tag:nas:445", "tag:registry:443"],
},
```
Deploy: `mise run tailnet-preview` then `mise run tailnet-up`.
After deploying, extract the auth key and set it as a Fly.io secret:
```bash
# Get the key from Pulumi state
cd pulumi/tailscale && pulumi stack output flyio_authkey --show-secrets
# Set it in Fly.io
fly secrets set TS_AUTHKEY="tskey-auth-..." -a blumeops-proxy
```
Store the auth key in 1Password as well for the `fly-setup` mise task.
### Step 4: Mise tasks
**`mise-tasks/fly-deploy`:**
```bash
#!/usr/bin/env bash
#MISE description="Deploy the Fly.io public proxy"
set -euo pipefail
cd "$(dirname "$0")/../fly"
fly deploy "$@"
```
**`mise-tasks/fly-setup`:**
```bash
#!/usr/bin/env bash
#MISE description="One-time setup: configure Fly.io secrets and certs (idempotent)"
set -euo pipefail
APP="blumeops-proxy"
# Fetch Tailscale auth key from 1Password
TS_AUTHKEY=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get <FLY_ITEM_ID> --fields ts-authkey --reveal)
fly secrets set TS_AUTHKEY="$TS_AUTHKEY" -a "$APP"
echo "Tailscale auth key set"
# Add certs for all public domains (idempotent — fly ignores duplicates)
fly certs add docs.eblu.me -a "$APP" 2>/dev/null || true
# fly certs add wiki.eblu.me -a "$APP" 2>/dev/null || true # future services
echo "Certificates configured"
echo "Done. Run 'mise run fly-deploy' to deploy."
```
**`mise-tasks/fly-shutoff`:**
```bash
#!/usr/bin/env bash
#MISE description="Emergency shutoff: stop all Fly.io proxy machines"
set -euo pipefail
APP="blumeops-proxy"
echo "EMERGENCY SHUTOFF: Stopping all machines for $APP"
fly scale count 0 -a "$APP" --yes
echo "All machines stopped. Public services are offline."
echo "To restore: fly scale count 1 -a $APP"
```
### Step 5: Forgejo CI workflow
**`.forgejo/workflows/deploy-fly.yaml`:**
```yaml
name: Deploy Fly.io Proxy
on:
workflow_dispatch:
push:
branches: [main]
paths:
- 'fly/**'
jobs:
deploy:
runs-on: k8s
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install flyctl
run: |
curl -L https://fly.io/install.sh | sh
echo "/root/.fly/bin" >> "$GITHUB_PATH"
- name: Deploy to Fly.io
env:
FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }}
run: |
cd fly
fly deploy
- name: Verify health
env:
FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }}
run: |
fly status -a blumeops-proxy
echo ""
echo "Health check:"
sleep 10
curl -sf https://blumeops-proxy.fly.dev/healthz || echo "Warning: health check failed (may need DNS propagation)"
```
The `FLY_DEPLOY_TOKEN` Forgejo Actions secret must be set via the [[forgejo]] API or UI, following the pattern in the `forgejo_actions_secrets` Ansible role.
---
## Per-service setup
To expose an additional service (example: `wiki.eblu.me`):
### 1. Add nginx server block
Edit `fly/nginx.conf` — add a new `server` block. The configuration
differs significantly between static and dynamic services.
**Static site example** (same pattern as docs):
```nginx
# --- wiki.eblu.me (static) ---
server {
listen 8080;
server_name wiki.eblu.me;
limit_req zone=general burst=20 nodelay;
location / {
proxy_pass https://wiki.tail8d86e.ts.net;
proxy_ssl_verify off;
proxy_cache services;
proxy_cache_valid 200 1d;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_lock on;
proxy_cache_key $host$uri;
proxy_ignore_headers Cache-Control Set-Cookie;
add_header X-Cache-Status $upstream_cache_status;
}
}
```
**Dynamic service example** (e.g., Forgejo):
```nginx
# --- forge.eblu.me (dynamic, authenticated) ---
server {
listen 8080;
server_name forge.eblu.me;
# Higher rate limit — git operations, CI webhooks, and API calls
# can legitimately burst. Forgejo also has its own rate limiting,
# so this is a safety net, not the primary control.
limit_req zone=general burst=50 nodelay;
# Git LFS and repo uploads can be large
client_max_body_size 512m;
location / {
proxy_pass https://forge.tail8d86e.ts.net;
proxy_ssl_verify off;
# NO proxy_cache — dynamic content with sessions.
# Caching would serve stale pages and break authentication.
# Pass through headers needed for proper proxying
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (Forgejo uses it for live updates)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# Selectively cache static assets only
location ~* \.(css|js|png|jpg|svg|woff2?)$ {
proxy_pass https://forge.tail8d86e.ts.net;
proxy_ssl_verify off;
proxy_cache services;
proxy_cache_valid 200 7d;
proxy_cache_key $host$uri;
add_header X-Cache-Status $upstream_cache_status;
}
}
```
Key differences for dynamic services:
- **No blanket caching** — only static assets (CSS, JS, images) are cached
- **Respect `Set-Cookie`** — do not ignore session headers
- **Include query strings** in non-cached requests (default behavior when
`proxy_cache_key` is not overridden)
- **Higher rate limits** — legitimate usage patterns are burstier
- **Proxy headers** — pass `X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`
so the backend sees the real client IP (important for Forgejo's audit logs
and its own rate limiting)
- **WebSocket support** — many modern web apps use WebSockets
- **Larger body size** — git pushes and file uploads need more than the default 1MB
### 2. Add DNS CNAME (Pulumi)
Add to `pulumi/gandi/__main__.py`:
```python
wiki_public = gandi.livedns.Record(
"wiki-public",
zone=domain,
name="wiki",
type="CNAME",
ttl=300,
values=["blumeops-proxy.fly.dev."],
)
```
Deploy: `mise run dns-preview` then `mise run dns-up`.
### 3. Add Fly.io certificate
```bash
fly certs add wiki.eblu.me -a blumeops-proxy
```
Or add it to `mise-tasks/fly-setup` so it's captured for future runs.
### 4. Deploy
```bash
mise run fly-deploy
```
Or push the `fly/nginx.conf` change to main — the Forgejo workflow deploys automatically.
### 5. Verify
```bash
curl -I https://wiki.eblu.me
# Should return 200 with X-Cache-Status header
```
### 6. Update Tailscale ACLs if needed
The one-time setup grants `tag:flyio-proxy` access to `tag:k8s` on port
443. If the new service needs a different grant, add it to
`policy.hujson`. Examples:
- **Another k8s service** (e.g., Kiwix): No ACL change needed — already
covered by `tag:k8s:443`.
- **Forgejo on indri**: Needs a new grant for `tag:homelab` on the
relevant ports (e.g., `tcp:3001` for HTTP, `tcp:2200` for SSH). Add
this as a separate, narrow grant — do not widen the existing one.
- **Non-Tailscale-ingress service**: If the backend uses `tailscale
serve` instead of the k8s Tailscale operator, the Tailscale node will
have its own tag. Grant `tag:flyio-proxy` access to that specific tag.
---
## Security
### DDoS and rate limiting
This approach provides basic protection, not enterprise-grade:
- **Fly.io Anycast** absorbs volumetric L3/L4 attacks
- **nginx `limit_req`** caps per-IP request rates at the container level
- **nginx `proxy_cache`** serves most requests from cache — only cache
misses traverse the Tailscale tunnel to indri
For **static sites**, the cache is the primary defense. Most requests
never reach the origin. Cache-busting is mitigated by ignoring query
strings (`proxy_cache_key $host$uri`) and client cache-control headers.
For **dynamic services**, the cache covers only static assets. Most
requests flow through the Tailscale tunnel to indri on every hit. This
makes dynamic services significantly more vulnerable to L7 DDoS — an
attacker sending high volumes of legitimate-looking requests (login
pages, API endpoints, search queries) bypasses the cache entirely.
Mitigations for dynamic services:
- nginx `limit_req` is the primary defense at the proxy layer — tune
the rate and burst per service
- The backend service's own rate limiting (e.g., Forgejo's built-in
rate limiter) provides a second layer
- fail2ban on indri (see below) can block IPs showing abuse patterns
- The break-glass shutoff remains the last resort
If a publicly exposed dynamic service attracts targeted attacks or the
home network bandwidth is impacted, consider migrating to Cloudflare
Tunnel for enterprise-grade DDoS protection (requires DNS migration;
see plan history in git).
### fail2ban
fail2ban monitors log files for repeated failed authentication attempts
(SSH brute force, bad login passwords, API abuse) and bans IPs via
firewall rules.
**Static sites**: fail2ban does not apply. There is no login surface,
no sessions, no credentials to brute force.
**Dynamic services with authentication** (e.g., Forgejo): fail2ban is
relevant and should be configured on **indri**, not on Fly.io. The
nginx proxy is transparent — it forwards requests but does not see
authentication outcomes. fail2ban watches the service's own logs on
indri for patterns like repeated failed logins.
Setup considerations for Forgejo specifically:
- Forgejo logs failed auth attempts to its log file
- fail2ban needs a filter matching Forgejo's log format
- Banned IPs are blocked at indri's firewall (the Fly.io proxy IP is
the Tailscale address of the `flyio-proxy` node, not the end user's
IP)
- **Important**: for fail2ban to see real client IPs, the nginx proxy
must pass `X-Real-IP` / `X-Forwarded-For` headers (included in the
dynamic service nginx config above), and Forgejo must be configured
to trust the proxy and log the forwarded IP rather than the proxy's
Tailscale IP
- Disable open user registration before exposing Forgejo publicly —
require explicit invites
### Break-glass shutoff
If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network):
**Level 1 — Stop the container (seconds, reversible):**
```bash
mise run fly-shutoff
# or: fly scale count 0 -a blumeops-proxy --yes
```
All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with `fly scale count 1 -a blumeops-proxy`.
**Level 2 — Revoke Tailscale access (seconds):**
Remove the `flyio-proxy` node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised.
**Level 3 — Remove DNS (minutes to hours):**
Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff.
**Level 1 is the primary response.** It is a single command, takes effect in seconds, and is trivially reversible. Document the `mise run fly-shutoff` command somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress.
---
## Considerations for dynamic services
The architecture described in this guide works for both static and dynamic
services, but the nginx configuration and security posture differ
significantly. This section summarizes what changes when exposing a
dynamic, authenticated service like [[forgejo]].
| Concern | Static site | Dynamic service |
|---------|-------------|-----------------|
| Caching | Aggressive (cache everything, 1d TTL) | Static assets only, or disabled |
| Session cookies | Ignored (`proxy_ignore_headers Set-Cookie`) | Must be passed through |
| Query strings | Ignored in cache key | Included (default behavior) |
| Rate limiting | 10r/s is plenty | Higher burst needed; coordinate with backend rate limiter |
| Request body size | Default 1MB is fine | Increase for uploads (`client_max_body_size`) |
| WebSocket | Not needed | Often needed (`proxy_http_version 1.1`, `Upgrade` headers) |
| Proxy headers | Optional | Required (`X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`) |
| fail2ban | Not applicable | Configure on indri, watching service logs |
| DDoS exposure | Low — cache absorbs most traffic | Higher — most requests hit origin |
| Pre-exposure checklist | Deploy and go | Disable open registration, audit access controls, configure fail2ban |
### Checklist before exposing a dynamic service
- [ ] Disable open user registration (require invites or admin approval)
- [ ] Audit access controls and permissions
- [ ] Configure the service to log the forwarded client IP (not the proxy IP)
- [ ] Set up fail2ban on indri with a filter for the service's log format
- [ ] Add narrow Tailscale ACL grant for `tag:flyio-proxy` to the service
- [ ] Test the nginx config locally or in staging before deploying
- [ ] Rehearse the break-glass shutoff (`mise run fly-shutoff`)
---
## IaC summary
| Component | Managed by | Declarative? |
|-----------|------------|:---:|
| Tailscale auth key | Pulumi (`pulumi/tailscale/`) | yes |
| Tailscale ACLs | Pulumi (`pulumi/tailscale/policy.hujson`) | yes |
| DNS CNAMEs | Pulumi (`pulumi/gandi/`) | yes |
| Container + app config | `fly/Dockerfile` + `fly/fly.toml` in repo | yes |
| Deployment | Forgejo CI on push to `fly/`, or `mise run fly-deploy` | yes |
| Fly.io secrets + certs | `mise run fly-setup` (one-time, idempotent) | semi |
The "semi" for Fly.io secrets is a one-time operation backed by a repeatable mise task. Fly.io does not have a mature Pulumi or Terraform provider, so `fly.toml` + `flyctl` is the standard IaC model for Fly.io apps.
---
## Verification
1. `curl -I https://docs.eblu.me` from public internet — returns 200 with `cf-ray` header
2. `dig docs.eblu.me` — shows Cloudflare IPs (not Tailscale IP)
3. `dig forge.ops.eblu.me` — still shows `100.98.163.89` (Tailscale IP)
4. All `*.ops.eblu.me` services accessible from tailnet
After initial deployment of a service (using `docs.eblu.me` as example):
1. `curl -I https://docs.eblu.me` — returns 200 with `X-Cache-Status` header
2. `dig docs.eblu.me` — resolves to Fly.io IPs (not Tailscale IP)
3. `dig forge.ops.eblu.me` — still resolves to `100.98.163.89` (unchanged)
4. All `*.ops.eblu.me` services work from tailnet
5. `mise run services-check` passes
6. Caddy TLS renewal works (force test with `caddy reload` if needed)
7. Cloudflare dashboard shows tunnel healthy and cache hits
## Risks
| Risk | Mitigation |
|------|------------|
| Caddy TLS renewal fails after NS change | Deploy Caddy update immediately; existing certs valid ~90 days |
| DNS propagation delay (24-48h) | Set low TTLs before migration; monitor with `dig +trace` |
| cloudflared crashes | K8s restarts it; Cloudflare serves cached content |
| Tunnel credentials leak | 1Password + ExternalSecret; tunnel only routes to docs |
## Adding more public services
To expose another service publicly (e.g., `wiki.eblu.me`):
1. Add DNS record + tunnel ingress rule in `pulumi/cloudflare/__main__.py`
2. Run `mise run cloudflare-up`
3. No changes to cloudflared deployment (remotely-managed tunnel config)
6. `fly status -a blumeops-proxy` shows healthy machine
7. Second request to same URL shows `X-Cache-Status: HIT`

View file

@ -22,7 +22,7 @@ Task-oriented instructions for common BlumeOps operations. These guides assume y
| [[update-tailscale-acls]] | Update Tailscale access control policies |
| [[gandi-operations]] | Manage DNS records and cycle the Gandi API token |
| [[use-pypi-proxy]] | Configure pip and publish packages to devpi |
| [[expose-service-publicly]] | Expose a service to the public internet via Cloudflare Tunnel |
| [[expose-service-publicly]] | Expose a service to the public internet via Fly.io + Tailscale |
## Documentation

View file

@ -125,17 +125,28 @@ def main() -> int:
if has_spaces:
# Links with spaces in target or around pipe are not allowed
spaced_links.append((rel_path, line_num, target))
elif "/" in target:
continue
# Handle anchor links: [[#Heading]] or [[file#Heading]]
# Strip the #fragment for validation; pure anchors (#Heading) skip file check
file_target = target
if "#" in target:
file_target = target.split("#", 1)[0]
if not file_target:
# Pure in-page anchor like [[#Break-glass shutoff]] — always valid
continue
if "/" in file_target:
# Path-based links are not allowed - use simple filenames only
path_links.append((rel_path, line_num, target))
elif target in ambiguous_filenames:
elif file_target in ambiguous_filenames:
# Link uses an ambiguous filename - needs to be renamed
ambiguous_links.append((rel_path, line_num, target, filename_counts[target]))
elif target not in valid_targets:
ambiguous_links.append((rel_path, line_num, target, filename_counts[file_target]))
elif file_target not in valid_targets:
broken_links.append((rel_path, line_num, target))
elif target != source_stem:
elif file_target != source_stem:
# Valid link to a different doc — record it for orphan detection
linked_stems.add(target)
linked_stems.add(file_target)
# Print results
console.print("[bold]Wiki-Link Validation[/bold]")