Doc said "Store the auth key in 1Password as well for the \`fly-setup\`
mise task" right next to the description of fly-setup, which reads
the key from Pulumi state, not 1Password. No code path anywhere reads
this key from 1P — the instruction is vestigial from an earlier
design and confused us during the v1.0.1 rotation when the
flyio-proxy-key expired.
Rewrite the section to:
- point at \`mise run fly-setup\` as the canonical path
- state explicitly that Pulumi state is the only source of truth
- document the rotation recipe (tailnet-up --replace=<urn> +
fly-setup + fly-deploy) for the next time this 90-day key lapses
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
| title | modified | last-reviewed | tags | aliases | id | ||||
|---|---|---|---|---|---|---|---|---|---|
| Expose a Service Publicly | 2026-04-18 | 2026-04-18 |
|
expose-service-publicly |
Expose a Service Publicly via Fly.io + Tailscale
This guide describes how to expose a BlumeOps service to the public internet
using a reverse proxy container on Fly.io that tunnels back
to indri over tailscale. The approach keeps the home IP hidden,
requires no changes to existing infrastructure (*.ops.eblu.me, caddy,
DNS), and is reusable for multiple services.
Architecture
Internet → <service>.eblu.me
│
Fly.io edge (Anycast, TLS via Let's Encrypt)
│
Fly.io VM (nginx reverse proxy + Tailscale)
│ (direct WireGuard tunnel to indri)
Caddy on indri (*.ops.eblu.me routing)
│
backend service (k8s, native, or remote)
A single Fly.io container serves as the public-facing proxy for all exposed
services. Nginx routes all traffic through caddy on indri via a
direct Tailscale WireGuard connection. Caddy already knows how to route
to every service (native, minikube, or ringtail k3s), so adding a new
public service only requires an nginx server block and a DNS CNAME.
The *.ops.eblu.me routes continue to work in parallel for private tailnet
access — the Fly proxy sends Host: <service>.ops.eblu.me headers that
match the same Caddy routes.
Key decisions
| Decision | Choice | Rationale |
|---|---|---|
| Proxy host | Fly.io (free tier) | Managed container, no server to maintain via Ansible. Shared IPv4 + IPv6 are free for HTTP/HTTPS; dedicated IPv4 is $2/mo if a service needs non-HTTP(S) protocols |
| Tunnel | Tailscale (existing) | Already in use, WireGuard encryption, ACL control |
| DNS | CNAME at gandi | No DNS migration needed, no Cloudflare dependency |
| TLS (public) | Fly.io auto-provisions Let's Encrypt | No cert management, $0.10/mo per hostname |
| TLS (origin) | Tailscale handles encryption | WireGuard tunnel encrypts all traffic |
| CDN/cache | nginx proxy_cache in container |
Per-service: aggressive for static sites, selective or disabled for dynamic services |
| DDoS | Fly.io Anycast + nginx rate limiting | Not enterprise-grade; see #Break-glass shutoff |
| IaC | fly/ directory in repo, Pulumi for DNS + TS key |
No well-maintained Fly.io Pulumi provider; fly.toml is the app's IaC |
TLS in this architecture
There are three independent TLS segments:
- Browser → Fly.io edge: Fly.io auto-provisions a Let's Encrypt
certificate for each custom domain (e.g.,
docs.eblu.me). Validated via TLS-ALPN challenge — no DNS API needed. - nginx → Caddy on indri: nginx proxies to
https://indri.tail8d86e.ts.netwithHost: <service>.ops.eblu.me. Caddy serves its*.ops.eblu.meLet's Encrypt wildcard cert. nginx usesproxy_ssl_verify offsince the underlying WireGuard tunnel is already encrypted. - WireGuard tunnel: All Tailscale traffic is encrypted at the network layer regardless of application-level TLS.
External references
- Tailscale on Fly.io — official guide for running Tailscale in a Fly.io container
- Fly.io Custom Domains — how Fly handles TLS for custom domains
- Home Assistant + Fly.io + Tailscale — community guide describing this exact pattern
One-time setup (first service)
These steps establish the Fly.io proxy infrastructure. They only need to be done once.
Step 1: Fly.io account and app
- Create or recover a Fly.io account at https://fly.io (requires credit card for free tier)
- Install
flyctl:brew install flyctl - Authenticate:
fly auth login - Create the app:
fly apps create blumeops-proxy - Store the Fly.io deploy token in 1Password (blumeops vault):
- Generate:
fly tokens create deploy -a blumeops-proxy - Store as
fly-deploy-tokenfield
- Generate:
Step 2: Repository structure
Create the fly/ directory at the repository root. This is separate from containers/ because the image is built and deployed directly to Fly.io by fly deploy — it never goes through registry.ops.eblu.me.
fly/
├── fly.toml # Fly.io app configuration
├── Dockerfile # nginx + tailscale + alloy
├── nginx.conf # Reverse proxy + cache config
├── start.sh # Entrypoint: start tailscale, nginx, alloy
├── alloy.river # Observability: logs → Loki, metrics → Prometheus
└── error.html # Friendly 503 page for upstream failures
See the actual files in fly/ for current configuration. Key design points:
fly.toml— uses bluegreen deploys so the old machine serves traffic until the new one passes health checks.auto_stop_machines = "off"keeps the proxy always-on.Dockerfile— multi-stage build pulling nginx, Tailscale, and alloy binaries. Alloy runs as a sidecar inside the container for observability (see below).start.sh— startstailscaled --port=41641first (pinned port enables direct WireGuard peering), waits for MagicDNS readiness (pollsnslookupagainst100.100.100.100), then starts nginx, fail2ban, and Alloy, and blocks on the nginx process. The MagicDNS check is required because theupstreamblock resolves DNS at config load.nginx.conf— uses a singleupstreamblock withkeepalivepointing at Caddy on indri (indri.tail8d86e.ts.net:443). All services route through this upstream withHost: <service>.ops.eblu.meheaders for Caddy routing. Includes a JSON access log format that Alloy tails for log collection and metric extraction. A catch-all server block serves/healthzand rejects unknown hosts.error.html— shown viaproxy_intercept_errorswhen upstreams are unreachable (indri offline, tunnel down, etc.). Cached responses still take priority viaproxy_cache_use_stale.
Observability sidecar
The Fly.io container includes alloy baked in (fly/alloy.river). Alloy tails the nginx JSON access log and:
- Forwards log lines to loki via the Tailscale Ingress endpoint
- Derives Prometheus metrics (
flyio_nginx_http_requests_total,flyio_nginx_http_request_duration_seconds,flyio_nginx_cache_requests_total, etc.) and remote-writes them to prometheus
Both Loki and Prometheus are reached directly via their *.tail8d86e.ts.net Tailscale Ingress endpoints (not via caddy), since the proxy's ACLs only allow tag:flyio-target.
Step 3: Tailscale auth key and ACLs (Pulumi)
Extend the existing pulumi/tailscale/ project.
Add to pulumi/tailscale/__main__.py:
# Auth key for Fly.io proxy container
flyio_key = tailscale.TailnetKey(
"flyio-proxy-key",
reusable=True,
ephemeral=True,
preauthorized=True, # Skip device approval on the tailnet
tags=["tag:flyio-proxy"],
expiry=7776000, # 90 days
)
pulumi.export("flyio_authkey", flyio_key.key)
Note:
preauthorized=Trueis required if your tailnet has device approval enabled. Without it, each new container start (including health-check restarts) creates a node that needs manual approval, causing the container to hang before nginx starts.
Add to pulumi/tailscale/policy.hujson:
Tag owner (allows the k8s operator to assign this tag to Ingress proxy nodes):
"tag:flyio-target": ["autogroup:admin", "tag:blumeops", "tag:k8s-operator"],
Access grant (Fly.io proxy → explicitly tagged endpoints on HTTPS only):
{
"src": ["tag:flyio-proxy"],
"dst": ["tag:flyio-target"],
"ip": ["tcp:443"],
},
ACL test:
{
"src": "tag:flyio-proxy",
"accept": ["tag:flyio-target:443"],
"deny": ["tag:k8s:443", "tag:homelab:22", "tag:nas:445", "tag:registry:443"],
},
Indri carries tag:flyio-target so the Fly proxy can reach Caddy. No per-service tagging is needed — Caddy handles routing to all services.
Deploy: mise run tailnet-preview then mise run tailnet-up.
After deploying, push the auth key to Fly.io. The simplest path is
mise run fly-setup, which reads the current value from Pulumi state
and stages it as a Fly.io secret:
mise run fly-setup
Manual equivalent for reference:
cd pulumi/tailscale && pulumi stack output flyio_authkey --show-secrets
# then in fly/:
fly secrets set TS_AUTHKEY="tskey-auth-..." -a blumeops-proxy --stage
Pulumi state is the only source of truth for this key. No other
process (mise tasks, ansible, scripts) reads it from anywhere else —
in particular, the key is not stored in 1Password. To rotate
(every 90 days, or after a compromise), force-replace the resource
and re-run fly-setup:
mise run tailnet-up -- \
--replace='urn:pulumi:tail8d86e::blumeops-tailnet::tailscale:index/tailnetKey:TailnetKey::flyio-proxy-key'
mise run fly-setup
mise run fly-deploy
Pulumi destroys the old key and mints a new 90-day one in a single operation. Older fly machines that already authed against the old key are unaffected (they don't need it after the initial join); only new machine starts read the rotated value.
Step 4: Mise tasks
Three mise tasks manage the proxy lifecycle. See the actual scripts in mise-tasks/ for current implementation:
mise run fly-deploy— runsfly deployfrom thefly/directorymise run fly-setup— one-time, idempotent setup: fetches the Tailscale auth key from Pulumi state, stages it as a Fly.io secret, allocates IPs, and adds TLS certs for all public domains (currentlydocs.eblu.meandcv.eblu.me)mise run fly-shutoff— emergency shutoff: scales machines to zero, immediately stopping all public traffic
Step 5: Forgejo CI workflow
A Forgejo Actions workflow (.forgejo/workflows/deploy-fly.yaml) auto-deploys on pushes to main that touch fly/**. It installs flyctl, runs fly deploy, and verifies health. It can also be triggered manually via workflow_dispatch.
The FLY_DEPLOY_TOKEN Forgejo Actions secret must be set via the forgejo API or UI, following the pattern in the forgejo_actions_secrets Ansible role.
Per-service setup
To expose an additional service (example: wiki.eblu.me):
1. Ensure the service has a Caddy route
The service must be accessible via <service>.ops.eblu.me through caddy.
Most services already have this. If not, add it to ansible/roles/caddy/defaults/main.yml
and deploy with mise run provision-indri -- --tags caddy.
2. Add nginx server block
Edit fly/nginx.conf — add a server block. All services use the shared
indri_backend upstream (Caddy on indri). Set Host and proxy_ssl_name
to the service's *.ops.eblu.me hostname so Caddy routes correctly.
Static site template (simplified — adapt from existing blocks):
# --- wiki.eblu.me (static) ---
server {
listen 8080;
server_name wiki.eblu.me;
limit_req zone=general burst=20 nodelay;
error_page 502 503 504 /error.html;
location = /error.html {
root /usr/share/nginx/html;
internal;
}
location / {
proxy_pass https://indri_backend$request_uri;
proxy_ssl_verify off;
proxy_ssl_server_name on;
proxy_ssl_name wiki.ops.eblu.me;
proxy_set_header Host wiki.ops.eblu.me;
proxy_intercept_errors on;
proxy_http_version 1.1;
proxy_set_header Connection $connection_upgrade;
proxy_cache services;
proxy_cache_valid 200 1d;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_lock on;
proxy_cache_key $host$uri;
proxy_ignore_headers Cache-Control Set-Cookie;
add_header X-Cache-Status $upstream_cache_status;
add_header X-Clacks-Overhead "GNU Terry Pratchett" always;
}
}
Dynamic service template — see fly/nginx.conf for the live Forgejo configuration, which includes rate-limited auth endpoints, cached static assets and release downloads, archive endpoint redirects, robots.txt, and WebSocket support.
2. Add Fly.io certificate
fly certs add wiki.eblu.me -a blumeops-proxy
Or add it to mise-tasks/fly-setup so it's captured for future runs.
3. Deploy
mise run fly-deploy
Or push the fly/nginx.conf change to main — the Forgejo workflow deploys automatically.
4. Verify against fly.dev
Test the proxy before touching DNS. Use the Host header to simulate
the real domain:
# Health check
curl -sf https://blumeops-proxy.fly.dev/healthz
# Simulate real domain request
curl -I -H "Host: wiki.eblu.me" https://blumeops-proxy.fly.dev/
# Should return 200 with X-Cache-Status header
If this fails, debug without any public DNS impact.
5. Add DNS CNAME (Pulumi)
Only after verifying the proxy works. Add to pulumi/gandi/__main__.py:
wiki_public = gandi.livedns.Record(
"wiki-public",
zone=domain,
name="wiki",
type="CNAME",
ttl=300,
values=["blumeops-proxy.fly.dev."],
)
Deploy: mise run dns-preview then mise run dns-up.
6. Verify with real domain
curl -I https://wiki.eblu.me
# Should return 200 with X-Cache-Status header
7. Verify routing
Since all traffic routes through Caddy on indri, no per-service Tailscale Ingress tagging is needed. As long as the service has a Caddy route (step 1), the Fly proxy can reach it.
Security
DDoS and rate limiting
This approach provides basic protection, not enterprise-grade:
- Fly.io Anycast absorbs volumetric L3/L4 attacks
- nginx
limit_reqcaps per-IP request rates at the container level - nginx
proxy_cacheserves most requests from cache — only cache misses traverse the Tailscale tunnel to indri
For static sites, the cache is the primary defense. Most requests
never reach the origin. Cache-busting is mitigated by ignoring query
strings (proxy_cache_key $host$uri) and client cache-control headers.
For dynamic services, the cache covers only static assets. Most requests flow through the Tailscale tunnel to indri on every hit. This makes dynamic services significantly more vulnerable to L7 DDoS — an attacker sending high volumes of legitimate-looking requests (login pages, API endpoints, search queries) bypasses the cache entirely. Mitigations for dynamic services:
- nginx
limit_reqis the primary defense at the proxy layer — tune the rate and burst per service - The backend service's own rate limiting (e.g., Forgejo's built-in rate limiter) provides a second layer
- fail2ban on indri (see below) can block IPs showing abuse patterns
- The break-glass shutoff remains the last resort
If a publicly exposed dynamic service attracts targeted attacks or the home network bandwidth is impacted, consider migrating to Cloudflare Tunnel for enterprise-grade DDoS protection (requires DNS migration; see plan history in git).
fail2ban
fail2ban monitors log files for repeated failed authentication attempts and bans offending IPs.
Static sites: fail2ban does not apply. There is no login surface, no sessions, no credentials to brute force.
Dynamic services with authentication (e.g., Forgejo): fail2ban
runs in the Fly.io container, not on indri. Standard iptables
banning won't work in Fly.io because $remote_addr is Fly's internal
proxy IP, not the client. Instead, fail2ban uses a custom nginx-based
ban action:
- fail2ban watches the nginx JSON access log for repeated 401/403
responses to login endpoints, keyed on the
client_ipfield (populated from theFly-Client-IPheader) - On ban, it appends the IP to
/etc/nginx/forge-deny.confand reloads nginx - nginx uses a
geodirective keyed on$http_fly_client_ipto check the deny list and return 403 for banned IPs
Ban lists are ephemeral across deploys — nginx rate limiting provides the persistent baseline; fail2ban adds escalating bans for active attacks.
See fly/fail2ban/ for the filter, jail, and action configuration.
Break-glass shutoff
If the proxy is causing issues, stop it immediately:
mise run fly-shutoff
This stops all machines in seconds — zero traffic reaches indri. See manage-flyio-proxy#Emergency Shutoff for the full escalation ladder (container stop → Tailscale revoke → DNS removal).
Considerations for dynamic services
The architecture described in this guide works for both static and dynamic services, but the nginx configuration and security posture differ significantly. This section summarizes what changes when exposing a dynamic, authenticated service like forgejo.
| Concern | Static site | Dynamic service |
|---|---|---|
| Caching | Aggressive (cache everything, 1d TTL) | Static assets only, or disabled |
| Session cookies | Ignored (proxy_ignore_headers Set-Cookie) |
Must be passed through |
| Query strings | Ignored in cache key | Included (default behavior) |
| Rate limiting | 10r/s is plenty | Higher burst needed; coordinate with backend rate limiter |
| Request body size | Default 1MB is fine | Increase for uploads (client_max_body_size) |
| WebSocket | Not needed | Often needed (proxy_http_version 1.1, Upgrade headers) |
| Proxy headers | Optional | Required (X-Real-IP, X-Forwarded-For, X-Forwarded-Proto) |
| fail2ban | Not applicable | Configure on indri, watching service logs |
| DDoS exposure | Low — cache absorbs most traffic | Higher — most requests hit origin |
| Pre-exposure checklist | Deploy and go | Disable open registration, audit access controls, configure fail2ban |
Checklist before exposing a dynamic service
- Disable open user registration (require invites or admin approval)
- Audit access controls and permissions
- Configure the service to log the forwarded client IP (not the proxy IP)
- Set up fail2ban in the Fly.io container with a filter for the service's login endpoints
- Tag the service's Tailscale Ingress with
tag:flyio-target - Test the nginx config locally or in staging before deploying
- Rehearse the break-glass shutoff (
mise run fly-shutoff)
IaC summary
| Component | Managed by | Declarative? |
|---|---|---|
| Tailscale auth key | Pulumi (pulumi/tailscale/) |
yes |
| Tailscale ACLs | Pulumi (pulumi/tailscale/policy.hujson) |
yes |
| DNS CNAMEs | Pulumi (pulumi/gandi/) |
yes |
| Container + app config | fly/Dockerfile + fly/fly.toml in repo |
yes |
| Observability | fly/alloy.river in repo |
yes |
| Deployment | Forgejo CI on push to fly/, or mise run fly-deploy |
yes |
| Fly.io secrets + certs | mise run fly-setup (one-time, idempotent) |
semi |
The "semi" for Fly.io secrets is a one-time operation backed by a repeatable mise task. Fly.io does not have a mature Pulumi or Terraform provider, so fly.toml + flyctl is the standard IaC model for Fly.io apps.
Verification
Pre-DNS (verify against fly.dev)
Test the proxy works before creating any public DNS records:
curl -sf https://blumeops-proxy.fly.dev/healthz— returnsokcurl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/— returns 200 withX-Cache-Statusheaderfly status -a blumeops-proxy— shows healthy machine- All
*.ops.eblu.meservices still work from tailnet (unchanged) mise run services-checkpasses
If anything fails here, debug without public DNS impact.
Post-DNS (after CNAME is live)
After deploying DNS (mise run dns-up):
curl -I https://docs.eblu.me— returns 200 withX-Cache-Statusheadercurl -I https://cv.eblu.me— same for each public servicedig docs.eblu.me— resolves to Fly.io IPs (not Tailscale IP)dig forge.ops.eblu.me— still resolves to indri's Tailscale IP (unchanged)- Second request to same URL shows
X-Cache-Status: HIT