Add Fly.io public reverse proxy for docs.eblu.me (#120)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 9s

## Summary

- Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale
- First service exposed: `docs.eblu.me` — the Quartz static docs site
- Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME
- Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow

## Key details

- Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed
- Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts
- nginx caches aggressively for the static site; health check is on the default_server block
- ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only
- DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev`

## Test plan

- [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok`
- [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status`
- [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert
- [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected)
- [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120
This commit is contained in:
Erich Blume 2026-02-08 02:36:19 -08:00
commit 64a78422b1
35 changed files with 928 additions and 208 deletions

View file

@ -0,0 +1 @@
Update docs for public proxy: canonical URL is now docs.eblu.me, add Fly.io proxy reference card and operations how-to

View file

@ -0,0 +1 @@
Add Fly.io public reverse proxy infrastructure for exposing services to the internet (first target: docs.eblu.me)

View file

@ -11,8 +11,6 @@ id: expose-service-publicly
# Expose a Service Publicly via Fly.io + Tailscale
> **Status:** Plan — not yet implemented. First target: `docs.eblu.me`.
This guide describes how to expose a BlumeOps service to the public internet
using a reverse proxy container on [Fly.io](https://fly.io) that tunnels back
to [[indri]] over [[tailscale]]. The approach keeps the home IP hidden,
@ -50,7 +48,7 @@ infrastructure. They can continue to operate in parallel for private access.
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Proxy host | Fly.io (free tier) | Managed container, no server to maintain via Ansible |
| Proxy host | Fly.io (free tier) | Managed container, no server to maintain via Ansible. Shared IPv4 + IPv6 are free for HTTP/HTTPS; dedicated IPv4 is $2/mo if a service needs non-HTTP(S) protocols |
| Tunnel | Tailscale (existing) | Already in use, WireGuard encryption, ACL control |
| DNS | CNAME at [[gandi]] | No DNS migration needed, no Cloudflare dependency |
| TLS (public) | Fly.io auto-provisions Let's Encrypt | No cert management, `$0.10/mo` per hostname |
@ -146,7 +144,8 @@ COPY --from=docker.io/tailscale/tailscale:stable \
COPY --from=docker.io/tailscale/tailscale:stable \
/usr/local/bin/tailscale /usr/local/bin/tailscale
RUN mkdir -p /var/run/tailscale /var/lib/tailscale
RUN mkdir -p /var/run/tailscale /var/lib/tailscale \
&& apk add --no-cache iptables ip6tables
COPY nginx.conf /etc/nginx/nginx.conf
COPY start.sh /start.sh
@ -163,8 +162,9 @@ CMD ["/start.sh"]
#!/bin/sh
set -e
# Start tailscale in userspace networking mode (no TUN device needed)
tailscaled --tun=userspace-networking --statedir=/var/lib/tailscale &
# Start tailscale daemon. Fly.io runs Firecracker microVMs which support
# TUN devices natively — no need for --tun=userspace-networking.
tailscaled --statedir=/var/lib/tailscale &
sleep 2
# Authenticate and join tailnet
@ -174,7 +174,7 @@ tailscale up --authkey="${TS_AUTHKEY}" --hostname=flyio-proxy
until tailscale status > /dev/null 2>&1; do sleep 1; done
echo "Tailscale connected"
# Start nginx
# Start nginx — MagicDNS resolves *.tail8d86e.ts.net hostnames
nginx -g "daemon off;"
```
@ -211,6 +211,7 @@ http {
location / {
proxy_pass https://docs.tail8d86e.ts.net;
proxy_ssl_verify off;
proxy_ssl_server_name on;
# Cache aggressively — static site only.
# Do NOT use these settings for dynamic services.
@ -228,16 +229,19 @@ http {
add_header X-Cache-Status $upstream_cache_status;
}
}
# Catch-all: reject unknown hosts, but serve health check
server {
listen 8080 default_server;
location /healthz {
return 200 "ok\n";
}
}
# Catch-all: reject unknown hosts
server {
listen 8080 default_server;
return 444;
location / {
return 444;
}
}
}
```
@ -250,16 +254,22 @@ Extend the existing `pulumi/tailscale/` project.
```python
# Auth key for Fly.io proxy container
flyio_key = tailscale.TailscaleKey(
flyio_key = tailscale.TailnetKey(
"flyio-proxy-key",
reusable=True,
ephemeral=True,
preauthorized=True, # Skip device approval on the tailnet
tags=["tag:flyio-proxy"],
expiry=7776000, # 90 days
)
pulumi.export("flyio_authkey", flyio_key.key)
```
> **Note:** `preauthorized=True` is required if your tailnet has device
> approval enabled. Without it, each new container start (including
> health-check restarts) creates a node that needs manual approval,
> causing the container to hang before nginx starts.
**Add to `pulumi/tailscale/policy.hujson`:**
Tag owner:
@ -323,10 +333,19 @@ set -euo pipefail
APP="blumeops-proxy"
# Fetch Tailscale auth key from 1Password
TS_AUTHKEY=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get <FLY_ITEM_ID> --fields ts-authkey --reveal)
fly secrets set TS_AUTHKEY="$TS_AUTHKEY" -a "$APP"
echo "Tailscale auth key set"
# Fetch Tailscale auth key from Pulumi state
echo "Fetching Tailscale auth key from Pulumi..."
TS_AUTHKEY=$(cd "$(dirname "$0")/../pulumi/tailscale" && pulumi stack select tail8d86e && pulumi stack output flyio_authkey --show-secrets)
fly secrets set TS_AUTHKEY="$TS_AUTHKEY" --stage -a "$APP"
echo "Tailscale auth key staged (will take effect on next deploy)"
# Allocate IPs (idempotent — fly errors if already allocated)
# Shared IPv4 is free and sufficient for HTTP/HTTPS services.
# Use 'fly ips allocate-v4' (no --shared) for dedicated IPv4 ($2/mo)
# if the service needs non-HTTP protocols.
fly ips allocate-v4 --shared -a "$APP" 2>/dev/null || true
fly ips allocate-v6 -a "$APP" 2>/dev/null || true
echo "IPs allocated"
# Add certs for all public domains (idempotent — fly ignores duplicates)
fly certs add docs.eblu.me -a "$APP" 2>/dev/null || true
@ -497,9 +516,41 @@ Key differences for dynamic services:
- **WebSocket support** — many modern web apps use WebSockets
- **Larger body size** — git pushes and file uploads need more than the default 1MB
### 2. Add DNS CNAME (Pulumi)
### 2. Add Fly.io certificate
Add to `pulumi/gandi/__main__.py`:
```bash
fly certs add wiki.eblu.me -a blumeops-proxy
```
Or add it to `mise-tasks/fly-setup` so it's captured for future runs.
### 3. Deploy
```bash
mise run fly-deploy
```
Or push the `fly/nginx.conf` change to main — the Forgejo workflow deploys automatically.
### 4. Verify against fly.dev
Test the proxy before touching DNS. Use the `Host` header to simulate
the real domain:
```bash
# Health check
curl -sf https://blumeops-proxy.fly.dev/healthz
# Simulate real domain request
curl -I -H "Host: wiki.eblu.me" https://blumeops-proxy.fly.dev/
# Should return 200 with X-Cache-Status header
```
If this fails, debug without any public DNS impact.
### 5. Add DNS CNAME (Pulumi)
Only after verifying the proxy works. Add to `pulumi/gandi/__main__.py`:
```python
wiki_public = gandi.livedns.Record(
@ -514,30 +565,14 @@ wiki_public = gandi.livedns.Record(
Deploy: `mise run dns-preview` then `mise run dns-up`.
### 3. Add Fly.io certificate
```bash
fly certs add wiki.eblu.me -a blumeops-proxy
```
Or add it to `mise-tasks/fly-setup` so it's captured for future runs.
### 4. Deploy
```bash
mise run fly-deploy
```
Or push the `fly/nginx.conf` change to main — the Forgejo workflow deploys automatically.
### 5. Verify
### 6. Verify with real domain
```bash
curl -I https://wiki.eblu.me
# Should return 200 with X-Cache-Status header
```
### 6. Update Tailscale ACLs if needed
### 7. Update Tailscale ACLs if needed
The one-time setup grants `tag:flyio-proxy` access to `tag:k8s` on port
443. If the new service needs a different grant, add it to
@ -620,22 +655,13 @@ Setup considerations for Forgejo specifically:
### Break-glass shutoff
If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network):
If the proxy is causing issues, stop it immediately:
**Level 1 — Stop the container (seconds, reversible):**
```bash
mise run fly-shutoff
# or: fly scale count 0 -a blumeops-proxy --yes
```
All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with `fly scale count 1 -a blumeops-proxy`.
**Level 2 — Revoke Tailscale access (seconds):**
Remove the `flyio-proxy` node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised.
**Level 3 — Remove DNS (minutes to hours):**
Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff.
**Level 1 is the primary response.** It is a single command, takes effect in seconds, and is trivially reversible. Document the `mise run fly-shutoff` command somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress.
This stops all machines in seconds — zero traffic reaches indri. See [[manage-flyio-proxy#Emergency Shutoff]] for the full escalation ladder (container stop → Tailscale revoke → DNS removal).
---
@ -688,12 +714,23 @@ The "semi" for Fly.io secrets is a one-time operation backed by a repeatable mis
## Verification
After initial deployment of a service (using `docs.eblu.me` as example):
### Pre-DNS (verify against fly.dev)
Test the proxy works before creating any public DNS records:
1. `curl -sf https://blumeops-proxy.fly.dev/healthz` — returns `ok`
2. `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` — returns 200 with `X-Cache-Status` header
3. `fly status -a blumeops-proxy` — shows healthy machine
4. All `*.ops.eblu.me` services still work from tailnet (unchanged)
5. `mise run services-check` passes
If anything fails here, debug without public DNS impact.
### Post-DNS (after CNAME is live)
After deploying DNS (`mise run dns-up`):
1. `curl -I https://docs.eblu.me` — returns 200 with `X-Cache-Status` header
2. `dig docs.eblu.me` — resolves to Fly.io IPs (not Tailscale IP)
3. `dig forge.ops.eblu.me` — still resolves to `100.98.163.89` (unchanged)
4. All `*.ops.eblu.me` services work from tailnet
5. `mise run services-check` passes
6. `fly status -a blumeops-proxy` shows healthy machine
7. Second request to same URL shows `X-Cache-Status: HIT`
4. Second request to same URL shows `X-Cache-Status: HIT`

View file

@ -41,4 +41,5 @@ Task-oriented instructions for common BlumeOps operations. These guides assume y
| Guide | Description |
|-------|-------------|
| [[restart-indri]] | Safely shut down and restart indri |
| [[manage-flyio-proxy]] | Deploy, shutoff, and troubleshoot the public proxy |
| [[troubleshooting]] | Diagnose and fix common issues |

View file

@ -0,0 +1,88 @@
---
title: Manage Fly.io Proxy
tags:
- how-to
- fly-io
- networking
- operations
---
# Manage Fly.io Proxy
Operational tasks for the [[flyio-proxy]] public reverse proxy.
## Deploy Changes
After modifying files in `fly/`:
```bash
mise run fly-deploy
```
Pushes to `fly/` on main also trigger automatic deployment via the Forgejo CI workflow.
## Add a New Public Service
See [[expose-service-publicly#Per-service setup]] for the full walkthrough. In short:
1. Add a `server` block to `fly/nginx.conf`
2. Add a Fly.io certificate: `fly certs add <domain> -a blumeops-proxy`
3. Deploy: `mise run fly-deploy`
4. Verify against `blumeops-proxy.fly.dev` with a `Host` header
5. Add DNS CNAME via Pulumi: `mise run dns-preview` then `mise run dns-up`
## Emergency Shutoff
If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network):
**Level 1 — Stop the container (seconds, reversible):**
```bash
mise run fly-shutoff
# or: fly scale count 0 -a blumeops-proxy --yes
```
All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with `fly scale count 1 -a blumeops-proxy`.
**Level 2 — Revoke Tailscale access (seconds):**
Remove the `flyio-proxy` node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised.
**Level 3 — Remove DNS (minutes to hours):**
Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff.
**Level 1 is the primary response.** It is a single command, takes effect in seconds, and is trivially reversible. Keep `mise run fly-shutoff` somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress.
## Check Status
```bash
# App and machine status
fly status -a blumeops-proxy
# Live logs
fly logs -a blumeops-proxy
# Health check
curl -sf https://blumeops-proxy.fly.dev/healthz
# Certificate status
fly certs list -a blumeops-proxy
```
## Rotate Tailscale Auth Key
The auth key expires every 90 days. To rotate:
1. Re-apply Pulumi to generate a new key: `mise run tailnet-up`
2. Re-run setup to stage the new secret: `mise run fly-setup`
3. Deploy to pick up the new secret: `mise run fly-deploy`
## Troubleshooting
**502 Bad Gateway**: Check `fly logs` for nginx upstream errors. Verify the backend Tailscale service is running (`tailscale status` from inside the container via `fly ssh console`).
**Health check failing**: `fly ssh console -a blumeops-proxy` then `curl localhost:8080/healthz` to test locally.
**TLS errors on custom domain**: Check cert status with `fly certs show <domain> -a blumeops-proxy`. Certs auto-provision via Let's Encrypt and may take a few minutes.
## Related
- [[flyio-proxy]] - Service reference card
- [[expose-service-publicly]] - Full setup guide and architecture

View file

@ -8,7 +8,7 @@ tags:
# Update Documentation
How to publish documentation changes to https://docs.ops.eblu.me.
How to publish documentation changes to https://docs.eblu.me.
## Quick Release

View file

@ -22,8 +22,10 @@ editor of choice. (I recommend vim.)
These services run on my home [[hosts|infrastructure]], primarily an m1 mac
mini named [[indri]] and a Synology NAS called [[sifaka]]. The infrastructure
is networked via [[tailscale]], with the domain `eblu.me` hosted via [[gandi]]
with [[caddy]] providing a reverse proxy to resolve tailnet devices.
is networked via [[tailscale]], with the domain `eblu.me` hosted via [[gandi]],
[[caddy]] providing a private reverse proxy for tailnet devices, and
[[flyio-proxy|Fly.io]] serving public-facing services like
[this documentation site](https://docs.eblu.me).
The goal of BlumeOps is threefold:

View file

@ -34,6 +34,7 @@ Individual service reference cards with URLs and configuration details.
| [[zot]] | Container registry | indri |
| [[devpi]] | PyPI caching proxy | k8s |
| [[docs]] | Documentation site (Quartz) | k8s |
| [[flyio-proxy]] | Public reverse proxy (Fly.io + Tailscale) | Fly.io |
| [[automounter]] | SMB share automounter | indri |
## Infrastructure

View file

@ -47,7 +47,7 @@ K8s services are proxied via their Tailscale Ingress endpoints:
|-----------|---------|---------|
| `grafana.ops.eblu.me` | `grafana.tail8d86e.ts.net` | [[grafana]] |
| `argocd.ops.eblu.me` | `argocd.tail8d86e.ts.net` | [[argocd]] |
| `docs.ops.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] |
| `docs.ops.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] (now publicly available at `docs.eblu.me` via [[flyio-proxy]]) |
| `feed.ops.eblu.me` | `feed.tail8d86e.ts.net` | [[miniflux]] |
| ... | ... | (see defaults/main.yml for full list) |

View file

@ -13,11 +13,13 @@ Documentation site built with [Quartz](https://quartz.jzhao.xyz/) and served via
| Property | Value |
|----------|-------|
| **URL** | https://docs.ops.eblu.me |
| **Public URL** | https://docs.eblu.me |
| **Private URL** | `docs.ops.eblu.me` (tailnet only, via [[caddy]]) |
| **Namespace** | `docs` |
| **Container** | `registry.ops.eblu.me/blumeops/quartz:v1.0.0` |
| **Source** | `docs/` directory in blumeops repo |
| **Build** | Forgejo workflow `build-blumeops.yaml` |
| **Public proxy** | [[flyio-proxy]] (Fly.io → Tailscale tunnel) |
## Architecture

View file

@ -0,0 +1,64 @@
---
title: Fly.io Proxy
tags:
- service
- networking
- fly-io
---
# Fly.io Proxy
Public reverse proxy on [Fly.io](https://fly.io) that exposes selected BlumeOps services to the internet via a Tailscale tunnel back to the homelab.
## Quick Reference
| Property | Value |
|----------|-------|
| **App** | `blumeops-proxy` |
| **Region** | `sjc` (San Jose) |
| **Fly.io URL** | `blumeops-proxy.fly.dev` |
| **Config** | `fly/` directory in repo |
| **IaC** | `fly/fly.toml` (app), Pulumi (DNS + auth key) |
## Exposed Services
| Public domain | Backend | Service |
|---------------|---------|---------|
| `docs.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] |
## Architecture
Internet traffic hits Fly.io's Anycast edge, terminates TLS with a Let's Encrypt certificate, and is proxied by nginx to the backend service over a Tailscale WireGuard tunnel. See [[expose-service-publicly]] for the full architecture diagram.
## Key Files
| File | Purpose |
|------|---------|
| `fly/fly.toml` | App configuration |
| `fly/Dockerfile` | nginx + Tailscale container |
| `fly/nginx.conf` | Reverse proxy, caching, rate limiting |
| `fly/start.sh` | Entrypoint: start Tailscale, then nginx |
| `pulumi/tailscale/__main__.py` | Auth key (`tag:flyio-proxy`) |
| `pulumi/tailscale/policy.hujson` | ACL grants for proxy |
| `pulumi/gandi/__main__.py` | DNS CNAMEs |
## Networking
Fly.io runs Firecracker microVMs which support TUN devices natively. Tailscale runs with a real TUN interface (not userspace networking), so MagicDNS and direct Tailscale IP routing work normally.
The Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts.
## Secrets
| Secret | Source | Description |
|--------|--------|-------------|
| `TS_AUTHKEY` | Pulumi state → `fly secrets` | Tailscale auth key for joining tailnet |
| `FLY_DEPLOY_TOKEN` | Fly.io → 1Password | Deploy token for CI |
## Related
- [[expose-service-publicly]] - Setup guide for adding new public services
- [[manage-flyio-proxy]] - Operational tasks (deploy, shutoff, troubleshoot)
- [[caddy]] - Private reverse proxy for `*.ops.eblu.me` (separate system)
- [[tailscale]] - WireGuard mesh network
- [[gandi]] - DNS hosting

View file

@ -67,7 +67,7 @@ Documentation uses `[[wiki-links]]` for cross-references:
- `[[service-name]]` links to a reference page
- `[[page|Display Text]]` customizes the link text
When reading on the web (docs.ops.eblu.me), these render as clickable links. The backlinks panel shows what references each page.
When reading on the web (docs.eblu.me), these render as clickable links. The backlinks panel shows what references each page.
Pre-commit hooks automatically validate that all wiki-links point to existing files and that link targets are unambiguous.