diff --git a/README.md b/README.md index f9968fc..cd9fa73 100644 --- a/README.md +++ b/README.md @@ -80,6 +80,6 @@ This repo uses [Forgejo Actions](https://forgejo.org/docs/latest/user/actions/) ## Documentation -Documentation lives in `docs/` and follows the [Diataxis](https://diataxis.fr/) framework. Published at https://docs.ops.eblu.me. +Documentation lives in `docs/` and follows the [Diataxis](https://diataxis.fr/) framework. Published at https://docs.eblu.me. Docs use [Obsidian](https://obsidian.md) wiki-link syntax (`[[link]]`) for cross-references. Edit with any markdown editor, or use [obsidian.nvim](https://github.com/obsidian-nvim/obsidian.nvim) for enhanced navigation. diff --git a/argocd/manifests/docs/ingress-tailscale.yaml b/argocd/manifests/docs/ingress-tailscale.yaml index 4c6710f..b895cfb 100644 --- a/argocd/manifests/docs/ingress-tailscale.yaml +++ b/argocd/manifests/docs/ingress-tailscale.yaml @@ -11,7 +11,7 @@ metadata: gethomepage.dev/group: "Apps" gethomepage.dev/icon: "mdi-book-open-page-variant" gethomepage.dev/description: "BlumeOps Documentation" - gethomepage.dev/href: "https://docs.ops.eblu.me" + gethomepage.dev/href: "https://docs.eblu.me" gethomepage.dev/pod-selector: "app=docs" spec: ingressClassName: tailscale diff --git a/docs/changelog.d/feature-flyio-proxy.doc.md b/docs/changelog.d/feature-flyio-proxy.doc.md new file mode 100644 index 0000000..d6accbf --- /dev/null +++ b/docs/changelog.d/feature-flyio-proxy.doc.md @@ -0,0 +1 @@ +Update docs for public proxy: canonical URL is now docs.eblu.me, add Fly.io proxy reference card and operations how-to diff --git a/docs/how-to/expose-service-publicly.md b/docs/how-to/expose-service-publicly.md index 77a0220..1f31302 100644 --- a/docs/how-to/expose-service-publicly.md +++ b/docs/how-to/expose-service-publicly.md @@ -655,22 +655,13 @@ Setup considerations for Forgejo specifically: ### Break-glass shutoff -If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network): +If the proxy is causing issues, stop it immediately: -**Level 1 — Stop the container (seconds, reversible):** ```bash mise run fly-shutoff -# or: fly scale count 0 -a blumeops-proxy --yes ``` -All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with `fly scale count 1 -a blumeops-proxy`. -**Level 2 — Revoke Tailscale access (seconds):** -Remove the `flyio-proxy` node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised. - -**Level 3 — Remove DNS (minutes to hours):** -Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff. - -**Level 1 is the primary response.** It is a single command, takes effect in seconds, and is trivially reversible. Document the `mise run fly-shutoff` command somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress. +This stops all machines in seconds — zero traffic reaches indri. See [[manage-flyio-proxy#Emergency Shutoff]] for the full escalation ladder (container stop → Tailscale revoke → DNS removal). --- diff --git a/docs/how-to/how-to.md b/docs/how-to/how-to.md index 3a8a587..0d12ec4 100644 --- a/docs/how-to/how-to.md +++ b/docs/how-to/how-to.md @@ -41,4 +41,5 @@ Task-oriented instructions for common BlumeOps operations. These guides assume y | Guide | Description | |-------|-------------| | [[restart-indri]] | Safely shut down and restart indri | +| [[manage-flyio-proxy]] | Deploy, shutoff, and troubleshoot the public proxy | | [[troubleshooting]] | Diagnose and fix common issues | diff --git a/docs/how-to/manage-flyio-proxy.md b/docs/how-to/manage-flyio-proxy.md new file mode 100644 index 0000000..b8c04bb --- /dev/null +++ b/docs/how-to/manage-flyio-proxy.md @@ -0,0 +1,88 @@ +--- +title: Manage Fly.io Proxy +tags: + - how-to + - fly-io + - networking + - operations +--- + +# Manage Fly.io Proxy + +Operational tasks for the [[flyio-proxy]] public reverse proxy. + +## Deploy Changes + +After modifying files in `fly/`: + +```bash +mise run fly-deploy +``` + +Pushes to `fly/` on main also trigger automatic deployment via the Forgejo CI workflow. + +## Add a New Public Service + +See [[expose-service-publicly#Per-service setup]] for the full walkthrough. In short: + +1. Add a `server` block to `fly/nginx.conf` +2. Add a Fly.io certificate: `fly certs add -a blumeops-proxy` +3. Deploy: `mise run fly-deploy` +4. Verify against `blumeops-proxy.fly.dev` with a `Host` header +5. Add DNS CNAME via Pulumi: `mise run dns-preview` then `mise run dns-up` + +## Emergency Shutoff + +If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network): + +**Level 1 — Stop the container (seconds, reversible):** +```bash +mise run fly-shutoff +# or: fly scale count 0 -a blumeops-proxy --yes +``` +All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with `fly scale count 1 -a blumeops-proxy`. + +**Level 2 — Revoke Tailscale access (seconds):** +Remove the `flyio-proxy` node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised. + +**Level 3 — Remove DNS (minutes to hours):** +Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff. + +**Level 1 is the primary response.** It is a single command, takes effect in seconds, and is trivially reversible. Keep `mise run fly-shutoff` somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress. + +## Check Status + +```bash +# App and machine status +fly status -a blumeops-proxy + +# Live logs +fly logs -a blumeops-proxy + +# Health check +curl -sf https://blumeops-proxy.fly.dev/healthz + +# Certificate status +fly certs list -a blumeops-proxy +``` + +## Rotate Tailscale Auth Key + +The auth key expires every 90 days. To rotate: + +1. Re-apply Pulumi to generate a new key: `mise run tailnet-up` +2. Re-run setup to stage the new secret: `mise run fly-setup` +3. Deploy to pick up the new secret: `mise run fly-deploy` + +## Troubleshooting + +**502 Bad Gateway**: Check `fly logs` for nginx upstream errors. Verify the backend Tailscale service is running (`tailscale status` from inside the container via `fly ssh console`). + +**Health check failing**: `fly ssh console -a blumeops-proxy` then `curl localhost:8080/healthz` to test locally. + +**TLS errors on custom domain**: Check cert status with `fly certs show -a blumeops-proxy`. Certs auto-provision via Let's Encrypt and may take a few minutes. + +## Related + +- [[flyio-proxy]] - Service reference card +- [[expose-service-publicly]] - Full setup guide and architecture diff --git a/docs/how-to/update-documentation.md b/docs/how-to/update-documentation.md index 72dd2b1..400cdaf 100644 --- a/docs/how-to/update-documentation.md +++ b/docs/how-to/update-documentation.md @@ -8,7 +8,7 @@ tags: # Update Documentation -How to publish documentation changes to https://docs.ops.eblu.me. +How to publish documentation changes to https://docs.eblu.me. ## Quick Release diff --git a/docs/index.md b/docs/index.md index a385da8..5a5c1c5 100644 --- a/docs/index.md +++ b/docs/index.md @@ -22,8 +22,10 @@ editor of choice. (I recommend vim.) These services run on my home [[hosts|infrastructure]], primarily an m1 mac mini named [[indri]] and a Synology NAS called [[sifaka]]. The infrastructure -is networked via [[tailscale]], with the domain `eblu.me` hosted via [[gandi]] -with [[caddy]] providing a reverse proxy to resolve tailnet devices. +is networked via [[tailscale]], with the domain `eblu.me` hosted via [[gandi]], +[[caddy]] providing a private reverse proxy for tailnet devices, and +[[flyio-proxy|Fly.io]] serving public-facing services like +[this documentation site](https://docs.eblu.me). The goal of BlumeOps is threefold: diff --git a/docs/reference/reference.md b/docs/reference/reference.md index 7c300ae..1041fb6 100644 --- a/docs/reference/reference.md +++ b/docs/reference/reference.md @@ -34,6 +34,7 @@ Individual service reference cards with URLs and configuration details. | [[zot]] | Container registry | indri | | [[devpi]] | PyPI caching proxy | k8s | | [[docs]] | Documentation site (Quartz) | k8s | +| [[flyio-proxy]] | Public reverse proxy (Fly.io + Tailscale) | Fly.io | | [[automounter]] | SMB share automounter | indri | ## Infrastructure diff --git a/docs/reference/services/caddy.md b/docs/reference/services/caddy.md index 631e4de..49b60e7 100644 --- a/docs/reference/services/caddy.md +++ b/docs/reference/services/caddy.md @@ -47,7 +47,7 @@ K8s services are proxied via their Tailscale Ingress endpoints: |-----------|---------|---------| | `grafana.ops.eblu.me` | `grafana.tail8d86e.ts.net` | [[grafana]] | | `argocd.ops.eblu.me` | `argocd.tail8d86e.ts.net` | [[argocd]] | -| `docs.ops.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] | +| `docs.ops.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] (now publicly available at `docs.eblu.me` via [[flyio-proxy]]) | | `feed.ops.eblu.me` | `feed.tail8d86e.ts.net` | [[miniflux]] | | ... | ... | (see defaults/main.yml for full list) | diff --git a/docs/reference/services/docs.md b/docs/reference/services/docs.md index fe2c6dc..79fe5da 100644 --- a/docs/reference/services/docs.md +++ b/docs/reference/services/docs.md @@ -13,11 +13,13 @@ Documentation site built with [Quartz](https://quartz.jzhao.xyz/) and served via | Property | Value | |----------|-------| -| **URL** | https://docs.ops.eblu.me | +| **Public URL** | https://docs.eblu.me | +| **Private URL** | `docs.ops.eblu.me` (tailnet only, via [[caddy]]) | | **Namespace** | `docs` | | **Container** | `registry.ops.eblu.me/blumeops/quartz:v1.0.0` | | **Source** | `docs/` directory in blumeops repo | | **Build** | Forgejo workflow `build-blumeops.yaml` | +| **Public proxy** | [[flyio-proxy]] (Fly.io → Tailscale tunnel) | ## Architecture diff --git a/docs/reference/services/flyio-proxy.md b/docs/reference/services/flyio-proxy.md new file mode 100644 index 0000000..17162b2 --- /dev/null +++ b/docs/reference/services/flyio-proxy.md @@ -0,0 +1,64 @@ +--- +title: Fly.io Proxy +tags: + - service + - networking + - fly-io +--- + +# Fly.io Proxy + +Public reverse proxy on [Fly.io](https://fly.io) that exposes selected BlumeOps services to the internet via a Tailscale tunnel back to the homelab. + +## Quick Reference + +| Property | Value | +|----------|-------| +| **App** | `blumeops-proxy` | +| **Region** | `sjc` (San Jose) | +| **Fly.io URL** | `blumeops-proxy.fly.dev` | +| **Config** | `fly/` directory in repo | +| **IaC** | `fly/fly.toml` (app), Pulumi (DNS + auth key) | + +## Exposed Services + +| Public domain | Backend | Service | +|---------------|---------|---------| +| `docs.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] | + +## Architecture + +Internet traffic hits Fly.io's Anycast edge, terminates TLS with a Let's Encrypt certificate, and is proxied by nginx to the backend service over a Tailscale WireGuard tunnel. See [[expose-service-publicly]] for the full architecture diagram. + +## Key Files + +| File | Purpose | +|------|---------| +| `fly/fly.toml` | App configuration | +| `fly/Dockerfile` | nginx + Tailscale container | +| `fly/nginx.conf` | Reverse proxy, caching, rate limiting | +| `fly/start.sh` | Entrypoint: start Tailscale, then nginx | +| `pulumi/tailscale/__main__.py` | Auth key (`tag:flyio-proxy`) | +| `pulumi/tailscale/policy.hujson` | ACL grants for proxy | +| `pulumi/gandi/__main__.py` | DNS CNAMEs | + +## Networking + +Fly.io runs Firecracker microVMs which support TUN devices natively. Tailscale runs with a real TUN interface (not userspace networking), so MagicDNS and direct Tailscale IP routing work normally. + +The Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts. + +## Secrets + +| Secret | Source | Description | +|--------|--------|-------------| +| `TS_AUTHKEY` | Pulumi state → `fly secrets` | Tailscale auth key for joining tailnet | +| `FLY_DEPLOY_TOKEN` | Fly.io → 1Password | Deploy token for CI | + +## Related + +- [[expose-service-publicly]] - Setup guide for adding new public services +- [[manage-flyio-proxy]] - Operational tasks (deploy, shutoff, troubleshoot) +- [[caddy]] - Private reverse proxy for `*.ops.eblu.me` (separate system) +- [[tailscale]] - WireGuard mesh network +- [[gandi]] - DNS hosting diff --git a/docs/tutorials/exploring-the-docs.md b/docs/tutorials/exploring-the-docs.md index 475419d..db9c8a0 100644 --- a/docs/tutorials/exploring-the-docs.md +++ b/docs/tutorials/exploring-the-docs.md @@ -67,7 +67,7 @@ Documentation uses `[[wiki-links]]` for cross-references: - `[[service-name]]` links to a reference page - `[[page|Display Text]]` customizes the link text -When reading on the web (docs.ops.eblu.me), these render as clickable links. The backlinks panel shows what references each page. +When reading on the web (docs.eblu.me), these render as clickable links. The backlinks panel shows what references each page. Pre-commit hooks automatically validate that all wiki-links point to existing files and that link targets are unambiguous.