From d2ea6358d235556ce57ea3257c4dbea22b56f702 Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sun, 8 Feb 2026 00:08:23 -0800 Subject: [PATCH] Rewrite public exposure guide to Fly.io + Tailscale approach Replace the Cloudflare Tunnel plan with a Fly.io reverse proxy architecture that tunnels back to indri over Tailscale. Covers: - Full architecture with nginx proxy cache + rate limiting - One-time setup vs per-service steps - Fly.io container (Dockerfile, fly.toml, nginx.conf, start.sh) - Pulumi IaC for Tailscale auth key + DNS CNAMEs - Forgejo CI workflow for automated deploys - Security model, DDoS considerations, break-glass shutoff - Mise tasks: fly-deploy, fly-setup, fly-shutoff Also fix docs-check-links to handle in-page anchor links ([[#Heading]]) and cross-file anchors ([[file#Heading]]). Co-Authored-By: Claude Opus 4.6 --- .../docs-expose-service-publicly.doc.md | 2 +- docs/how-to/expose-service-publicly.md | 585 ++++++++++++++---- docs/how-to/how-to.md | 2 +- mise-tasks/docs-check-links | 23 +- 4 files changed, 474 insertions(+), 138 deletions(-) diff --git a/docs/changelog.d/docs-expose-service-publicly.doc.md b/docs/changelog.d/docs-expose-service-publicly.doc.md index 25f225f..e3019c2 100644 --- a/docs/changelog.d/docs-expose-service-publicly.doc.md +++ b/docs/changelog.d/docs-expose-service-publicly.doc.md @@ -1 +1 @@ -Add how-to guide for exposing services publicly via Cloudflare Tunnel. +Add how-to guide for exposing services publicly via Fly.io reverse proxy + Tailscale tunnel. diff --git a/docs/how-to/expose-service-publicly.md b/docs/how-to/expose-service-publicly.md index 1e9b10d..e64e61c 100644 --- a/docs/how-to/expose-service-publicly.md +++ b/docs/how-to/expose-service-publicly.md @@ -2,197 +2,522 @@ title: Expose a Service Publicly tags: - how-to - - cloudflare + - fly-io + - tailscale - networking --- -# Expose a Service Publicly via Cloudflare Tunnel +# Expose a Service Publicly via Fly.io + Tailscale -> **Status:** Plan — not yet implemented. Execute phases in order when ready. +> **Status:** Plan — not yet implemented. First target: `docs.eblu.me`. -This guide describes how to expose a BlumeOps service to the public internet securely using Cloudflare as a CDN and DDoS shield, with a Cloudflare Tunnel creating an outbound-only connection that never exposes the home IP. - -The first service to expose is `docs.eblu.me`. The pattern is reusable for future services. +This guide describes how to expose a BlumeOps service to the public internet using a reverse proxy container on [Fly.io](https://fly.io) that tunnels back to [[indri]] over [[tailscale]]. The approach keeps the home IP hidden, requires no changes to existing infrastructure (`*.ops.eblu.me`, [[caddy]], DNS), and is reusable for multiple services. ## Architecture ``` -Internet → docs.eblu.me (Cloudflare proxied CNAME) +Internet → .eblu.me │ - Cloudflare Edge (CDN, WAF, DDoS protection) + Fly.io edge (Anycast, TLS via Let's Encrypt) │ - Cloudflare Tunnel (outbound from k8s) + Fly.io VM (nginx reverse proxy + Tailscale) + │ (WireGuard tunnel) + tailnet (tail8d86e.ts.net) │ - cloudflared pod in minikube + .tail8d86e.ts.net (Tailscale ingress) │ - docs k8s Service (ClusterIP, port 80) - │ - docs pod (nginx + Quartz static site) - -Tailnet → *.ops.eblu.me (unchanged, DNS-only to Tailscale IP) + k8s Service → pod ``` -All existing `*.ops.eblu.me` services remain private behind Tailscale. Only explicitly configured subdomains (like `docs.eblu.me`) are exposed publicly through Cloudflare. +A single Fly.io container serves as the public-facing proxy for all exposed services. Each service gets a `server` block in the nginx config and a DNS CNAME. The container joins the tailnet via an ephemeral auth key and reaches backend services through Tailscale ingress endpoints. -## Key Decisions +Existing `*.ops.eblu.me` services remain private behind Tailscale — this approach does not touch [[caddy]], [[gandi]] DNS-01, or any other existing infrastructure. + +## Key decisions | Decision | Choice | Rationale | |----------|--------|-----------| -| DNS hosting | Move from [[gandi]] to Cloudflare (free) | CNAME/partial setup needs Business plan @ $200/mo | -| Gandi role | Registrar only | Domain renewal, WHOIS. No more DNS hosting. | -| Tunnel host | Kubernetes | ArgoCD managed, direct ClusterIP access, no Tailscale hop | -| [[caddy]] TLS | Migrate to Cloudflare DNS-01 plugin | Gandi DNS-01 won't work after nameserver change | -| Cloudflare account | Recover existing, instrument with IaC | | +| Proxy host | Fly.io (free tier) | Managed container, no server to maintain via Ansible | +| Tunnel | Tailscale (existing) | Already in use, WireGuard encryption, ACL control | +| DNS | CNAME at [[gandi]] | No DNS migration needed, no Cloudflare dependency | +| TLS (public) | Fly.io auto-provisions Let's Encrypt | No cert management, `$0.10/mo` per hostname | +| TLS (origin) | Tailscale handles encryption | WireGuard tunnel encrypts all traffic | +| CDN/cache | nginx `proxy_cache` in container | Aggressive caching for static content, sufficient for personal sites | +| DDoS | Fly.io Anycast + nginx rate limiting | Not enterprise-grade; see [[#Break-glass shutoff]] | +| IaC | `fly/` directory in repo, Pulumi for DNS + TS key | No well-maintained Fly.io Pulumi provider; `fly.toml` is the app's IaC | -## Prerequisites +## TLS in this architecture -- Cloudflare account with `eblu.me` zone added (free plan) -- Cloudflare API token stored in 1Password with scopes: Zone:DNS:Edit, Zone:Zone:Read, Account:Cloudflare Tunnel:Edit, Account:Account Settings:Read -- Cloudflare account ID and zone ID noted +There are three independent TLS segments — none involve Caddy: -## Phase 0: Preparation (manual) +1. **Browser → Fly.io edge**: Fly.io auto-provisions a Let's Encrypt certificate for each custom domain (e.g., `docs.eblu.me`). Validated via TLS-ALPN challenge — no DNS API needed. +2. **nginx → Tailscale ingress**: nginx proxies to `https://.tail8d86e.ts.net`. The Tailscale ingress serves a Tailscale-issued cert. nginx uses `proxy_ssl_verify off` since the underlying tunnel is already encrypted. +3. **WireGuard tunnel**: All Tailscale traffic is encrypted at the network layer regardless of application-level TLS. -1. Recover Cloudflare account access -2. Add `eblu.me` zone (free plan) — Cloudflare scans existing records from Gandi -3. **Do not change nameservers yet** — wait until Phase 3 -4. Create API token with the scopes listed above -5. Store API token and account ID in 1Password (blumeops vault) +Caddy continues to serve `*.ops.eblu.me` with its existing Gandi DNS-01 certificates. The two TLS domains are completely independent. -## Phase 1: Caddy TLS migration +## External references -**Why first**: Blocking dependency for the nameserver change. Once nameservers move to Cloudflare, Gandi LiveDNS can't serve DNS-01 ACME challenges. +- [Tailscale on Fly.io](https://tailscale.com/kb/1132/flydotio) — official guide for running Tailscale in a Fly.io container +- [Fly.io Custom Domains](https://fly.io/docs/networking/custom-domain/) — how Fly handles TLS for custom domains +- [Home Assistant + Fly.io + Tailscale](https://community.home-assistant.io/t/expose-ha-to-the-internet-via-a-cloud-reverse-proxy-fly-io-and-a-vpn-tailscale-for-free-for-now-without-opening-ports/352118) — community guide describing this exact pattern -### Caddy binary rebuild +--- -Rebuild Caddy with `github.com/caddy-dns/cloudflare` instead of `github.com/caddy-dns/gandi` using `xcaddy` in `~/code/3rd/caddy/`. +## One-time setup (first service) -### Files to modify +These steps establish the Fly.io proxy infrastructure. They only need to be done once. -- `ansible/roles/caddy/templates/Caddyfile.j2` — change `dns gandi {env.GANDI_BEARER_TOKEN}` to `dns cloudflare {env.CF_API_TOKEN}` -- `ansible/roles/caddy/templates/caddy-wrapper.sh.j2` — source Cloudflare API token instead of Gandi PAT -- `ansible/roles/caddy/defaults/main.yml` — update token variable name -- `ansible/playbooks/indri.yml` — add pre_task to fetch Cloudflare API token from 1Password, replace Gandi PAT fetch +### Step 1: Fly.io account and app -### Deployment sequence +1. Create or recover a Fly.io account at https://fly.io (requires credit card for free tier) +2. Install `flyctl`: `brew install flyctl` +3. Authenticate: `fly auth login` +4. Create the app: `fly apps create blumeops-proxy` +5. Store the Fly.io deploy token in 1Password (blumeops vault): + - Generate: `fly tokens create deploy -a blumeops-proxy` + - Store as `fly-deploy-token` field -1. Set up Cloudflare zone with all records (Phase 2) -2. Prepare Caddy migration on a branch (this phase) -3. Change nameservers at Gandi (Phase 3) -4. Immediately deploy Caddy update: `mise run provision-indri -- --tags caddy` -5. Caddy's next TLS renewal uses Cloudflare DNS-01 +### Step 2: Repository structure -Existing certificates are valid for ~90 days, providing a grace window. +Create the `fly/` directory at the repository root. This is separate from `containers/` because the image is built and deployed directly to Fly.io by `fly deploy` — it never goes through `registry.ops.eblu.me`. -## Phase 2: Pulumi — Cloudflare IaC +``` +fly/ +├── README.md # Setup notes and context +├── fly.toml # Fly.io app configuration +├── Dockerfile # nginx + tailscale +├── nginx.conf # Reverse proxy + cache config +└── start.sh # Entrypoint: start tailscale, then nginx +``` -Create a new Pulumi project at `pulumi/cloudflare/`. +**`fly/fly.toml`** — app configuration: -### Files to create +```toml +app = "blumeops-proxy" +primary_region = "sjc" -- `pulumi/cloudflare/Pulumi.yaml` — project definition (`blumeops-cloudflare`, python/uv) -- `pulumi/cloudflare/Pulumi.eblu-me.yaml` — stack config (domain, account-id) -- `pulumi/cloudflare/pyproject.toml` — deps: `pulumi>=3.0.0`, `pulumi-cloudflare>=5.0.0` -- `pulumi/cloudflare/__main__.py` +[build] -### Pulumi program manages +[http_service] + internal_port = 8080 + force_https = true + auto_stop_machines = false + auto_start_machines = true + min_machines_running = 1 -- Zone lookup for `eblu.me` -- DNS records: - - `*.ops.eblu.me` A record → Tailscale IP, **proxied=False** (grey cloud, private) - - `ops.eblu.me` A record → Tailscale IP, **proxied=False** - - `docs.eblu.me` CNAME → `.cfargotunnel.com`, **proxied=True** (orange cloud, CDN) -- Cloudflare Tunnel resource -- Tunnel config (ingress: `docs.eblu.me` → `http://docs.docs.svc.cluster.local:80`) -- Cache rules for static docs site (edge TTL: 1 day, browser TTL: 1 hour) -- Zone security settings (SSL: full, min TLS 1.2, always HTTPS) +[checks] + [checks.health] + port = 8080 + type = "http" + interval = "30s" + timeout = "5s" + path = "/healthz" +``` -### New mise tasks +**`fly/Dockerfile`** — nginx + tailscale: -Following the `dns-preview`/`dns-up` pattern: +```dockerfile +FROM nginx:alpine -- `mise-tasks/cloudflare-preview` — `pulumi preview` with 1Password token injection -- `mise-tasks/cloudflare-up` — `pulumi up` with 1Password token injection +# Copy tailscale binaries from official image +COPY --from=docker.io/tailscale/tailscale:stable \ + /usr/local/bin/tailscaled /usr/local/bin/tailscaled +COPY --from=docker.io/tailscale/tailscale:stable \ + /usr/local/bin/tailscale /usr/local/bin/tailscale -Keep `pulumi/gandi/` until migration is confirmed working. Then `pulumi destroy` the Gandi stack and archive the code. +RUN mkdir -p /var/run/tailscale /var/lib/tailscale -## Phase 3: DNS migration +COPY nginx.conf /etc/nginx/nginx.conf +COPY start.sh /start.sh +RUN chmod +x /start.sh -### Pre-migration checklist +EXPOSE 8080 -- [ ] Cloudflare zone active with all records (Phase 2) -- [ ] Caddy migration branch ready (Phase 1) -- [ ] Cloudflare Tunnel created and configured (Phase 2) -- [ ] cloudflared running in k8s (Phase 4) +CMD ["/start.sh"] +``` -### Steps +**`fly/start.sh`** — entrypoint: -1. At Gandi registrar dashboard: change nameservers to Cloudflare's assigned NS -2. Deploy Caddy update immediately: `mise run provision-indri -- --tags caddy` -3. Monitor propagation: `dig +trace docs.eblu.me`, `dig +trace forge.ops.eblu.me` -4. Verify tailnet services still work from tailnet clients -5. Verify `docs.eblu.me` resolves publicly +```bash +#!/bin/sh +set -e -### Rollback +# Start tailscale in userspace networking mode (no TUN device needed) +tailscaled --tun=userspace-networking --statedir=/var/lib/tailscale & +sleep 2 -Change nameservers back to Gandi's at registrar. Everything reverts. +# Authenticate and join tailnet +tailscale up --authkey="${TS_AUTHKEY}" --hostname=flyio-proxy -## Phase 4: cloudflared in Kubernetes +# Wait for tailscale to be ready +until tailscale status > /dev/null 2>&1; do sleep 1; done +echo "Tailscale connected" -### Files to create +# Start nginx +nginx -g "daemon off;" +``` -- `argocd/apps/cloudflare-tunnel.yaml` — ArgoCD Application -- `argocd/manifests/cloudflare-tunnel/deployment.yaml` — cloudflared Deployment - - Image: `cloudflare/cloudflared:latest` (or pinned version) - - Args: `tunnel --no-autoupdate run --token ` - - Single replica, tunnel token injected from a Secret -- `argocd/manifests/cloudflare-tunnel/external-secret.yaml` — ExternalSecret to pull tunnel token from 1Password -- `argocd/manifests/cloudflare-tunnel/kustomization.yaml` +**`fly/nginx.conf`** — reverse proxy with caching and rate limiting: -### Tunnel routing (managed by Pulumi) +```nginx +worker_processes auto; -- `docs.eblu.me` → `http://docs.docs.svc.cluster.local:80` (direct k8s service access) -- Catch-all → `http_status:404` +events { + worker_connections 1024; +} -Namespace: `cloudflare-tunnel` (dedicated, reusable for future public services) +http { + include /etc/nginx/mime.types; + default_type application/octet-stream; -## Phase 5: Documentation and cleanup + # Rate limiting: 10 requests/sec per IP, burst of 20 + limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s; -### Files to create + # Proxy cache: 200MB, evict after 24h of no access + proxy_cache_path /tmp/cache levels=1:2 keys_zone=services:10m + max_size=200m inactive=24h; -- `docs/reference/infrastructure/cloudflare.md` — reference card -- `docs/changelog.d/.feature.md` — changelog fragment + # --- docs.eblu.me --- + server { + listen 8080; + server_name docs.eblu.me; -### Files to modify + limit_req zone=general burst=20 nodelay; -- `docs/reference/infrastructure/routing.md` — add public services section -- `docs/reference/infrastructure/gandi.md` — update to registrar-only role -- `docs/reference/services/docs.md` — add public URL `https://docs.eblu.me` -- `docs/reference/reference.md` — add Cloudflare to infrastructure section -- `CLAUDE.md` — update routing table, add cloudflare tasks + location / { + proxy_pass https://docs.tail8d86e.ts.net; + proxy_ssl_verify off; + + # Cache aggressively — static site + proxy_cache services; + proxy_cache_valid 200 1d; + proxy_cache_valid 404 1m; + proxy_cache_use_stale error timeout updating; + proxy_cache_lock on; + + # Prevent cache-busting: ignore query strings and + # client cache-control headers + proxy_cache_key $host$uri; + proxy_ignore_headers Cache-Control Set-Cookie; + + add_header X-Cache-Status $upstream_cache_status; + } + + location /healthz { + return 200 "ok\n"; + } + } + + # Catch-all: reject unknown hosts + server { + listen 8080 default_server; + return 444; + } +} +``` + +### Step 3: Tailscale auth key and ACLs (Pulumi) + +Extend the existing `pulumi/tailscale/` project. + +**Add to `pulumi/tailscale/__main__.py`:** + +```python +# Auth key for Fly.io proxy container +flyio_key = tailscale.TailscaleKey( + "flyio-proxy-key", + reusable=True, + ephemeral=True, + tags=["tag:flyio-proxy"], + expiry=7776000, # 90 days +) +pulumi.export("flyio_authkey", flyio_key.key) +``` + +**Add to `pulumi/tailscale/policy.hujson`:** + +Tag owner: +``` +"tag:flyio-proxy": ["autogroup:admin", "tag:blumeops"], +``` + +Access grant (Fly.io proxy → k8s services on HTTPS only): +``` +{ + "src": ["tag:flyio-proxy"], + "dst": ["tag:k8s"], + "ip": ["tcp:443"], +}, +``` + +ACL test: +``` +{ + "src": "tag:flyio-proxy", + "accept": ["tag:k8s:443"], + "deny": ["tag:homelab:22", "tag:nas:445", "tag:registry:443"], +}, +``` + +Deploy: `mise run tailnet-preview` then `mise run tailnet-up`. + +After deploying, extract the auth key and set it as a Fly.io secret: + +```bash +# Get the key from Pulumi state +cd pulumi/tailscale && pulumi stack output flyio_authkey --show-secrets + +# Set it in Fly.io +fly secrets set TS_AUTHKEY="tskey-auth-..." -a blumeops-proxy +``` + +Store the auth key in 1Password as well for the `fly-setup` mise task. + +### Step 4: Mise tasks + +**`mise-tasks/fly-deploy`:** + +```bash +#!/usr/bin/env bash +#MISE description="Deploy the Fly.io public proxy" + +set -euo pipefail + +cd "$(dirname "$0")/../fly" +fly deploy "$@" +``` + +**`mise-tasks/fly-setup`:** + +```bash +#!/usr/bin/env bash +#MISE description="One-time setup: configure Fly.io secrets and certs (idempotent)" + +set -euo pipefail + +APP="blumeops-proxy" + +# Fetch Tailscale auth key from 1Password +TS_AUTHKEY=$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get --fields ts-authkey --reveal) +fly secrets set TS_AUTHKEY="$TS_AUTHKEY" -a "$APP" +echo "Tailscale auth key set" + +# Add certs for all public domains (idempotent — fly ignores duplicates) +fly certs add docs.eblu.me -a "$APP" 2>/dev/null || true +# fly certs add wiki.eblu.me -a "$APP" 2>/dev/null || true # future services +echo "Certificates configured" + +echo "Done. Run 'mise run fly-deploy' to deploy." +``` + +**`mise-tasks/fly-shutoff`:** + +```bash +#!/usr/bin/env bash +#MISE description="Emergency shutoff: stop all Fly.io proxy machines" + +set -euo pipefail + +APP="blumeops-proxy" + +echo "EMERGENCY SHUTOFF: Stopping all machines for $APP" +fly scale count 0 -a "$APP" --yes +echo "All machines stopped. Public services are offline." +echo "To restore: fly scale count 1 -a $APP" +``` + +### Step 5: Forgejo CI workflow + +**`.forgejo/workflows/deploy-fly.yaml`:** + +```yaml +name: Deploy Fly.io Proxy + +on: + workflow_dispatch: + push: + branches: [main] + paths: + - 'fly/**' + +jobs: + deploy: + runs-on: k8s + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Install flyctl + run: | + curl -L https://fly.io/install.sh | sh + echo "/root/.fly/bin" >> "$GITHUB_PATH" + + - name: Deploy to Fly.io + env: + FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }} + run: | + cd fly + fly deploy + + - name: Verify health + env: + FLY_API_TOKEN: ${{ secrets.FLY_DEPLOY_TOKEN }} + run: | + fly status -a blumeops-proxy + echo "" + echo "Health check:" + sleep 10 + curl -sf https://blumeops-proxy.fly.dev/healthz || echo "Warning: health check failed (may need DNS propagation)" +``` + +The `FLY_DEPLOY_TOKEN` Forgejo Actions secret must be set via the [[forgejo]] API or UI, following the pattern in the `forgejo_actions_secrets` Ansible role. + +--- + +## Per-service setup + +To expose an additional service (example: `wiki.eblu.me`): + +### 1. Add nginx server block + +Edit `fly/nginx.conf` — add a new `server` block: + +```nginx +# --- wiki.eblu.me --- +server { + listen 8080; + server_name wiki.eblu.me; + + limit_req zone=general burst=20 nodelay; + + location / { + proxy_pass https://wiki.tail8d86e.ts.net; + proxy_ssl_verify off; + + proxy_cache services; + proxy_cache_valid 200 1d; + proxy_cache_valid 404 1m; + proxy_cache_use_stale error timeout updating; + proxy_cache_lock on; + proxy_cache_key $host$uri; + proxy_ignore_headers Cache-Control Set-Cookie; + + add_header X-Cache-Status $upstream_cache_status; + } +} +``` + +Adjust `proxy_cache_valid` and `proxy_cache_key` based on the service. For dynamic services with user sessions, you'll want shorter cache TTLs and may need to include query strings or cookies in the cache key. + +### 2. Add DNS CNAME (Pulumi) + +Add to `pulumi/gandi/__main__.py`: + +```python +wiki_public = gandi.livedns.Record( + "wiki-public", + zone=domain, + name="wiki", + type="CNAME", + ttl=300, + values=["blumeops-proxy.fly.dev."], +) +``` + +Deploy: `mise run dns-preview` then `mise run dns-up`. + +### 3. Add Fly.io certificate + +```bash +fly certs add wiki.eblu.me -a blumeops-proxy +``` + +Or add it to `mise-tasks/fly-setup` so it's captured for future runs. + +### 4. Deploy + +```bash +mise run fly-deploy +``` + +Or push the `fly/nginx.conf` change to main — the Forgejo workflow deploys automatically. + +### 5. Verify + +```bash +curl -I https://wiki.eblu.me +# Should return 200 with X-Cache-Status header +``` + +### 6. Update Tailscale ACLs if needed + +If the new service uses a Tailscale tag not already in the `tag:flyio-proxy` grant, add it to `policy.hujson`. + +--- + +## Security + +### DDoS and rate limiting + +This approach provides basic protection, not enterprise-grade: + +- **Fly.io Anycast** absorbs volumetric L3/L4 attacks +- **nginx `limit_req`** caps per-IP request rates at the container level +- **nginx `proxy_cache`** serves most requests from cache — only cache misses traverse the Tailscale tunnel to indri +- **`proxy_cache_key $host$uri`** ignores query strings, preventing trivial cache-busting +- **`proxy_ignore_headers Cache-Control`** prevents clients from forcing cache misses + +This is sufficient for a personal documentation site. It is **not** sufficient for a service that might attract targeted attacks. For enterprise-grade DDoS protection, Cloudflare Tunnel is the better approach (requires DNS migration, see plan history in git). + +### What fail2ban is (and why it doesn't apply) + +fail2ban monitors logs for repeated failed authentication attempts (SSH brute force, bad login passwords) and bans IPs via firewall rules. A static site with no authentication has no login surface for fail2ban to monitor. It is a tool for services with user sessions, not for CDN/proxy protection. + +### Break-glass shutoff + +If the proxy is causing issues (DDoS, unexpected traffic, bandwidth consumption on the home network): + +**Level 1 — Stop the container (seconds, reversible):** +```bash +mise run fly-shutoff +# or: fly scale count 0 -a blumeops-proxy --yes +``` +All public services go offline immediately. Tailscale tunnel drops. Zero traffic reaches indri. Restore with `fly scale count 1 -a blumeops-proxy`. + +**Level 2 — Revoke Tailscale access (seconds):** +Remove the `flyio-proxy` node in the Tailscale admin console. Even if the container is running, it cannot reach the tailnet. Use this if the container itself may be compromised. + +**Level 3 — Remove DNS (minutes to hours):** +Delete the CNAME records at Gandi. Takes time for DNS propagation but is the permanent shutoff. + +**Level 1 is the primary response.** It is a single command, takes effect in seconds, and is trivially reversible. Document the `mise run fly-shutoff` command somewhere easily accessible (e.g., pinned in a notes app) so it can be run quickly under stress. + +--- + +## IaC summary + +| Component | Managed by | Declarative? | +|-----------|------------|:---:| +| Tailscale auth key | Pulumi (`pulumi/tailscale/`) | yes | +| Tailscale ACLs | Pulumi (`pulumi/tailscale/policy.hujson`) | yes | +| DNS CNAMEs | Pulumi (`pulumi/gandi/`) | yes | +| Container + app config | `fly/Dockerfile` + `fly/fly.toml` in repo | yes | +| Deployment | Forgejo CI on push to `fly/`, or `mise run fly-deploy` | yes | +| Fly.io secrets + certs | `mise run fly-setup` (one-time, idempotent) | semi | + +The "semi" for Fly.io secrets is a one-time operation backed by a repeatable mise task. Fly.io does not have a mature Pulumi or Terraform provider, so `fly.toml` + `flyctl` is the standard IaC model for Fly.io apps. + +--- ## Verification -1. `curl -I https://docs.eblu.me` from public internet — returns 200 with `cf-ray` header -2. `dig docs.eblu.me` — shows Cloudflare IPs (not Tailscale IP) -3. `dig forge.ops.eblu.me` — still shows `100.98.163.89` (Tailscale IP) -4. All `*.ops.eblu.me` services accessible from tailnet +After initial deployment of a service (using `docs.eblu.me` as example): + +1. `curl -I https://docs.eblu.me` — returns 200 with `X-Cache-Status` header +2. `dig docs.eblu.me` — resolves to Fly.io IPs (not Tailscale IP) +3. `dig forge.ops.eblu.me` — still resolves to `100.98.163.89` (unchanged) +4. All `*.ops.eblu.me` services work from tailnet 5. `mise run services-check` passes -6. Caddy TLS renewal works (force test with `caddy reload` if needed) -7. Cloudflare dashboard shows tunnel healthy and cache hits - -## Risks - -| Risk | Mitigation | -|------|------------| -| Caddy TLS renewal fails after NS change | Deploy Caddy update immediately; existing certs valid ~90 days | -| DNS propagation delay (24-48h) | Set low TTLs before migration; monitor with `dig +trace` | -| cloudflared crashes | K8s restarts it; Cloudflare serves cached content | -| Tunnel credentials leak | 1Password + ExternalSecret; tunnel only routes to docs | - -## Adding more public services - -To expose another service publicly (e.g., `wiki.eblu.me`): - -1. Add DNS record + tunnel ingress rule in `pulumi/cloudflare/__main__.py` -2. Run `mise run cloudflare-up` -3. No changes to cloudflared deployment (remotely-managed tunnel config) +6. `fly status -a blumeops-proxy` shows healthy machine +7. Second request to same URL shows `X-Cache-Status: HIT` diff --git a/docs/how-to/how-to.md b/docs/how-to/how-to.md index 21e9715..3a8a587 100644 --- a/docs/how-to/how-to.md +++ b/docs/how-to/how-to.md @@ -22,7 +22,7 @@ Task-oriented instructions for common BlumeOps operations. These guides assume y | [[update-tailscale-acls]] | Update Tailscale access control policies | | [[gandi-operations]] | Manage DNS records and cycle the Gandi API token | | [[use-pypi-proxy]] | Configure pip and publish packages to devpi | -| [[expose-service-publicly]] | Expose a service to the public internet via Cloudflare Tunnel | +| [[expose-service-publicly]] | Expose a service to the public internet via Fly.io + Tailscale | ## Documentation diff --git a/mise-tasks/docs-check-links b/mise-tasks/docs-check-links index 3d62971..46deab9 100755 --- a/mise-tasks/docs-check-links +++ b/mise-tasks/docs-check-links @@ -125,17 +125,28 @@ def main() -> int: if has_spaces: # Links with spaces in target or around pipe are not allowed spaced_links.append((rel_path, line_num, target)) - elif "/" in target: + continue + + # Handle anchor links: [[#Heading]] or [[file#Heading]] + # Strip the #fragment for validation; pure anchors (#Heading) skip file check + file_target = target + if "#" in target: + file_target = target.split("#", 1)[0] + if not file_target: + # Pure in-page anchor like [[#Break-glass shutoff]] — always valid + continue + + if "/" in file_target: # Path-based links are not allowed - use simple filenames only path_links.append((rel_path, line_num, target)) - elif target in ambiguous_filenames: + elif file_target in ambiguous_filenames: # Link uses an ambiguous filename - needs to be renamed - ambiguous_links.append((rel_path, line_num, target, filename_counts[target])) - elif target not in valid_targets: + ambiguous_links.append((rel_path, line_num, target, filename_counts[file_target])) + elif file_target not in valid_targets: broken_links.append((rel_path, line_num, target)) - elif target != source_stem: + elif file_target != source_stem: # Valid link to a different doc — record it for orphan detection - linked_stems.add(target) + linked_stems.add(file_target) # Print results console.print("[bold]Wiki-Link Validation[/bold]")