From 54db8643a148e2a77a4fcfe05410c7b65db217ec Mon Sep 17 00:00:00 2001 From: Erich Blume Date: Sun, 8 Feb 2026 21:53:07 -0800 Subject: [PATCH] Update docs to reflect public service routing via Fly.io - security-model: Replace "no public access" with Fly.io proxy description - routing: Add *.eblu.me as third DNS domain for public services - architecture: Add Fly.io to network layer and service routing table - CLAUDE.md: Add public routing domain to routing table - gandi: Add public CNAME records section - tailscale-operator: Document ProxyGroup, VIP routing, per-Ingress tags - flyio-proxy: Clarify why Alloy uses direct Tailscale endpoints (ACL) - Remove hardcoded Tailscale IP (100.98.163.89) from docs, use DNS names Co-Authored-By: Claude Opus 4.6 --- CLAUDE.md | 3 ++- docs/explanation/architecture.md | 8 +++++--- docs/explanation/security-model.md | 10 +++++++--- docs/how-to/expose-service-publicly.md | 2 +- docs/how-to/gandi-operations.md | 4 ++-- docs/reference/infrastructure/gandi.md | 16 ++++++++++++++-- docs/reference/infrastructure/indri.md | 2 +- docs/reference/infrastructure/routing.md | 17 ++++++++++++++--- docs/reference/kubernetes/tailscale-operator.md | 13 +++++++++---- docs/reference/services/flyio-proxy.md | 2 +- 10 files changed, 56 insertions(+), 21 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 1353f2e..60c32b4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -69,7 +69,8 @@ mise run provision-indri -- --check --diff # dry run | Domain | Mechanism | Reachable from | |--------|-----------|----------------| -| `*.ops.eblu.me` | Caddy on indri (100.98.163.89) | everywhere incl. k8s pods | +| `*.eblu.me` | Fly.io proxy (Tailscale tunnel) | public internet | +| `*.ops.eblu.me` | Caddy on indri | k8s pods, containers, tailnet | | `*.tail8d86e.ts.net` | Tailscale MagicDNS | tailnet clients only | Check tailscale serve: `ssh indri 'tailscale serve status --json'` diff --git a/docs/explanation/architecture.md b/docs/explanation/architecture.md index 095c588..d9870c0 100644 --- a/docs/explanation/architecture.md +++ b/docs/explanation/architecture.md @@ -42,15 +42,17 @@ Two always-on devices form the infrastructure backbone: - All devices on tailnet `tail8d86e.ts.net` - ACLs control access between devices and services - MagicDNS provides `*.tail8d86e.ts.net` hostnames -- No port forwarding or public IPs needed +- No port forwarding or public IPs on homelab devices +- Selected services exposed publicly via [[flyio-proxy]] (Fly.io → Tailscale tunnel) ## Service Routing -Two DNS domains route to services: +Three DNS domains route to services: | Domain | Mechanism | Reachable from | |--------|-----------|----------------| -| `*.ops.eblu.me` | Caddy reverse proxy on indri | Everywhere (k8s pods, containers, tailnet) | +| `*.eblu.me` | [[flyio-proxy]] (Fly.io → Tailscale tunnel) | Public internet | +| `*.ops.eblu.me` | Caddy reverse proxy on indri | k8s pods, containers, tailnet clients | | `*.tail8d86e.ts.net` | Tailscale MagicDNS | Tailnet clients only | See [[routing]] for details on when to use which. diff --git a/docs/explanation/security-model.md b/docs/explanation/security-model.md index de5c22a..b7aea88 100644 --- a/docs/explanation/security-model.md +++ b/docs/explanation/security-model.md @@ -17,18 +17,22 @@ The foundational security decision is using [[tailscale]] as the network layer. ### Zero Trust Networking -BlumeOps has no public IP addresses or port forwarding. All services are only accessible via Tailscale: +BlumeOps infrastructure has no public IP addresses or port forwarding. Most services are only accessible via Tailscale: -- **No attack surface** from the public internet - **Encrypted by default** - WireGuard encryption for all traffic - **Identity-based access** - ACLs based on user/device identity, not IP addresses +- **Minimal public surface** - only selected services are exposed via [[flyio-proxy]] + +### Public Access via Fly.io + +A small number of services are exposed to the internet through a reverse proxy on Fly.io that tunnels back to the homelab over Tailscale. The proxy uses restricted ACLs (`tag:flyio-target`) so it can only reach explicitly tagged endpoints — a compromised proxy cannot route to arbitrary services on the tailnet. See [[flyio-proxy]] for details and [[expose-service-publicly]] for the security considerations. ### Defense in Depth Even within the tailnet, access is restricted: ``` -Internet ──X──▶ Services (no public access) +Internet ──▶ Fly.io proxy ──▶ tag:flyio-target only (docs, observability) Tailnet: Admin ────────▶ All services diff --git a/docs/how-to/expose-service-publicly.md b/docs/how-to/expose-service-publicly.md index 970dd3c..7fbd79b 100644 --- a/docs/how-to/expose-service-publicly.md +++ b/docs/how-to/expose-service-publicly.md @@ -732,5 +732,5 @@ After deploying DNS (`mise run dns-up`): 1. `curl -I https://docs.eblu.me` — returns 200 with `X-Cache-Status` header 2. `dig docs.eblu.me` — resolves to Fly.io IPs (not Tailscale IP) -3. `dig forge.ops.eblu.me` — still resolves to `100.98.163.89` (unchanged) +3. `dig forge.ops.eblu.me` — still resolves to indri's Tailscale IP (unchanged) 4. Second request to same URL shows `X-Cache-Status: HIT` diff --git a/docs/how-to/gandi-operations.md b/docs/how-to/gandi-operations.md index 6df294e..bebdd52 100644 --- a/docs/how-to/gandi-operations.md +++ b/docs/how-to/gandi-operations.md @@ -74,10 +74,10 @@ A successful preview confirms the new PAT is working. ## Break-Glass Override -If MagicDNS is unavailable and Pulumi can't resolve indri's IP, set the target IP manually: +If MagicDNS is unavailable and Pulumi can't resolve indri's IP, set the target IP manually. Find indri's current Tailscale IP via `tailscale status` or the admin console: ```bash -export BLUMEOPS_REVERSE_PROXY_IP=100.98.163.89 +export BLUMEOPS_REVERSE_PROXY_IP= mise run dns-up ``` diff --git a/docs/reference/infrastructure/gandi.md b/docs/reference/infrastructure/gandi.md index 37643e7..58a54e9 100644 --- a/docs/reference/infrastructure/gandi.md +++ b/docs/reference/infrastructure/gandi.md @@ -21,18 +21,30 @@ DNS hosting provider for the `eblu.me` domain, managed via Pulumi IaC. ## What It Does -Gandi hosts the DNS records that make `*.ops.eblu.me` resolve to [[indri]]'s Tailscale IP (100.98.163.89). Since Tailscale IPs are not publicly routable, this gives services real DNS names while keeping them private to the tailnet. +Gandi hosts the DNS records that make `*.ops.eblu.me` resolve to [[indri]]'s Tailscale IP (`indri.tail8d86e.ts.net`). Since Tailscale IPs are not publicly routable, this gives services real DNS names while keeping them private to the tailnet. The target IP is resolved dynamically from `indri.tail8d86e.ts.net` at deploy time, so if indri's Tailscale IP changes, re-running the deployment is sufficient. ## DNS Records +### Private services (Caddy on indri) + | Record | Type | Value | TTL | |--------|------|-------|-----| | `*.ops.eblu.me` | A | indri's Tailscale IP | 300s | | `ops.eblu.me` | A | indri's Tailscale IP | 300s | -Both records point to [[indri]], which runs [[caddy]] as the reverse proxy for all services. See [[routing]] for the full service URL map. +Both records point to [[indri]], which runs [[caddy]] as the reverse proxy for all private services. + +### Public services (Fly.io proxy) + +| Record | Type | Value | TTL | +|--------|------|-------|-----| +| `docs.eblu.me` | CNAME | `blumeops-proxy.fly.dev` | 300s | + +Public CNAMEs point to [[flyio-proxy]] on Fly.io. See [[expose-service-publicly]] for adding new public services. + +See [[routing]] for the full service URL map. ## Pulumi Configuration diff --git a/docs/reference/infrastructure/indri.md b/docs/reference/infrastructure/indri.md index cf8c60f..7f0b91d 100644 --- a/docs/reference/infrastructure/indri.md +++ b/docs/reference/infrastructure/indri.md @@ -16,7 +16,7 @@ Primary BlumeOps server. Mac Mini M1 (2020). | **Model** | Mac mini M1, 2020 (Macmini9,1) | | **Storage** | 2TB internal SSD | | **macOS** | 15.7.3 (Sequoia) | -| **Tailscale IP** | 100.98.163.89 | +| **Tailscale hostname** | `indri.tail8d86e.ts.net` | | **Tailscale Tag** | `tag:homelab` | | **UPS** | Anker SOLIX F2000 GaNPrime | diff --git a/docs/reference/infrastructure/routing.md b/docs/reference/infrastructure/routing.md index 12cde31..cf8e115 100644 --- a/docs/reference/infrastructure/routing.md +++ b/docs/reference/infrastructure/routing.md @@ -7,20 +7,21 @@ tags: # Service Routing -Services are accessible via two DNS domains with different reachability. +Services are accessible via three DNS domains with different reachability. ## DNS Domains | Domain | Proxy | Reachable From | |--------|-------|----------------| +| `*.eblu.me` | [[flyio-proxy]] (Fly.io → Tailscale tunnel) | Public internet | | `*.ops.eblu.me` | Caddy on indri | k8s pods, docker containers, tailnet clients | | `*.tail8d86e.ts.net` | Tailscale MagicDNS | Tailnet clients only | -**Use `*.ops.eblu.me`** for services that need pod-to-service communication. +**Use `*.ops.eblu.me`** for services that need pod-to-service communication. Use `*.eblu.me` for services exposed publicly via Fly.io. ## Caddy Services (`*.ops.eblu.me`) -DNS points to indri's Tailscale IP (100.98.163.89). TLS via Let's Encrypt (ACME DNS-01 with Gandi). +DNS points to [[indri]]'s Tailscale IP. TLS via Let's Encrypt (ACME DNS-01 with Gandi). | Service | URL | Description | |---------|-----|-------------| @@ -40,6 +41,14 @@ DNS points to indri's Tailscale IP (100.98.163.89). TLS via Let's Encrypt (ACME | [[postgresql]] | pg.ops.eblu.me:5432 | Database | | [[sifaka|Sifaka]] | https://nas.ops.eblu.me | NAS dashboard | +## Public Services (`*.eblu.me`) + +DNS CNAMEs point to `blumeops-proxy.fly.dev`. TLS via Fly.io-managed Let's Encrypt. Traffic tunnels back to the homelab over Tailscale. Only services tagged `tag:flyio-target` are reachable by the proxy — see [[flyio-proxy]] for details. + +| Service | URL | Description | +|---------|-----|-------------| +| [[docs]] | https://docs.eblu.me | Documentation site | + ## Tailscale-Only Services | Service | URL | Description | @@ -64,3 +73,5 @@ DNS points to indri's Tailscale IP (100.98.163.89). TLS via Let's Encrypt (ACME - [[gandi]] - DNS hosting for `eblu.me` - [[tailscale]] - ACL configuration - [[indri]] - Where services run +- [[flyio-proxy]] - Public reverse proxy for `*.eblu.me` +- [[expose-service-publicly]] - How to add a new public service diff --git a/docs/reference/kubernetes/tailscale-operator.md b/docs/reference/kubernetes/tailscale-operator.md index ed41ea8..aa7b1a8 100644 --- a/docs/reference/kubernetes/tailscale-operator.md +++ b/docs/reference/kubernetes/tailscale-operator.md @@ -19,11 +19,16 @@ The Tailscale operator enables Kubernetes services to be exposed directly on the ## How It Works -When you create an Ingress with `ingressClassName: tailscale`: +Ingresses use a shared ProxyGroup (`ingress`) rather than per-service Tailscale nodes. When you create an Ingress with `ingressClassName: tailscale`: -1. Operator provisions a Tailscale node for the service -2. Service becomes accessible at `.tail8d86e.ts.net` -3. TLS is handled automatically via Tailscale +1. Operator configures the shared ProxyGroup pods to serve the new Ingress +2. Service gets a VIP (Virtual IP) address on the tailnet +3. Service becomes accessible at `.tail8d86e.ts.net` +4. TLS is handled automatically via Tailscale + +Tailnet clients must have `--accept-routes` enabled to route to VIP addresses. + +Services can be individually tagged (e.g., `tag:flyio-target`) via Ingress annotations to control which ACL grants apply. See [[expose-service-publicly]] for the tagging workflow. ## Limitations diff --git a/docs/reference/services/flyio-proxy.md b/docs/reference/services/flyio-proxy.md index 244a6a1..f03c667 100644 --- a/docs/reference/services/flyio-proxy.md +++ b/docs/reference/services/flyio-proxy.md @@ -73,7 +73,7 @@ Alloy listens on `127.0.0.1:12345` for self-scraping its `/metrics` endpoint. Al The `tag:flyio-proxy` ACL grants access only to `tag:flyio-target:443`. Services must explicitly opt in by adding a `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to their Tailscale Ingress. This means the proxy can only reach endpoints that have been individually tagged — a compromised nginx config cannot route to arbitrary services on the tailnet. -Currently tagged as `tag:flyio-target`: [[docs]], [[loki]], [[prometheus]]. Loki and Prometheus are reachable so that [[alloy|Alloy]] (running inside the container) can push logs and metrics directly via their Tailscale Ingress endpoints, bypassing [[caddy]] entirely. +Currently tagged as `tag:flyio-target`: [[docs]], [[loki]], [[prometheus]]. Loki and Prometheus are tagged so that [[alloy|Alloy]] (running inside the container) can push logs and metrics directly via their Tailscale Ingress endpoints — the restricted ACL means Caddy on indri (`tag:homelab`) is not reachable from the proxy. To expose an additional service through the proxy, add the `tag:flyio-target` annotation to its Tailscale Ingress. See [[expose-service-publicly]] for the full workflow.