diff --git a/containers/quartz/default.conf b/containers/quartz/default.conf index 4009101..2705f1e 100644 --- a/containers/quartz/default.conf +++ b/containers/quartz/default.conf @@ -14,6 +14,28 @@ server { add_header Cache-Control "public, immutable"; } + # --- Spider-trap guards ------------------------------------------------ + # Quartz emits relative links (../path). When a crawler resolves these + # from a phantom URL that was already served by the SPA fallback, the + # relative prefix compounds (e.g. /tags/ref/infra → /tags/ref/infra/ref/infra) + # creating an infinite tree of unique URIs — all served as 200 via the + # fallback to index.html. Two rules cut this off: + # + # 1. /tags/ is always flat (/tags/), so block anything deeper. + # 2. Real content never exceeds depth 4 (/how-to//). + # A depth-5 cutoff gives headroom while stopping recursive paths. + # + # Together these caught ~95% of trap requests in the March 2026 incident. + # The proper fix is root-absolute links in Quartz (planned for fork). + + location ~ "^/tags/[^/]+/" { + return 404; + } + + location ~ "^(/[^/]+){5,}" { + return 404; + } + # SPA fallback - serve index.html for client-side routing location / { try_files $uri $uri/ $uri.html /index.html; diff --git a/docs/changelog.d/+spider-trap-guard.infra.md b/docs/changelog.d/+spider-trap-guard.infra.md new file mode 100644 index 0000000..d152e75 --- /dev/null +++ b/docs/changelog.d/+spider-trap-guard.infra.md @@ -0,0 +1 @@ +Add nginx spider-trap guards to docs.eblu.me Quartz container — blocks recursive crawler paths at /tags/ depth >1 and global depth ≥5. diff --git a/docs/reference/services/flyio-proxy.md b/docs/reference/services/flyio-proxy.md index b026f46..3c66d4e 100644 --- a/docs/reference/services/flyio-proxy.md +++ b/docs/reference/services/flyio-proxy.md @@ -78,6 +78,17 @@ Currently tagged as `tag:flyio-target`: [[docs]], [[loki]], [[prometheus]]. Loki To expose an additional service through the proxy, add the `tag:flyio-target` annotation to its Tailscale Ingress. See [[expose-service-publicly]] for the full workflow. +## Spider Trap Mitigation + +The SPA fallback (`try_files ... /index.html`) serves `index.html` with a 200 for *any* URI, including non-existent paths. Quartz's relative links (`../path`) compound when resolved from phantom URLs, creating an infinite tree of unique URIs that crawlers follow indefinitely. In March 2026, Meta's crawler (`meta-externalagent/1.1`) hit ~49,000 unique URIs over 7 hours this way. + +Two nginx `location` guards in `containers/quartz/default.conf` mitigate the trap: + +1. **`/tags/` depth limit** — `/tags/` is always flat; anything deeper returns 404. +2. **Global depth-5 cutoff** — real content never exceeds depth 4; paths with 5+ segments return 404. + +These are applied in the Quartz container's nginx config, not the Fly.io proxy. The proper fix is switching Quartz to root-absolute links (planned for the fork). + ## Secrets | Secret | Source | Description |