Add spider-trap guards to docs.eblu.me Quartz nginx config
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 12s

Block recursive crawler paths caused by SPA fallback + relative links:
/tags/ depth >1 returns 404, global depth ≥5 returns 404.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-03-06 09:43:41 -08:00
commit 6636576cdc
3 changed files with 34 additions and 0 deletions

View file

@ -0,0 +1 @@
Add nginx spider-trap guards to docs.eblu.me Quartz container — blocks recursive crawler paths at /tags/ depth >1 and global depth ≥5.

View file

@ -78,6 +78,17 @@ Currently tagged as `tag:flyio-target`: [[docs]], [[loki]], [[prometheus]]. Loki
To expose an additional service through the proxy, add the `tag:flyio-target` annotation to its Tailscale Ingress. See [[expose-service-publicly]] for the full workflow.
## Spider Trap Mitigation
The SPA fallback (`try_files ... /index.html`) serves `index.html` with a 200 for *any* URI, including non-existent paths. Quartz's relative links (`../path`) compound when resolved from phantom URLs, creating an infinite tree of unique URIs that crawlers follow indefinitely. In March 2026, Meta's crawler (`meta-externalagent/1.1`) hit ~49,000 unique URIs over 7 hours this way.
Two nginx `location` guards in `containers/quartz/default.conf` mitigate the trap:
1. **`/tags/` depth limit** — `/tags/<name>` is always flat; anything deeper returns 404.
2. **Global depth-5 cutoff** — real content never exceeds depth 4; paths with 5+ segments return 404.
These are applied in the Quartz container's nginx config, not the Fly.io proxy. The proper fix is switching Quartz to root-absolute links (planned for the fork).
## Secrets
| Secret | Source | Description |