Add spider-trap guards to docs.eblu.me Quartz nginx config
All checks were successful
Build Container (Nix) / detect (push) Successful in 1s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 12s

Block recursive crawler paths caused by SPA fallback + relative links:
/tags/ depth >1 returns 404, global depth ≥5 returns 404.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-03-06 09:43:41 -08:00
commit 6636576cdc
3 changed files with 34 additions and 0 deletions

View file

@ -14,6 +14,28 @@ server {
add_header Cache-Control "public, immutable";
}
# --- Spider-trap guards ------------------------------------------------
# Quartz emits relative links (../path). When a crawler resolves these
# from a phantom URL that was already served by the SPA fallback, the
# relative prefix compounds (e.g. /tags/ref/infra → /tags/ref/infra/ref/infra)
# creating an infinite tree of unique URIs — all served as 200 via the
# fallback to index.html. Two rules cut this off:
#
# 1. /tags/ is always flat (/tags/<name>), so block anything deeper.
# 2. Real content never exceeds depth 4 (/how-to/<cat>/<page>).
# A depth-5 cutoff gives headroom while stopping recursive paths.
#
# Together these caught ~95% of trap requests in the March 2026 incident.
# The proper fix is root-absolute links in Quartz (planned for fork).
location ~ "^/tags/[^/]+/" {
return 404;
}
location ~ "^(/[^/]+){5,}" {
return 404;
}
# SPA fallback - serve index.html for client-side routing
location / {
try_files $uri $uri/ $uri.html /index.html;