Add spider-trap guards to docs.eblu.me Quartz nginx config
Block recursive crawler paths caused by SPA fallback + relative links: /tags/ depth >1 returns 404, global depth ≥5 returns 404. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
6e8d11c6bb
commit
6636576cdc
3 changed files with 34 additions and 0 deletions
|
|
@ -14,6 +14,28 @@ server {
|
||||||
add_header Cache-Control "public, immutable";
|
add_header Cache-Control "public, immutable";
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# --- Spider-trap guards ------------------------------------------------
|
||||||
|
# Quartz emits relative links (../path). When a crawler resolves these
|
||||||
|
# from a phantom URL that was already served by the SPA fallback, the
|
||||||
|
# relative prefix compounds (e.g. /tags/ref/infra → /tags/ref/infra/ref/infra)
|
||||||
|
# creating an infinite tree of unique URIs — all served as 200 via the
|
||||||
|
# fallback to index.html. Two rules cut this off:
|
||||||
|
#
|
||||||
|
# 1. /tags/ is always flat (/tags/<name>), so block anything deeper.
|
||||||
|
# 2. Real content never exceeds depth 4 (/how-to/<cat>/<page>).
|
||||||
|
# A depth-5 cutoff gives headroom while stopping recursive paths.
|
||||||
|
#
|
||||||
|
# Together these caught ~95% of trap requests in the March 2026 incident.
|
||||||
|
# The proper fix is root-absolute links in Quartz (planned for fork).
|
||||||
|
|
||||||
|
location ~ "^/tags/[^/]+/" {
|
||||||
|
return 404;
|
||||||
|
}
|
||||||
|
|
||||||
|
location ~ "^(/[^/]+){5,}" {
|
||||||
|
return 404;
|
||||||
|
}
|
||||||
|
|
||||||
# SPA fallback - serve index.html for client-side routing
|
# SPA fallback - serve index.html for client-side routing
|
||||||
location / {
|
location / {
|
||||||
try_files $uri $uri/ $uri.html /index.html;
|
try_files $uri $uri/ $uri.html /index.html;
|
||||||
|
|
|
||||||
1
docs/changelog.d/+spider-trap-guard.infra.md
Normal file
1
docs/changelog.d/+spider-trap-guard.infra.md
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
Add nginx spider-trap guards to docs.eblu.me Quartz container — blocks recursive crawler paths at /tags/ depth >1 and global depth ≥5.
|
||||||
|
|
@ -78,6 +78,17 @@ Currently tagged as `tag:flyio-target`: [[docs]], [[loki]], [[prometheus]]. Loki
|
||||||
|
|
||||||
To expose an additional service through the proxy, add the `tag:flyio-target` annotation to its Tailscale Ingress. See [[expose-service-publicly]] for the full workflow.
|
To expose an additional service through the proxy, add the `tag:flyio-target` annotation to its Tailscale Ingress. See [[expose-service-publicly]] for the full workflow.
|
||||||
|
|
||||||
|
## Spider Trap Mitigation
|
||||||
|
|
||||||
|
The SPA fallback (`try_files ... /index.html`) serves `index.html` with a 200 for *any* URI, including non-existent paths. Quartz's relative links (`../path`) compound when resolved from phantom URLs, creating an infinite tree of unique URIs that crawlers follow indefinitely. In March 2026, Meta's crawler (`meta-externalagent/1.1`) hit ~49,000 unique URIs over 7 hours this way.
|
||||||
|
|
||||||
|
Two nginx `location` guards in `containers/quartz/default.conf` mitigate the trap:
|
||||||
|
|
||||||
|
1. **`/tags/` depth limit** — `/tags/<name>` is always flat; anything deeper returns 404.
|
||||||
|
2. **Global depth-5 cutoff** — real content never exceeds depth 4; paths with 5+ segments return 404.
|
||||||
|
|
||||||
|
These are applied in the Quartz container's nginx config, not the Fly.io proxy. The proper fix is switching Quartz to root-absolute links (planned for the fork).
|
||||||
|
|
||||||
## Secrets
|
## Secrets
|
||||||
|
|
||||||
| Secret | Source | Description |
|
| Secret | Source | Description |
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue