Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290)

## Summary Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days. **Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion. **Fix:** - Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML - Replace nginx SPA fallback with `=404` + Quartz's static `404.html` - Remove `robots.txt` exclusions (no longer needed) **Docs cleanup (Obsidian.nvim compat no longer needed):** - Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages - Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history) - Drop `docs-check-index` and `docs-check-filenames` prek hooks - Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity - Add `ai-docs` doc tree listing to replace index files for AI context - Add natural cross-links from reference cards to fix orphan docs ## Deployment and Testing - [ ] Merge and let the build pipeline run - [ ] Verify docs.eblu.me serves pages correctly with full page loads - [ ] Verify non-existent URLs return 404 - [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs Reviewed-on: #290
2026-03-09 11:59:43 -07:00 · 2026-03-09 11:59:43 -07:00 · 4f0476a851
commit 4f0476a851
parent 953640d2b7
24 changed files with 110 additions and 666 deletions
--- a/containers/quartz/default.conf
+++ b/containers/quartz/default.conf
@ -14,18 +14,16 @@ server {
        add_header Cache-Control "public, immutable";
    }

-    # Serve robots.txt inline to prevent crawlers from entering /explore/ and /tags/,
-    # which is an SPA feature that generates infinite relative-link trees
-    # when crawled (the March 2026 spider-trap incident).
-    location = /robots.txt {
-        default_type text/plain;
-        return 200 "User-agent: *\nDisallow: /explore/\nDisallow: /tags/\n";
+    # Static file serving — no SPA fallback.
+    # Quartz generates complete HTML for every page, so all valid URLs
+    # map to real files. Non-existent paths get 404.html (generated by
+    # Quartz's NotFoundPage plugin), preventing the spider-trap issue
+    # where crawlers would get index.html for fabricated URLs.
+    location / {
+        try_files $uri $uri/ $uri.html =404;
    }

-    # SPA fallback - serve index.html for client-side routing
-    location / {
-        try_files $uri $uri/ $uri.html /index.html;
-    }
+    error_page 404 /404.html;

    # Health check endpoint
    location /healthz {