Fix spider trap: disable SPA mode, remove index files, relax wiki-links #290

Merged
eblume merged 1 commit from fix/disable-spa-relax-docs into main 2026-03-09 11:59:44 -07:00
Owner

Summary

Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like /how-to/tutorials/tutorials/how-to/explanation/... for several days.

Root cause: Quartz SPA mode + nginx try_files fallback to index.html meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion.

Fix:

  • Disable Quartz SPA mode (enableSPA: false) — all pages are now fully static HTML
  • Replace nginx SPA fallback with =404 + Quartz's static 404.html
  • Remove robots.txt exclusions (no longer needed)

Docs cleanup (Obsidian.nvim compat no longer needed):

  • Delete hand-curated category index files (tutorials.md, reference.md, how-to.md, explanation.md) — Quartz auto-generates folder pages
  • Delete postgresql-storage.md (redirect stub) and migrate-forgejo-from-brew.md (stale history)
  • Drop docs-check-index and docs-check-filenames prek hooks
  • Rewrite docs-check-links to allow path-based wiki-links ([[path/to/file]]) and only error on true ambiguity
  • Add ai-docs doc tree listing to replace index files for AI context
  • Add natural cross-links from reference cards to fix orphan docs

Deployment and Testing

  • Merge and let the build pipeline run
  • Verify docs.eblu.me serves pages correctly with full page loads
  • Verify non-existent URLs return 404
  • Monitor crawler traffic — should drop to near zero for fabricated URLs
## Summary Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days. **Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion. **Fix:** - Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML - Replace nginx SPA fallback with `=404` + Quartz's static `404.html` - Remove `robots.txt` exclusions (no longer needed) **Docs cleanup (Obsidian.nvim compat no longer needed):** - Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages - Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history) - Drop `docs-check-index` and `docs-check-filenames` prek hooks - Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity - Add `ai-docs` doc tree listing to replace index files for AI context - Add natural cross-links from reference cards to fix orphan docs ## Deployment and Testing - [ ] Merge and let the build pipeline run - [ ] Verify docs.eblu.me serves pages correctly with full page loads - [ ] Verify non-existent URLs return 404 - [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs
Fix the Facebook crawler spider trap by disabling Quartz SPA mode and
removing the nginx fallback to index.html. Non-existent URLs now return
404.html instead of the root SPA shell, preventing infinite recursive
crawling.

Remove hand-curated category index files (tutorials.md, reference.md,
how-to.md, explanation.md) — Quartz auto-generates folder pages. Drop
docs-check-index and docs-check-filenames hooks. Update docs-check-links
to allow path-based wiki-links and only error on true ambiguity. Remove
robots.txt exclusions since they're no longer needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
eblume merged commit 4f0476a851 into main 2026-03-09 11:59:44 -07:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/blumeops!290
No description provided.