blumeops/mise-tasks/ai-docs

26 lines
884 B
Text
Raw Normal View History

#!/usr/bin/env bash
Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290) ## Summary Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days. **Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion. **Fix:** - Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML - Replace nginx SPA fallback with `=404` + Quartz's static `404.html` - Remove `robots.txt` exclusions (no longer needed) **Docs cleanup (Obsidian.nvim compat no longer needed):** - Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages - Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history) - Drop `docs-check-index` and `docs-check-filenames` prek hooks - Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity - Add `ai-docs` doc tree listing to replace index files for AI context - Add natural cross-links from reference cards to fix orphan docs ## Deployment and Testing - [ ] Merge and let the build pipeline run - [ ] Verify docs.eblu.me serves pages correctly with full page loads - [ ] Verify non-existent URLs return 404 - [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/290
2026-03-09 11:59:43 -07:00
#MISE description="Prime AI context with key BlumeOps documentation"
set -euo pipefail
DOCS_DIR="$(cd "$(dirname "$0")/.." && pwd)/docs"
# Key files for AI context priming, in order of importance
FILES=(
"$DOCS_DIR/tutorials/ai-assistance-guide.md"
"$DOCS_DIR/how-to/agent-change-process.md"
"$DOCS_DIR/index.md"
"$DOCS_DIR/how-to/operations/troubleshooting.md"
"$DOCS_DIR/explanation/architecture.md"
"$DOCS_DIR/reference/tools/mise-tasks.md"
)
# Concatenate files with headers showing paths
bat --style=header --color=never --decorations=always "$@" "${FILES[@]}"
Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290) ## Summary Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days. **Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion. **Fix:** - Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML - Replace nginx SPA fallback with `=404` + Quartz's static `404.html` - Remove `robots.txt` exclusions (no longer needed) **Docs cleanup (Obsidian.nvim compat no longer needed):** - Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages - Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history) - Drop `docs-check-index` and `docs-check-filenames` prek hooks - Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity - Add `ai-docs` doc tree listing to replace index files for AI context - Add natural cross-links from reference cards to fix orphan docs ## Deployment and Testing - [ ] Merge and let the build pipeline run - [ ] Verify docs.eblu.me serves pages correctly with full page loads - [ ] Verify non-existent URLs return 404 - [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/290
2026-03-09 11:59:43 -07:00
# Documentation tree — replaces the old hand-curated index files
echo ""
echo "=== Documentation Structure ==="
echo "All docs under $DOCS_DIR (excluding changelog.d/):"
echo ""
find "$DOCS_DIR" -name '*.md' -not -path '*/changelog.d/*' | sort | sed "s|$DOCS_DIR/||"