Fix spider trap: disable SPA mode, remove index files, relax wiki-links #290

Merged
eblume merged 1 commit from fix/disable-spa-relax-docs into main 2026-03-09 11:59:44 -07:00
24 changed files with 110 additions and 666 deletions

View file

@ -14,18 +14,16 @@ server {
add_header Cache-Control "public, immutable";
}
# Serve robots.txt inline to prevent crawlers from entering /explore/ and /tags/,
# which is an SPA feature that generates infinite relative-link trees
# when crawled (the March 2026 spider-trap incident).
location = /robots.txt {
default_type text/plain;
return 200 "User-agent: *\nDisallow: /explore/\nDisallow: /tags/\n";
# Static file serving — no SPA fallback.
# Quartz generates complete HTML for every page, so all valid URLs
# map to real files. Non-existent paths get 404.html (generated by
# Quartz's NotFoundPage plugin), preventing the spider-trap issue
# where crawlers would get index.html for fabricated URLs.
location / {
try_files $uri $uri/ $uri.html =404;
}
# SPA fallback - serve index.html for client-side routing
location / {
try_files $uri $uri/ $uri.html /index.html;
}
error_page 404 /404.html;
# Health check endpoint
location /healthz {

View file

@ -0,0 +1 @@
Relax wiki-link constraints: allow path-based links for disambiguation, drop global filename uniqueness requirement, remove docs-check-filenames and docs-check-index hooks.

View file

@ -0,0 +1 @@
Disable Quartz SPA mode and remove robots.txt crawler exclusions to fix the Facebook crawler spider trap. Remove hand-curated category index files in favor of Quartz auto-generated folder pages.

View file

@ -1,25 +0,0 @@
---
title: Explanation
modified: 2026-02-10
last-reviewed: 2026-02-10
tags:
- explanation
---
# Explanation
Understanding-oriented content explaining the "why" behind BlumeOps design decisions.
## Philosophy
| Article | Description |
|---------|-------------|
| [[why-gitops]] | Why infrastructure-as-code and GitOps for a homelab |
## Design
| Article | Description |
|---------|-------------|
| [[architecture]] | How all the pieces fit together |
| [[federated-login]] | How SSO works across BlumeOps (Authentik) |
| [[security-model]] | Network security, secrets, and access control |

View file

@ -1,47 +0,0 @@
---
title: Migrate Forgejo from Brew to Source Build
status: active
modified: 2026-03-04
last-reviewed: 2026-03-05
tags:
- how-to
- forgejo
---
# Migrate Forgejo from Brew to Source Build
Transition Forgejo on indri from Homebrew to a source-built binary with LaunchAgent, matching the pattern used by [[zot]], [[caddy]], and [[alloy]].
## Motivation
Forgejo was force-upgraded from v13 to v14 by `brew upgrade`, breaking version control. A source build pins versions and aligns with the established native service pattern.
## Architecture Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **Source remote** | Codeberg upstream | Avoids circular dependency (Forgejo hosting its own source) |
| **Secondary remote** | `forge.eblu.me/mirrors/forgejo` | Convenience and backup |
| **Version tracking** | `indri-deployment` branch on tag | Rebase to upgrade; explicit version pinning |
| **Build deps** | Go 1.24+, Node 20+ via mise | Consistent with other mise-managed tooling |
| **Process manager** | LaunchAgent plist | Matches zot, caddy, alloy |
| **Data location** | `~/forgejo` | Migrated from `/opt/homebrew/var/forgejo` |
| **Run user** | `erichblume` | LaunchAgent session user (SSH git user stays `forgejo`) |
## Key Steps
1. Clone from Codeberg, add forge mirror remote
2. Check out target tag, create `indri-deployment` branch
3. Build with `TAGS="bindata timedzdata sqlite sqlite_unlock_notify" mise x -- make build`
4. Stop brew service, copy data to `~/forgejo`, fix ownership
5. Run Ansible (`--tags forgejo`) to deploy updated role with LaunchAgent
6. Verify (API version, SSH clone, push, Actions runners, services-check)
7. `brew uninstall forgejo`
## Reference Patterns
- `ansible/roles/zot/` — primary pattern for source-built binary roles (tasks, defaults, handlers, plist template)
## Related
- [[forgejo]] — Service reference

View file

@ -1,100 +0,0 @@
---
title: How-To
modified: 2026-03-06
last-reviewed: 2026-03-06
tags:
- how-to
---
# How-To Guides
## Deployment
- [[deploy-k8s-service]]
- [[add-ansible-role]]
- [[create-release-artifact-workflow]]
- [[build-container-image]]
## Configuration
- [[update-tailscale-acls]]
- [[gandi-operations]]
- [[use-pypi-proxy]]
- [[expose-service-publicly]]
- [[manage-forgejo-mirrors]]
- [[update-documentation]]
- [[update-tooling-dependencies]]
## Knowledge Base
- [[review-documentation]]
- [[review-services]]
- [[agent-change-process]]
## Operations
- [[connect-to-postgres]]
- [[restart-indri]]
- [[manage-flyio-proxy]]
- [[restore-1password-backup]]
- [[troubleshooting]]
## Forgejo
- [[migrate-forgejo-from-brew]]
## Ringtail
- [[manage-lockfile]]
## Zot
- [[harden-zot-registry]]
- [[register-zot-oidc-client]]
- [[wire-ci-registry-auth]]
- [[enforce-tag-immutability]]
- [[adopt-commit-based-container-tags]]
- [[add-container-version-sync-check]]
- [[install-dagger-on-nix-runner]]
- [[pin-container-versions]]
- [[add-dagger-nix-build]]
- [[fix-ntfy-nix-version]]
## Authentik
- [[deploy-authentik]]
- [[build-authentik-container]]
- [[provision-authentik-database]]
- [[create-authentik-secrets]]
- [[migrate-grafana-to-authentik]]
## Authentik Source Build
- [[build-authentik-from-source]]
- [[mirror-authentik-build-deps]]
- [[authentik-api-client-generation]]
- [[authentik-python-backend-derivation]]
- [[authentik-web-ui-derivation]]
- [[authentik-go-server-derivation]]
## Grafana
- [[upgrade-grafana]]
- [[kustomize-grafana-deployment]]
- [[build-grafana-container]]
- [[build-grafana-sidecar]]
## Dagger
- [[upgrade-dagger]]
## JobSync
- [[deploy-jobsync]]
- [[build-jobsync-container]]
## Forgejo Runner
- [[upgrade-k8s-runner]]
- [[validate-workflows-against-v12]]
- [[review-runner-config-v12]]

View file

@ -39,8 +39,8 @@ The goal of BlumeOps is threefold:
## Sections
- [[tutorials|Tutorials]] - Learning-oriented guides for getting started
- [[reference|Reference]] - Technical specifications and service details
- [[how-to|How-to]] - Task-oriented instructions for common operations
- [[explanation|Explanation]] - Understanding the "why" behind BlumeOps
- [Tutorials](/tutorials/) - Learning-oriented guides for getting started
- [Reference](/reference/) - Technical specifications and service details
- [How-to](/how-to/) - Task-oriented instructions for common operations
- [Explanation](/explanation/) - Understanding the "why" behind BlumeOps
- [[CHANGELOG]] - Release history and changes

View file

@ -9,7 +9,7 @@ const config: QuartzConfig = {
configuration: {
pageTitle: "BlumeOps Docs",
pageTitleSuffix: "",
enableSPA: true,
enableSPA: false,
enablePopovers: true,
analytics: null,
locale: "en-US",

View file

@ -1,95 +0,0 @@
---
title: Reference
modified: 2026-03-04
tags:
- reference
---
# Reference
Technical specifications, inventories, and configuration details for BlumeOps infrastructure.
## Services
Individual service reference cards with URLs and configuration details.
| Service | Description | Location |
|---------|-------------|----------|
| [[alloy|Alloy]] | Observability collector (metrics & logs) | indri + k8s |
| [[argocd]] | GitOps continuous delivery | k8s |
| [[borgmatic]] | Backup system | indri |
| [[caddy]] | Reverse proxy & TLS termination | indri |
| [[1password]] | Secrets management | cloud + k8s |
| [[forgejo]] | Git forge & CI/CD | indri |
| [[frigate]] | Network video recorder | k8s (ringtail) |
| [[grafana]] | Dashboards & visualization | k8s |
| [[immich]] | Photo management | k8s |
| [[jellyfin]] | Media server | indri |
| [[jobsync]] | Job application tracker | k8s (ringtail) |
| [[kiwix]] | Offline Wikipedia & ZIM archives | k8s |
| [[loki]] | Log aggregation | k8s |
| [[tempo]] | Distributed tracing | k8s |
| [[miniflux]] | RSS feed reader | k8s |
| [[navidrome]] | Music streaming | k8s |
| [[ntfy]] | Push notifications | k8s (ringtail) |
| [[postgresql]] | Database cluster | k8s |
| [[prometheus]] | Metrics collection | k8s |
| [[teslamate]] | Tesla data logger | k8s |
| [[transmission]] | BitTorrent daemon | k8s |
| [[zot]] | Container registry | indri |
| [[devpi]] | PyPI caching proxy | k8s |
| [[cv]] | Resume / CV site | k8s |
| [[authentik]] | OIDC identity provider | k8s (ringtail) |
| [[docs]] | Documentation site (Quartz) | k8s |
| [[flyio-proxy]] | Public reverse proxy (Fly.io + Tailscale) | Fly.io |
| [[ollama]] | LLM inference server | k8s (ringtail) |
| [[automounter]] | SMB share automounter | indri |
## Infrastructure
Host inventory and network configuration.
- [[hosts|Hosts]] - Device inventory
- [[indri]] - Primary server
- [[ringtail]] - Service host & gaming PC
- [[gilbert]] - Development workstation
- [[tailscale]] - ACLs, groups, tags
- [[gandi]] - DNS hosting for `eblu.me`
- [[unifi]] - Home WiFi router (UniFi Express 7)
- [[routing|Routing]] - DNS domains, port mappings
- [[power]] - Battery-backed power chain
## Tools
Build, deployment, and IaC tool reference.
- [[mise-tasks]] - Operational task runner (all `mise run` tasks)
- [[dagger]] - CI/CD build engine (Python SDK)
- [[argocd-cli]] - ArgoCD CLI workflows
- [[ansible]] - Configuration management for indri
- [[pulumi]] - Infrastructure-as-Code (DNS, Tailscale ACLs)
## Kubernetes
Cluster configuration and application registry.
- [[cluster|Cluster]] - Minikube specs, storage, networking
- [[apps|Apps]] - ArgoCD application registry
- [[tailscale-operator]] - Tailscale ingress for k8s services
- [[external-secrets]] - Secrets management
## Storage
Network storage and backup configuration.
- [[sifaka|Sifaka]] - Synology NAS configuration
- [[postgresql-storage]] - Database cluster
- [[backups|Backups]] - Backup policy and schedule
## Operations
Operational concerns and their components.
- [[observability]] - Metrics, logs, dashboards
- [[backup]] - Data protection
- [[disaster-recovery]] - Recovery procedures (TBD)

View file

@ -60,7 +60,7 @@ Future clients: [[argocd]], [[miniflux]], [[zot]]
## Secrets
Injected via [[external-secrets]] from the "Authentik (blumeops)" 1Password item.
Injected via [[external-secrets]] from the "Authentik (blumeops)" 1Password item (see [[create-authentik-secrets]] for setup).
| 1Password Field | Purpose |
|-----------------|---------|
@ -79,4 +79,7 @@ Nix-built via `dockerTools.buildLayeredImage`. The entrypoint wrapper symlinks b
- [[federated-login]] - How authentication works across BlumeOps
- [[grafana]] - First OIDC client
- [[deploy-authentik]] - Deployment how-to
- [[migrate-grafana-to-authentik]] - Grafana SSO migration from Dex
- [[build-authentik-from-source]] - Nix-based container build
- [[mirror-authentik-build-deps]] - Supply chain mirrors for the build
- [[external-secrets]] - Secrets injection from 1Password

View file

@ -120,6 +120,10 @@ The UI shows `forge.eblu.me` for HTTPS clone URLs and `forge.ops.eblu.me` for SS
`mise run fly-shutoff` stops all public traffic immediately. forge.ops.eblu.me continues to work from the tailnet. See [[expose-service-publicly#Break-glass shutoff]].
## Mirrors
Forgejo hosts pull mirrors of external repositories (GitHub, etc.) for supply chain control. Mirrors live in the `mirrors/` org and sync on a configurable interval. See [[manage-forgejo-mirrors]] for operations.
## Related
- [[argocd]] - Uses Forgejo as git source

View file

@ -63,6 +63,7 @@ Optional annotation: `grafana_folder: "FolderName"`
- [[build-grafana-sidecar]] - Home-built sidecar container
- [[kustomize-grafana-deployment]] - Kustomize manifest structure
- [[authentik]] - OIDC identity provider for SSO
- [[migrate-grafana-to-authentik]] - How SSO was migrated from Dex to Authentik
- [[prometheus]] - Metrics datasource
- [[loki]] - Logs datasource
- [[tempo]] - Traces datasource

View file

@ -65,3 +65,5 @@ The `zot-ci` API key expires every **90 days**. To rotate:
- [[forgejo]] - Container build CI
- [[cluster|Cluster]] - Registry consumer
- [[authentik]] - OIDC identity provider
- [[harden-zot-registry]] - Security hardening guide
- [[install-dagger-on-nix-runner]] - Why Dagger can't run on the Nix builder

View file

@ -1,11 +0,0 @@
---
title: PostgreSQL Storage
modified: 2026-02-07
tags:
- storage
- database
---
# PostgreSQL Storage
See [[postgresql]] in Services.

View file

@ -17,11 +17,9 @@ Run `mise tasks --sort name` for the live list with descriptions.
| Task | Description |
|------|-------------|
| `ai-docs` | Prime AI context with key documentation |
| `docs-check-filenames` | Detect duplicate filenames in documentation |
| `ai-docs` | Prime AI context with key documentation and doc tree |
| `docs-check-frontmatter` | Check required frontmatter fields |
| `docs-check-index` | Check every doc is referenced in its category index |
| `docs-check-links` | Validate wiki-links point to existing filenames |
| `docs-check-links` | Validate wiki-links resolve correctly (supports path-based links) |
| `docs-mikado` | View active Mikado dependency chains (C2 changes) |
| `docs-review` | Review the most stale doc by `last-reviewed` date |
| `docs-review-stale` | Report docs by last-modified date |

View file

@ -91,7 +91,7 @@ BlumeOps operations are driven by mise tasks. Run `mise tasks` to list all avail
| Task | When to Use |
|------|-------------|
| `ai-docs` | At session start - review infrastructure documentation |
| `ai-docs` | At session start - review infrastructure documentation (see [[mise-tasks]]) |
| `docs-mikado` | View active Mikado dependency chains for C2 changes |
| `docs-mikado --resume` | Resume a C2 chain: detect branch, show state and next steps |
| `provision-indri` | Deploy changes to [[indri]]-hosted services via Ansible |
@ -104,9 +104,7 @@ BlumeOps operations are driven by mise tasks. Run `mise tasks` to list all avail
| `dns-up` | Apply DNS changes via Pulumi |
| `tailnet-preview` | Preview Tailscale ACL changes |
| `tailnet-up` | Apply Tailscale ACL changes via Pulumi |
| `docs-check-links` | Validate wiki-links in documentation (includes orphan detection) |
| `docs-check-index` | Check every doc is referenced in its category index |
| `docs-check-filenames` | Check for duplicate doc filenames |
| `docs-check-links` | Validate wiki-links resolve correctly (supports path-based links, orphan detection) |
| `docs-review-stale` | Report docs by last-modified date, highlight stale ones |
| `docs-review-tags` | Print frontmatter tag inventory across all docs |
| `docs-review` | Review the most stale doc by last-reviewed date |
@ -120,7 +118,7 @@ For ArgoCD operations, use the `argocd` CLI directly:
For AI agents building context:
- [[reference|Reference]] - Entry point for technical details
- [Reference](/reference/) - Entry point for technical details
- [[hosts|Host Inventory]] - What hardware exists
- [[apps|ArgoCD Apps]] - What's deployed in Kubernetes
- [[routing|Routing]] - How services are exposed

View file

@ -18,18 +18,18 @@ The docs follow the [Diataxis](https://diataxis.fr/) framework:
| Section | Purpose | When to Use |
|---------|---------|-------------|
| **[[tutorials|Tutorials]]** | Learning-oriented | "I'm new and want to understand" |
| **[[reference|Reference]]** | Information-oriented | "I need specific technical details" |
| **[[how-to|How-to]]** | Task-oriented | "I need to do X" |
| **[[explanation|Explanation]]** | Understanding-oriented | "I want to understand why" |
| **[Tutorials](/tutorials/)** | Learning-oriented | "I'm new and want to understand" |
| **[Reference](/reference/)** | Information-oriented | "I need specific technical details" |
| **[How-to](/how-to/)** | Task-oriented | "I need to do X" |
| **[Explanation](/explanation/)** | Understanding-oriented | "I want to understand why" |
## Quick Paths by Audience
### For Erich (Owner)
You probably want quick access to operational details:
- [[how-to]] guides for common operations (deploy, troubleshoot, update ACLs)
- [[reference]] has service URLs, commands, and config locations
- [How-to](/how-to/) guides for common operations (deploy, troubleshoot, update ACLs)
- [Reference](/reference/) has service URLs, commands, and config locations
- [[ai-assistance-guide]] explains how to work effectively with Claude
- Run `mise run ai-docs` to prime AI context with key documentation
@ -37,40 +37,41 @@ You probably want quick access to operational details:
Context for effective assistance:
- Read [[ai-assistance-guide]] for operational conventions
- [[reference]] has the technical specifics you'll need
- [Reference](/reference/) has the technical specifics you'll need
- The repo's `CLAUDE.md` has critical rules (especially the kubectl context requirement)
### For External Readers
Understanding what this is:
- [[explanation]] covers the "why" behind design decisions
- [[reference]] shows what's actually running
- [Explanation](/explanation/) covers the "why" behind design decisions
- [Reference](/reference/) shows what's actually running
- Browse service pages to see specific implementations
### For Contributors
Getting started with changes:
- [[contributing]] walks through the workflow
- [[how-to]] guides for specific tasks (deploy services, add roles)
- [[reference]] tells you where things live
- [How-to](/how-to/) guides for specific tasks (deploy services, add roles)
- [Reference](/reference/) tells you where things live
### For Replicators
Replicators are people who want to build their own similar homelab GitOps setup, using BlumeOps as inspiration.
- [[replicating-blumeops]] provides the overview, with linked tutorials that go deep on individual components
- [[explanation]] covers architecture and design rationale
- [Explanation](/explanation/) covers architecture and design rationale
- Reference pages show specific configuration choices
## Using Wiki Links
Documentation uses `[[wiki-links]]` for cross-references:
- `[[service-name]]` links to a reference page
- `[[service-name]]` links by filename stem (must be unambiguous)
- `[[path/to/file]]` links by path from docs root (for disambiguation)
- `[[page|Display Text]]` customizes the link text
When reading on the web (docs.eblu.me), these render as clickable links. The backlinks panel shows what references each page.
Prek hooks automatically validate that all wiki-links point to existing files and that link targets are unambiguous.
Prek hooks validate that all wiki-links resolve to existing files and flag ambiguous bare-name links.
## AI Context Priming
@ -80,10 +81,9 @@ The `ai-docs` mise task concatenates key documentation files for AI context:
mise run ai-docs
```
This outputs the AI assistance guide, reference index, how-to index, architecture overview, and tutorials index in plain text with file headers - providing Claude with essential context for BlumeOps operations.
This outputs key documentation files and a full tree listing of all docs, providing Claude with essential context for BlumeOps operations.
## Related
- [[tutorials]] - Parent index of all tutorials
- [[update-documentation]] - How to publish doc changes
- [[review-documentation]] - Periodic doc review process

View file

@ -136,5 +136,5 @@ Begin with [[tailscale-setup]] - networking is the foundation everything else bu
## Related
- [[reference]] - See BlumeOps' specific configurations
- [Reference](/reference/) - See BlumeOps' specific configurations
- [[contributing]] - Help improve BlumeOps instead

View file

@ -1,49 +0,0 @@
---
title: Tutorials
modified: 2026-02-07
tags:
- tutorials
---
# Tutorials
Learning-oriented guides for understanding and working with BlumeOps.
## Audience Guide
Each tutorial indicates which audiences it serves:
| Icon | Audience | Description |
|------|----------|-------------|
| **Owner** | Erich | Quick recall and operational refreshers |
| **AI** | Claude/AI agents | Context for AI-assisted operations |
| **Reader** | External readers | Understanding what BlumeOps is |
| **Contributor** | Operators/contributors | Helping with BlumeOps development |
| **Replicator** | Replicators | Building your own similar setup |
## Getting Started
| Tutorial | Audiences | Description |
|----------|-----------|-------------|
| [[exploring-the-docs]] | All | How to navigate and use this documentation |
| [[ai-assistance-guide]] | AI, Owner | Context for effective AI-assisted operations |
## Contributing
| Tutorial | Audiences | Description |
|----------|-----------|-------------|
| [[contributing]] | Contributor | Your first contribution to BlumeOps |
| [[adding-a-service]] | Contributor, Replicator | Deploy a new service via ArgoCD |
## Replication
For those building their own homelab GitOps setup.
| Tutorial | Audiences | Description |
|----------|-----------|-------------|
| [[replicating-blumeops]] | Replicator | Overview: building a similar environment |
| [[tailscale-setup|Tailscale Setup]] | Replicator | Setting up Tailscale networking |
| [[core-services|Core Services]] | Replicator | Forgejo and container registry |
| [[kubernetes-bootstrap|Kubernetes Bootstrap]] | Replicator | Bootstrapping a Kubernetes cluster |
| [[argocd-config|ArgoCD Config]] | Replicator | Configuring GitOps with ArgoCD |
| [[observability-stack|Observability Stack]] | Replicator | Metrics, logs, and dashboards |

View file

@ -1,5 +1,5 @@
#!/usr/bin/env bash
#MISE description="Prime AI context with key BlumeOps documentation (formerly zk-docs)"
#MISE description="Prime AI context with key BlumeOps documentation"
set -euo pipefail
@ -10,15 +10,17 @@ FILES=(
"$DOCS_DIR/tutorials/ai-assistance-guide.md"
"$DOCS_DIR/how-to/agent-change-process.md"
"$DOCS_DIR/index.md"
"$DOCS_DIR/reference/reference.md"
"$DOCS_DIR/how-to/how-to.md"
"$DOCS_DIR/how-to/operations/troubleshooting.md"
"$DOCS_DIR/explanation/explanation.md"
"$DOCS_DIR/explanation/architecture.md"
"$DOCS_DIR/tutorials/tutorials.md"
"$DOCS_DIR/reference/tools/mise-tasks.md"
)
# Concatenate files with headers showing paths
# Defaults are tuned for AI consumption (plain text, file headers only)
bat --style=header --color=never --decorations=always "$@" "${FILES[@]}"
# Documentation tree — replaces the old hand-curated index files
echo ""
echo "=== Documentation Structure ==="
echo "All docs under $DOCS_DIR (excluding changelog.d/):"
echo ""
find "$DOCS_DIR" -name '*.md' -not -path '*/changelog.d/*' | sort | sed "s|$DOCS_DIR/||"

View file

@ -1,85 +0,0 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = ["rich>=13.0.0"]
# ///
#MISE description="Detect duplicate filenames in documentation"
"""Detect duplicate filenames in documentation.
This script scans all markdown files in the docs/ directory (excluding
changelog.d/ and zk/) and reports any duplicate filenames that could
cause wiki-link resolution issues.
With Quartz, wiki-links like [[filename]] resolve by filename,
so filenames must be unique across the documentation.
Usage: mise run docs-check-filenames
"""
import sys
from collections import defaultdict
from pathlib import Path
from rich.console import Console
from rich.table import Table
DOCS_DIR = Path(__file__).parent.parent / "docs"
def main() -> int:
console = Console()
# Collect all filenames and their paths
# Key: filename (without .md), Value: list of file paths
filenames: dict[str, list[str]] = defaultdict(list)
# Scan all markdown files (excluding zk/ and changelog.d/)
for md_file in sorted(DOCS_DIR.rglob("*.md")):
if "changelog.d" in md_file.parts or "zk" in md_file.parts:
continue
rel_path = str(md_file.relative_to(DOCS_DIR))
filename = md_file.stem # filename without .md
filenames[filename].append(rel_path)
# Find duplicates
duplicates = {name: paths for name, paths in filenames.items() if len(paths) > 1}
# Print results
console.print("[bold]Doc Filename Inventory[/bold]")
console.print()
console.print("With Quartz, wiki-links like [[filename]] resolve by filename,")
console.print("so filenames must be unique across the documentation.")
console.print()
# Duplicates table (if any)
if duplicates:
console.print("[bold red]Duplicate Filenames Found[/bold red]")
dup_table = Table(show_header=True, header_style="bold")
dup_table.add_column("Filename")
dup_table.add_column("Paths")
for name in sorted(duplicates.keys()):
paths = duplicates[name]
dup_table.add_row(name, "\n".join(paths))
console.print(dup_table)
console.print()
# Summary
console.print(f"Total files: {sum(len(p) for p in filenames.values())}")
console.print(f"Unique filenames: {len(filenames)}")
console.print(f"Duplicate filenames: {len(duplicates)}")
if duplicates:
console.print()
console.print("[bold red]Action required:[/bold red] Rename files to ensure unique wiki-link resolution.")
return 1
console.print()
console.print("[bold green]All filenames are unique![/bold green]")
return 0
if __name__ == "__main__":
sys.exit(main())

View file

@ -1,117 +0,0 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = ["rich>=13.0.0"]
# ///
#MISE description="Check that every doc is referenced in its category index"
"""Check that every doc in a Diataxis category is referenced in its index.
Each Diataxis category (tutorials, reference, how-to, explanation) has an
index file that should wiki-link to every doc in that category directory.
A doc is considered referenced if its filename stem appears as a wiki-link
target (e.g., alloy.md is matched by [[alloy]]) in the category index.
Index files are excluded from the self-check.
Usage: mise run docs-check-index
"""
import re
import sys
from pathlib import Path
from rich.console import Console
from rich.markup import escape
from rich.table import Table
DOCS_DIR = Path(__file__).parent.parent / "docs"
# Category directories and their index files
CATEGORIES = {
"tutorials": "tutorials/tutorials.md",
"reference": "reference/reference.md",
"how-to": "how-to/how-to.md",
"explanation": "explanation/explanation.md",
}
# Regex to match wiki-links: [[Target]] or [[Target|Display]]
WIKILINK_PATTERN = re.compile(r"\[\[([^\]|]+)(\|[^\]]+)?\]\]")
# Regex to match inline code (backticks)
INLINE_CODE_PATTERN = re.compile(r"`[^`]+`")
def extract_link_targets(file_path: Path) -> set[str]:
"""Extract all wiki-link targets from a file (ignoring inline code)."""
content = file_path.read_text()
targets: set[str] = set()
for line in content.splitlines():
line_without_code = INLINE_CODE_PATTERN.sub("", line)
for match in WIKILINK_PATTERN.finditer(line_without_code):
targets.add(match.group(1).strip())
return targets
def main() -> int:
console = Console()
console.print("[bold]Category Index Validation[/bold]")
console.print()
has_errors = False
missing: list[tuple[str, str, str]] = [] # (category, stem, file)
for category, index_rel in CATEGORIES.items():
index_path = DOCS_DIR / index_rel
if not index_path.exists():
console.print(f"[yellow]Warning: index file not found: {index_rel}[/yellow]")
continue
category_dir = DOCS_DIR / category
if not category_dir.is_dir():
continue
# Get all wiki-link targets from the index
index_targets = extract_link_targets(index_path)
index_stem = index_path.stem
# Check each doc in the category directory
for md_file in sorted(category_dir.rglob("*.md")):
if "changelog.d" in md_file.parts:
continue
stem = md_file.stem
# Skip the index file itself
if stem == index_stem:
continue
if stem not in index_targets:
rel_path = str(md_file.relative_to(DOCS_DIR))
missing.append((category, stem, rel_path))
if missing:
has_errors = True
console.print("[bold red]Docs Missing From Category Index[/bold red]")
console.print("These docs are not wiki-linked from their category index file.")
console.print()
table = Table(show_header=True, header_style="bold")
table.add_column("Category")
table.add_column("File")
table.add_column("Add To")
for category, stem, rel_path in missing:
table.add_row(category, rel_path, CATEGORIES[category])
console.print(table)
console.print()
if has_errors:
return 1
console.print(f"Checked {len(CATEGORIES)} category indexes.")
console.print("[bold green]All docs are referenced in their category index![/bold green]")
return 0
if __name__ == "__main__":
sys.exit(main())

View file

@ -3,19 +3,21 @@
# requires-python = ">=3.12"
# dependencies = ["rich>=13.0.0"]
# ///
#MISE description="Validate all wiki-links point to existing doc filenames"
#MISE description="Validate all wiki-links point to existing doc files"
"""Validate that all wiki-links in documentation point to existing files.
This script scans all markdown files in the docs/ directory (excluding
changelog.d/), extracts wiki-links, and verifies each link target
exists as a unique filename in the documentation.
changelog.d/), extracts wiki-links, and verifies each link target resolves
to an existing file.
Wiki-link formats supported:
- [[filename]] - links to filename.md (must be unique across all docs)
- [[target|Display Text]] - filename with display text
- [[filename]] - resolves by stem (errors if ambiguous)
- [[path/to/file]] - resolves by relative path from docs root
- [[target|Display Text]] - either form with display text
- [[target#Heading]] - with anchor fragment (file part validated)
Path-based links (containing '/') are NOT supported to ensure all
filenames are unique and links work correctly in obsidian.nvim.
Resolution mirrors Quartz's "shortest" markdownLinkResolution:
bare names resolve when unique; use paths to disambiguate duplicates.
Usage: mise run docs-check-links
"""
@ -31,7 +33,6 @@ from rich.table import Table
DOCS_DIR = Path(__file__).parent.parent / "docs"
# Regex to match wiki-links: [[Target]] or [[Target|Display]]
# Captures: group(1) = target (may have spaces), group(2) = full "|Display" part if present
WIKILINK_PATTERN = re.compile(r"\[\[([^\]|]+)(\|[^\]]+)?\]\]")
# Regex to match inline code (backticks)
@ -68,51 +69,42 @@ def extract_wikilinks(file_path: Path) -> list[tuple[str, int, bool]]:
def main() -> int:
console = Console()
# Collect all valid targets (both filenames and paths)
valid_targets: set[str] = set()
# Track which filenames are ambiguous (appear multiple times)
filename_counts: dict[str, list[str]] = {}
# Build lookup structures:
# - path_targets: set of relative paths without extension (e.g., "reference/services/alloy")
# - stem_to_paths: map from filename stem to list of paths (for ambiguity detection)
path_targets: set[str] = set()
stem_to_paths: dict[str, list[str]] = {}
# Scan all markdown files (excluding changelog.d/)
for md_file in DOCS_DIR.rglob("*.md"):
if "changelog.d" in md_file.parts:
continue
# Track filename occurrences
filename = md_file.stem
stem = md_file.stem
rel_path_str = str(md_file.relative_to(DOCS_DIR).with_suffix(""))
if filename not in filename_counts:
filename_counts[filename] = []
filename_counts[filename].append(rel_path_str)
# Add relative path without extension (e.g., "reference/services/alloy")
valid_targets.add(rel_path_str)
path_targets.add(rel_path_str)
if stem not in stem_to_paths:
stem_to_paths[stem] = []
stem_to_paths[stem].append(rel_path_str)
# Only add filenames that are unique (not ambiguous)
ambiguous_filenames: set[str] = set()
for filename, paths in filename_counts.items():
if len(paths) == 1:
valid_targets.add(filename)
else:
ambiguous_filenames.add(filename)
# Special case: files at repo root that are copied into docs during build
# These are valid link targets even though they don't exist in docs/
# Special case: files at repo root copied into docs during build
REPO_ROOT = DOCS_DIR.parent
BUILD_TIME_DOCS = ["CHANGELOG.md"]
for filename in BUILD_TIME_DOCS:
if (REPO_ROOT / filename).exists():
valid_targets.add(Path(filename).stem)
stem = Path(filename).stem
if stem not in stem_to_paths:
stem_to_paths[stem] = []
stem_to_paths[stem].append(stem)
path_targets.add(stem)
# Collect all broken, ambiguous, path-based, and spaced links
# Collect errors
broken_links: list[tuple[str, int, str]] = []
ambiguous_links: list[tuple[str, int, str, list[str]]] = []
path_links: list[tuple[str, int, str]] = []
spaced_links: list[tuple[str, int, str]] = []
# Track which doc stems are linked-to from other docs (for orphan detection)
all_doc_stems: set[str] = set(filename_counts.keys())
# Track linked stems for orphan detection
all_doc_stems: set[str] = set(stem_to_paths.keys())
linked_stems: set[str] = set()
# Scan all markdown files for wiki-links (excluding changelog.d/)
for md_file in sorted(DOCS_DIR.rglob("*.md")):
if "changelog.d" in md_file.parts:
continue
@ -123,35 +115,41 @@ def main() -> int:
for target, line_num, has_spaces in links:
if has_spaces:
# Links with spaces in target or around pipe are not allowed
spaced_links.append((rel_path, line_num, target))
continue
# Handle anchor links: [[#Heading]] or [[file#Heading]]
# Strip the #fragment for validation; pure anchors (#Heading) skip file check
# Strip anchor fragment for file validation
file_target = target
if "#" in target:
file_target = target.split("#", 1)[0]
if not file_target:
# Pure in-page anchor like [[#Break-glass shutoff]] — always valid
# Pure in-page anchor like [[#Heading]] — always valid
continue
if "/" in file_target:
# Path-based links are not allowed - use simple filenames only
path_links.append((rel_path, line_num, target))
elif file_target in ambiguous_filenames:
# Link uses an ambiguous filename - needs to be renamed
ambiguous_links.append((rel_path, line_num, target, filename_counts[file_target]))
elif file_target not in valid_targets:
# Path-based link — resolve against path_targets
if file_target not in path_targets:
broken_links.append((rel_path, line_num, target))
else:
# Extract the stem for orphan tracking
linked_stem = file_target.rsplit("/", 1)[-1]
if linked_stem != source_stem:
linked_stems.add(linked_stem)
else:
# Bare stem link — check for existence and ambiguity
paths = stem_to_paths.get(file_target)
if paths is None:
broken_links.append((rel_path, line_num, target))
elif len(paths) > 1:
# Ambiguous: multiple files share this stem
ambiguous_links.append((rel_path, line_num, target, paths))
elif file_target != source_stem:
# Valid link to a different doc — record it for orphan detection
linked_stems.add(file_target)
# Print results
console.print("[bold]Wiki-Link Validation[/bold]")
console.print()
console.print(f"Found {len(valid_targets)} valid link targets in documentation.")
console.print(f"Found {len(path_targets)} valid link targets in documentation.")
console.print()
has_errors = False
@ -173,28 +171,11 @@ def main() -> int:
console.print(table)
console.print()
if path_links:
has_errors = True
console.print("[bold red]Path-Based Wiki-Links Found[/bold red]")
console.print("Wiki-links must use simple filenames only (no '/' paths).")
console.print("Rename files to be unique, then use [[filename]] format.")
console.print()
table = Table(show_header=True, header_style="bold")
table.add_column("File")
table.add_column("Line", justify="right")
table.add_column("Target")
for file_path, line_num, target in path_links:
table.add_row(file_path, str(line_num), escape(f"[[{target}]]"))
console.print(table)
console.print()
if ambiguous_links:
has_errors = True
console.print("[bold red]Ambiguous Wiki-Links Found[/bold red]")
console.print("These links use filenames that exist in multiple locations.")
console.print("Rename files to be unique across all documentation.")
console.print("These bare-name links match multiple files.")
console.print("Use a path-based link to disambiguate: [[path/to/file]]")
console.print()
table = Table(show_header=True, header_style="bold")
table.add_column("File")
@ -221,7 +202,7 @@ def main() -> int:
console.print(table)
console.print()
console.print("Each wiki-link target must match a filename or path in docs/.")
console.print("Each wiki-link target must match a filename stem or path in docs/.")
console.print()
# Orphan detection: docs not linked from any other doc
@ -237,7 +218,7 @@ def main() -> int:
table.add_column("Stem")
for stem in orphan_stems:
paths = filename_counts[stem]
paths = stem_to_paths[stem]
for path in paths:
table.add_row(f"{path}.md", stem)

View file

@ -148,14 +148,6 @@ stages = ["commit-msg"]
[[repos]]
repo = "local"
[[repos.hooks]]
id = "docs-check-filenames"
name = "docs-check-filenames"
entry = "mise run docs-check-filenames"
language = "system"
files = '^docs/.*\.md$'
pass_filenames = false
[[repos.hooks]]
id = "docs-check-links"
name = "docs-check-links"
@ -164,14 +156,6 @@ language = "system"
files = '^docs/.*\.md$'
pass_filenames = false
[[repos.hooks]]
id = "docs-check-index"
name = "docs-check-index"
entry = "mise run docs-check-index"
language = "system"
files = '^docs/.*\.md$'
pass_filenames = false
[[repos.hooks]]
id = "docs-check-frontmatter"
name = "docs-check-frontmatter"