C0: black-hole /mirrors/* at Fly edge + name-and-shame scrapers
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 35s

A $29.60 Fly bill traced to ~1.25 TB/30d egress on forge.eblu.me (99.95% of
all proxy egress), ~71% of it AI scrapers (Meta meta-externalagent, OpenAI
GPTBot, Amazonbot, Bytespider) crawling the public mirror repos' infinite
git-history URL space and timing out Forgejo. robots.txt already disallowed
/mirrors/ but those agents ignore it, so enforce at the edge: return 403 (^~
to beat the regex asset locations), served as a roll-of-dishonour page with an
X-Naughty-Scrapers header. Mirrors stay reachable on the tailnet via
forge.ops.eblu.me. Tier 2 (UA denylist + Anubis) and the Cloudflare rejection
are documented in docs/explanation/ai-scraper-mitigation.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erich Blume 2026-06-01 20:52:20 -07:00
commit a36a18aaa6
7 changed files with 302 additions and 0 deletions

View file

@ -376,6 +376,13 @@ Mitigations for dynamic services:
- fail2ban on indri (see below) can block IPs showing abuse patterns
- The break-glass shutoff remains the last resort
The most acute version of this in practice has been **AI scrapers**, which
ignore `robots.txt` and crawl dynamic services (notably [[forgejo|Forgejo]]'s
infinite git-history URL space) into both a surprise egress bill and an
effective L7 DoS. See [[ai-scraper-mitigation]] for the incident, the tiered
defense (mirror black-hole, user-agent denylist, Anubis proof-of-work), and
why a Cloudflare Tunnel is *not* the chosen answer here.
If a publicly exposed dynamic service attracts targeted attacks or the
home network bandwidth is impacted, consider migrating to Cloudflare
Tunnel for enterprise-grade DDoS protection (requires DNS migration;