blumeops/docs/changelog.d/+forge-mirrors-blackhole.infra.md
Erich Blume a36a18aaa6
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 35s
C0: black-hole /mirrors/* at Fly edge + name-and-shame scrapers
A $29.60 Fly bill traced to ~1.25 TB/30d egress on forge.eblu.me (99.95% of
all proxy egress), ~71% of it AI scrapers (Meta meta-externalagent, OpenAI
GPTBot, Amazonbot, Bytespider) crawling the public mirror repos' infinite
git-history URL space and timing out Forgejo. robots.txt already disallowed
/mirrors/ but those agents ignore it, so enforce at the edge: return 403 (^~
to beat the regex asset locations), served as a roll-of-dishonour page with an
X-Naughty-Scrapers header. Mirrors stay reachable on the tailnet via
forge.ops.eblu.me. Tier 2 (UA denylist + Anubis) and the Cloudflare rejection
are documented in docs/explanation/ai-scraper-mitigation.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 20:52:20 -07:00

755 B

Black-hole the /mirrors/* repositories at the Fly proxy edge (return 403forge.ops.eblu.me). A surprise $29.60 Fly bill traced to ~1.24 TB/30d of egress on forge.eblu.me, 99.95% of all proxy egress — of which ~71% was AI scrapers (Meta meta-externalagent, OpenAI GPTBot, Amazonbot) crawling the near-infinite git-history URL space of the public mirror repos and timing out Forgejo in the process. Mirrors exist for supply-chain control and are consumed over the tailnet, so their public web UI had no legitimate audience. robots.txt already disallowed /mirrors/, but the offending agents ignore it. Tier-2 mitigations (user-agent denylist, Anubis proof-of-work gateway) are documented in docs/explanation/ai-scraper-mitigation.md.