Expose Forgejo publicly at forge.eblu.me #278

Merged
eblume merged 14 commits from feature/forge-public into main 2026-03-03 08:40:42 -08:00
Owner

Summary

Expose Forgejo publicly at forge.eblu.me via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.

  • Forgejo hardening: Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
  • Tailscale Ingress: ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
  • Fly.io proxy: nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
  • Authentik: OAuth callback updated to forge.eblu.me
  • DNS/TLS: CNAME record in Pulumi, cert in fly-setup
  • Rename: ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)

Deployment Order

  1. mise run provision-indri -- --tags forgejo (config changes)
  2. Verify forge.ops.eblu.me still works
  3. argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator
  4. Verify curl https://forge.tail8d86e.ts.net
  5. cd fly && fly deploy
  6. Verify pre-DNS: curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/
  7. fly certs add forge.eblu.me -a blumeops-proxy
  8. argocd app set authentik --revision feature/forge-public && argocd app sync authentik
  9. mise run dns-preview && mise run dns-up
  10. Full verification (see below)
  11. Rehearse mise run fly-shutoff
  12. After merge: reset ArgoCD revisions to main, re-sync

Verification Checklist

  • forge.eblu.me loads, shows public repos
  • forge.ops.eblu.me still works from tailnet
  • SSH clone via forge.ops.eblu.me:2222 works
  • HTTPS clone via forge.eblu.me works
  • UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
  • /swagger returns 403
  • Rapid login attempts trigger 429 rate limit
  • fail2ban bans after 5 failed logins in 10 minutes
  • ArgoCD can still sync (SSH unaffected)
  • mise run fly-shutoff stops all public traffic
  • mise run services-check passes
## Summary Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service. - **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO) - **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint - **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit - **Authentik:** OAuth callback updated to forge.eblu.me - **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup - **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is) ## Deployment Order 1. `mise run provision-indri -- --tags forgejo` (config changes) 2. Verify forge.ops.eblu.me still works 3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator` 4. Verify `curl https://forge.tail8d86e.ts.net` 5. `cd fly && fly deploy` 6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/` 7. `fly certs add forge.eblu.me -a blumeops-proxy` 8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik` 9. `mise run dns-preview && mise run dns-up` 10. Full verification (see below) 11. Rehearse `mise run fly-shutoff` 12. After merge: reset ArgoCD revisions to main, re-sync ## Verification Checklist - [ ] forge.eblu.me loads, shows public repos - [ ] forge.ops.eblu.me still works from tailnet - [ ] SSH clone via forge.ops.eblu.me:2222 works - [ ] HTTPS clone via forge.eblu.me works - [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH - [ ] /swagger returns 403 - [ ] Rapid login attempts trigger 429 rate limit - [ ] fail2ban bans after 5 failed logins in 10 minutes - [ ] ArgoCD can still sync (SSH unaffected) - [ ] `mise run fly-shutoff` stops all public traffic - [ ] `mise run services-check` passes
The egress proxy (tailscale-forge device) has been unused since Caddy
took over forge routing. No k8s resources reference it as a backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update forgejo.md with public access details and security controls.
Add forge.eblu.me to public services table in routing.md.
Update fail2ban guidance in expose-service-publicly.md to reflect
Fly.io container approach. Add changelog fragment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set forgejo_domain to forge.eblu.me (public URL in clone URLs)
- Set forgejo_ssh_domain to forge.ops.eblu.me (SSH stays tailnet-only)
- Add REVERSE_PROXY_LIMIT=2, REVERSE_PROXY_TRUSTED_PROXIES=* for
  correct client IP logging through Fly.io + Tailscale proxy chain
- Enable ALLOW_ONLY_EXTERNAL_REGISTRATION to block local signups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create forge.tail8d86e.ts.net endpoint that proxies to Forgejo on
indri:3001. Uses ExternalName Service since Forgejo runs natively
on indri (not in k8s). Tagged with flyio-target for Fly.io proxy
access via existing ACLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nginx configuration:
- forge.eblu.me server block with WebSocket support, 512m body limit
- Rate limit login/signup/forgot-password at 3r/s per real client IP
  (keyed on Fly-Client-IP header, not Fly's internal remote_addr)
- Static asset caching (7d), no blanket caching for dynamic content
- Security headers (HSTS, X-Frame-Options, X-Content-Type-Options)
- Block /swagger (API docs only available via tailnet)
- X-Real-IP set to real client IP for Forgejo audit logs
- geo-based deny list for fail2ban integration

fail2ban configuration:
- Custom filter matching 401/403 on login paths in nginx JSON log
- Ban after 5 failures in 10 minutes, ban duration 1 hour
- Custom nginx-deny action: writes IPs to deny file and reloads nginx
  (iptables won't work in Fly.io — remote_addr is Fly's proxy IP)
- Ban lists ephemeral across deploys (nginx rate limiting is persistent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update redirect_uris and meta_launch_url to use the new public domain.
OAuth flow will dead-end naturally since Authentik is not publicly
accessible — SSO only works from the tailnet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CNAME record pointing forge.eblu.me to blumeops-proxy.fly.dev
in Pulumi Gandi config. Add forge.eblu.me to fly-setup cert list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update all HTTPS references to use the new public domain. This
touches workflows, ArgoCD manifests, Ansible, mise-tasks, NixOS
config, and documentation (~29 files).

Deliberately kept as forge.ops.eblu.me:
- SSH repoURLs in argocd/apps/ (SSH stays tailnet-only)
- containers/*/Dockerfile and *.nix (internal CI efficiency)
- Caddy services table in routing.md
- Internal URL references in forgejo.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Tailscale ingress operator requires backends with a ClusterIP.
ExternalName services don't have one, causing "invalid ClusterIP"
errors. Replace with a headless Service + manual Endpoints pointing
to indri's Tailscale IP (100.98.163.89).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kustomize didn't pick up the Endpoints from the multi-document YAML
in svc-forge-external.yaml. Split into a separate endpoints-forge.yaml
and add to kustomization.yaml resources.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ArgoCD's resource.exclusions in argocd-cm skips all Endpoints objects
(they're normally auto-managed by the control plane). The manual
forge-external Endpoints must be applied directly with kubectl.

Removed endpoints-forge.yaml from kustomization resources and added
comments in both files explaining the situation and the apply command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Alpine's fail2ban ships with sshd jail enabled by default. Since there's
no SSH server in the Fly.io container, fail2ban exits with an error
looking for sshd logs — crashing the container via set -e.

Disable the sshd jail explicitly and make fail2ban startup non-fatal
since nginx rate limiting is the primary defense.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use [DEFAULT] enabled = false to disable all inherited jails globally.
The previous fix only disabled sshd, but sshd-ddos (and potentially
others) also fail looking for missing log files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Alpine ships alpine-ssh.conf with sshd and sshd-ddos jails enabled.
These fail on startup because there's no SSH server or /var/log/messages
in the container. Remove the file after install instead of trying to
override via [DEFAULT] (per-jail enabled=true beats DEFAULT).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
eblume merged commit a87c997ee1 into main 2026-03-03 08:40:42 -08:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/blumeops!278
No description provided.