2026-02-07 22:09:52 -08:00
---
title: Expose a Service Publicly
Expose Forgejo publicly at forge.eblu.me (#278)
## Summary
Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.
- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)
## Deployment Order
1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync
## Verification Checklist
- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/278
2026-03-03 08:40:41 -08:00
modified: 2026-03-03
last-reviewed: 2026-03-03
2026-02-07 22:09:52 -08:00
tags:
- how-to
2026-02-08 00:38:27 -08:00
- fly-io
- tailscale
2026-02-07 22:09:52 -08:00
- networking
2026-02-08 00:38:27 -08:00
aliases: []
id: expose-service-publicly
2026-02-07 22:09:52 -08:00
---
2026-02-08 00:38:27 -08:00
# Expose a Service Publicly via Fly.io + Tailscale
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
This guide describes how to expose a BlumeOps service to the public internet
using a reverse proxy container on [Fly.io ](https://fly.io ) that tunnels back
to [[indri]] over [[tailscale]]. The approach keeps the home IP hidden,
requires no changes to existing infrastructure (`*.ops.eblu.me` , [[caddy]],
DNS), and is reusable for multiple services.
2026-02-07 22:09:52 -08:00
## Architecture
```
2026-02-08 00:38:27 -08:00
Internet → <service>.eblu.me
2026-02-07 22:09:52 -08:00
│
2026-02-08 00:38:27 -08:00
Fly.io edge (Anycast, TLS via Let's Encrypt)
2026-02-07 22:09:52 -08:00
│
2026-02-08 00:38:27 -08:00
Fly.io VM (nginx reverse proxy + Tailscale)
│ (WireGuard tunnel)
tailnet (tail8d86e.ts.net)
2026-02-07 22:09:52 -08:00
│
2026-02-08 00:38:27 -08:00
<service>.tail8d86e.ts.net (Tailscale ingress)
2026-02-07 22:09:52 -08:00
│
2026-02-08 00:38:27 -08:00
k8s Service → pod
2026-02-07 22:09:52 -08:00
```
2026-02-08 00:38:27 -08:00
(The approach works similarly for non-k8s services via `tailscale serve`
service definitions, eg. [[forgejo]] and [[zot]])
A single Fly.io container serves as the public-facing proxy for all exposed
services. Each service gets a `server` block in the nginx config and a DNS
CNAME. The container joins the tailnet via an ephemeral auth key and reaches
backend services through Tailscale ingress endpoints.
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
Existing `*.ops.eblu.me` services remain private behind Tailscale — this
approach does not touch [[caddy]], [[gandi]] DNS-01, or any other existing
infrastructure. They can continue to operate in parallel for private access.
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
## Key decisions
2026-02-07 22:09:52 -08:00
| Decision | Choice | Rationale |
|----------|--------|-----------|
2026-02-08 02:36:19 -08:00
| Proxy host | Fly.io (free tier) | Managed container, no server to maintain via Ansible. Shared IPv4 + IPv6 are free for HTTP/HTTPS; dedicated IPv4 is $2/mo if a service needs non-HTTP(S) protocols |
2026-02-08 00:38:27 -08:00
| Tunnel | Tailscale (existing) | Already in use, WireGuard encryption, ACL control |
| DNS | CNAME at [[gandi]] | No DNS migration needed, no Cloudflare dependency |
| TLS (public) | Fly.io auto-provisions Let's Encrypt | No cert management, `$0.10/mo` per hostname |
| TLS (origin) | Tailscale handles encryption | WireGuard tunnel encrypts all traffic |
| CDN/cache | nginx `proxy_cache` in container | Per-service: aggressive for static sites, selective or disabled for dynamic services |
| DDoS | Fly.io Anycast + nginx rate limiting | Not enterprise-grade; see [[#Break -glass shutoff]] |
| IaC | `fly/` directory in repo, Pulumi for DNS + TS key | No well-maintained Fly.io Pulumi provider; `fly.toml` is the app's IaC |
## TLS in this architecture
There are three independent TLS segments — none involve Caddy:
1. **Browser → Fly.io edge ** : Fly.io auto-provisions a Let's Encrypt
certificate for each custom domain (e.g., `docs.eblu.me` ). Validated via
TLS-ALPN challenge — no DNS API needed.
2. **nginx → Tailscale ingress ** : nginx proxies to
`https://<service>.tail8d86e.ts.net` . The Tailscale ingress serves a
Tailscale-issued cert. nginx uses `proxy_ssl_verify off` since the
underlying tunnel is already encrypted.
3. **WireGuard tunnel ** : All Tailscale traffic is encrypted at the network
layer regardless of application-level TLS.
Caddy continues to serve `*.ops.eblu.me` with its existing Gandi DNS-01
certificates. The two TLS domains are completely independent.
## External references
- [Tailscale on Fly.io ](https://tailscale.com/kb/1132/flydotio ) — official guide for running Tailscale in a Fly.io container
- [Fly.io Custom Domains ](https://fly.io/docs/networking/custom-domain/ ) — how Fly handles TLS for custom domains
- [Home Assistant + Fly.io + Tailscale ](https://community.home-assistant.io/t/expose-ha-to-the-internet-via-a-cloud-reverse-proxy-fly-io-and-a-vpn-tailscale-for-free-for-now-without-opening-ports/352118 ) — community guide describing this exact pattern
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
---
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
## One-time setup (first service)
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
These steps establish the Fly.io proxy infrastructure. They only need to be done once.
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
### Step 1: Fly.io account and app
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
1. Create or recover a Fly.io account at https://fly.io (requires credit card for free tier)
2. Install `flyctl` : `brew install flyctl`
3. Authenticate: `fly auth login`
4. Create the app: `fly apps create blumeops-proxy`
5. Store the Fly.io deploy token in 1Password (blumeops vault):
- Generate: `fly tokens create deploy -a blumeops-proxy`
- Store as `fly-deploy-token` field
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
### Step 2: Repository structure
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
Create the `fly/` directory at the repository root. This is separate from `containers/` because the image is built and deployed directly to Fly.io by `fly deploy` — it never goes through `registry.ops.eblu.me` .
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
```
fly/
├── fly.toml # Fly.io app configuration
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
├── Dockerfile # nginx + tailscale + alloy
2026-02-08 00:38:27 -08:00
├── nginx.conf # Reverse proxy + cache config
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
├── start.sh # Entrypoint: start tailscale, nginx, alloy
├── alloy.river # Observability: logs → Loki, metrics → Prometheus
└── error.html # Friendly 503 page for upstream failures
2026-02-08 00:38:27 -08:00
```
2026-02-07 22:09:52 -08:00
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
See the actual files in `fly/` for current configuration. Key design points:
2026-02-07 22:09:52 -08:00
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
- **`fly.toml` ** — uses bluegreen deploys so the old machine serves traffic until the new one passes health checks. `auto_stop_machines = "off"` keeps the proxy always-on.
- **`Dockerfile` ** — multi-stage build pulling nginx, Tailscale, and [[alloy]] binaries. Alloy runs as a sidecar inside the container for observability (see below).
- **`start.sh` ** — starts `tailscaled` first (MagicDNS must be available before nginx resolves upstreams), then nginx in the background, then Alloy, and blocks on the nginx process.
- **`nginx.conf` ** — uses a `resolver 100.100.100.100` directive so upstream DNS resolution is deferred to request time (not config load time). Each service gets a `server` block with a `set $upstream` variable pattern. Includes a JSON access log format that Alloy tails for log collection and metric extraction. A catch-all server block serves `/healthz` and rejects unknown hosts.
- **`error.html` ** — shown via `proxy_intercept_errors` when upstreams are unreachable (indri offline, tunnel down, etc.). Cached responses still take priority via `proxy_cache_use_stale` .
2026-02-07 22:09:52 -08:00
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
#### Observability sidecar
2026-02-07 22:09:52 -08:00
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
The Fly.io container includes [[alloy]] baked in (`fly/alloy.river` ). Alloy tails the nginx JSON access log and:
2026-02-07 22:09:52 -08:00
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
- Forwards log lines to [[loki]] via the Tailscale Ingress endpoint
- Derives Prometheus metrics (`flyio_nginx_http_requests_total` , `flyio_nginx_http_request_duration_seconds` , `flyio_nginx_cache_requests_total` , etc.) and remote-writes them to [[prometheus]]
2026-02-07 22:09:52 -08:00
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
Both Loki and Prometheus are reached directly via their `*.tail8d86e.ts.net` Tailscale Ingress endpoints (not via [[caddy]]), since the proxy's ACLs only allow `tag:flyio-target` .
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
### Step 3: Tailscale auth key and ACLs (Pulumi)
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
Extend the existing `pulumi/tailscale/` project.
**Add to `pulumi/tailscale/__main__.py` :**
```python
# Auth key for Fly.io proxy container
2026-02-08 02:36:19 -08:00
flyio_key = tailscale.TailnetKey(
2026-02-08 00:38:27 -08:00
"flyio-proxy-key",
reusable=True,
ephemeral=True,
2026-02-08 02:36:19 -08:00
preauthorized=True, # Skip device approval on the tailnet
2026-02-08 00:38:27 -08:00
tags=["tag:flyio-proxy"],
expiry=7776000, # 90 days
)
pulumi.export("flyio_authkey", flyio_key.key)
```
2026-02-08 02:36:19 -08:00
> **Note:** `preauthorized=True` is required if your tailnet has device
> approval enabled. Without it, each new container start (including
> health-check restarts) creates a node that needs manual approval,
> causing the container to hang before nginx starts.
2026-02-08 00:38:27 -08:00
**Add to `pulumi/tailscale/policy.hujson` :**
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
Tag owner (allows the k8s operator to assign this tag to Ingress proxy nodes):
2026-02-08 00:38:27 -08:00
```
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
"tag:flyio-target": ["autogroup:admin", "tag:blumeops", "tag:k8s-operator"],
2026-02-08 00:38:27 -08:00
```
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
Access grant (Fly.io proxy → explicitly tagged endpoints on HTTPS only):
2026-02-08 00:38:27 -08:00
```
{
"src": ["tag:flyio-proxy"],
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
"dst": ["tag:flyio-target"],
2026-02-08 00:38:27 -08:00
"ip": ["tcp:443"],
},
```
ACL test:
```
{
"src": "tag:flyio-proxy",
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
"accept": ["tag:flyio-target:443"],
"deny": ["tag:k8s:443", "tag:homelab:443", "tag:homelab:22", "tag:nas:445", "tag:registry:443"],
2026-02-08 00:38:27 -08:00
},
```
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
Each service's Tailscale Ingress must be annotated with `tag:flyio-target` to be reachable by the proxy — see [[#7 . Tag the Tailscale Ingress with tag:flyio-target]].
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
2026-02-08 00:38:27 -08:00
Deploy: `mise run tailnet-preview` then `mise run tailnet-up` .
After deploying, extract the auth key and set it as a Fly.io secret:
```bash
# Get the key from Pulumi state
cd pulumi/tailscale && pulumi stack output flyio_authkey --show-secrets
# Set it in Fly.io
fly secrets set TS_AUTHKEY="tskey-auth-..." -a blumeops-proxy
```
Store the auth key in 1Password as well for the `fly-setup` mise task.
### Step 4: Mise tasks
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
Three mise tasks manage the proxy lifecycle. See the actual scripts in `mise-tasks/` for current implementation:
2026-02-08 00:38:27 -08:00
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
- **`mise run fly-deploy` ** — runs `fly deploy` from the `fly/` directory
- **`mise run fly-setup` ** — one-time, idempotent setup: fetches the Tailscale auth key from Pulumi state, stages it as a Fly.io secret, allocates IPs, and adds TLS certs for all public domains (currently `docs.eblu.me` and `cv.eblu.me` )
- **`mise run fly-shutoff` ** — emergency shutoff: scales machines to zero, immediately stopping all public traffic
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
### Step 5: Forgejo CI workflow
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
A Forgejo Actions workflow (`.forgejo/workflows/deploy-fly.yaml` ) auto-deploys on pushes to `main` that touch `fly/**` . It installs `flyctl` , runs `fly deploy` , and verifies health. It can also be triggered manually via `workflow_dispatch` .
2026-02-08 00:38:27 -08:00
The `FLY_DEPLOY_TOKEN` Forgejo Actions secret must be set via the [[forgejo]] API or UI, following the pattern in the `forgejo_actions_secrets` Ansible role.
---
## Per-service setup
To expose an additional service (example: `wiki.eblu.me` ):
### 1. Add nginx server block
Edit `fly/nginx.conf` — add a new `server` block. The configuration
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
differs significantly between static and dynamic services. See the
existing `docs.eblu.me` and `cv.eblu.me` blocks in `fly/nginx.conf`
for the current pattern (uses `set $upstream` variable for deferred
DNS resolution, `proxy_intercept_errors` for error pages, etc.).
2026-02-08 00:38:27 -08:00
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
**Static site template** (simplified — adapt from existing blocks):
2026-02-08 00:38:27 -08:00
```nginx
# --- wiki.eblu.me (static) ---
server {
listen 8080;
server_name wiki.eblu.me;
limit_req zone=general burst=20 nodelay;
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
error_page 502 503 504 /error.html;
location = /error.html {
root /usr/share/nginx/html;
internal;
}
2026-02-08 00:38:27 -08:00
location / {
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
set $upstream_wiki https://wiki.tail8d86e.ts.net;
proxy_pass $upstream_wiki$request_uri;
2026-02-08 00:38:27 -08:00
proxy_ssl_verify off;
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
proxy_ssl_server_name on;
proxy_intercept_errors on;
2026-02-08 00:38:27 -08:00
proxy_cache services;
proxy_cache_valid 200 1d;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_lock on;
proxy_cache_key $host$uri;
proxy_ignore_headers Cache-Control Set-Cookie;
add_header X-Cache-Status $upstream_cache_status;
2026-02-12 17:08:22 -08:00
add_header X-Clacks-Overhead "GNU Terry Pratchett" always;
2026-02-08 00:38:27 -08:00
}
}
```
Expose Forgejo publicly at forge.eblu.me (#278)
## Summary
Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.
- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)
## Deployment Order
1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync
## Verification Checklist
- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/278
2026-03-03 08:40:41 -08:00
**Dynamic service template** (e.g., Forgejo — see `fly/nginx.conf` for the live configuration):
2026-02-08 00:38:27 -08:00
```nginx
# --- forge.eblu.me (dynamic, authenticated) ---
server {
listen 8080;
server_name forge.eblu.me;
# Higher rate limit — git operations, CI webhooks, and API calls
# can legitimately burst. Forgejo also has its own rate limiting,
# so this is a safety net, not the primary control.
limit_req zone=general burst=50 nodelay;
# Git LFS and repo uploads can be large
client_max_body_size 512m;
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
error_page 502 503 504 /error.html;
location = /error.html {
root /usr/share/nginx/html;
internal;
}
2026-02-08 00:38:27 -08:00
location / {
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
set $upstream_forge https://forge.tail8d86e.ts.net;
proxy_pass $upstream_forge$request_uri;
2026-02-08 00:38:27 -08:00
proxy_ssl_verify off;
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
proxy_ssl_server_name on;
proxy_intercept_errors on;
2026-02-08 00:38:27 -08:00
# NO proxy_cache — dynamic content with sessions.
# Caching would serve stale pages and break authentication.
# Pass through headers needed for proper proxying
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (Forgejo uses it for live updates)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# Selectively cache static assets only
location ~* \.(css|js|png|jpg|svg|woff2?)$ {
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
set $upstream_forge_static https://forge.tail8d86e.ts.net;
proxy_pass $upstream_forge_static$request_uri;
2026-02-08 00:38:27 -08:00
proxy_ssl_verify off;
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
proxy_ssl_server_name on;
2026-02-08 00:38:27 -08:00
proxy_cache services;
proxy_cache_valid 200 7d;
proxy_cache_key $host$uri;
add_header X-Cache-Status $upstream_cache_status;
2026-02-12 17:08:22 -08:00
add_header X-Clacks-Overhead "GNU Terry Pratchett" always;
2026-02-08 00:38:27 -08:00
}
}
```
Key differences for dynamic services:
- **No blanket caching** — only static assets (CSS, JS, images) are cached
- **Respect `Set-Cookie` ** — do not ignore session headers
- **Include query strings** in non-cached requests (default behavior when
`proxy_cache_key` is not overridden)
- **Higher rate limits** — legitimate usage patterns are burstier
- **Proxy headers** — pass `X-Real-IP` , `X-Forwarded-For` , `X-Forwarded-Proto`
so the backend sees the real client IP (important for Forgejo's audit logs
and its own rate limiting)
- **WebSocket support** — many modern web apps use WebSockets
- **Larger body size** — git pushes and file uploads need more than the default 1MB
2026-02-08 02:36:19 -08:00
### 2. Add Fly.io certificate
2026-02-08 00:38:27 -08:00
2026-02-08 02:36:19 -08:00
```bash
fly certs add wiki.eblu.me -a blumeops-proxy
```
Or add it to `mise-tasks/fly-setup` so it's captured for future runs.
### 3. Deploy
```bash
mise run fly-deploy
```
Or push the `fly/nginx.conf` change to main — the Forgejo workflow deploys automatically.
### 4. Verify against fly.dev
Test the proxy before touching DNS. Use the `Host` header to simulate
the real domain:
```bash
# Health check
curl -sf https://blumeops-proxy.fly.dev/healthz
# Simulate real domain request
curl -I -H "Host: wiki.eblu.me" https://blumeops-proxy.fly.dev/
# Should return 200 with X-Cache-Status header
```
If this fails, debug without any public DNS impact.
### 5. Add DNS CNAME (Pulumi)
Only after verifying the proxy works. Add to `pulumi/gandi/__main__.py` :
2026-02-08 00:38:27 -08:00
```python
wiki_public = gandi.livedns.Record(
"wiki-public",
zone=domain,
name="wiki",
type="CNAME",
ttl=300,
values=["blumeops-proxy.fly.dev."],
)
```
Deploy: `mise run dns-preview` then `mise run dns-up` .
2026-02-08 02:36:19 -08:00
### 6. Verify with real domain
2026-02-08 00:38:27 -08:00
```bash
curl -I https://wiki.eblu.me
# Should return 200 with X-Cache-Status header
```
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
### 7. Tag the Tailscale Ingress with `tag:flyio-target`
The fly.io proxy can only reach endpoints tagged with `tag:flyio-target` . Add the annotation to the service's Tailscale Ingress:
```yaml
annotations:
tailscale.com/tags: "tag:k8s,tag:flyio-target"
```
2026-02-08 00:38:27 -08:00
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
Include `tag:k8s` to preserve existing access rules for the Ingress proxy node. The `tag:flyio-target` tag opts this specific endpoint into being reachable by the fly.io proxy — no broad ACL changes needed.
2026-02-08 00:38:27 -08:00
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
For non-k8s services (e.g., Forgejo on indri), create a k8s ExternalName Service pointing to the host, then a Tailscale Ingress with the same annotation.
2026-02-08 00:38:27 -08:00
---
## Security
### DDoS and rate limiting
This approach provides basic protection, not enterprise-grade:
- **Fly.io Anycast** absorbs volumetric L3/L4 attacks
- **nginx `limit_req` ** caps per-IP request rates at the container level
- **nginx `proxy_cache` ** serves most requests from cache — only cache
misses traverse the Tailscale tunnel to indri
For **static sites ** , the cache is the primary defense. Most requests
never reach the origin. Cache-busting is mitigated by ignoring query
strings (`proxy_cache_key $host$uri` ) and client cache-control headers.
For **dynamic services ** , the cache covers only static assets. Most
requests flow through the Tailscale tunnel to indri on every hit. This
makes dynamic services significantly more vulnerable to L7 DDoS — an
attacker sending high volumes of legitimate-looking requests (login
pages, API endpoints, search queries) bypasses the cache entirely.
Mitigations for dynamic services:
- nginx `limit_req` is the primary defense at the proxy layer — tune
the rate and burst per service
- The backend service's own rate limiting (e.g., Forgejo's built-in
rate limiter) provides a second layer
- fail2ban on indri (see below) can block IPs showing abuse patterns
- The break-glass shutoff remains the last resort
If a publicly exposed dynamic service attracts targeted attacks or the
home network bandwidth is impacted, consider migrating to Cloudflare
Tunnel for enterprise-grade DDoS protection (requires DNS migration;
see plan history in git).
### fail2ban
fail2ban monitors log files for repeated failed authentication attempts
Expose Forgejo publicly at forge.eblu.me (#278)
## Summary
Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.
- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)
## Deployment Order
1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync
## Verification Checklist
- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/278
2026-03-03 08:40:41 -08:00
and bans offending IPs.
2026-02-08 00:38:27 -08:00
**Static sites**: fail2ban does not apply. There is no login surface,
no sessions, no credentials to brute force.
Expose Forgejo publicly at forge.eblu.me (#278)
## Summary
Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.
- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)
## Deployment Order
1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync
## Verification Checklist
- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/278
2026-03-03 08:40:41 -08:00
**Dynamic services with authentication** (e.g., Forgejo): fail2ban
runs in the **Fly.io container ** , not on indri. Standard iptables
banning won't work in Fly.io because `$remote_addr` is Fly's internal
proxy IP, not the client. Instead, fail2ban uses a custom nginx-based
ban action:
1. fail2ban watches the nginx JSON access log for repeated 401/403
responses to login endpoints, keyed on the `client_ip` field
(populated from the `Fly-Client-IP` header)
2. On ban, it appends the IP to `/etc/nginx/forge-deny.conf` and
reloads nginx
3. nginx uses a `geo` directive keyed on `$http_fly_client_ip` to
check the deny list and return 403 for banned IPs
Ban lists are **ephemeral across deploys ** — nginx rate limiting
provides the persistent baseline; fail2ban adds escalating bans for
active attacks.
See `fly/fail2ban/` for the filter, jail, and action configuration.
2026-02-08 00:38:27 -08:00
### Break-glass shutoff
2026-02-08 02:36:19 -08:00
If the proxy is causing issues, stop it immediately:
2026-02-08 00:38:27 -08:00
```bash
mise run fly-shutoff
```
2026-02-07 22:09:52 -08:00
2026-02-08 02:36:19 -08:00
This stops all machines in seconds — zero traffic reaches indri. See [[manage-flyio-proxy#Emergency Shutoff]] for the full escalation ladder (container stop → Tailscale revoke → DNS removal).
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
---
## Considerations for dynamic services
The architecture described in this guide works for both static and dynamic
services, but the nginx configuration and security posture differ
significantly. This section summarizes what changes when exposing a
dynamic, authenticated service like [[forgejo]].
| Concern | Static site | Dynamic service |
|---------|-------------|-----------------|
| Caching | Aggressive (cache everything, 1d TTL) | Static assets only, or disabled |
| Session cookies | Ignored (`proxy_ignore_headers Set-Cookie` ) | Must be passed through |
| Query strings | Ignored in cache key | Included (default behavior) |
| Rate limiting | 10r/s is plenty | Higher burst needed; coordinate with backend rate limiter |
| Request body size | Default 1MB is fine | Increase for uploads (`client_max_body_size` ) |
| WebSocket | Not needed | Often needed (`proxy_http_version 1.1` , `Upgrade` headers) |
| Proxy headers | Optional | Required (`X-Real-IP` , `X-Forwarded-For` , `X-Forwarded-Proto` ) |
| fail2ban | Not applicable | Configure on indri, watching service logs |
| DDoS exposure | Low — cache absorbs most traffic | Higher — most requests hit origin |
| Pre-exposure checklist | Deploy and go | Disable open registration, audit access controls, configure fail2ban |
### Checklist before exposing a dynamic service
- [ ] Disable open user registration (require invites or admin approval)
- [ ] Audit access controls and permissions
- [ ] Configure the service to log the forwarded client IP (not the proxy IP)
Expose Forgejo publicly at forge.eblu.me (#278)
## Summary
Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.
- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)
## Deployment Order
1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync
## Verification Checklist
- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/278
2026-03-03 08:40:41 -08:00
- [ ] Set up fail2ban in the Fly.io container with a filter for the service's login endpoints
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly
## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.
## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards
## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
- [ ] Tag the service's Tailscale Ingress with `tag:flyio-target`
2026-02-08 00:38:27 -08:00
- [ ] Test the nginx config locally or in staging before deploying
- [ ] Rehearse the break-glass shutoff (`mise run fly-shutoff` )
---
## IaC summary
| Component | Managed by | Declarative? |
|-----------|------------|:---:|
| Tailscale auth key | Pulumi (`pulumi/tailscale/` ) | yes |
| Tailscale ACLs | Pulumi (`pulumi/tailscale/policy.hujson` ) | yes |
| DNS CNAMEs | Pulumi (`pulumi/gandi/` ) | yes |
| Container + app config | `fly/Dockerfile` + `fly/fly.toml` in repo | yes |
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
| Observability | `fly/alloy.river` in repo | yes |
2026-02-08 00:38:27 -08:00
| Deployment | Forgejo CI on push to `fly/` , or `mise run fly-deploy` | yes |
| Fly.io secrets + certs | `mise run fly-setup` (one-time, idempotent) | semi |
The "semi" for Fly.io secrets is a one-time operation backed by a repeatable mise task. Fly.io does not have a mature Pulumi or Terraform provider, so `fly.toml` + `flyctl` is the standard IaC model for Fly.io apps.
2026-02-07 22:09:52 -08:00
2026-02-08 00:38:27 -08:00
---
## Verification
2026-02-08 02:36:19 -08:00
### Pre-DNS (verify against fly.dev)
Test the proxy works before creating any public DNS records:
1. `curl -sf https://blumeops-proxy.fly.dev/healthz` — returns `ok`
2. `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` — returns 200 with `X-Cache-Status` header
3. `fly status -a blumeops-proxy` — shows healthy machine
4. All `*.ops.eblu.me` services still work from tailnet (unchanged)
5. `mise run services-check` passes
If anything fails here, debug without public DNS impact.
### Post-DNS (after CNAME is live)
After deploying DNS (`mise run dns-up` ):
2026-02-08 00:38:27 -08:00
1. `curl -I https://docs.eblu.me` — returns 200 with `X-Cache-Status` header
Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)
## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
2. `curl -I https://cv.eblu.me` — same for each public service
3. `dig docs.eblu.me` — resolves to Fly.io IPs (not Tailscale IP)
4. `dig forge.ops.eblu.me` — still resolves to indri's Tailscale IP (unchanged)
5. Second request to same URL shows `X-Cache-Status: HIT`