blumeops/docs/reference/services/caddy.md

111 lines
3.5 KiB
Markdown
Raw Normal View History

---
title: Caddy
modified: 2026-03-15
tags:
- service
- networking
- tls
---
# Caddy
Reverse proxy for `*.ops.eblu.me` services with automatic TLS via ACME DNS-01.
## Quick Reference
| Property | Value |
|----------|-------|
| **Domain** | `*.ops.eblu.me` |
| **HTTPS Port** | 443 |
| **Config** | `ansible/roles/caddy/templates/Caddyfile.j2` |
| **Binary** | Custom build with Gandi DNS plugin |
## Why Caddy?
Caddy provides a single TLS termination point for all BlumeOps services:
- **Wildcard certificate** for `*.ops.eblu.me` via Let's Encrypt
- **DNS-01 challenge** using Gandi API (no port 80 needed)
- **Unified access** from k8s pods, containers, and tailnet clients
See [[routing]] for when to use `*.ops.eblu.me` vs `*.tail8d86e.ts.net`.
## Proxied Services
### Indri-Local Services
| Subdomain | Backend | Service |
|-----------|---------|---------|
| `forge.ops.eblu.me` | `localhost:3001` | [[forgejo]] |
| `registry.ops.eblu.me` | `localhost:5050` | [[zot]] |
| `jellyfin.ops.eblu.me` | `localhost:8096` | [[jellyfin]] |
### Kubernetes Services
K8s services are proxied via their Tailscale Ingress endpoints:
| Subdomain | Backend | Service |
|-----------|---------|---------|
| `grafana.ops.eblu.me` | `grafana.tail8d86e.ts.net` | [[grafana]] |
| `argocd.ops.eblu.me` | `argocd.tail8d86e.ts.net` | [[argocd]] |
| `cv.ops.eblu.me` | `cv.tail8d86e.ts.net` | [[cv]] |
| `docs.ops.eblu.me` | `docs.tail8d86e.ts.net` | [[docs]] (now publicly available at `docs.eblu.me` via [[flyio-proxy]]) |
| `feed.ops.eblu.me` | `feed.tail8d86e.ts.net` | [[miniflux]] |
| ... | ... | (see defaults/main.yml for full list) |
### TCP Services (Layer 4)
| Port | Backend | Service |
|------|---------|---------|
| 2222 | `localhost:2200` | Forgejo SSH |
| 5432 | `pg.tail8d86e.ts.net:5432` | [[postgresql]] |
## Configuration
Caddy is managed via the `caddy` Ansible role:
```bash
# Deploy caddy changes
mise run provision-indri -- --tags caddy
```
**Key files:**
- `ansible/roles/caddy/defaults/main.yml` - Service definitions
- `ansible/roles/caddy/templates/Caddyfile.j2` - Caddy config template
## Secrets
| Secret | Source | Description |
|--------|--------|-------------|
| `GANDI_BEARER_TOKEN` | 1Password | API token for DNS-01 challenges |
The token is written to `~/.config/caddy/gandi-token` (chmod 0600) and sourced by the Caddy wrapper script.
Add Fly.io proxy observability via embedded Alloy (#123) ## Summary - Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus) - Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting) - Create two Grafana dashboards: **Docs APM** (per-service view filtered by `host="docs.eblu.me"`) and **Fly.io Proxy Health** (aggregate proxy health across all upstream services) ## Changed Files | File | Change | |------|--------| | `fly/nginx.conf` | Add JSON `log_format` + `access_log`, add `stub_status` endpoint | | `fly/Dockerfile` | COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config | | `fly/alloy.river` | **New** — Alloy config: log tailing, metric extraction, remote_write | | `fly/start.sh` | Start Alloy after Tailscale, before nginx | | `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` | **New** — Docs APM dashboard | | `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` | **New** — Fly.io Proxy Health dashboard | | `argocd/manifests/grafana-config/kustomization.yaml` | Register new dashboard configmaps | | `docs/reference/services/flyio-proxy.md` | Document observability setup | ## Deployment and Testing - [ ] `mise run fly-deploy` — rebuild container with Alloy - [ ] `curl https://docs.eblu.me/` — generate traffic - [ ] `fly logs -a blumeops-proxy` — verify Alloy startup - [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}` - [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}` - [ ] `argocd app sync grafana-config` — deploy dashboards - [ ] Verify dashboards show data in Grafana - [ ] `mise run services-check` — no regressions Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123
2026-02-08 10:05:38 -08:00
## Security Considerations
Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126) ## Summary - Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy - Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test - Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses - Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress) - Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly ## Manual step (not in PR) Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes. ## Deployment order 1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up` 2. **OAuth client** — Manual update in Tailscale admin console 3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus` 4. **Fly.io proxy** — `mise run fly-deploy` 5. **Verify** — `mise run services-check`, check Grafana dashboards ## Test plan - [ ] `mise run tailnet-preview` shows clean diff - [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions - [ ] After deploy: Grafana dashboards show continued log/metric flow - [ ] `curl -sf https://docs.eblu.me` returns 200 - [ ] `mise run services-check` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
Caddy has no authentication layer — it is a plain reverse proxy. Access control relies entirely on Tailscale ACLs restricting which devices can reach indri on port 443. Currently `tag:homelab` and `autogroup:admin` can reach Caddy. The [[flyio-proxy]] no longer routes through Caddy — it pushes logs and metrics directly to [[loki]] and [[prometheus]] via their Tailscale Ingress endpoints.
Add Fly.io proxy observability via embedded Alloy (#123) ## Summary - Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus) - Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting) - Create two Grafana dashboards: **Docs APM** (per-service view filtered by `host="docs.eblu.me"`) and **Fly.io Proxy Health** (aggregate proxy health across all upstream services) ## Changed Files | File | Change | |------|--------| | `fly/nginx.conf` | Add JSON `log_format` + `access_log`, add `stub_status` endpoint | | `fly/Dockerfile` | COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config | | `fly/alloy.river` | **New** — Alloy config: log tailing, metric extraction, remote_write | | `fly/start.sh` | Start Alloy after Tailscale, before nginx | | `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` | **New** — Docs APM dashboard | | `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` | **New** — Fly.io Proxy Health dashboard | | `argocd/manifests/grafana-config/kustomization.yaml` | Register new dashboard configmaps | | `docs/reference/services/flyio-proxy.md` | Document observability setup | ## Deployment and Testing - [ ] `mise run fly-deploy` — rebuild container with Alloy - [ ] `curl https://docs.eblu.me/` — generate traffic - [ ] `fly logs -a blumeops-proxy` — verify Alloy startup - [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}` - [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}` - [ ] `argocd app sync grafana-config` — deploy dashboards - [ ] Verify dashboards show data in Grafana - [ ] `mise run services-check` — no regressions Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123
2026-02-08 10:05:38 -08:00
## Custom Build
Caddy is built from source using `xcaddy` with two plugins:
- `github.com/caddy-dns/gandi` — ACME DNS-01 challenges via Gandi API
- `github.com/mholt/caddy-l4` — Layer 4 (TCP/UDP) proxying
```bash
# Source and build location (mirrored on forge)
~/code/3rd/caddy/bin/caddy
# Build via mise task in the caddy clone
cd ~/code/3rd/caddy && mise run build
```
Forge mirrors: `mirrors/caddy`, `mirrors/caddy-gandi`, `mirrors/xcaddy`, `mirrors/caddy-l4`.
## Related
- [[gandi]] - DNS hosting and ACME DNS-01 provider
- [[routing]] - Service routing architecture
- [[forgejo]] - Git forge (proxied by Caddy)
- [[zot]] - Container registry (proxied by Caddy)
- [[tailscale-operator]] - K8s services use Tailscale Ingress, then Caddy