Commit graph

6 commits

Author SHA1 Message Date
176d38be68 Fix metric names: strip loki_process_custom_ prefix, drop internal labels
Alloy's stage.metrics prefixes all metric names with
loki_process_custom_. Add a relabel rule to strip the prefix so
dashboards can query clean names (flyio_nginx_http_requests_total
etc). Also drop component_id/component_path/filename labels.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 09:50:07 -08:00
80e7d11058 Remove nginx stub_status exporter — not available in Alloy
Alloy has no built-in prometheus.exporter.nginx component. Remove
the stub_status scraping and connection panels from the Fly.io
dashboard. Replace with error rate and cache hit ratio stats.
All key signals are still covered by log-derived metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 09:45:39 -08:00
6e4d7f5991 Fix Alloy binary on Alpine: add libc6-compat
The grafana/alloy image is Ubuntu-based (glibc), but our container
uses nginx:alpine (musl). The binary exists but fails with "not found"
because the glibc dynamic linker is missing. libc6-compat provides
the compatibility shim.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 09:40:31 -08:00
1e1e513b4a Route Alloy through Caddy for proper TLS, update ACLs
Switch Alloy endpoints from *.tail8d86e.ts.net (with insecure_skip_verify)
to *.ops.eblu.me via Caddy reverse proxy with valid TLS certificates.
Add tag:homelab:443 to flyio-proxy ACL grant so the proxy can reach Caddy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:45:57 -08:00
a0dc7ec511 Add observability to Fly.io proxy via embedded Alloy
Instrument the flyio-proxy container with Grafana Alloy to collect
nginx JSON access logs (→ Loki) and derive request/latency/cache
metrics (→ Prometheus). Adds stub_status for connection-level metrics.
Includes two Grafana dashboards: Docs APM (per-service) and Fly.io
Proxy Health (aggregate).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:04:29 -08:00
64a78422b1 Add Fly.io public reverse proxy for docs.eblu.me (#120)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 9s
## Summary

- Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale
- First service exposed: `docs.eblu.me` — the Quartz static docs site
- Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME
- Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow

## Key details

- Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed
- Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts
- nginx caches aggressively for the static site; health check is on the default_server block
- ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only
- DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev`

## Test plan

- [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok`
- [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status`
- [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert
- [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected)
- [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120
2026-02-08 02:36:19 -08:00