blumeops/ansible/roles/caddy/defaults/main.yml

120 lines
3.9 KiB
YAML
Raw Normal View History

---
# Caddy reverse proxy configuration
# Caddy is built from ~/code/3rd/caddy with Gandi DNS and Layer 4 plugins
caddy_repo_dir: /Users/erichblume/code/3rd/caddy
caddy_binary: "{{ caddy_repo_dir }}/bin/caddy"
caddy_config_dir: /Users/erichblume/.config/caddy
caddy_data_dir: /Users/erichblume/.local/share/caddy
caddy_log_dir: /Users/erichblume/Library/Logs
# Gandi API token file (written by ansible, chmod 0600)
# Caddy reads this file for ACME DNS-01 challenges
caddy_gandi_token_file: /Users/erichblume/.config/caddy/gandi-token
# Domain configuration
caddy_domain: ops.eblu.me
# HTTPS port (443 is standard)
caddy_https_port: 443
# Services to proxy
# Format: { name: "service", host: "hostname", backend: "url" }
caddy_services:
# Indri-local services
- name: forge
host: "forge.{{ caddy_domain }}"
backend: "http://localhost:3001"
- name: registry
host: "registry.{{ caddy_domain }}"
backend: "http://localhost:5050"
- name: jellyfin
host: "jellyfin.{{ caddy_domain }}"
backend: "http://localhost:8096"
# K8s services (via Tailscale Ingress)
# Caddy proxies to existing Tailscale endpoints - traffic stays local
- name: grafana
host: "grafana.{{ caddy_domain }}"
backend: "https://grafana.tail8d86e.ts.net"
- name: argocd
host: "argocd.{{ caddy_domain }}"
backend: "https://argocd.tail8d86e.ts.net"
- name: prometheus
host: "prometheus.{{ caddy_domain }}"
backend: "https://prometheus.tail8d86e.ts.net"
- name: loki
host: "loki.{{ caddy_domain }}"
backend: "https://loki.tail8d86e.ts.net"
- name: miniflux
host: "feed.{{ caddy_domain }}"
backend: "https://feed.tail8d86e.ts.net"
- name: devpi
host: "pypi.{{ caddy_domain }}"
Migrate devpi from minikube to indri (launchd) (#341) ## Summary Devpi was crash-looping under memory pressure on the minikube StatefulSet, breaking the Python toolchain across the repo (`mise run docs-mikado`, `prek`, every `uv pip install`). It moves to indri as a native LaunchAgent. ## What changed - **New ansible role** `ansible/roles/devpi/`: installs `devpi-server` + `devpi-web` into a uv-managed venv, initializes the server-dir on first run via 1Password root password, runs as a LaunchAgent (`mcquack.eblume.devpi`) bound to `127.0.0.1:3141`. Bootstraps from upstream PyPI (so devpi can install itself on a fresh box). - **Caddy**: `pypi.ops.eblu.me` now proxies to `http://localhost:3141`. - **Playbook**: `indri.yml` gains pre_tasks for the root password and the new role. - **service-versions.yaml**: devpi flipped from `type: argocd` to `type: ansible`. - **ArgoCD**: removed `apps/devpi.yaml` and `manifests/devpi/`. The in-cluster Application, namespace, and PVC have been deleted. - **Docs**: new how-to `docs/how-to/operations/devpi-on-indri.md`; `restart-indri.md` lists devpi in the LaunchAgent stop list. ## Already deployed (live on indri) - Service running: `launchctl list mcquack.eblume.devpi` → PID 53888 - `curl https://pypi.ops.eblu.me/+api` returns 200 ✅ - `mise run docs-mikado` works again ✅ - 1.0G of cached PyPI data was migrated from the PVC to `~erichblume/devpi/server-dir/` - Minikube namespace and PVC fully reclaimed ## Test plan - [ ] `mise run services-check` (after merge) - [ ] CI workflows that use devpi succeed - [ ] No regressions in tools that depend on `pypi.ops.eblu.me` (prek, uv-script tasks, dagger pipelines) ## Context This is the C1 prelude to a planned C2 chain (`mikado/retire-minikube-indri`) to retire minikube on indri entirely. Doing devpi as a standalone C1 was the right call because (a) it was urgent — it was breaking the toolchain — and (b) it shakes out the migration recipe before we commit to a multi-leaf chain. Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/341
2026-04-29 13:38:36 -07:00
backend: "http://localhost:3141"
- name: kiwix
host: "kiwix.{{ caddy_domain }}"
backend: "https://kiwix.tail8d86e.ts.net"
- name: torrent
host: "torrent.{{ caddy_domain }}"
backend: "https://torrent.tail8d86e.ts.net"
- name: teslamate
host: "tesla.{{ caddy_domain }}"
backend: "https://tesla.tail8d86e.ts.net"
- name: immich
host: "photos.{{ caddy_domain }}"
backend: "https://photos.tail8d86e.ts.net"
- name: navidrome
host: "dj.{{ caddy_domain }}"
backend: "https://dj.tail8d86e.ts.net"
Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify (#190) ## Summary Deploy a cloud-free NVR stack for the GableCam (ReoLink Elite Floodlight at 192.168.1.159): - **Mosquitto** — shared MQTT broker in `mqtt` namespace (cluster-internal, no auth) - **Ntfy** — self-hosted push notifications in `ntfy` namespace, exposed at `ntfy.tail8d86e.ts.net` / `ntfy.ops.eblu.me` - **Frigate** — NVR with GableCam via HTTP-FLV, ONNX CPU detection, NFS recordings on sifaka, exposed at `nvr.tail8d86e.ts.net` / `nvr.ops.eblu.me` - **frigate-notify** — bridges Frigate detection events (person, car, dog, cat) to Ntfy alerts via MQTT Also includes: - Prometheus scrape target for Frigate metrics - Grafana dashboard for Frigate (status, inference speed, FPS, CPU/memory, storage) - Caddy reverse proxy entries for `nvr.ops.eblu.me` and `ntfy.ops.eblu.me` ## Prerequisites - [ ] Create NFS share `frigate` on sifaka (`/volume1/frigate`, RW for indri) - [ ] Create 1Password item "Reolink Floodlight Camera" in `blumeops` vault with `username` and `password` fields ## Deployment (after merge) ```bash argocd app sync apps argocd app sync mosquitto argocd app sync ntfy argocd app sync frigate argocd app sync grafana-config argocd app sync prometheus mise run provision-indri -- --tags caddy mise run services-check ``` ## Verification - [ ] Mosquitto pod running, accepting connections on 1883 - [ ] Ntfy web UI accessible at `ntfy.ops.eblu.me` - [ ] Frigate web UI at `nvr.ops.eblu.me` showing GableCam live feed - [ ] Object detection working (ONNX, person/car/dog/cat) - [ ] Recordings appearing in NFS share on sifaka - [ ] frigate-notify sending detection alerts to Ntfy - [ ] Prometheus scraping Frigate metrics - [ ] Grafana dashboard showing Frigate data Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/190
2026-02-14 21:27:44 -08:00
- name: homepage
host: "go.{{ caddy_domain }}"
backend: "https://go.tail8d86e.ts.net"
- name: docs
host: "docs.{{ caddy_domain }}"
C1: migrate cv + docs from minikube to indri-native (#342) ## Summary Replace the cv (`cv.eblu.me`) and docs (`docs.eblu.me`) minikube Deployments with indri-native ansible roles. Caddy serves the extracted release tarballs directly via a new `kind: static` service-block — no daemon, no nginx pod, no ProxyGroup ingress on the request path. Mirrors the rationale of the recent devpi migration; part of the broader minikube wind-down. ## What's in this commit - `ansible/roles/{cv,docs}` — sentinel-gated tarball download + extract into `~/{cv,docs}/content/` - `ansible/roles/caddy/` — new `kind: static` branch in the Caddyfile template (encoded gzip, immutable cache headers for fingerprinted assets, optional `try_html` for Quartz-style clean URLs, optional per-path `download_paths` for the resume PDF's `Content-Disposition`) - `ansible/playbooks/indri.yml` — wires `cv` and `docs` roles before `caddy` - `service-versions.yaml` — both services flip to `type: ansible`. `docs.current-version` stays at `1.28.2` for this commit so `container-version-check` keeps passing while `containers/quartz/Dockerfile` still exists; it moves to the docs release tag in the cleanup commit - `.forgejo/workflows/{cv-deploy,build-blumeops}.yaml` — deploy step now bumps `cv_version`/`docs_version` in the role defaults and pushes; running ansible + purging the Fly cache is manual from gilbert (matches devpi) - Docs: `docs/how-to/operations/{cv,docs}-on-indri.md`, updated `docs/reference/services/{cv,docs}.md`, changelog fragment ## What is not in this commit The dead artifacts. After PR review and successful cutover, a follow-up commit deletes: - `argocd/apps/{cv,docs}.yaml` and `argocd/manifests/{cv,docs}/` - `containers/cv/`, `containers/quartz/` - `CONTAINER_TO_SERVICE['quartz']` mapping in `mise-tasks/container-version-check` - bumps `docs.current-version` in `service-versions.yaml` to the release tag ## Cutover plan (manual, from gilbert, after review) 1. **Take down old:** - Remove the cv and docs Applications: `argocd app delete cv --cascade && argocd app delete docs --cascade` - Verify k8s namespaces gone: `kubectl --context=minikube-indri get ns | grep -E '^(cv|docs)\\b'` (should be empty) - Verify tailnet MagicDNS no longer advertises the VIPs: `nslookup cv.tail8d86e.ts.net` and `nslookup docs.tail8d86e.ts.net` should both fail 2. **Bring up new:** - `mise run provision-indri -- --tags cv,docs,caddy --check --diff` (already validated on branch) - `mise run provision-indri -- --tags cv,docs,caddy` - `fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"` 3. **Verify:** `mise run services-check` and the curl checks listed in `docs/how-to/operations/{cv,docs}-on-indri.md` 4. **Cleanup commit + merge.** Total expected downtime: minutes (not the few-hour budget you authorized). ## Test plan - [ ] `mise run provision-indri -- --tags cv,docs --check --diff` clean - [ ] `mise run provision-indri -- --tags caddy --check --diff` shows only the cv + docs blocks changing as previewed in the PR thread - [ ] After cutover: `cv.eblu.me`, `cv.ops.eblu.me`, `docs.eblu.me`, `docs.ops.eblu.me` all return 200 - [ ] `cv.eblu.me/resume.pdf` includes `Content-Disposition: attachment` - [ ] A clean Quartz URL (e.g. `docs.eblu.me/explanation/agent-change-process`) resolves to the right page - [ ] `mise run services-check` clean - [ ] `mise run service-review --type ansible` shows cv and docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/342
2026-04-29 14:55:11 -07:00
kind: static
root: "{{ docs_content_dir }}"
try_html: true # Quartz: path → path/ → path.html → 404.html
- name: cv
host: "cv.{{ caddy_domain }}"
C1: migrate cv + docs from minikube to indri-native (#342) ## Summary Replace the cv (`cv.eblu.me`) and docs (`docs.eblu.me`) minikube Deployments with indri-native ansible roles. Caddy serves the extracted release tarballs directly via a new `kind: static` service-block — no daemon, no nginx pod, no ProxyGroup ingress on the request path. Mirrors the rationale of the recent devpi migration; part of the broader minikube wind-down. ## What's in this commit - `ansible/roles/{cv,docs}` — sentinel-gated tarball download + extract into `~/{cv,docs}/content/` - `ansible/roles/caddy/` — new `kind: static` branch in the Caddyfile template (encoded gzip, immutable cache headers for fingerprinted assets, optional `try_html` for Quartz-style clean URLs, optional per-path `download_paths` for the resume PDF's `Content-Disposition`) - `ansible/playbooks/indri.yml` — wires `cv` and `docs` roles before `caddy` - `service-versions.yaml` — both services flip to `type: ansible`. `docs.current-version` stays at `1.28.2` for this commit so `container-version-check` keeps passing while `containers/quartz/Dockerfile` still exists; it moves to the docs release tag in the cleanup commit - `.forgejo/workflows/{cv-deploy,build-blumeops}.yaml` — deploy step now bumps `cv_version`/`docs_version` in the role defaults and pushes; running ansible + purging the Fly cache is manual from gilbert (matches devpi) - Docs: `docs/how-to/operations/{cv,docs}-on-indri.md`, updated `docs/reference/services/{cv,docs}.md`, changelog fragment ## What is not in this commit The dead artifacts. After PR review and successful cutover, a follow-up commit deletes: - `argocd/apps/{cv,docs}.yaml` and `argocd/manifests/{cv,docs}/` - `containers/cv/`, `containers/quartz/` - `CONTAINER_TO_SERVICE['quartz']` mapping in `mise-tasks/container-version-check` - bumps `docs.current-version` in `service-versions.yaml` to the release tag ## Cutover plan (manual, from gilbert, after review) 1. **Take down old:** - Remove the cv and docs Applications: `argocd app delete cv --cascade && argocd app delete docs --cascade` - Verify k8s namespaces gone: `kubectl --context=minikube-indri get ns | grep -E '^(cv|docs)\\b'` (should be empty) - Verify tailnet MagicDNS no longer advertises the VIPs: `nslookup cv.tail8d86e.ts.net` and `nslookup docs.tail8d86e.ts.net` should both fail 2. **Bring up new:** - `mise run provision-indri -- --tags cv,docs,caddy --check --diff` (already validated on branch) - `mise run provision-indri -- --tags cv,docs,caddy` - `fly ssh console -a blumeops-proxy -C "sh -c 'rm -rf /tmp/cache && nginx -s reload'"` 3. **Verify:** `mise run services-check` and the curl checks listed in `docs/how-to/operations/{cv,docs}-on-indri.md` 4. **Cleanup commit + merge.** Total expected downtime: minutes (not the few-hour budget you authorized). ## Test plan - [ ] `mise run provision-indri -- --tags cv,docs --check --diff` clean - [ ] `mise run provision-indri -- --tags caddy --check --diff` shows only the cv + docs blocks changing as previewed in the PR thread - [ ] After cutover: `cv.eblu.me`, `cv.ops.eblu.me`, `docs.eblu.me`, `docs.ops.eblu.me` all return 200 - [ ] `cv.eblu.me/resume.pdf` includes `Content-Disposition: attachment` - [ ] A clean Quartz URL (e.g. `docs.eblu.me/explanation/agent-change-process`) resolves to the right page - [ ] `mise run services-check` clean - [ ] `mise run service-review --type ansible` shows cv and docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/342
2026-04-29 14:55:11 -07:00
kind: static
root: "{{ cv_content_dir }}"
download_paths:
- path: /resume.pdf
filename: erich-blume-resume.pdf
Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify (#190) ## Summary Deploy a cloud-free NVR stack for the GableCam (ReoLink Elite Floodlight at 192.168.1.159): - **Mosquitto** — shared MQTT broker in `mqtt` namespace (cluster-internal, no auth) - **Ntfy** — self-hosted push notifications in `ntfy` namespace, exposed at `ntfy.tail8d86e.ts.net` / `ntfy.ops.eblu.me` - **Frigate** — NVR with GableCam via HTTP-FLV, ONNX CPU detection, NFS recordings on sifaka, exposed at `nvr.tail8d86e.ts.net` / `nvr.ops.eblu.me` - **frigate-notify** — bridges Frigate detection events (person, car, dog, cat) to Ntfy alerts via MQTT Also includes: - Prometheus scrape target for Frigate metrics - Grafana dashboard for Frigate (status, inference speed, FPS, CPU/memory, storage) - Caddy reverse proxy entries for `nvr.ops.eblu.me` and `ntfy.ops.eblu.me` ## Prerequisites - [ ] Create NFS share `frigate` on sifaka (`/volume1/frigate`, RW for indri) - [ ] Create 1Password item "Reolink Floodlight Camera" in `blumeops` vault with `username` and `password` fields ## Deployment (after merge) ```bash argocd app sync apps argocd app sync mosquitto argocd app sync ntfy argocd app sync frigate argocd app sync grafana-config argocd app sync prometheus mise run provision-indri -- --tags caddy mise run services-check ``` ## Verification - [ ] Mosquitto pod running, accepting connections on 1883 - [ ] Ntfy web UI accessible at `ntfy.ops.eblu.me` - [ ] Frigate web UI at `nvr.ops.eblu.me` showing GableCam live feed - [ ] Object detection working (ONNX, person/car/dog/cat) - [ ] Recordings appearing in NFS share on sifaka - [ ] frigate-notify sending detection alerts to Ntfy - [ ] Prometheus scraping Frigate metrics - [ ] Grafana dashboard showing Frigate data Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/190
2026-02-14 21:27:44 -08:00
- name: nvr
host: "nvr.{{ caddy_domain }}"
backend: "https://nvr.tail8d86e.ts.net"
- name: authentik
host: "authentik.{{ caddy_domain }}"
backend: "https://authentik.tail8d86e.ts.net"
cache_policy: spa
Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify (#190) ## Summary Deploy a cloud-free NVR stack for the GableCam (ReoLink Elite Floodlight at 192.168.1.159): - **Mosquitto** — shared MQTT broker in `mqtt` namespace (cluster-internal, no auth) - **Ntfy** — self-hosted push notifications in `ntfy` namespace, exposed at `ntfy.tail8d86e.ts.net` / `ntfy.ops.eblu.me` - **Frigate** — NVR with GableCam via HTTP-FLV, ONNX CPU detection, NFS recordings on sifaka, exposed at `nvr.tail8d86e.ts.net` / `nvr.ops.eblu.me` - **frigate-notify** — bridges Frigate detection events (person, car, dog, cat) to Ntfy alerts via MQTT Also includes: - Prometheus scrape target for Frigate metrics - Grafana dashboard for Frigate (status, inference speed, FPS, CPU/memory, storage) - Caddy reverse proxy entries for `nvr.ops.eblu.me` and `ntfy.ops.eblu.me` ## Prerequisites - [ ] Create NFS share `frigate` on sifaka (`/volume1/frigate`, RW for indri) - [ ] Create 1Password item "Reolink Floodlight Camera" in `blumeops` vault with `username` and `password` fields ## Deployment (after merge) ```bash argocd app sync apps argocd app sync mosquitto argocd app sync ntfy argocd app sync frigate argocd app sync grafana-config argocd app sync prometheus mise run provision-indri -- --tags caddy mise run services-check ``` ## Verification - [ ] Mosquitto pod running, accepting connections on 1883 - [ ] Ntfy web UI accessible at `ntfy.ops.eblu.me` - [ ] Frigate web UI at `nvr.ops.eblu.me` showing GableCam live feed - [ ] Object detection working (ONNX, person/car/dog/cat) - [ ] Recordings appearing in NFS share on sifaka - [ ] frigate-notify sending detection alerts to Ntfy - [ ] Prometheus scraping Frigate metrics - [ ] Grafana dashboard showing Frigate data Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/190
2026-02-14 21:27:44 -08:00
- name: ntfy
host: "ntfy.{{ caddy_domain }}"
backend: "https://ntfy.tail8d86e.ts.net"
2026-03-02 20:39:51 -08:00
- name: ollama
host: "ollama.{{ caddy_domain }}"
backend: "https://ollama.tail8d86e.ts.net"
- name: mealie
host: "meals.{{ caddy_domain }}"
backend: "https://meals.tail8d86e.ts.net"
2026-04-08 17:54:12 -07:00
- name: paperless
host: "paperless.{{ caddy_domain }}"
backend: "https://paperless.tail8d86e.ts.net"
- name: sifaka
host: "nas.{{ caddy_domain }}"
backend: "http://sifaka:5000"
# Layer 4 (TCP) services
# Format: { port: external_port, backend: "host:port" }
caddy_tcp_services:
- port: 2222
backend: "localhost:2200" # Forgejo SSH
- port: 5432
Add borgmatic backups for authentik and immich databases (#314) ## Summary - Add `authentik` database (blumeops-pg cluster) to borgmatic pg_dump backups - Add `immich` database (immich-pg cluster) to borgmatic pg_dump backups - For immich-pg: new borgmatic managed role with `pg_read_all_data`, ExternalSecret, Tailscale LoadBalancer service, and Caddy L4 TCP proxy on port 5433 - Update backup docs to reflect all four CNPG databases + mealie SQLite ## Deploy plan Deploy order matters — k8s resources must exist before ansible can route to them: 1. **ArgoCD (databases app):** sync to pick up immich-pg borgmatic role, ExternalSecret, and Tailscale service ``` argocd app set blumeops-pg --revision feature/borgmatic-all-pg-backups argocd app sync blumeops-pg ``` 2. **Wait** for `immich-pg-tailscale` service to get a Tailscale IP and `immich-pg.tail8d86e.ts.net` to resolve 3. **Ansible (caddy):** deploy Caddy L4 route for port 5433 ``` mise run provision-indri -- --tags caddy ``` 4. **Ansible (borgmatic):** deploy updated config and .pgpass ``` mise run provision-indri -- --tags borgmatic ``` 5. **Verify:** trigger a manual borgmatic run and check all four pg_dump streams succeed ``` borgmatic --verbosity 1 2>&1 | grep -E '(Dumping|ERROR)' ``` ## Test plan - [x] `kubectl kustomize` builds cleanly - [x] `ansible --check --diff` for borgmatic and caddy show expected changes - [ ] ArgoCD sync succeeds for databases app - [ ] `immich-pg.tail8d86e.ts.net` resolves - [ ] `pg.ops.eblu.me:5433` accepts connections - [ ] `borgmatic --verbosity 1` dumps all four databases without errors Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/314
2026-03-27 16:59:58 -07:00
backend: "pg.tail8d86e.ts.net:5432" # PostgreSQL (blumeops-pg)
- port: 5433
backend: "immich-pg.tail8d86e.ts.net:5432" # PostgreSQL (immich-pg)
- port: "{{ sifaka_node_exporter_port }}"
backend: "sifaka:{{ sifaka_node_exporter_port }}" # Sifaka node_exporter
- port: "{{ sifaka_smartctl_exporter_port }}"
backend: "sifaka:{{ sifaka_smartctl_exporter_port }}" # Sifaka smartctl_exporter