Commit graph

71 commits

Author SHA1 Message Date
a87c997ee1 Expose Forgejo publicly at forge.eblu.me (#278)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m28s
## Summary

Expose Forgejo publicly at `forge.eblu.me` via the Fly.io reverse proxy — the first dynamic, authenticated public-facing service.

- **Forgejo hardening:** Domain changed to forge.eblu.me, SSH stays on forge.ops.eblu.me, reverse proxy trust headers configured, local registration locked to external-only (Authentik SSO)
- **Tailscale Ingress:** ExternalName Service + Ingress in tailscale-operator creates forge.tail8d86e.ts.net endpoint
- **Fly.io proxy:** nginx server block with rate-limited auth endpoints (3r/s), fail2ban with custom nginx-deny action, security headers, /swagger blocked, WebSocket support, 512m body limit
- **Authentik:** OAuth callback updated to forge.eblu.me
- **DNS/TLS:** CNAME record in Pulumi, cert in fly-setup
- **Rename:** ~29 files updated from forge.ops.eblu.me to forge.eblu.me (HTTPS refs only; SSH, container builds, and Caddy table kept as-is)

## Deployment Order

1. `mise run provision-indri -- --tags forgejo` (config changes)
2. Verify forge.ops.eblu.me still works
3. `argocd app set tailscale-operator --revision feature/forge-public && argocd app sync tailscale-operator`
4. Verify `curl https://forge.tail8d86e.ts.net`
5. `cd fly && fly deploy`
6. Verify pre-DNS: `curl -H "Host: forge.eblu.me" https://blumeops-proxy.fly.dev/`
7. `fly certs add forge.eblu.me -a blumeops-proxy`
8. `argocd app set authentik --revision feature/forge-public && argocd app sync authentik`
9. `mise run dns-preview && mise run dns-up`
10. Full verification (see below)
11. Rehearse `mise run fly-shutoff`
12. After merge: reset ArgoCD revisions to main, re-sync

## Verification Checklist

- [ ] forge.eblu.me loads, shows public repos
- [ ] forge.ops.eblu.me still works from tailnet
- [ ] SSH clone via forge.ops.eblu.me:2222 works
- [ ] HTTPS clone via forge.eblu.me works
- [ ] UI shows forge.eblu.me for HTTPS clone, forge.ops.eblu.me for SSH
- [ ] /swagger returns 403
- [ ] Rapid login attempts trigger 429 rate limit
- [ ] fail2ban bans after 5 failed logins in 10 minutes
- [ ] ArgoCD can still sync (SSH unaffected)
- [ ] `mise run fly-shutoff` stops all public traffic
- [ ] `mise run services-check` passes

Reviewed-on: #278
2026-03-03 08:40:41 -08:00
Forgejo Actions
0f79c61c42 Update docs release to v1.12.1
- Built changelog from towncrier fragments

[skip ci]
2026-03-02 18:17:07 -08:00
Forgejo Actions
847e47eaf3 Update docs release to v1.12.0
- Built changelog from towncrier fragments

[skip ci]
2026-03-01 17:24:09 -08:00
Forgejo Actions
fa223f8e3b Update docs release to v1.11.5
- Built changelog from towncrier fragments

[skip ci]
2026-02-26 07:56:02 -08:00
be3cdad1cb Add HA for CV and Docs: zero-downtime deploys (#273)
## Summary
- Set `replicas: 2` with `maxUnavailable: 0` / `maxSurge: 1` on CV and Docs deployments so rolling updates never drop below 2 ready pods
- Add PodDisruptionBudgets (`minAvailable: 1`) to protect against node drains and cluster maintenance
- Add Fly.io cache purge step to `cv-deploy.yaml` workflow (docs already had this) so CV deploys don't serve stale cached content

## Deployment and Testing
- [ ] `argocd app diff cv` / `argocd app diff docs` from branch
- [ ] Deploy from branch: `argocd app set cv --revision feature/ha-cv-docs-zero-downtime && argocd app sync cv`
- [ ] Verify 2 pods running: `kubectl get pods -n cv --context=minikube-indri`
- [ ] Test rolling restart: `kubectl rollout restart deployment/cv -n cv --context=minikube-indri`
- [ ] During rollout, confirm continuous availability via `curl -I https://cv.eblu.me`
- [ ] After merge: reset ArgoCD to main, re-sync both apps

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/273
2026-02-26 07:53:21 -08:00
Forgejo Actions
4736c7e9bd Update docs release to v1.11.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-25 07:04:23 -08:00
9b44a8ec51 Add kustomize images: and configMapGenerator: across services (#264)
## Summary

- Move hardcoded image tags to kustomization.yaml `images:` transformer across **22 services** — image names in manifests become version-agnostic templates, with tags centralized in one place per service
- Replace hand-written ConfigMap manifests with `configMapGenerator:` in **12 services** — config data extracted to standalone files, generated ConfigMaps include content hashes that trigger automatic pod rollouts on changes
- Create new `kustomization.yaml` for **forgejo-runner** and **nvidia-device-plugin** (switches ArgoCD from directory mode to kustomize mode, rendered output identical)

### Services modified

**Images only (8):** cv, devpi, docs, kube-state-metrics, miniflux, navidrome, teslamate, torrent

**Images + configMapGenerator (10):** alloy-k8s, forgejo-runner, frigate, grafana, homepage, kiwix, loki, mosquitto, ntfy, prometheus

**Images only, no configMapGenerator (4):** authentik (skip blueprints — special YAML tags), tailscale-operator-base (Deployment only, CRD image fields left as-is)

**Skipped entirely (6):** argocd (remote upstream), databases (no image fields), external-secrets, grafana-config (cross-kustomization dashboards), immich (Helm-managed), 1password-connect/cloudnative-pg (no kustomization.yaml)

### What changes at deploy time

- **images:** — no functional diff, `kustomize build` produces identical output with tags
- **configMapGenerator:** — ConfigMap names gain hash suffixes (e.g., `prometheus-config` → `prometheus-config-6f42fhctcb`) and all Deployment/StatefulSet/DaemonSet references are updated automatically. Pods will restart once per service on first sync due to the name change

## Test plan

- [x] `kubectl kustomize` builds all 30 service directories successfully
- [x] Image tags verified in rendered output for all modified services
- [x] ConfigMap hash suffixes verified in rendered output
- [x] ConfigMap references in Deployments/StatefulSets confirmed to use hashed names
- [x] All pre-commit hooks pass (yamllint, shellcheck, prettier, etc.)
- [ ] `argocd app diff` each service to confirm only expected ConfigMap name changes
- [ ] Deploy from branch starting with a low-risk service (e.g., mosquitto)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/264
2026-02-24 14:25:19 -08:00
Forgejo Actions
2f78d180e8 Update docs release to v1.11.3
- Built changelog from towncrier fragments

[skip ci]
2026-02-23 21:04:33 -08:00
Forgejo Actions
dda7d719b3 Update docs release to v1.11.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-22 17:52:05 -08:00
Forgejo Actions
c21cf54847 Update docs release to v1.11.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-22 10:21:19 -08:00
Forgejo Actions
627caeb61f Update docs release to v1.11.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-22 09:16:00 -08:00
a72a0d8e8e Update all container images to new upstream-version tagging scheme (#238)
## Summary
- Updates all 15 container image references across 14 ArgoCD manifest files
- Migrates from old internal `vX.Y.Z` tags to new `v<upstream-version>-<sha>` format
- Covers: authentik, cv, devpi, forgejo-runner, homepage, kiwix-serve, kubectl, miniflux, navidrome, ntfy, quartz, teslamate, transmission

## Deployment and Testing
- [ ] Sync all ArgoCD apps on branch revision
- [ ] Verify all services come up healthy
- [ ] Merge and re-sync on main
- [ ] Clean up old-style tags from zot registry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/238
2026-02-21 15:58:11 -08:00
Forgejo Actions
18f1ac61fc Update docs release to v1.10.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-19 20:45:43 -08:00
Forgejo Actions
530460171a Update docs release to v1.9.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-17 07:30:39 -08:00
Forgejo Actions
8a48171acf Update docs release to v1.9.3
- Built changelog from towncrier fragments

[skip ci]
2026-02-16 21:25:47 -08:00
Forgejo Actions
994bed0693 Update docs release to v1.9.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-16 15:51:12 -08:00
Forgejo Actions
26c1ff5ce6 Update docs release to v1.9.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-15 07:43:00 -08:00
Forgejo Actions
b2b5879e3c Update docs release to v1.9.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-14 21:32:27 -08:00
04c7f3c45a Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify (#190)
## Summary

Deploy a cloud-free NVR stack for the GableCam (ReoLink Elite Floodlight at 192.168.1.159):

- **Mosquitto** — shared MQTT broker in `mqtt` namespace (cluster-internal, no auth)
- **Ntfy** — self-hosted push notifications in `ntfy` namespace, exposed at `ntfy.tail8d86e.ts.net` / `ntfy.ops.eblu.me`
- **Frigate** — NVR with GableCam via HTTP-FLV, ONNX CPU detection, NFS recordings on sifaka, exposed at `nvr.tail8d86e.ts.net` / `nvr.ops.eblu.me`
- **frigate-notify** — bridges Frigate detection events (person, car, dog, cat) to Ntfy alerts via MQTT

Also includes:
- Prometheus scrape target for Frigate metrics
- Grafana dashboard for Frigate (status, inference speed, FPS, CPU/memory, storage)
- Caddy reverse proxy entries for `nvr.ops.eblu.me` and `ntfy.ops.eblu.me`

## Prerequisites

- [ ] Create NFS share `frigate` on sifaka (`/volume1/frigate`, RW for indri)
- [ ] Create 1Password item "Reolink Floodlight Camera" in `blumeops` vault with `username` and `password` fields

## Deployment (after merge)

```bash
argocd app sync apps
argocd app sync mosquitto
argocd app sync ntfy
argocd app sync frigate
argocd app sync grafana-config
argocd app sync prometheus
mise run provision-indri -- --tags caddy
mise run services-check
```

## Verification

- [ ] Mosquitto pod running, accepting connections on 1883
- [ ] Ntfy web UI accessible at `ntfy.ops.eblu.me`
- [ ] Frigate web UI at `nvr.ops.eblu.me` showing GableCam live feed
- [ ] Object detection working (ONNX, person/car/dog/cat)
- [ ] Recordings appearing in NFS share on sifaka
- [ ] frigate-notify sending detection alerts to Ntfy
- [ ] Prometheus scraping Frigate metrics
- [ ] Grafana dashboard showing Frigate data

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/190
2026-02-14 21:27:44 -08:00
Forgejo Actions
02b1397f1a Update docs release to v1.8.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-13 10:36:04 -08:00
48ce5b4120 Recategorize homepage into Content and Misc groups (#179)
## Summary
- Replace the three homepage groups (Apps, Observability, Infrastructure) with two cleaner groups
- **Content**: Immich, Kiwix, Miniflux, DJ, Grafana
- **Misc**: CV, TeslaMate, Transmission, Docs, Prometheus, PyPI

## Deployment and Testing
- [ ] Sync affected ingresses via ArgoCD (all 11 services)
- [ ] Verify homepage shows the two new groups correctly

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/179
2026-02-13 09:09:22 -08:00
Forgejo Actions
e21277ae83 Update docs release to v1.8.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 19:20:27 -08:00
Forgejo Actions
70d8881959 Update docs release to v1.7.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 14:13:12 -08:00
Forgejo Actions
200be39492 Update docs release to v1.7.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 11:46:38 -08:00
Forgejo Actions
a800bdc8b9 Update docs release to v1.6.9
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:28:40 -08:00
Forgejo Actions
b36b30ef7a Update docs release to v1.6.8
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:12:50 -08:00
Forgejo Actions
0528a6f712 Update docs release to v1.6.7
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 18:07:11 -08:00
Forgejo Actions
9e0487b523 Update docs release to v1.6.6
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 17:57:59 -08:00
Forgejo Actions
996afbcf6f Update docs release to v1.6.5
[skip ci]
2026-02-11 17:10:29 -08:00
Forgejo Actions
6ce03df819 Update docs release to v1.6.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 01:01:23 +00:00
Forgejo Actions
e5d1e795e0 Update docs release to v1.6.3
[skip ci]
2026-02-12 00:46:35 +00:00
Forgejo Actions
a75089d8ef Update docs release to v1.6.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 00:35:02 +00:00
Forgejo Actions
362ae22ab7 Update docs release to v1.6.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:37:34 +00:00
Forgejo Actions
eca01a9546 Update docs release to v1.6.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:33:57 +00:00
Forgejo Actions
ab6661f5dd Update docs release to v1.5.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 20:17:12 +00:00
Forgejo Actions
a106f92c38 Update docs release to v1.5.3
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 15:53:49 +00:00
Forgejo Actions
92a1081302 Update docs release to v1.5.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-09 15:30:21 +00:00
e6cf7e47e0 Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m8s
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly

## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.

## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards

## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
Forgejo Actions
c8d0af6644 Update docs release to v1.5.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 18:06:46 +00:00
Forgejo Actions
c46d55060d Update docs release to v1.5.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 10:37:30 +00:00
64a78422b1 Add Fly.io public reverse proxy for docs.eblu.me (#120)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 9s
## Summary

- Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale
- First service exposed: `docs.eblu.me` — the Quartz static docs site
- Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME
- Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow

## Key details

- Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed
- Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts
- nginx caches aggressively for the static site; health check is on the default_server block
- ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only
- DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev`

## Test plan

- [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok`
- [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status`
- [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert
- [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected)
- [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120
2026-02-08 02:36:19 -08:00
Forgejo Actions
11c76d4768 Update docs release to v1.4.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:45:40 +00:00
Forgejo Actions
ab7efd8c1c Update docs release to v1.4.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:27:23 +00:00
Forgejo Actions
3f5017f732 Update docs release to v1.4.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-08 05:03:34 +00:00
Forgejo Actions
808bc507d8 Update docs release to v1.3.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-05 01:22:10 +00:00
Forgejo Actions
a03a9faaad Update docs release to v1.3.3
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 22:40:18 +00:00
Forgejo Actions
e15caec898 Update docs release to v1.3.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 16:47:27 +00:00
Forgejo Actions
4aeade1543 Update docs release to v1.3.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 16:26:24 +00:00
Forgejo Actions
1835e3e80e Update docs release to v1.3.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 16:14:08 +00:00
Forgejo Actions
e405a48881 Update docs release to v1.2.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-04 05:18:37 +00:00