Commit graph

152 commits

Author SHA1 Message Date
ee141edeb0 Fix frigate-notify config structure
Use MQTT-only event collection (disable webapi), fix ntfy alert
config nesting to match frigate-notify's expected format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 13:41:18 -08:00
0a871a40b4 Switch go2rtc streams from HTTP-FLV to RTSP
Camera had HTTP/RTMP disabled. RTSP is the Frigate-recommended
protocol for ReoLink cameras.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 13:37:25 -08:00
74a572084c Switch Frigate detector from ONNX to CPU (TFLite)
ONNX detector was crashing due to missing model path config.
CPU/TFLite works out of the box on ARM64 and is sufficient for
single-camera detection of large objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 13:30:59 -08:00
efc71c9f22 Add upstream relay config to ntfy for instant iOS push notifications
Configures ntfy to forward poll requests through ntfy.sh for APNs
delivery. Without this, iOS delays notifications by 20-30+ minutes.
Free tier allows 250 messages/day (no account needed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 13:24:56 -08:00
15c932922d Update homepage layout for new group structure
Replace Misc group with Infrastructure and Services in the homepage
layout configuration to match the reorganized ingress annotations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 12:55:20 -08:00
7b17729085 Address PR #190 review feedback
- Add bird to tracked objects (catches escaped chickens/ducks)
- Add DHCP reservation comment for GableCam IP
- Remove explicit detect dimensions (Frigate auto-detects from stream)
- Reorganize homepage groups: ArgoCD/Prometheus/PyPI to Infrastructure,
  CV/Docs/TeslaMate/Transmission to Services

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 12:53:38 -08:00
d0360c1585 Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify
Add four new services for cloud-free camera recording and alerting:
- Mosquitto MQTT broker (shared service in mqtt namespace)
- Ntfy push notifications (tailnet-accessible)
- Frigate NVR with GableCam via HTTP-FLV, ONNX detection, NFS recordings
- frigate-notify bridging detection events to Ntfy

Also adds Prometheus scrape target, Grafana dashboard, and Caddy
reverse proxy entries for nvr.ops.eblu.me and ntfy.ops.eblu.me.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 12:39:02 -08:00
b77ae19f20 Fix 1Password Connect credentials for chart 2.3.0
Chart 2.3.0 mounts credentials as a file with standard k8s base64
encoding. The old double-encoding workaround (credentials-base64 in
stringData) now produces invalid JSON. Use raw JSON (credentials-file)
instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:30:45 -08:00
8f4708e26f Fix navidrome image tag: remove v prefix (0.60.3 not v0.60.3)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:23:12 -08:00
b3747f6c95 Tier 1 version bumps (#186)
All checks were successful
Build Container / build (push) Successful in 8s
## Summary

Audit and upgrade of all deployed images, helm charts, and custom container Dockerfiles to latest stable versions. This PR covers Tier 1 (low-risk minor/patch bumps only).

### Upstream images
| Image | Old | New |
|-------|-----|-----|
| kube-state-metrics | v2.13.0 | v2.18.0 |
| prometheus | v3.2.1 | v3.9.1 |
| loki | 3.3.2 | 3.6.5 |
| alloy | v1.5.1 | v1.13.1 |
| tailscale (proxy + operator) | v1.92.5 | v1.94.1 |
| navidrome | :latest | v0.60.3 (pinned) |

### Helm charts
| Chart | Old | New |
|-------|-----|-----|
| CloudNativePG | v0.27.0 | v0.27.1 |
| 1Password Connect | 2.2.1 | 2.3.0 |

### Custom containers (Dockerfiles updated, images not yet tagged)
| Container | Changes | New tag |
|-----------|---------|---------|
| miniflux | 2.2.16→2.2.17 (security), alpine 3.22 | v1.1.0 |
| kubectl | v1.34.1→v1.34.4, alpine 3.22 | v1.1.0 |
| kiwix-serve | alpine 3.22 | v1.1.0 |
| nettest | alpine 3.22 | v0.14.0 |
| transmission | alpine 3.22, pkg 4.0.6-r4 | v1.1.0 |

All custom containers verified with local `dagger call build`.

### Deferred to Tier 2 (separate PRs)
- Forgejo runner 6→12 (major version scheme change)
- Docker DinD 27→29
- Grafana chart 8→11 (repo migration)
- External Secrets 1→2 (breaking changes)
- Python 3.12→3.13, Elixir 1.18→1.19, Node 22→24
- Transmission 4.0.6→4.1.0 (not in Alpine yet)

## Deployment

After merge:
1. Tag custom containers: `mise run container-tag-and-release <name> <version>` for each
2. Wait for CI builds to complete
3. `argocd app sync apps` then sync individual apps, or let ArgoCD auto-detect

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/186
2026-02-13 17:16:37 -08:00
d5c00192d5 Configure DinD to use Zot as pull-through registry mirror (#183)
## Summary
- Add `daemon.json` with `registry-mirrors` to the forgejo-runner ConfigMap, pointing DinD at `http://host.minikube.internal:5050`
- Mount `daemon.json` into the DinD sidecar at `/etc/docker/daemon.json` via `subPath`
- Docker Hub pulls during Dagger CI builds will now route through Zot's pull-through cache, reducing bandwidth and avoiding rate limits

## Deployment and Testing
- [ ] `argocd app sync forgejo-runner`
- [ ] Exec into DinD container: `docker info` should show the registry mirror
- [ ] Trigger a workflow build and check Zot logs for cache hits

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/183
2026-02-13 12:36:03 -08:00
ba9b251759 Update forgejo-runner image to v3.2.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 12:16:52 -08:00
d0c18043b7 Revert forgejo-runner image to v3.1.0
v3.2.0 build failed (GitHub download timeout), rolling back to
working image while it rebuilds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 12:07:51 -08:00
fdd3f6483a Update forgejo-runner image to v3.2.0
All checks were successful
Build Container / build (push) Successful in 7m31s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:08:57 -08:00
Forgejo Actions
02b1397f1a Update docs release to v1.8.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-13 10:36:04 -08:00
0098ac37e0 Move non-secret runner env vars to deployment spec (#181)
## Summary
- Move FORGEJO_URL, RUNNER_NAME, and RUNNER_LABELS from ExternalSecret template to deployment env vars
- ExternalSecret now only contains the actual secret (RUNNER_TOKEN)
- Image version changes in RUNNER_LABELS now trigger automatic pod rollouts

## Deployment
1. Merge this PR
2. `argocd app sync forgejo-runner` — the deployment spec change will auto-roll the pod

No manual restart needed — that's the whole point :)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/181
2026-02-13 10:29:23 -08:00
52bbf88aa6 Update forgejo-runner image to v3.1.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:21:43 -08:00
4942dee182 Update homepage layout for new Content/Misc groups
Replace old Apps/Observability/Infrastructure layout entries with
Content and Misc to match the recategorized ingress annotations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 09:16:40 -08:00
ca6a845604 Move ArgoCD to Misc homepage group and rename ingress file
ArgoCD's tailscale ingress was missed in the recategorization (filed as
service-tailscale.yaml instead of ingress-tailscale.yaml). Fix the group
annotation and rename the file to match the convention used by all other
services.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 09:13:32 -08:00
48ce5b4120 Recategorize homepage into Content and Misc groups (#179)
## Summary
- Replace the three homepage groups (Apps, Observability, Infrastructure) with two cleaner groups
- **Content**: Immich, Kiwix, Miniflux, DJ, Grafana
- **Misc**: CV, TeslaMate, Transmission, Docs, Prometheus, PyPI

## Deployment and Testing
- [ ] Sync affected ingresses via ArgoCD (all 11 services)
- [ ] Verify homepage shows the two new groups correctly

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/179
2026-02-13 09:09:22 -08:00
Forgejo Actions
e21277ae83 Update docs release to v1.8.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 19:20:27 -08:00
9c789a1868 Fix cache hit rate on APM and Fly.io dashboards (#177)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m19s
## Summary
- Remove `match_all = true` from `flyio_nginx_cache_requests_total` in Alloy so the metric only counts requests that go through the proxy cache (excludes health checks with empty `cache_status`)
- Change dashboard queries from `rate(...[5m])` to `increase(...[$__range])` — aggregates over the full dashboard time window instead of a 5-minute sliding window, giving meaningful ratios for low-traffic static sites
- Add null/NaN value mapping to show "No traffic" in neutral color instead of blank/red

## Root cause
Health check requests from Fly.io hit the default nginx server block (no `proxy_cache`), producing entries with empty `upstream_cache_status`. With `match_all = true`, these were counted in the cache metric, diluting the Fly.io dashboard ratio. For APM dashboards, `rate()[5m]` on low-traffic sites with 24h cache validity almost always returns either all-HITs (100%) or no data (blank → red background).

## Deployment
- Fly.io proxy redeploy needed for Alloy config change
- ArgoCD sync for dashboard ConfigMap changes

## Test plan
- [ ] Redeploy Fly.io proxy
- [ ] Sync grafana-config in ArgoCD
- [ ] Verify CV APM cache hit ratio shows a real percentage (not 100%)
- [ ] Verify Docs APM shows "No traffic" in neutral color when idle, real ratio when visited
- [ ] Verify Fly.io proxy dashboard cache ratio excludes health checks

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/177
2026-02-12 18:40:48 -08:00
9717863f65 Update CV release to v1.0.3, add X-Clacks-Overhead header (#176)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m5s
## Summary
- Update CV release URL from v1.0.2 to v1.0.3
- Add `X-Clacks-Overhead: GNU Terry Pratchett` header to both `docs.eblu.me` and `cv.eblu.me` server blocks in the Fly.io proxy nginx config

## Deployment and Testing
- [ ] Sync CV app: `argocd app sync cv`
- [ ] Verify CV is serving v1.0.3 content
- [ ] Deploy fly proxy (workflow or `mise run fly-deploy`)
- [ ] Verify header: `curl -sI https://docs.eblu.me | grep -i clacks`
- [ ] Verify header: `curl -sI https://cv.eblu.me | grep -i clacks`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/176
2026-02-12 17:08:22 -08:00
ed5c9c9b48 Update CV release to v1.0.2 (#175)
## Summary
- Update `CV_RELEASE_URL` in cv deployment from v1.0.1 to v1.0.2

## Deployment and Testing
- [ ] `argocd app sync cv` after merge
- [ ] Verify cv.eblu.me serves updated content

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/175
2026-02-12 16:18:55 -08:00
Forgejo Actions
70d8881959 Update docs release to v1.7.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 14:13:12 -08:00
7dc03c0af1 Add CV to services-check, update homepage link (#174)
## Summary
- Add CV to services-check (tailnet endpoint + public cv.eblu.me)
- Update CV homepage annotation to point to cv.eblu.me instead of cv.ops.eblu.me

## Deployment and Testing
- [ ] `argocd app sync cv` (homepage link change)
- [ ] `mise run services-check` passes

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/174
2026-02-12 14:10:03 -08:00
df372fccb6 Expose CV publicly at cv.eblu.me (#173)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m57s
## Summary
- Add nginx server block for `cv.eblu.me` (static site, same pattern as docs)
- Add DNS CNAME record in Pulumi (`cv.eblu.me` → `blumeops-proxy.fly.dev`)
- Add `cv.eblu.me` cert to `fly-setup` mise task
- Tag CV Tailscale ingress with `tag:flyio-target` for ACL access
- Remove `/_error` test endpoint from docs proxy

## Deployment and Testing
- [ ] `argocd app set cv --revision cv/public-cv-eblu-me && argocd app sync cv`
- [ ] `fly certs add cv.eblu.me -a blumeops-proxy`
- [ ] `mise run fly-deploy`
- [ ] Verify proxy: `curl -I -H "Host: cv.eblu.me" https://blumeops-proxy.fly.dev/`
- [ ] `mise run dns-preview` then `mise run dns-up`
- [ ] Verify live: `curl -I https://cv.eblu.me`
- [ ] Merge, then `argocd app set cv --revision main && argocd app sync cv`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/173
2026-02-12 14:05:00 -08:00
a68542a602 Update CV release to v1.0.1 (#172)
## Summary
- Update `CV_RELEASE_URL` from v0.1.0 to v1.0.1 in the CV deployment manifest

## Deployment and Testing
- [ ] `argocd app sync cv` after merge
- [ ] Verify cv.ops.eblu.me serves updated resume

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/172
2026-02-12 13:38:05 -08:00
Forgejo Actions
200be39492 Update docs release to v1.7.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 11:46:38 -08:00
01e19023ee Add CV/resume web app at cv.ops.eblu.me (#169)
## Summary
- nginx container (`containers/cv/`) downloads and serves a content tarball at startup (same pattern as quartz)
- ArgoCD app + k8s manifests (deployment, service, Tailscale ingress)
- Caddy route for `cv.ops.eblu.me`
- Deploy workflow: resolves "latest" or specific version from Forgejo packages, updates deployment, syncs ArgoCD
- Content is built and released from the separate [cv repo](https://forge.ops.eblu.me/eblume/cv)

## Deployment steps (after merge)
1. `mise run container-tag-and-release cv v1.0.0`
2. Run "Release CV" workflow in cv repo (SPECIFIC_VERSION `v0.1.0`)
3. Run "Deploy CV" workflow in blumeops (default: latest)
4. `mise run provision-indri -- --tags caddy`
5. Verify at `https://cv.ops.eblu.me/`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/169
2026-02-12 11:09:41 -08:00
Forgejo Actions
a800bdc8b9 Update docs release to v1.6.9
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:28:40 -08:00
Forgejo Actions
b36b30ef7a Update docs release to v1.6.8
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:12:50 -08:00
Forgejo Actions
0528a6f712 Update docs release to v1.6.7
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 18:07:11 -08:00
Forgejo Actions
9e0487b523 Update docs release to v1.6.6
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 17:57:59 -08:00
20a25557d6 Bump runner image to v3.0.2 (restore Docker CLI)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 17:53:04 -08:00
0006e6bf17 Bump runner image to v3.0.1 (restore Node.js)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 17:38:45 -08:00
3d84483513 Update runner job image to forgejo-runner:v3.0.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 17:27:50 -08:00
95364dcb48 Simplify runner image (Dagger Phase 3) (#162)
All checks were successful
Build Container / build (push) Successful in 1m13s
## Summary

With Phases 1 and 2 complete, the runner image no longer needs most of its bundled tools. This PR strips it down and adds what was missing.

**Removed** (now inside Dagger containers):
- Node.js 24.x
- Docker CLI + buildx plugin
- skopeo
- gnupg, lsb-release, xz-utils

**Added:**
- `tzdata` — fixes the TZ env var (#159, #160, #161) so `TZ=America/Los_Angeles` actually works
- `flyctl` — was being installed from scratch every release

**Workflow changes:**
- Remove "Ensure Dagger CLI" bootstrap steps from both workflows (Dagger is in the image)
- Remove "Install flyctl" step from build-blumeops (flyctl is in the image)
- Remove job-level `TZ` from build-blumeops (moved to runner configmap `runner.envs`)
- Set `TZ: America/Los_Angeles` in runner configmap so all job containers inherit it

## Deployment

After merge:
1. Build and release the new runner image: `mise run container-release forgejo-runner v2.0.0`
2. Sync the runner: `argocd app sync forgejo-runner`
3. Verify: `kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date` (but the real test is running a docs release and checking the changelog date)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/162
2026-02-11 17:24:20 -08:00
Forgejo Actions
996afbcf6f Update docs release to v1.6.5
[skip ci]
2026-02-11 17:10:29 -08:00
Forgejo Actions
6ce03df819 Update docs release to v1.6.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 01:01:23 +00:00
2a04ab26b7 Mount host zoneinfo into runner for TZ support (#160)
## Summary

The `TZ=America/Los_Angeles` env var from #159 has no effect because the `forgejo/runner` image doesn't ship tzdata. Mount the node's `/usr/share/zoneinfo` into the container so the timezone database is available.

## Deployment

After merge, sync forgejo-runner and verify:
```
argocd app sync forgejo-runner
kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date
# Should show PST/PDT, not UTC
```

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/160
2026-02-11 16:57:11 -08:00
42ebc2b122 Fix Forgejo runner timezone (UTC -> America/Los_Angeles) (#159)
## Summary

- Set `TZ=America/Los_Angeles` on the Forgejo runner container

The runner pod defaults to UTC. When releases are cut in the evening PST, towncrier stamps changelog entries with tomorrow's date (e.g., v1.6.2 shows 2026-02-12 despite being released on the evening of Feb 11 PST).

## Deployment

After merge, sync the forgejo-runner ArgoCD app:
```
argocd app sync forgejo-runner
```

The runner pod will restart with the new timezone. Note: the v1.6.2 changelog entry will remain dated 2026-02-12; future entries will use PST dates, so dates may appear non-sequential once.

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/159
2026-02-11 16:53:41 -08:00
Forgejo Actions
e5d1e795e0 Update docs release to v1.6.3
[skip ci]
2026-02-12 00:46:35 +00:00
Forgejo Actions
a75089d8ef Update docs release to v1.6.2
- Built changelog from towncrier fragments

[skip ci]
2026-02-12 00:35:02 +00:00
1bc2b421a8 Adopt Dagger CI for container builds (Phase 1) (#156)
All checks were successful
Build Container / build (push) Successful in 13s
## Summary

- Add Dagger Python module (`.dagger/`) with `build` and `publish` functions for container images
- Replace Docker buildx + skopeo composite action with `dagger call publish` in `build-container.yaml`
- BuildKit's native push is compatible with Zot — **skopeo workaround eliminated**
- Add Dagger CLI (v0.19.11) to forgejo-runner Dockerfile, bump runner to v2.6.0
- Bootstrap step in workflow curl-installs dagger if not in runner (for first build on v2.5.1 runner)
- Delete old `.forgejo/actions/build-push-image/` composite action
- Add GPLv3 LICENSE

## Verified locally

- `dagger call build --src=. --container-name=nettest` — builds ✓
- `dagger call publish --src=. --container-name=nettest --version=dagger-test` — pushed to Zot ✓
- `dagger call build --src=. --container-name=forgejo-runner` — new runner image builds ✓
- Dagger CLI accessible inside built runner image ✓

## Deployment sequence (after merge)

1. `mise run container-tag-and-release forgejo-runner v2.6.0` — old runner bootstraps dagger via curl, builds new runner
2. `argocd app sync forgejo-runner` — runner restarts with v2.6.0 (dagger baked in)
3. `mise run container-tag-and-release nettest v0.13.0` — end-to-end test of new pipeline
4. `mise run container-list` — verify tags

## Not included (future phases)

- Phase 2: docs build + Forgejo packages migration
- Phase 3: runner simplification (remove skopeo, Node.js, etc.)
- Phase 4: future workflows

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/156
2026-02-11 15:38:31 -08:00
Forgejo Actions
362ae22ab7 Update docs release to v1.6.1
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:37:34 +00:00
Forgejo Actions
eca01a9546 Update docs release to v1.6.0
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 21:33:57 +00:00
Forgejo Actions
ab6661f5dd Update docs release to v1.5.4
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 20:17:12 +00:00
Forgejo Actions
a106f92c38 Update docs release to v1.5.3
- Built changelog from towncrier fragments

[skip ci]
2026-02-11 15:53:49 +00:00
f0ac04fb8a Bootstrap buildx: revert to docker build, bump runner to v2.5.1 (#148)
All checks were successful
Build Container / build (push) Successful in 1m56s
## Summary
- Temporarily revert composite action to `docker build` so we can build the runner image (chicken-and-egg: current runner v2.5.0 doesn't have buildx)
- Bump runner label to `v2.5.1` so after sync the new runner image (with buildx) gets used

## Deployment plan
1. Merge this PR
2. Tag `forgejo-runner-v2.5.1` — builds with legacy `docker build` (one last time)
3. Sync forgejo-runner in ArgoCD to pick up the v2.5.1 label
4. Follow-up PR: switch action back to `docker buildx build`
5. Tag `nettest-v0.12.0` to verify buildx works end-to-end

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/148
2026-02-10 21:17:14 -08:00