Plan verified and marked complete with all checklist items checked
(except IoT VLAN isolation which is a separate plan). Updated open
questions with resolved decisions. Updated changelog fragment to
reflect full scope of deployment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add four new services for cloud-free camera recording and alerting:
- Mosquitto MQTT broker (shared service in mqtt namespace)
- Ntfy push notifications (tailnet-accessible)
- Frigate NVR with GableCam via HTTP-FLV, ONNX detection, NFS recordings
- frigate-notify bridging detection events to Ntfy
Also adds Prometheus scrape target, Grafana dashboard, and Caddy
reverse proxy entries for nvr.ops.eblu.me and ntfy.ops.eblu.me.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reflect actual UX7 zone-based firewall UI, correct streaming port
(8096 not 443), note indri DHCP reservation, mark plan as completed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Abandon the UniFi Pulumi IaC approach after provider bugs caused a network outage (no-op update reset undeclared properties on the default LAN network)
- Remove untracked IaC artifacts (`pulumi/unifi/`, `mise-tasks/unifi-preview`, `mise-tasks/unifi-up`) locally
- Mark `add-unifi-pulumi-stack` plan as Abandoned with explanation
- Create new `segment-home-network` plan for manual three-network segmentation (Main/IoT/Guest) via UX7 web UI
- Rewrite UniFi reference card to remove all Pulumi/IaC references
- Update plan and how-to indexes
## Test plan
- [x] `docs-check-links` passes
- [x] `docs-check-index` passes
- [x] Pre-commit hooks pass
- [ ] Review segmentation plan for completeness before executing manually
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/189
## Summary
- Add new how-to guide (`connect-to-postgres.md`) with the `psql` command using `op read` for 1Password credentials
- Add "Database" section to the how-to index linking to the new guide
- Link the new guide from the PostgreSQL reference card's Related section
## Test plan
- [x] Verified `psql` connection works from gilbert using the documented command
- [ ] Review doc formatting and content
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/188
## Summary
- Replace `op item get --fields` with `op read` for secrets (matches playbook and CLAUDE.md guidance)
- Change `tags: [<role>]` to `tags: <role>` to match actual playbook style
- Remove redundant `listen:` from handler example, add `changed_when: true`
- Name handler after specific service (e.g. `Restart <service>`) to match real roles
- Add `last-reviewed: 2026-02-13` frontmatter
## Also noted (not fixed here)
Two other docs still use the old `op item get` pattern:
- `docs/how-to/troubleshooting.md:72` (ArgoCD login command)
- `docs/how-to/gandi-operations.md:35` (Gandi token export)
These can be fixed in their own review cycles.
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/185
## Summary
- Add `daemon.json` with `registry-mirrors` to the forgejo-runner ConfigMap, pointing DinD at `http://host.minikube.internal:5050`
- Mount `daemon.json` into the DinD sidecar at `/etc/docker/daemon.json` via `subPath`
- Docker Hub pulls during Dagger CI builds will now route through Zot's pull-through cache, reducing bandwidth and avoiding rate limits
## Deployment and Testing
- [ ] `argocd app sync forgejo-runner`
- [ ] Exec into DinD container: `docker info` should show the registry mirror
- [ ] Trigger a workflow build and check Zot logs for cache hits
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/183
## Summary
- Move FORGEJO_URL, RUNNER_NAME, and RUNNER_LABELS from ExternalSecret template to deployment env vars
- ExternalSecret now only contains the actual secret (RUNNER_TOKEN)
- Image version changes in RUNNER_LABELS now trigger automatic pod rollouts
## Deployment
1. Merge this PR
2. `argocd app sync forgejo-runner` — the deployment spec change will auto-roll the pod
No manual restart needed — that's the whole point :)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/181
## Summary
- Install yq in the forgejo-runner container image for structured YAML editing
- Replace fragile `sed` regex patterns with `yq` in `build-blumeops.yaml` and `cv-deploy.yaml` workflows
## Deployment
1. Merge this PR
2. Tag and release forgejo-runner v3.1.0: `mise run container-tag-and-release forgejo-runner v3.1.0`
3. Update runner label in `argocd/manifests/forgejo-runner/external-secret.yaml` from `v3.0.2` to `v3.1.0`
4. Sync the forgejo-runner app: `argocd app sync forgejo-runner`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/180
## Summary
- Replace the three homepage groups (Apps, Observability, Infrastructure) with two cleaner groups
- **Content**: Immich, Kiwix, Miniflux, DJ, Grafana
- **Misc**: CV, TeslaMate, Transmission, Docs, Prometheus, PyPI
## Deployment and Testing
- [ ] Sync affected ingresses via ArgoCD (all 11 services)
- [ ] Verify homepage shows the two new groups correctly
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/179
## Summary
- Remove `match_all = true` from `flyio_nginx_cache_requests_total` in Alloy so the metric only counts requests that go through the proxy cache (excludes health checks with empty `cache_status`)
- Change dashboard queries from `rate(...[5m])` to `increase(...[$__range])` — aggregates over the full dashboard time window instead of a 5-minute sliding window, giving meaningful ratios for low-traffic static sites
- Add null/NaN value mapping to show "No traffic" in neutral color instead of blank/red
## Root cause
Health check requests from Fly.io hit the default nginx server block (no `proxy_cache`), producing entries with empty `upstream_cache_status`. With `match_all = true`, these were counted in the cache metric, diluting the Fly.io dashboard ratio. For APM dashboards, `rate()[5m]` on low-traffic sites with 24h cache validity almost always returns either all-HITs (100%) or no data (blank → red background).
## Deployment
- Fly.io proxy redeploy needed for Alloy config change
- ArgoCD sync for dashboard ConfigMap changes
## Test plan
- [ ] Redeploy Fly.io proxy
- [ ] Sync grafana-config in ArgoCD
- [ ] Verify CV APM cache hit ratio shows a real percentage (not 100%)
- [ ] Verify Docs APM shows "No traffic" in neutral color when idle, real ratio when visited
- [ ] Verify Fly.io proxy dashboard cache ratio excludes health checks
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/177
## Summary
- How-to guide for creating release artifact workflows with Forgejo packages
- Changelog fragment for the multi-repo forgejo_actions_secrets Ansible role change
- Changelog fragment for the new docs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/170
## Summary
- nginx container (`containers/cv/`) downloads and serves a content tarball at startup (same pattern as quartz)
- ArgoCD app + k8s manifests (deployment, service, Tailscale ingress)
- Caddy route for `cv.ops.eblu.me`
- Deploy workflow: resolves "latest" or specific version from Forgejo packages, updates deployment, syncs ArgoCD
- Content is built and released from the separate [cv repo](https://forge.ops.eblu.me/eblume/cv)
## Deployment steps (after merge)
1. `mise run container-tag-and-release cv v1.0.0`
2. Run "Release CV" workflow in cv repo (SPECIFIC_VERSION `v0.1.0`)
3. Run "Deploy CV" workflow in blumeops (default: latest)
4. `mise run provision-indri -- --tags caddy`
5. Verify at `https://cv.ops.eblu.me/`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/169
## Summary
- The `build_changelog` Dagger container (`python:3.12-slim`) defaults to UTC, causing towncrier to stamp tomorrow's date when releases are cut in the evening PST.
- This is the root cause of the docs website (built via Dagger) showing Feb 12 while the repo CHANGELOG (built directly on the runner) correctly showed Feb 11.
- Fix: set `TZ=America/Los_Angeles` in the Dagger container before running towncrier.
## Verified
- `docker run --rm python:3.12-slim` → `towncrier _get_date()` returns `2026-02-12` (wrong)
- `docker run --rm -e TZ=America/Los_Angeles python:3.12-slim` → returns `2026-02-11` (correct)
## Test plan
- [ ] Merge, then trigger a build-blumeops release
- [ ] Verify the CHANGELOG date on https://docs.eblu.me/CHANGELOG matches the repo
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/167
## Summary
- Updated the "Configure Ingress" section to use the current ProxyGroup pattern (`proxy-group: "ingress"`, `defaultBackend`, `tls.hosts`)
- Replaced the old per-ingress proxy example that used `rules:` with `host:` (which breaks ProxyGroup routing)
- Added key points explaining why `defaultBackend` is required and what each annotation does
- Updated checklist to mention ProxyGroup
## Test plan
- [ ] Review rendered doc for accuracy against existing ingress manifests
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/166
## Summary
- Move `adopt-dagger-ci.md` to new `docs/how-to/plans/completed/` archive
- Update plan status to "Phases 1–3 complete" with all verification checklists checked
- Update runner description to reflect what actually stayed (Docker CLI, Node.js needed)
- Document known issue: changelog dates still show UTC
- Create completed plans index, update plans and how-to indexes
- Changelog fragment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/165
## Summary
- Dagger shells out to the `docker` binary to provision its BuildKit engine container
- Phase 3 removed `docker-ce-cli`, breaking all `dagger call` invocations in CI
- This restores `docker-ce-cli` (without buildx/skopeo — those aren't needed)
## Test plan
- [ ] Build locally, release as v3.0.2, update manifest, sync
- [ ] Trigger docs build workflow and verify Dagger engine starts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/164
## Summary
- Restores Node.js 20 LTS to the Forgejo runner job image
- `actions/checkout@v4` and other JavaScript Actions require `node` in the job container
- The Phase 3 simplification (PR #162) accidentally removed it, breaking all CI runs
## Changes
- `containers/forgejo-runner/Dockerfile`: Add `gnupg` (for nodesource GPG key) and Node.js 20 via nodesource
- Changelog fragment
## Test plan
- [ ] Merge, release as `forgejo-runner-v3.0.1`
- [ ] Update runner manifest to v3.0.1, sync, restart pod
- [ ] Trigger a workflow_dispatch and verify `actions/checkout` succeeds
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/163
## Summary
With Phases 1 and 2 complete, the runner image no longer needs most of its bundled tools. This PR strips it down and adds what was missing.
**Removed** (now inside Dagger containers):
- Node.js 24.x
- Docker CLI + buildx plugin
- skopeo
- gnupg, lsb-release, xz-utils
**Added:**
- `tzdata` — fixes the TZ env var (#159, #160, #161) so `TZ=America/Los_Angeles` actually works
- `flyctl` — was being installed from scratch every release
**Workflow changes:**
- Remove "Ensure Dagger CLI" bootstrap steps from both workflows (Dagger is in the image)
- Remove "Install flyctl" step from build-blumeops (flyctl is in the image)
- Remove job-level `TZ` from build-blumeops (moved to runner configmap `runner.envs`)
- Set `TZ: America/Los_Angeles` in runner configmap so all job containers inherit it
## Deployment
After merge:
1. Build and release the new runner image: `mise run container-release forgejo-runner v2.0.0`
2. Sync the runner: `argocd app sync forgejo-runner`
3. Verify: `kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date` (but the real test is running a docs release and checking the changelog date)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/162
## Summary
- Set `TZ=America/Los_Angeles` on the Forgejo runner container
The runner pod defaults to UTC. When releases are cut in the evening PST, towncrier stamps changelog entries with tomorrow's date (e.g., v1.6.2 shows 2026-02-12 despite being released on the evening of Feb 11 PST).
## Deployment
After merge, sync the forgejo-runner ArgoCD app:
```
argocd app sync forgejo-runner
```
The runner pod will restart with the new timezone. Note: the v1.6.2 changelog entry will remain dated 2026-02-12; future entries will use PST dates, so dates may appear non-sequential once.
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/159
## Summary
- Rename `date-modified` -> `modified` in all 80 docs and the `docs-check-frontmatter` task
Quartz's `CreatedModifiedDate` plugin recognizes `modified`, `lastmod`, `updated`, and `last-modified` — but not `date-modified`. The wrong field name caused Quartz to ignore frontmatter dates entirely and fall through to filesystem timestamps (UTC inside Dagger), showing Feb 12 on pages built late on Feb 11 PST.
## Test plan
- [x] `mise run docs-check-frontmatter` passes
- [ ] Kick off docs release after merge — verify rendered dates match frontmatter values
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/158
## Summary
Migrates the docs build pipeline to Dagger (Phase 2 of the Dagger CI adoption plan).
- **Backfill `date-modified` frontmatter** on all 80 docs — Dagger's `--src=.` excludes `.git`, so Quartz can't use git history for page dates. Frontmatter dates work with or without git.
- **New `docs-check-frontmatter` mise task + pre-commit hook** — validates all docs have `title`, `tags`, and `date-modified`
- **New Dagger functions** — `build_changelog` (towncrier in Python container) and `build_docs` (chains changelog → Quartz build in Node container, returns tarball)
- **Simplified CI workflow** — the ~44-line inline Quartz build (clone, npm ci, build, tar, cleanup) is replaced by `dagger call build-docs`. Changelog step remains local on the runner since towncrier needs to modify the host working tree for the git commit.
### Design decisions
- **Towncrier runs twice in CI**: once inside Dagger (for the docs tarball) and once on the runner (for the git commit). This is intentional — Dagger's directory export is additive and can't delete the consumed changelog fragments from the host.
- **Artifact hosting stays on Forgejo Releases** (not migrated to Forgejo Packages as the plan doc originally suggested). That migration can happen independently.
- **`date-modified` frontmatter** preserved even though `build_changelog` installs git — the git there is only for towncrier's `git add` call, not for history. The local iteration story (`dagger call build-docs --src=. --version=dev` with uncommitted changes) depends on frontmatter dates.
### Local iteration
```bash
dagger call build-docs --src=. --version=dev export --path=./docs-dev.tar.gz
tar tf docs-dev.tar.gz | head -20
```
## Deployment and Testing
- [x] `dagger call build-docs --src=. --version=dev` produces valid 1.1MB tarball (149 HTML pages)
- [x] Pre-commit hooks pass (including new `docs-check-frontmatter`)
- [ ] Full `workflow_dispatch` run after merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/157
## Summary
- Migrate from deprecated Todoist REST API v2 (`410 Gone`) to new unified API v1
- Add cursor-based pagination for project and task listing endpoints
- Switch 1Password credential retrieval from `op item get --fields` to `op read`
## Testing
- [x] `mise run blumeops-tasks` returns all 9 tasks successfully
- [x] Pre-commit hooks pass
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/155