Commit graph

56 commits

Author SHA1 Message Date
6f532ca607 Add Authentik deployment plan document
Captures the full deployment plan for replacing Dex with Authentik as the
BlumeOps identity provider, covering Nix containers, kustomize manifests,
cross-cluster CNPG database, networking, and monitoring across 6 phases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 07:05:45 -08:00
d21798b1f3 Document Dex OIDC and add services-check integration (#223)
## Summary
- Create Dex reference card (`docs/reference/services/dex.md`) with quick reference, architecture, identity source, storage, OIDC clients, secrets, and endpoints
- Write federated login explanation article (`docs/explanation/federated-login.md`) covering the Dex + Forgejo two-layer auth model, login flow, and break-glass access
- Add Dex to `services-check` (HTTP health endpoint + k3s pod check)
- Update Grafana docs with new Authentication section documenting SSO via Dex
- Update Forgejo docs with OAuth2 Provider section documenting its role as upstream identity source
- Add Dex to ringtail workloads table and reference service index
- Move `adopt-oidc-provider` plan to `completed/` with final design reflecting actual implementation

## Test plan
- [ ] `mise run services-check` passes (includes new Dex checks)
- [ ] `docs-check-links` passes (all wiki-links resolve)
- [ ] `docs-check-index` passes (new docs are indexed)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/223
2026-02-19 20:44:23 -08:00
b876e39981 Replace Homepage Helm chart with kustomize manifests and custom Dockerfile (#221)
## Summary
- Replace third-party Helm chart (jameswynn/homepage v2.1.0, pinned at app v1.2.0) with plain kustomize manifests and a custom Dockerfile building from forge mirror at v1.10.1
- Adds Dockerfile (`containers/homepage/`) with multi-stage build (node:22-slim builder, node:22-alpine runtime)
- Creates kustomize manifests: Deployment, Service, ConfigMap (6 config files), ServiceAccount, ClusterRole, ClusterRoleBinding
- Keeps existing ingress-tailscale.yaml and all 6 ExternalSecret resources unchanged
- Updates ArgoCD app definition from multi-source Helm to single directory source

## Prerequisite
- Homepage source mirrored at forge.ops.eblu.me/eblume/homepage.git 
- Container must be built and pushed before syncing: `mise run container-release homepage v1.10.1`

## Deployment and Testing
- [ ] Build and push container image: `mise run container-release homepage v1.10.1`
- [ ] Branch-test via ArgoCD: `argocd app set homepage --revision feature/homepage-kustomize && argocd app sync homepage`
- [ ] Verify dashboard loads at go.ops.eblu.me / go.tail8d86e.ts.net
- [ ] Verify k8s autodiscovery works (services appear on dashboard)
- [ ] Verify widgets load (weather, Forgejo, Jellyfin, etc.)
- [ ] After merge: `argocd app set homepage --revision main && argocd app sync homepage`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/221
2026-02-19 18:29:19 -08:00
869f6bd20d Review: update-documentation doc (#220)
## Summary
- Add missing workflow step 8: Fly.io proxy cache purge after deploy
- Remove misleading "Directory" column from changelog fragment types table (all fragments use flat `<name>.<type>.md` pattern, not subdirectories)
- Stamp `last-reviewed: 2026-02-19`

## Review notes
Verified all claims against actual workflow YAML, Dagger pipeline, ArgoCD manifests, towncrier config, and Quartz config files. Everything else checks out.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/220
2026-02-19 17:40:05 -08:00
291fff345c Fix services-check and update docs for Frigate migration to ringtail (#218)
## Summary
- Move mosquitto, ntfy, frigate, frigate-notify pod checks from `minikube-indri` to `k3s-ringtail` context in `services-check`
- Add `nvidia-device-plugin` pod check for ringtail k3s
- Rename "Kubernetes pods" section to "Indri minikube pods" for clarity
- Update 8 documentation files to reflect the migration completed in PRs #216/#217

## Files Changed
| File | Change |
|------|--------|
| `mise-tasks/services-check` | Move 4 pod checks to k3s-ringtail, add nvidia-device-plugin |
| `docs/reference/services/frigate.md` | Image→tensorrt, detector→ONNX/CUDA, shm→512Mi |
| `docs/reference/infrastructure/ringtail.md` | List actual k3s workloads |
| `docs/reference/infrastructure/indri.md` | Note frigate migration |
| `docs/explanation/architecture.md` | Add ringtail to diagram + compute layer |
| `docs/reference/kubernetes/cluster.md` | Note two clusters, add k3s section |
| `docs/reference/reference.md` | Update frigate/ntfy location |
| `docs/how-to/plans/completed/operationalize-reolink-camera.md` | Add post-completion migration note |
| `CLAUDE.md` | Add k3s-ringtail context guidance |

## Test plan
- [ ] `mise run services-check` — all checks pass
- [ ] Review each doc for accuracy against deployed state

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/218
2026-02-19 14:38:21 -08:00
695089499e Nix container build for nettest (#214)
## Summary
- Add `containers/nettest/default.nix` using `dockerTools.buildLayeredImage` with curl, jq, dnsutils, cacert, and bash — equivalent to the existing Dockerfile
- Update `container-tag-and-release` to require `--nix` or `--dockerfile` flag when both build types exist for a container
- Update `container-list` to show `[dockerfile+nix]` label when both exist

## Deployment and Testing
- [ ] SSH to ringtail, run `nix build -f containers/nettest/default.nix -o result` to verify the nix expression builds
- [ ] Tag `nettest-nix-v1.0.0`, confirm `build-container-nix` workflow runs on `nix-container-builder` runner and pushes to registry
- [ ] Smoke test on ringtail k3s: `kubectl run nettest --image=registry.ops.eblu.me/blumeops/nettest:v1.0.0 --restart=Never && kubectl logs nettest`
- [ ] Verify `mise run container-list` shows `[dockerfile+nix]` for nettest
- [ ] Verify `mise run container-tag-and-release nettest v1.1.0` prompts for build type

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/214
2026-02-19 08:42:58 -08:00
1e96866dd3 Grafana helm chart upgrade plan 2026-02-17 11:15:34 -08:00
27d8f3cf1f Review gandi-operations doc and reorganize how-to guides (#200)
## Summary
- **Doc review:** Reviewed `gandi-operations.md` — added `last-reviewed` frontmatter, verified all wiki-links, confirmed Pulumi state has no drift
- **Gandi reference fix:** Added missing `cv.eblu.me` CNAME row to `gandi.md` DNS records table (was present in Pulumi but undocumented)
- **Pulumi comment fix:** Updated stale `README.md` reference in `__main__.py` to point to `docs/how-to/gandi-operations.md`
- **How-to reorg:** Moved 14 how-to guides into 3 subdirectories (`deployment/`, `configuration/`, `operations/`), collapsed the Documentation and Database index sections into Configuration and Operations respectively

## Verification
- `docs-check-links` — all 180 wiki-links valid
- `docs-check-filenames` — all 90 filenames unique
- `dns-preview` — 5 resources unchanged, no drift
- All pre-commit hooks pass

## Test plan
- [ ] Verify docs site builds correctly with new paths
- [ ] Spot-check a few wiki-links from other pages to moved how-to guides

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/200
2026-02-17 07:29:33 -08:00
779b7d6709 Eliminate double towncrier run in release workflow (#199)
## Summary

- Added a new `build_quartz` Dagger function that builds the Quartz site from a pre-processed source tree (no towncrier)
- Reordered the release workflow so towncrier runs **once** on the runner, then passes the updated working tree to `build-quartz`
- `build_docs` and `build_changelog` are preserved for standalone use — `build_docs` now delegates to `build_quartz` internally

## Motivation

Previously towncrier ran twice per release: once inside a Dagger container (via `build_docs` → `build_changelog`) and once on the runner to capture CHANGELOG.md changes for the git commit. This was wasteful and fragile — if towncrier behavior changed, the two runs could produce different results.

## Test plan

- [ ] Review diff to confirm workflow step ordering is correct
- [ ] Trigger a release and confirm towncrier runs only once
- [ ] Verify the docs tarball contains the updated CHANGELOG.md
- [ ] `dagger call build-quartz --src=. --version=vX.Y.Z` should work standalone

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/199
2026-02-16 21:24:34 -08:00
faf9682b55 Add service version review system (#196)
## Summary

- Add `service-versions.yaml` tracking file with 33 services and upstream release URLs
- Add `mise run service-review` task (Python uv script) mirroring the docs-review UX
- Add `review-services` how-to article covering the review process by service type
- Add `[[review-services]]` link to the how-to index Knowledge Base table

## Deployment and Testing

- [x] `mise run service-review` displays 33 services, all "never reviewed"
- [x] `mise run service-review -- --type ansible` filters to 7 Ansible services
- [x] `mise run service-review -- --limit 5` shows 5 rows
- [x] `mise run docs-check-links` — no broken wiki-links
- [x] `mise run docs-check-frontmatter` — new doc passes validation
- [x] All pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/196
2026-02-16 17:02:56 -08:00
2c55c2316e Review expose-service-publicly doc (#195)
## Summary
- Replace stale inline code listings (fly.toml, Dockerfile, start.sh, nginx.conf, mise tasks, CI workflow) with brief descriptions pointing readers to the actual `fly/` and `mise-tasks/` files — prevents future drift
- Add observability sidecar section documenting the Alloy integration (logs → Loki, metrics → Prometheus)
- Fix broken internal wiki-link (`[[#7. Update Tailscale ACLs if needed]]` → correct heading)
- Update per-service nginx templates to current patterns (deferred DNS resolution via `set $upstream` variable, `proxy_intercept_errors`, error pages)
- Add `cv.eblu.me` to verification steps (live service not previously documented)
- Add `last-reviewed: 2026-02-16` frontmatter
- Net -187 lines (58 added, 245 removed)

## Deployment and Testing
- [x] All pre-commit hooks pass (link validation, frontmatter, filenames)
- [ ] Docs site renders correctly after merge

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/195
2026-02-16 15:49:55 -08:00
996441876d Document container build pattern and port navidrome (#192)
Some checks failed
Build Container / build (push) Failing after 4m28s
## Summary
- Add how-to guide (`docs/how-to/build-container-image.md`) covering the full container build workflow: directory layout, Dagger local builds, mise release task, and common patterns with links to existing containers
- Port navidrome from upstream `deluan/navidrome:0.60.3` to a custom three-stage build (`containers/navidrome/Dockerfile`) using Node + Go + Alpine
- Update navidrome deployment to use `registry.ops.eblu.me/blumeops/navidrome:v1.0.0`

## Deployment and Testing
- [x] `dagger call build --src=. --container-name=navidrome` builds successfully
- [ ] After merge: `mise run container-tag-and-release navidrome v1.0.0`
- [ ] After image published: `argocd app sync navidrome` and verify pod starts

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/192
2026-02-15 08:05:11 -08:00
22f418d0dc Doc review: connect-to-postgres, create-release-artifact-workflow, deploy-k8s-service (#191)
## Summary

Review session covering 3 docs, plus a codebase-wide cleanup:

### Docs reviewed
- **connect-to-postgres** — verified end-to-end (psql connection tested), stamped
- **create-release-artifact-workflow** — clarified that `build-blumeops.yaml` is only a version bump example (not a packages API example)
- **deploy-k8s-service** — fixed stale repoURL (`indri:2200` → `forge.ops.eblu.me:2222`), wrong Caddy config keys (`upstream` → `backend`, added missing `host`), updated Homepage group to "Services", added Tailscale tag documentation

### Codebase cleanup
- Migrated all remaining `op item get --fields` calls to `op read` URI syntax across 7 files (docs, READMEs, YAML comments)
- Simplified the `op read` vs `op item get` guidance in CLAUDE.md

## Side findings (not addressed)
- New `immich-pg` CNPG cluster not yet documented in the postgresql reference card

## Test plan
- [x] `psql` connection to `pg.ops.eblu.me` verified
- [x] All pre-commit hooks pass
- [x] `docs-check-links`, `docs-check-index`, `docs-check-frontmatter` pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/191
2026-02-15 07:42:01 -08:00
04c7f3c45a Deploy Frigate NVR stack with Mosquitto, Ntfy, and frigate-notify (#190)
## Summary

Deploy a cloud-free NVR stack for the GableCam (ReoLink Elite Floodlight at 192.168.1.159):

- **Mosquitto** — shared MQTT broker in `mqtt` namespace (cluster-internal, no auth)
- **Ntfy** — self-hosted push notifications in `ntfy` namespace, exposed at `ntfy.tail8d86e.ts.net` / `ntfy.ops.eblu.me`
- **Frigate** — NVR with GableCam via HTTP-FLV, ONNX CPU detection, NFS recordings on sifaka, exposed at `nvr.tail8d86e.ts.net` / `nvr.ops.eblu.me`
- **frigate-notify** — bridges Frigate detection events (person, car, dog, cat) to Ntfy alerts via MQTT

Also includes:
- Prometheus scrape target for Frigate metrics
- Grafana dashboard for Frigate (status, inference speed, FPS, CPU/memory, storage)
- Caddy reverse proxy entries for `nvr.ops.eblu.me` and `ntfy.ops.eblu.me`

## Prerequisites

- [ ] Create NFS share `frigate` on sifaka (`/volume1/frigate`, RW for indri)
- [ ] Create 1Password item "Reolink Floodlight Camera" in `blumeops` vault with `username` and `password` fields

## Deployment (after merge)

```bash
argocd app sync apps
argocd app sync mosquitto
argocd app sync ntfy
argocd app sync frigate
argocd app sync grafana-config
argocd app sync prometheus
mise run provision-indri -- --tags caddy
mise run services-check
```

## Verification

- [ ] Mosquitto pod running, accepting connections on 1883
- [ ] Ntfy web UI accessible at `ntfy.ops.eblu.me`
- [ ] Frigate web UI at `nvr.ops.eblu.me` showing GableCam live feed
- [ ] Object detection working (ONNX, person/car/dog/cat)
- [ ] Recordings appearing in NFS share on sifaka
- [ ] frigate-notify sending detection alerts to Ntfy
- [ ] Prometheus scraping Frigate metrics
- [ ] Grafana dashboard showing Frigate data

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/190
2026-02-14 21:27:44 -08:00
f376c02b76 Move segment-home-network to completed plans
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:48:32 -08:00
2252e5e60d Update segmentation plan: mark completed, fix firewall details
Reflect actual UX7 zone-based firewall UI, correct streaming port
(8096 not 443), note indri DHCP reservation, mark plan as completed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:43:17 -08:00
657bb28fd1 Abandon UniFi IaC, add manual network segmentation plan (#189)
## Summary

- Abandon the UniFi Pulumi IaC approach after provider bugs caused a network outage (no-op update reset undeclared properties on the default LAN network)
- Remove untracked IaC artifacts (`pulumi/unifi/`, `mise-tasks/unifi-preview`, `mise-tasks/unifi-up`) locally
- Mark `add-unifi-pulumi-stack` plan as Abandoned with explanation
- Create new `segment-home-network` plan for manual three-network segmentation (Main/IoT/Guest) via UX7 web UI
- Rewrite UniFi reference card to remove all Pulumi/IaC references
- Update plan and how-to indexes

## Test plan

- [x] `docs-check-links` passes
- [x] `docs-check-index` passes
- [x] Pre-commit hooks pass
- [ ] Review segmentation plan for completeness before executing manually

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/189
2026-02-14 09:47:04 -08:00
eec1edf43d Add how-to guide for connecting to PostgreSQL via psql (#188)
## Summary
- Add new how-to guide (`connect-to-postgres.md`) with the `psql` command using `op read` for 1Password credentials
- Add "Database" section to the how-to index linking to the new guide
- Link the new guide from the PostgreSQL reference card's Related section

## Test plan
- [x] Verified `psql` connection works from gilbert using the documented command
- [ ] Review doc formatting and content

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/188
2026-02-14 07:18:06 -08:00
49ec05041c Update UniFi Pulumi plan: switch to ubiquiti-community provider (#187)
## Summary
- Switch provider from filipowm/unifi (inactive maintainer, showstopper bug #94 wiping firewall rules) to ubiquiti-community/unifi (actively maintained, API key auth)
- Add UX7 config backup prerequisite before adopting IaC
- Fix safety guard: check default route interface instead of hostname (runs from gilbert, not indri)
- Update 1Password paths to match actual item (`op://blumeops/unifi/credential`)
- Fix ringtail references: not a Raspberry Pi, stays on WiFi (removed from wired topology)
- Update doc steps for already-existing reference files

## Test plan
- [x] Pre-commit hooks pass
- [x] `docs-check-links` pass
- [x] `docs-check-index` pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/187
2026-02-13 20:02:16 -08:00
81690dae0f Review add-ansible-role doc (#185)
## Summary
- Replace `op item get --fields` with `op read` for secrets (matches playbook and CLAUDE.md guidance)
- Change `tags: [<role>]` to `tags: <role>` to match actual playbook style
- Remove redundant `listen:` from handler example, add `changed_when: true`
- Name handler after specific service (e.g. `Restart <service>`) to match real roles
- Add `last-reviewed: 2026-02-13` frontmatter

## Also noted (not fixed here)
Two other docs still use the old `op item get` pattern:
- `docs/how-to/troubleshooting.md:72` (ArgoCD login command)
- `docs/how-to/gandi-operations.md:35` (Gandi token export)

These can be fixed in their own review cycles.

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/185
2026-02-13 16:54:42 -08:00
517080aeab Add reference/tools/ category with Dagger, ArgoCD CLI, Ansible, and Pulumi cards (#178)
## Summary

- Create `docs/reference/tools/` with four reference cards: Dagger (build engine), ArgoCD CLI (deployment workflows), Ansible (config management), and Pulumi (DNS/Tailscale IaC)
- Move `ansible/roles.md` → `tools/ansible.md`, broadened with CLI patterns and dry-run usage
- Update `reference.md` index: add "Tools" section, remove old "Ansible" section
- Update `update-documentation.md` to reflect Dagger build process (workflow steps, manual build recipe, runner environment)
- Update `adopt-dagger-ci.md` plan to note how-to articles were handled via reference card + existing how-to updates
- Fix all broken `[[roles]]` wiki-links across 5 files → `[[ansible]]`

## Verification

- `docs-check-links` ✓ — no broken wiki-links
- `docs-check-index` ✓ — all docs referenced in category index
- `docs-check-filenames` ✓ — no duplicate filenames
- All pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/178
2026-02-12 19:18:46 -08:00
9717863f65 Update CV release to v1.0.3, add X-Clacks-Overhead header (#176)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m5s
## Summary
- Update CV release URL from v1.0.2 to v1.0.3
- Add `X-Clacks-Overhead: GNU Terry Pratchett` header to both `docs.eblu.me` and `cv.eblu.me` server blocks in the Fly.io proxy nginx config

## Deployment and Testing
- [ ] Sync CV app: `argocd app sync cv`
- [ ] Verify CV is serving v1.0.3 content
- [ ] Deploy fly proxy (workflow or `mise run fly-deploy`)
- [ ] Verify header: `curl -sI https://docs.eblu.me | grep -i clacks`
- [ ] Verify header: `curl -sI https://cv.eblu.me | grep -i clacks`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/176
2026-02-12 17:08:22 -08:00
545433055e Add release artifact workflow how-to and changelog fragments (#170)
All checks were successful
Build Container / build (push) Successful in 17s
## Summary
- How-to guide for creating release artifact workflows with Forgejo packages
- Changelog fragment for the multi-repo forgejo_actions_secrets Ansible role change
- Changelog fragment for the new docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/170
2026-02-12 11:20:05 -08:00
f3319c753c Update deploy-k8s-service doc with ProxyGroup ingress pattern (#166)
## Summary
- Updated the "Configure Ingress" section to use the current ProxyGroup pattern (`proxy-group: "ingress"`, `defaultBackend`, `tls.hosts`)
- Replaced the old per-ingress proxy example that used `rules:` with `host:` (which breaks ProxyGroup routing)
- Added key points explaining why `defaultBackend` is required and what each annotation does
- Updated checklist to mention ProxyGroup

## Test plan
- [ ] Review rendered doc for accuracy against existing ingress manifests

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/166
2026-02-11 21:10:42 -08:00
1fd54dbab7 Close Dagger CI plan (Phases 1-3 complete) (#165)
## Summary

- Move `adopt-dagger-ci.md` to new `docs/how-to/plans/completed/` archive
- Update plan status to "Phases 1–3 complete" with all verification checklists checked
- Update runner description to reflect what actually stayed (Docker CLI, Node.js needed)
- Document known issue: changelog dates still show UTC
- Create completed plans index, update plans and how-to indexes
- Changelog fragment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/165
2026-02-11 18:04:39 -08:00
95364dcb48 Simplify runner image (Dagger Phase 3) (#162)
All checks were successful
Build Container / build (push) Successful in 1m13s
## Summary

With Phases 1 and 2 complete, the runner image no longer needs most of its bundled tools. This PR strips it down and adds what was missing.

**Removed** (now inside Dagger containers):
- Node.js 24.x
- Docker CLI + buildx plugin
- skopeo
- gnupg, lsb-release, xz-utils

**Added:**
- `tzdata` — fixes the TZ env var (#159, #160, #161) so `TZ=America/Los_Angeles` actually works
- `flyctl` — was being installed from scratch every release

**Workflow changes:**
- Remove "Ensure Dagger CLI" bootstrap steps from both workflows (Dagger is in the image)
- Remove "Install flyctl" step from build-blumeops (flyctl is in the image)
- Remove job-level `TZ` from build-blumeops (moved to runner configmap `runner.envs`)
- Set `TZ: America/Los_Angeles` in runner configmap so all job containers inherit it

## Deployment

After merge:
1. Build and release the new runner image: `mise run container-release forgejo-runner v2.0.0`
2. Sync the runner: `argocd app sync forgejo-runner`
3. Verify: `kubectl -n forgejo-runner exec deploy/forgejo-runner -c runner -- date` (but the real test is running a docs release and checking the changelog date)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/162
2026-02-11 17:24:20 -08:00
b0bac91ca9 Fix frontmatter field name for Quartz date display (#158)
## Summary

- Rename `date-modified` -> `modified` in all 80 docs and the `docs-check-frontmatter` task

Quartz's `CreatedModifiedDate` plugin recognizes `modified`, `lastmod`, `updated`, and `last-modified` — but not `date-modified`. The wrong field name caused Quartz to ignore frontmatter dates entirely and fall through to filesystem timestamps (UTC inside Dagger), showing Feb 12 on pages built late on Feb 11 PST.

## Test plan

- [x] `mise run docs-check-frontmatter` passes
- [ ] Kick off docs release after merge — verify rendered dates match frontmatter values

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/158
2026-02-11 16:45:12 -08:00
b197bd5f58 Adopt Dagger CI for docs build (Phase 2) (#157)
## Summary

Migrates the docs build pipeline to Dagger (Phase 2 of the Dagger CI adoption plan).

- **Backfill `date-modified` frontmatter** on all 80 docs — Dagger's `--src=.` excludes `.git`, so Quartz can't use git history for page dates. Frontmatter dates work with or without git.
- **New `docs-check-frontmatter` mise task + pre-commit hook** — validates all docs have `title`, `tags`, and `date-modified`
- **New Dagger functions** — `build_changelog` (towncrier in Python container) and `build_docs` (chains changelog → Quartz build in Node container, returns tarball)
- **Simplified CI workflow** — the ~44-line inline Quartz build (clone, npm ci, build, tar, cleanup) is replaced by `dagger call build-docs`. Changelog step remains local on the runner since towncrier needs to modify the host working tree for the git commit.

### Design decisions

- **Towncrier runs twice in CI**: once inside Dagger (for the docs tarball) and once on the runner (for the git commit). This is intentional — Dagger's directory export is additive and can't delete the consumed changelog fragments from the host.
- **Artifact hosting stays on Forgejo Releases** (not migrated to Forgejo Packages as the plan doc originally suggested). That migration can happen independently.
- **`date-modified` frontmatter** preserved even though `build_changelog` installs git — the git there is only for towncrier's `git add` call, not for history. The local iteration story (`dagger call build-docs --src=. --version=dev` with uncommitted changes) depends on frontmatter dates.

### Local iteration

```bash
dagger call build-docs --src=. --version=dev export --path=./docs-dev.tar.gz
tar tf docs-dev.tar.gz | head -20
```

## Deployment and Testing

- [x] `dagger call build-docs --src=. --version=dev` produces valid 1.1MB tarball (149 HTML pages)
- [x] Pre-commit hooks pass (including new `docs-check-frontmatter`)
- [ ] Full `workflow_dispatch` run after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/157
2026-02-11 16:33:16 -08:00
738b39a321 Mark Phase 1 verification checklist items as complete
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:49:49 -08:00
1bc2b421a8 Adopt Dagger CI for container builds (Phase 1) (#156)
All checks were successful
Build Container / build (push) Successful in 13s
## Summary

- Add Dagger Python module (`.dagger/`) with `build` and `publish` functions for container images
- Replace Docker buildx + skopeo composite action with `dagger call publish` in `build-container.yaml`
- BuildKit's native push is compatible with Zot — **skopeo workaround eliminated**
- Add Dagger CLI (v0.19.11) to forgejo-runner Dockerfile, bump runner to v2.6.0
- Bootstrap step in workflow curl-installs dagger if not in runner (for first build on v2.5.1 runner)
- Delete old `.forgejo/actions/build-push-image/` composite action
- Add GPLv3 LICENSE

## Verified locally

- `dagger call build --src=. --container-name=nettest` — builds ✓
- `dagger call publish --src=. --container-name=nettest --version=dagger-test` — pushed to Zot ✓
- `dagger call build --src=. --container-name=forgejo-runner` — new runner image builds ✓
- Dagger CLI accessible inside built runner image ✓

## Deployment sequence (after merge)

1. `mise run container-tag-and-release forgejo-runner v2.6.0` — old runner bootstraps dagger via curl, builds new runner
2. `argocd app sync forgejo-runner` — runner restarts with v2.6.0 (dagger baked in)
3. `mise run container-tag-and-release nettest v0.13.0` — end-to-end test of new pipeline
4. `mise run container-list` — verify tags

## Not included (future phases)

- Phase 2: docs build + Forgejo packages migration
- Phase 3: runner simplification (remove skopeo, Node.js, etc.)
- Phase 4: future workflows

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/156
2026-02-11 15:38:31 -08:00
834c9fa57b Bump Fly.io proxy VM to 512MB, fix TruffleHog scanning (#152)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m37s
## Summary
- Bump Fly.io proxy VM memory from 256MB to 512MB — Alloy was OOM-killed, causing the Grafana Fly.io dashboard to lose metrics
- Fix TruffleHog pre-commit hook to scan only staged changes (`--since-commit HEAD`) instead of full repo history
- Sanitize example credential URL in Reolink camera plan doc

## Deployment and Testing
- [ ] Fly.io deploy triggers automatically on merge (workflow watches `fly/**`)
- [ ] After deploy, verify Alloy is running: `fly ssh console -a blumeops-proxy -C "ps aux"` should show alloy process
- [ ] Grafana Fly.io dashboard should start populating within ~1 minute

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/152
2026-02-11 12:03:51 -08:00
651fed8f1a Transcribe backlog tasks into plan documents (#151)
## Summary
- **adopt-oidc-provider:** Dex-based OIDC identity provider for SSO across services (status: Planning — service dependency/recovery design needed)
- **harden-zot-registry:** OIDC + API key auth and tag immutability for zot (depends on OIDC provider + Dagger CI)
- **forgejo-actions-dashboard:** Custom textfile Prometheus exporter + Grafana dashboard for Forgejo Actions CI metrics
- **operationalize-reolink-camera:** Cloud-free Frigate NVR with ONNX detection, NFS ring buffer recording to sifaka (depends on network segmentation)
- **add-unifi-pulumi-stack:** Expanded with NFS security motivation, BlumeOps Services subnet, IoT/appliance segregation, firewall rules

## Test plan
- [x] Pre-commit hooks pass (all 3 commits)
- [x] `docs-check-links` passes
- [x] `docs-check-index` passes
- [x] `docs-check-filenames` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/151
2026-02-11 11:47:23 -08:00
430f2c6ec5 Add plans for Dagger CI/CD and upstream fork strategy (#150)
## Summary

Two new plan documents in `docs/how-to/plans/`:

- **adopt-dagger-ci** — Migrate CI/CD build logic from Forgejo Actions YAML to Dagger (Python SDK). Forgejo Actions stays as a thin trigger layer. Covers:
  - Container builds with local iteration (`dagger call build ... terminal`)
  - Docs builds with Forgejo packages migration (replacing Forgejo releases)
  - Runner simplification (only Docker + dagger CLI needed)
  - Secrets handling via Dagger's `Secret` type
  - Future: forked project builds, Python packages, pre-merge validation

- **upstream-fork-strategy** — Stacked-branch pattern for maintaining forks of upstream projects. Covers:
  - Daily automated rebase with conflict detection and issue creation
  - Branch model: `upstream/main` → `blumeops` → `feature/*`
  - Quartz fork as first instance, enabling `last-reviewed` frontmatter rendering in docs
  - Upstream PR path for contributing changes back

## Context

These plans emerged from evaluating alternatives to the GHA ecosystem (BuildKite, Concourse, Earthly) for CI/CD. Dagger was chosen for its local iteration story, Python-native pipelines, and zero-infrastructure requirements. The fork strategy is a prerequisite for customizing Quartz and other upstream tools.

Neither plan is ready for execution yet — they are design documents for future work.

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/150
2026-02-11 10:20:14 -08:00
0dce806107 Add plan and reference card for UniFi Express 7 Pulumi stack (#145)
## Summary
- Rewrites the UniFi Pulumi plan doc to use filipowm/unifi Terraform provider via `pulumi package add terraform-provider` (replaces pulumiverse_unifi approach)
- Adds network segmentation goals (main/guest/IoT WiFi zones) and API key auth
- Creates UniFi reference card (`docs/reference/infrastructure/unifi.md`) with topology diagram
- Updates all documentation indexes (plans.md, how-to.md, hosts.md, reference.md)

## What's Deferred
Actual stack scaffolding (`pulumi/unifi/`), mise tasks, and `pulumi import` are blocked on switch purchase and cabling. The plan doc captures everything needed for a future execution session.

## Verification
- `docs-check-links` passes (all wiki-links resolve)
- `docs-check-index` passes (unifi.md referenced in reference.md)
- Pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/145
2026-02-10 15:36:13 -08:00
54afa0750b Add how-to guide for restoring 1Password backup from borgmatic (#141)
## Summary
- New how-to guide at `docs/how-to/restore-1password-backup.md` with step-by-step procedure for extracting and decrypting a 1Password `.1pux` export from borgmatic backup
- **End-to-end verified**: extracted from today's borg archive, decrypted age key with openssl, decrypted .1pux with age → valid 31MB zip with vault data
- Cross-links added from: disaster-recovery, 1password, borgmatic, backups policy, and how-to index
- Updated disaster-recovery.md from TBD stub to include a procedures table

## Deployment and Testing
- [x] Verified full extraction + decryption flow against live borgmatic archive
- [x] `docs-check-links` passes — all wiki-links valid
- [ ] Review guide for clarity and completeness

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/141
2026-02-10 10:55:00 -08:00
b5746e62c2 Add migration plan for Forgejo brew-to-source transition (#140)
## Summary
- Add `docs/how-to/plans/migrate-forgejo-from-brew.md` — full Diataxis-style plan covering background, one-time migration steps, Ansible role changes (with exact code), verification checklist, and future considerations
- Add `docs/how-to/plans/plans.md` — new plans subdirectory index for upcoming migration/transition plans
- Update `docs/how-to/how-to.md` with a Plans section
- Update `docs/tutorials/exploring-the-docs.md` to mention plans in the doc structure table and quick-path sections for Owner and AI audiences

## Test plan
- [x] `docs-check-links` passes
- [x] `docs-check-index` passes
- [x] All pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/140
2026-02-10 10:18:53 -08:00
41dfae1f80 Add CNI conflict troubleshooting to restart-indri how-to (#139)
## Summary
- Documents a troubleshooting procedure for broken pod networking after unclean shutdown
- During minikube recovery, a stale `1-k8s.conflist` CNI config can override kindnet's `10-kindnet.conflist`, causing new pods to use bridge+firewall networking instead of kindnet's ptp — breaking pod-to-pod communication
- Covers symptoms (DNS failures, liveness probe timeouts), diagnosis steps, and the fix

## Context
Encountered this during the 2026-02-10 power outage. Immich, kiwix, and transmission were all crash-looping for ~8 hours due to the CNI conflict. The minikube ansible role's clean boot detection has been improved (#137) so this may not recur, but the troubleshooting guide is valuable if it does.

## Test plan
- [x] Documentation only — no code changes
- [x] Pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/139
2026-02-10 07:24:42 -08:00
9e361cf38f Add docs-review task with last-reviewed frontmatter tracking (#129)
## Summary
- New `docs-review` mise task replaces `docs-review-random` — sorts docs by `last-reviewed` frontmatter field (never-reviewed first, then oldest)
- Updated review-documentation how-to to explain the new workflow and how to mark cards as reviewed
- Updated ai-assistance-guide task table to reference `docs-review`

## Test plan
- [x] `mise run docs-review` runs and shows staleness table + most stale doc
- [x] `mise run docs-review -- --limit 5` respects the limit flag
- [x] All pre-commit checks pass (links, index, filenames)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/129
2026-02-09 07:29:45 -08:00
e6cf7e47e0 Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m8s
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly

## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.

## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards

## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
64a78422b1 Add Fly.io public reverse proxy for docs.eblu.me (#120)
Some checks failed
Deploy Fly.io Proxy / deploy (push) Failing after 9s
## Summary

- Adds a Fly.io reverse proxy (`blumeops-proxy`) that tunnels public traffic to homelab services over Tailscale
- First service exposed: `docs.eblu.me` — the Quartz static docs site
- Includes Pulumi IaC for Tailscale auth key/ACLs and Gandi DNS CNAME
- Adds mise tasks (`fly-deploy`, `fly-setup`, `fly-shutoff`) and Forgejo CI workflow

## Key details

- Fly.io Firecracker VMs support TUN devices natively — no userspace networking needed
- Tailscale auth key is `preauthorized=True` to avoid device approval hangs on container restarts
- nginx caches aggressively for the static site; health check is on the default_server block
- ACLs restrict `tag:flyio-proxy` to `tag:k8s` on port 443 only
- DNS CNAME deployed and verified: `docs.eblu.me` → `blumeops-proxy.fly.dev`

## Test plan

- [x] `curl -sf https://blumeops-proxy.fly.dev/healthz` returns `ok`
- [x] `curl -I -H "Host: docs.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 with `X-Cache-Status`
- [x] `curl -I https://docs.eblu.me/` returns 200 with valid Let's Encrypt cert
- [x] `dig forge.ops.eblu.me` still resolves to 100.98.163.89 (private services unaffected)
- [x] Set `FLY_DEPLOY_TOKEN` Forgejo Actions secret for CI auto-deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/120
2026-02-08 02:36:19 -08:00
fbedaf2833 docs/expose-service-publicly pt2 - fly.io (#119)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/119
2026-02-08 00:38:27 -08:00
ae2445c99a Add how-to guide for public service exposure via Cloudflare (#118)
## Summary
- Adds `docs/how-to/expose-service-publicly.md` documenting the full plan for exposing `docs.eblu.me` to the public internet
- Covers Cloudflare Tunnel + CDN architecture, DNS migration from Gandi, Caddy TLS changes, Pulumi IaC, k8s cloudflared deployment, and verification steps
- Pattern is reusable for future public services
- Marked as "Plan — not yet implemented" status

## Test plan
- [x] `docs-check-links` passes
- [x] `docs-check-index` passes
- [x] All pre-commit hooks pass

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/118
2026-02-07 22:09:52 -08:00
dc46eb7def Update all docs titles to human-readable (#117)
## Summary
- Updated frontmatter `title:` in all 63 doc cards from slug-case to human-readable (e.g. `borgmatic` → `Borgmatic`, `ai-assistance-guide` → `AI Assistance Guide`)
- Titles now closely match file stems so `[[wiki-links]]` render naturally without alternate anchor text
- Corrected titles that diverged from stems (e.g. `host-inventory` → `Hosts`, `grafana-alloy` → `Alloy`, `argocd-applications` → `Apps`)
- Deleted `title-test-alpha.md` and `title-test-beta.md` test cards and removed their reference index entry

## Deployment and Testing
- [x] `docs-check-links` passes — all wiki-links valid
- [x] `docs-check-index` passes
- [x] `docs-check-filenames` passes
- [ ] Verify titles render correctly on docs site after deploy

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/117
2026-02-07 21:44:57 -08:00
8e4afe77e0 Add Gandi DNS docs and rewrite homepage intro (#115)
## Summary
- New reference card (`docs/reference/infrastructure/gandi.md`) covering DNS records, Pulumi config, TLS integration
- New how-to guide (`docs/how-to/gandi-operations.md`) for DNS deployment and PAT cycling with `pbpaste` shortcut
- Rewritten homepage intro for wider audience ahead of public docs.eblu.me
- Cross-linked from reference index, routing, caddy, and how-to index
- Fixed PAT expiration inaccuracy in `pulumi/gandi/README.md` (max is 90 days, not 30)

## Test plan
- [ ] Verify wiki-links resolve in Quartz build
- [ ] Review gandi reference card for accuracy
- [ ] Review gandi-operations how-to for accuracy
- [ ] Check homepage reads well for external visitors

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/115
2026-02-07 21:02:10 -08:00
2cb76de5a2 Fix doc tag inconsistencies and add missing ai changelog type (#114)
## Summary
- Add missing `ai` changelog fragment type to the update-documentation how-to guide (comment and table)
- Consolidate `cicd` → `ci-cd` tag on forgejo.md
- Consolidate `network` → `networking` tag on routing.md and tailscale.md

Found during random doc review via `docs-review-tags`.

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/114
2026-02-06 18:52:36 -08:00
cb343a2e35 Rename doc-* mise tasks to docs-check-* / docs-review-* (#113)
## Summary
- Rename 4 automated-check tasks: `doc-titles` → `docs-check-titles`, `doc-filenames` → `docs-check-filenames`, `doc-links` → `docs-check-links`, `doc-index` → `docs-check-index`
- Rename 3 interactive-review tasks: `doc-random` → `docs-review-random`, `doc-tags` → `docs-review-tags`, `doc-stale` → `docs-review-stale`
- Update all references in `.pre-commit-config.yaml`, `ai-assistance-guide.md`, and `review-documentation.md`
- Historical changelog entries left as-is

## Test plan
- [x] `mise run docs-check-titles` exits 0
- [x] `mise run docs-check-links` exits 0
- [x] `mise run docs-review-tags` exits 0
- [x] `mise run doc-titles` fails with "no task found"
- [x] All pre-commit hooks pass (including renamed hook IDs)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/113
2026-02-06 07:08:46 -08:00
060c7a24e3 Review exploring-the-docs and add doc consistency checks (#112)
## Summary
- Reviewed and cleaned up exploring-the-docs tutorial: simplified wiki-links, fixed broken replication/ reference, added Related section, corrected zk-docs flags to match CLAUDE.md
- Added orphan detection to doc-links (finds docs not linked from any other doc)
- Added new doc tooling: `doc-index` (checks category index coverage), `doc-stale` (staleness report), `doc-tags` (tag inventory)
- Added `doc-index` as a pre-commit hook
- Updated use-pypi-proxy to document env-var-based proxy toggle for pip/uv
- Updated ai-assistance-guide with new doc task descriptions

## Test plan
- [ ] Run `mise run doc-links` — passes
- [ ] Run `mise run doc-index` — passes
- [ ] Run `mise run doc-stale` — informational output
- [ ] Run `mise run doc-tags` — informational output
- [ ] Pre-commit hooks pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/112
2026-02-05 21:12:06 -08:00
61c5328ec2 Update restart-indri docs after power outage recovery (#111)
## Summary
- Simplified restart-indri startup procedure to match reality (most services autostart via mcquack LaunchAgents and brew services)
- Added minikube tailscale serve port fix step (`mise run provision-indri -- --tags minikube`)
- Added Anker SOLIX F2000 GaNPrime UPS to indri reference card

## Context
After a power outage, discovered that the restart-indri docs overstated what needs manual intervention. Docker Desktop, Forgejo, Caddy, and all mcquack services autostart. Only Amphetamine, AutoMounter, and minikube need manual action.

## Test plan
- [ ] Verify restart-indri doc reads clearly
- [ ] Verify indri ref card UPS entry renders correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/111
2026-02-05 21:05:35 -08:00
3da455e49c Enforce unique doc filenames and simple wiki-links (#109)
## Summary
- Rename section index files to match their titles (tutorials.md, reference.md, how-to.md, explanation.md) so all filenames are unique
- Convert all ~47 path-based wiki-links to simple filename format across 15 files
- Update doc-filenames task to no longer skip index.md files
- Update doc-links task to reject path-based links containing '/'

This ensures all wiki-links work correctly in obsidian.nvim by making links resolvable by filename alone.

## Testing
- `mise run doc-filenames` - all unique
- `mise run doc-links` - no broken or path-based links
- `mise run doc-titles` - no duplicates

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/109
2026-02-04 17:21:34 -08:00
7aa0e60b27 Add how-to guide for restarting indri (#108)
## Summary
- Add `docs/how-to/restart-indri.md` with safe shutdown and startup procedures
- Add `docs/reference/services/automounter.md` documenting the SMB share automounter app
- Update indri reference card with GUI applications section
- Update how-to and reference indexes

## Test plan
- [ ] Review restart-indri.md for accuracy
- [ ] Verify AutoMounter details are correct
- [ ] Perform the actual restart using this guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/108
2026-02-04 14:39:48 -08:00