Commit graph

9 commits

Author SHA1 Message Date
bb55fa9566 Recurring review sweep: 4 doc cards + nvidia-device-plugin v0.19.2 (#366)
Knocks out the two daily recurring review tasks (doc review + service review) in one PR.

## Doc review (4 never-reviewed reference cards, `last-reviewed: 2026-06-04`)
- **cluster.md** — Kubernetes version v1.34.0 → **v1.35.0**; refreshed the stale ringtail workload list and noted the in-progress minikube→k3s migration (points to `[[ringtail]]` as the canonical list).
- **ntfy.md / tempo.md / alloy.md** — corrected image references: these are now **locally-built `registry.ops.eblu.me/blumeops/*` nix containers** (ntfy v2.19.2, tempo v2.10.3, alloy-k8s v1.16.0), not upstream Docker Hub. Fly.io alloy binary bumped to v1.16.1.

## Service review
- **nvidia-device-plugin** (ringtail GPU): v0.19.0 → **v0.19.2**. Upstream patch releases — CDI/Tegra fixes + dependency bumps, no breaking changes for our manifest-based CDI + RuntimeClass setup (the service-account change in the notes is helm-only).

## Not in this PR (need container rebuilds, deferred)
The other stale services are locally-built nix images, so upgrading them is a forge-runner rebuild rather than a clean tag bump — left untouched (not date-bumped, so they resurface): **prometheus** (v3.10.0→v3.12.0), **loki** (3.6.7→3.7.2), **kube-state-metrics**, **homepage**. Happy to do these as a follow-up rebuild PR.

## Deploy / verify
Not yet deployed — `nvidia-device-plugin` still points at `main`. After review:
```
argocd app set nvidia-device-plugin --revision reviews-jun4 && argocd app sync nvidia-device-plugin
# after merge:
argocd app set nvidia-device-plugin --revision main && argocd app sync nvidia-device-plugin
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #366
2026-06-04 13:37:02 -07:00
ab8ea6f301 Bump Grafana Alloy to v1.14.0 (#292)
## Summary
- Bump alloy-k8s, alloy-ringtail, and alloy-tracing-ringtail image tags from v1.13.1 to v1.14.0
- Mark indri alloy (ansible) as reviewed at v1.14.0 — source rebuild from forge mirror needed
- Add missing alloy-ringtail entry to service-versions.yaml
- Update alloy reference doc

## Breaking changes reviewed
- `loki.secretfilter` options removed — not used in our configs
- OTel Collector upgraded to v0.142.0 — Kafka receiver changes don't affect us
- Exporter queue default changes — our tracing pipeline (Beyla → batch → otlphttp) uses simple config, low risk

## Deployment and Testing
- [ ] Sync alloy-k8s: `argocd app set alloy-k8s --revision bump/alloy-v1.14.0 && argocd app sync alloy-k8s`
- [ ] Sync alloy-ringtail: `argocd app set alloy-ringtail --revision bump/alloy-v1.14.0 --server ringtail-argocd && argocd app sync alloy-ringtail`
- [ ] Sync alloy-tracing-ringtail similarly
- [ ] Verify metrics flowing in Grafana
- [ ] Verify traces flowing to Tempo (ringtail)
- [ ] Rebuild indri alloy from source (`v1.14.0` tag on forge mirror), SCP to indri, restart
- [ ] After merge: reset ArgoCD revisions to main, re-sync

Reviewed-on: #292
2026-03-13 16:25:27 -07:00
b0bac91ca9 Fix frontmatter field name for Quartz date display (#158)
## Summary

- Rename `date-modified` -> `modified` in all 80 docs and the `docs-check-frontmatter` task

Quartz's `CreatedModifiedDate` plugin recognizes `modified`, `lastmod`, `updated`, and `last-modified` — but not `date-modified`. The wrong field name caused Quartz to ignore frontmatter dates entirely and fall through to filesystem timestamps (UTC inside Dagger), showing Feb 12 on pages built late on Feb 11 PST.

## Test plan

- [x] `mise run docs-check-frontmatter` passes
- [ ] Kick off docs release after merge — verify rendered dates match frontmatter values

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/158
2026-02-11 16:45:12 -08:00
b197bd5f58 Adopt Dagger CI for docs build (Phase 2) (#157)
## Summary

Migrates the docs build pipeline to Dagger (Phase 2 of the Dagger CI adoption plan).

- **Backfill `date-modified` frontmatter** on all 80 docs — Dagger's `--src=.` excludes `.git`, so Quartz can't use git history for page dates. Frontmatter dates work with or without git.
- **New `docs-check-frontmatter` mise task + pre-commit hook** — validates all docs have `title`, `tags`, and `date-modified`
- **New Dagger functions** — `build_changelog` (towncrier in Python container) and `build_docs` (chains changelog → Quartz build in Node container, returns tarball)
- **Simplified CI workflow** — the ~44-line inline Quartz build (clone, npm ci, build, tar, cleanup) is replaced by `dagger call build-docs`. Changelog step remains local on the runner since towncrier needs to modify the host working tree for the git commit.

### Design decisions

- **Towncrier runs twice in CI**: once inside Dagger (for the docs tarball) and once on the runner (for the git commit). This is intentional — Dagger's directory export is additive and can't delete the consumed changelog fragments from the host.
- **Artifact hosting stays on Forgejo Releases** (not migrated to Forgejo Packages as the plan doc originally suggested). That migration can happen independently.
- **`date-modified` frontmatter** preserved even though `build_changelog` installs git — the git there is only for towncrier's `git add` call, not for history. The local iteration story (`dagger call build-docs --src=. --version=dev` with uncommitted changes) depends on frontmatter dates.

### Local iteration

```bash
dagger call build-docs --src=. --version=dev export --path=./docs-dev.tar.gz
tar tf docs-dev.tar.gz | head -20
```

## Deployment and Testing

- [x] `dagger call build-docs --src=. --version=dev` produces valid 1.1MB tarball (149 HTML pages)
- [x] Pre-commit hooks pass (including new `docs-check-frontmatter`)
- [ ] Full `workflow_dispatch` run after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/157
2026-02-11 16:33:16 -08:00
cc54b4f565 Add Fly.io proxy observability via embedded Alloy (#123)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m16s
## Summary

- Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus)
- Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting)
- Create two Grafana dashboards: **Docs APM** (per-service view filtered by `host="docs.eblu.me"`) and **Fly.io Proxy Health** (aggregate proxy health across all upstream services)

## Changed Files

| File | Change |
|------|--------|
| `fly/nginx.conf` | Add JSON `log_format` + `access_log`, add `stub_status` endpoint |
| `fly/Dockerfile` | COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config |
| `fly/alloy.river` | **New** — Alloy config: log tailing, metric extraction, remote_write |
| `fly/start.sh` | Start Alloy after Tailscale, before nginx |
| `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` | **New** — Docs APM dashboard |
| `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` | **New** — Fly.io Proxy Health dashboard |
| `argocd/manifests/grafana-config/kustomization.yaml` | Register new dashboard configmaps |
| `docs/reference/services/flyio-proxy.md` | Document observability setup |

## Deployment and Testing

- [ ] `mise run fly-deploy` — rebuild container with Alloy
- [ ] `curl https://docs.eblu.me/` — generate traffic
- [ ] `fly logs -a blumeops-proxy` — verify Alloy startup
- [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}`
- [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}`
- [ ] `argocd app sync grafana-config` — deploy dashboards
- [ ] Verify dashboards show data in Grafana
- [ ] `mise run services-check` — no regressions

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123
2026-02-08 10:05:38 -08:00
dc46eb7def Update all docs titles to human-readable (#117)
## Summary
- Updated frontmatter `title:` in all 63 doc cards from slug-case to human-readable (e.g. `borgmatic` → `Borgmatic`, `ai-assistance-guide` → `AI Assistance Guide`)
- Titles now closely match file stems so `[[wiki-links]]` render naturally without alternate anchor text
- Corrected titles that diverged from stems (e.g. `host-inventory` → `Hosts`, `grafana-alloy` → `Alloy`, `argocd-applications` → `Apps`)
- Deleted `title-test-alpha.md` and `title-test-beta.md` test cards and removed their reference index entry

## Deployment and Testing
- [x] `docs-check-links` passes — all wiki-links valid
- [x] `docs-check-index` passes
- [x] `docs-check-filenames` passes
- [ ] Verify titles render correctly on docs site after deploy

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/117
2026-02-07 21:44:57 -08:00
3e4b5c2dd3 Convert wiki-link titles to lowercase slugs (#92)
## Summary
- Convert all frontmatter titles to lowercase-hyphenated format (e.g., `grafana-alloy` instead of `Grafana Alloy`)
- Update all wiki-links to use the new slug format
- Update `doc-titles` task to validate slug format (lowercase, hyphens only)

Quartz appears to require titles without spaces for wiki-link resolution.

## Deployment and Testing
- [x] Pre-commit hooks pass (`doc-titles` and `doc-links`)
- [ ] Build docs v1.0.8 and deploy
- [ ] Verify wiki-links resolve correctly (e.g., `[[grafana-alloy]]`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/92
2026-02-03 16:06:35 -08:00
01adc4cf0f Switch to title-based wiki-links (#91)
## Summary
- Remove aliases from all zk cards to prevent them from capturing wiki-links
- Convert all wiki-links from `[[filename|Title]]` to `[[Title]]` format
- Replace `doc-filenames` task with `doc-titles` for duplicate title detection
- Update pre-commit hook to use `doc-titles`

Wiki-links now resolve to reference docs by their frontmatter title, which is more readable and maintainable than filename-based links.

## Deployment and Testing
- [x] Pre-commit hooks pass (including new `doc-titles` check)
- [x] Manually verified zk cards have aliases removed
- [ ] Deploy docs v1.0.7 and verify wiki-links resolve correctly
- [ ] Test links to reference docs (e.g., [[Grafana Alloy]], [[ArgoCD]])

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/91
2026-02-03 15:55:31 -08:00
254b93096a Phase 2: Add Reference section with 24 technical reference cards (#88)
## Summary
- Create `docs/reference/` section with 24 technical reference cards
- Services (16): alloy, argocd, borgmatic, 1password, forgejo, grafana, jellyfin, kiwix, loki, miniflux, navidrome, postgresql, prometheus, teslamate, transmission, zot
- Infrastructure (3): hosts, tailscale, routing
- Kubernetes (2): cluster, apps
- Storage (2): sifaka, backups
- Update README to mark Phase 2 as complete
- Add towncrier changelog fragment

## Deployment and Testing
- [ ] Build docs locally to verify wiki-links resolve
- [ ] Deploy via ArgoCD and verify at docs.ops.eblu.me/reference/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/88
2026-02-03 14:27:37 -08:00