Commit graph

17 commits

Author SHA1 Message Date
f46a04b902 Restructure docs: consolidate, recategorize, and extract
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
- Consolidate 4 Authentik Nix derivation docs into one card
  (authentik-nix-build-components.md)
- Merge build-grafana-container + build-grafana-sidecar into
  build-grafana-images.md
- Move agent-change-process from how-to/ to explanation/ (it's a
  methodology doc, not a task guide)
- Extract Caddy custom build section from reference card into
  how-to/deployment/build-caddy-with-plugins.md
- Move expose-service-publicly from how-to/ to tutorials/ (it's a
  comprehensive walkthrough, not a quick task reference)
- Update all wiki-link references across affected docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 19:55:59 -07:00
4f0476a851 Fix spider trap: disable SPA mode, remove index files, relax wiki-links (#290)
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container (Nix) / detect (push) Successful in 1s
Build Container (Nix) / build (quartz) (push) Successful in 1s
Build Container / build (quartz) (push) Successful in 10s
## Summary

Fixes the Facebook crawler spider trap that's been generating infinite recursive URLs like `/how-to/tutorials/tutorials/how-to/explanation/...` for several days.

**Root cause:** Quartz SPA mode + nginx `try_files` fallback to `index.html` meant any fabricated URL returned the root HTML shell with HTTP 200. Crawlers followed relative links from those fake URLs, creating infinite recursion.

**Fix:**
- Disable Quartz SPA mode (`enableSPA: false`) — all pages are now fully static HTML
- Replace nginx SPA fallback with `=404` + Quartz's static `404.html`
- Remove `robots.txt` exclusions (no longer needed)

**Docs cleanup (Obsidian.nvim compat no longer needed):**
- Delete hand-curated category index files (`tutorials.md`, `reference.md`, `how-to.md`, `explanation.md`) — Quartz auto-generates folder pages
- Delete `postgresql-storage.md` (redirect stub) and `migrate-forgejo-from-brew.md` (stale history)
- Drop `docs-check-index` and `docs-check-filenames` prek hooks
- Rewrite `docs-check-links` to allow path-based wiki-links (`[[path/to/file]]`) and only error on true ambiguity
- Add `ai-docs` doc tree listing to replace index files for AI context
- Add natural cross-links from reference cards to fix orphan docs

## Deployment and Testing

- [ ] Merge and let the build pipeline run
- [ ] Verify docs.eblu.me serves pages correctly with full page loads
- [ ] Verify non-existent URLs return 404
- [ ] Monitor crawler traffic — should drop to near zero for fabricated URLs

Reviewed-on: #290
2026-03-09 11:59:43 -07:00
c281fb5403 Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286)
## Summary

Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki).

- **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus
- **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes
- **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking
- **Prometheus** scrapes Tempo operational metrics

### Architecture

```
ringtail (k3s)                                indri (minikube)
┌──────────────────────┐                      ┌─────────────────────┐
│ Alloy+Beyla (eBPF)   │──OTLP HTTP────────→ │ Tempo               │
│  ↳ Frigate, ntfy,    │  via tailnet         │  ↳ trace storage    │
│    Ollama, Immich     │                      │  ↳ RED → Prometheus │
└──────────────────────┘                      │                     │
                                              │ Grafana             │
                                              │  ↳ Tempo datasource │
                                              └─────────────────────┘
```

### New files (12)
- `docs/reference/services/tempo.md` — reference doc
- `docs/changelog.d/feature-otel-tracing.feature.md`
- `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files)
- `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files)

### Modified files (6)
- `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields
- `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target
- `service-versions.yaml` — tempo + alloy-tracing-ringtail entries
- `docs/reference/services/grafana.md` — Tempo in datasources table
- `docs/reference/reference.md` — Tempo in services index
- `docs/reference/operations/observability.md` — Tempo in components list

## Deployment and Testing

- [ ] Sync `apps` app to pick up new Application definitions
- [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo`
- [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo`
- [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready`
- [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring`
- [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail`
- [ ] Check Beyla discovery in alloy-tracing logs on ringtail
- [ ] Sync grafana-config for updated datasources
- [ ] Sync prometheus for updated scrape config
- [ ] Test Grafana Tempo datasource connection
- [ ] Generate test traffic and search traces in Grafana Explore → Tempo
- [ ] After merge: reset all ArgoCD app revisions back to main

Reviewed-on: #286
2026-03-05 10:51:07 -08:00
a2bb9abbdb Home-build grafana-sidecar container (#281)
All checks were successful
Build Container (Nix) / detect (push) Successful in 2s
Build Container / detect (push) Successful in 2s
Build Container (Nix) / build (grafana-sidecar) (push) Successful in 2s
Build Container / build (grafana-sidecar) (push) Successful in 6s
## Summary
- Home-build the k8s-sidecar container (`grafana-sidecar`) from forge mirror, replacing upstream `quay.io/kiwigrid/k8s-sidecar:1.28.0`
- Pinned to v1.28.0 — v2.x deferred due to 135% memory regression and readOnlyRootFilesystem crashloop
- Adds Dockerfile, service-versions entry, docs, and changelog fragment
- Manifest switch to home-built image pending container build

## Deployment and Testing
- [ ] `mise run container-build-and-release grafana-sidecar`
- [ ] Update kustomization.yaml with built image tag
- [ ] `argocd app set grafana --revision feature/grafana-sidecar && argocd app sync grafana`
- [ ] Verify sidecar logs and dashboards at https://grafana.ops.eblu.me
- [ ] Post-merge: `argocd app set grafana --revision main && argocd app sync grafana`

Reviewed-on: #281
2026-03-03 13:48:24 -08:00
8d1e98617b Review build-grafana-container docs: stamp reviewed, fix cross-links
Also fix stale grafana.md reference card (Helm → Kustomize).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 07:28:06 -08:00
71cb256527 Deploy Authentik identity provider (C2 Mikado) (#227)
## Summary
C2 Mikado chain for deploying Authentik as the SSO identity provider, replacing Dex.

This PR will evolve over multiple sessions. Each iteration adds documentation (prerequisite cards) and eventually code as leaf nodes are resolved.

## Current Mikado State
- **Goal:** `deploy-authentik` (active)
- **Leaf prerequisites:**
  - `build-authentik-container` — Build Nix container image
  - `provision-authentik-database` — Create PostgreSQL database on CNPG cluster
  - `create-authentik-secrets` — Create 1Password item with credentials

## Process refinements
- Updated agent-change-process with lessons from first attempt: reset code before committing cards, open PRs early

## Test plan
- [ ] `mise run docs-mikado` shows correct dependency chain
- [ ] Leaf nodes can be worked independently
- [ ] Container builds on ringtail
- [ ] Authentik starts and reaches healthy state
- [ ] Forgejo OAuth2 connector works

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/227
2026-02-20 12:55:59 -08:00
d21798b1f3 Document Dex OIDC and add services-check integration (#223)
## Summary
- Create Dex reference card (`docs/reference/services/dex.md`) with quick reference, architecture, identity source, storage, OIDC clients, secrets, and endpoints
- Write federated login explanation article (`docs/explanation/federated-login.md`) covering the Dex + Forgejo two-layer auth model, login flow, and break-glass access
- Add Dex to `services-check` (HTTP health endpoint + k3s pod check)
- Update Grafana docs with new Authentication section documenting SSO via Dex
- Update Forgejo docs with OAuth2 Provider section documenting its role as upstream identity source
- Add Dex to ringtail workloads table and reference service index
- Move `adopt-oidc-provider` plan to `completed/` with final design reflecting actual implementation

## Test plan
- [ ] `mise run services-check` passes (includes new Dex checks)
- [ ] `docs-check-links` passes (all wiki-links resolve)
- [ ] `docs-check-index` passes (new docs are indexed)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/223
2026-02-19 20:44:23 -08:00
b0bac91ca9 Fix frontmatter field name for Quartz date display (#158)
## Summary

- Rename `date-modified` -> `modified` in all 80 docs and the `docs-check-frontmatter` task

Quartz's `CreatedModifiedDate` plugin recognizes `modified`, `lastmod`, `updated`, and `last-modified` — but not `date-modified`. The wrong field name caused Quartz to ignore frontmatter dates entirely and fall through to filesystem timestamps (UTC inside Dagger), showing Feb 12 on pages built late on Feb 11 PST.

## Test plan

- [x] `mise run docs-check-frontmatter` passes
- [ ] Kick off docs release after merge — verify rendered dates match frontmatter values

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/158
2026-02-11 16:45:12 -08:00
b197bd5f58 Adopt Dagger CI for docs build (Phase 2) (#157)
## Summary

Migrates the docs build pipeline to Dagger (Phase 2 of the Dagger CI adoption plan).

- **Backfill `date-modified` frontmatter** on all 80 docs — Dagger's `--src=.` excludes `.git`, so Quartz can't use git history for page dates. Frontmatter dates work with or without git.
- **New `docs-check-frontmatter` mise task + pre-commit hook** — validates all docs have `title`, `tags`, and `date-modified`
- **New Dagger functions** — `build_changelog` (towncrier in Python container) and `build_docs` (chains changelog → Quartz build in Node container, returns tarball)
- **Simplified CI workflow** — the ~44-line inline Quartz build (clone, npm ci, build, tar, cleanup) is replaced by `dagger call build-docs`. Changelog step remains local on the runner since towncrier needs to modify the host working tree for the git commit.

### Design decisions

- **Towncrier runs twice in CI**: once inside Dagger (for the docs tarball) and once on the runner (for the git commit). This is intentional — Dagger's directory export is additive and can't delete the consumed changelog fragments from the host.
- **Artifact hosting stays on Forgejo Releases** (not migrated to Forgejo Packages as the plan doc originally suggested). That migration can happen independently.
- **`date-modified` frontmatter** preserved even though `build_changelog` installs git — the git there is only for towncrier's `git add` call, not for history. The local iteration story (`dagger call build-docs --src=. --version=dev` with uncommitted changes) depends on frontmatter dates.

### Local iteration

```bash
dagger call build-docs --src=. --version=dev export --path=./docs-dev.tar.gz
tar tf docs-dev.tar.gz | head -20
```

## Deployment and Testing

- [x] `dagger call build-docs --src=. --version=dev` produces valid 1.1MB tarball (149 HTML pages)
- [x] Pre-commit hooks pass (including new `docs-check-frontmatter`)
- [ ] Full `workflow_dispatch` run after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/157
2026-02-11 16:33:16 -08:00
cc54b4f565 Add Fly.io proxy observability via embedded Alloy (#123)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m16s
## Summary

- Embed Grafana Alloy in the Fly.io proxy container to collect nginx JSON access logs (→ Loki) and derive request rate, latency histogram, cache status, and bandwidth metrics (→ Prometheus)
- Add nginx `stub_status` endpoint for connection-level metrics (active/reading/writing/waiting)
- Create two Grafana dashboards: **Docs APM** (per-service view filtered by `host="docs.eblu.me"`) and **Fly.io Proxy Health** (aggregate proxy health across all upstream services)

## Changed Files

| File | Change |
|------|--------|
| `fly/nginx.conf` | Add JSON `log_format` + `access_log`, add `stub_status` endpoint |
| `fly/Dockerfile` | COPY Alloy binary from `grafana/alloy:v1.5.1`, COPY `alloy.river` config |
| `fly/alloy.river` | **New** — Alloy config: log tailing, metric extraction, remote_write |
| `fly/start.sh` | Start Alloy after Tailscale, before nginx |
| `argocd/manifests/grafana-config/dashboards/configmap-docs-apm.yaml` | **New** — Docs APM dashboard |
| `argocd/manifests/grafana-config/dashboards/configmap-flyio.yaml` | **New** — Fly.io Proxy Health dashboard |
| `argocd/manifests/grafana-config/kustomization.yaml` | Register new dashboard configmaps |
| `docs/reference/services/flyio-proxy.md` | Document observability setup |

## Deployment and Testing

- [ ] `mise run fly-deploy` — rebuild container with Alloy
- [ ] `curl https://docs.eblu.me/` — generate traffic
- [ ] `fly logs -a blumeops-proxy` — verify Alloy startup
- [ ] Query Prometheus: `flyio_nginx_http_requests_total{instance="flyio-proxy"}`
- [ ] Query Loki: `{instance="flyio-proxy", job="flyio-nginx"}`
- [ ] `argocd app sync grafana-config` — deploy dashboards
- [ ] Verify dashboards show data in Grafana
- [ ] `mise run services-check` — no regressions

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/123
2026-02-08 10:05:38 -08:00
dc46eb7def Update all docs titles to human-readable (#117)
## Summary
- Updated frontmatter `title:` in all 63 doc cards from slug-case to human-readable (e.g. `borgmatic` → `Borgmatic`, `ai-assistance-guide` → `AI Assistance Guide`)
- Titles now closely match file stems so `[[wiki-links]]` render naturally without alternate anchor text
- Corrected titles that diverged from stems (e.g. `host-inventory` → `Hosts`, `grafana-alloy` → `Alloy`, `argocd-applications` → `Apps`)
- Deleted `title-test-alpha.md` and `title-test-beta.md` test cards and removed their reference index entry

## Deployment and Testing
- [x] `docs-check-links` passes — all wiki-links valid
- [x] `docs-check-index` passes
- [x] `docs-check-filenames` passes
- [ ] Verify titles render correctly on docs site after deploy

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/117
2026-02-07 21:44:57 -08:00
3da455e49c Enforce unique doc filenames and simple wiki-links (#109)
## Summary
- Rename section index files to match their titles (tutorials.md, reference.md, how-to.md, explanation.md) so all filenames are unique
- Convert all ~47 path-based wiki-links to simple filename format across 15 files
- Update doc-filenames task to no longer skip index.md files
- Update doc-links task to reject path-based links containing '/'

This ensures all wiki-links work correctly in obsidian.nvim by making links resolvable by filename alone.

## Testing
- `mise run doc-filenames` - all unique
- `mise run doc-links` - no broken or path-based links
- `mise run doc-titles` - no duplicates

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/109
2026-02-04 17:21:34 -08:00
9630655cc9 Switch to filename-based wiki-links (Quartz resolves by filename)
- Convert all wiki-links from title-based to filename-based
- Update doc-links to validate against filenames
- Add doc-filenames task for duplicate filename detection
- Consolidate doc hooks into single local block in pre-commit config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:29:31 -08:00
9261ec7679 Add spaces around pipe in wiki-links (test for Quartz resolution)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:17:20 -08:00
3e4b5c2dd3 Convert wiki-link titles to lowercase slugs (#92)
## Summary
- Convert all frontmatter titles to lowercase-hyphenated format (e.g., `grafana-alloy` instead of `Grafana Alloy`)
- Update all wiki-links to use the new slug format
- Update `doc-titles` task to validate slug format (lowercase, hyphens only)

Quartz appears to require titles without spaces for wiki-link resolution.

## Deployment and Testing
- [x] Pre-commit hooks pass (`doc-titles` and `doc-links`)
- [ ] Build docs v1.0.8 and deploy
- [ ] Verify wiki-links resolve correctly (e.g., `[[grafana-alloy]]`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/92
2026-02-03 16:06:35 -08:00
01adc4cf0f Switch to title-based wiki-links (#91)
## Summary
- Remove aliases from all zk cards to prevent them from capturing wiki-links
- Convert all wiki-links from `[[filename|Title]]` to `[[Title]]` format
- Replace `doc-filenames` task with `doc-titles` for duplicate title detection
- Update pre-commit hook to use `doc-titles`

Wiki-links now resolve to reference docs by their frontmatter title, which is more readable and maintainable than filename-based links.

## Deployment and Testing
- [x] Pre-commit hooks pass (including new `doc-titles` check)
- [x] Manually verified zk cards have aliases removed
- [ ] Deploy docs v1.0.7 and verify wiki-links resolve correctly
- [ ] Test links to reference docs (e.g., [[Grafana Alloy]], [[ArgoCD]])

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/91
2026-02-03 15:55:31 -08:00
254b93096a Phase 2: Add Reference section with 24 technical reference cards (#88)
## Summary
- Create `docs/reference/` section with 24 technical reference cards
- Services (16): alloy, argocd, borgmatic, 1password, forgejo, grafana, jellyfin, kiwix, loki, miniflux, navidrome, postgresql, prometheus, teslamate, transmission, zot
- Infrastructure (3): hosts, tailscale, routing
- Kubernetes (2): cluster, apps
- Storage (2): sifaka, backups
- Update README to mark Phase 2 as complete
- Add towncrier changelog fragment

## Deployment and Testing
- [ ] Build docs locally to verify wiki-links resolve
- [ ] Deploy via ArgoCD and verify at docs.ops.eblu.me/reference/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/88
2026-02-03 14:27:37 -08:00