Commit graph

21 commits

Author SHA1 Message Date
2bea048dbf Externalize Tailscale operator to forge mirror (#295)
## Summary
- Mirrors `tailscale/tailscale` on forge (`mirrors/tailscale`)
- Replaces vendored `operator.yaml` (495 KB / 5,386 lines) with ArgoCD apps sourcing the upstream static manifest, pinned via `targetRevision: v1.94.2`
- Adds `tailscale-operator-base` app for indri and `tailscale-operator-base-ringtail` for ringtail
- Local kustomization retains only ProxyClass and DNSConfig custom resources
- Updates `[[tailscale-operator]]` doc to reflect new sourcing

## Deployment and Testing
- [ ] Register `mirrors/tailscale` repo in ArgoCD (it needs to know about the new repo)
- [ ] Sync `apps` app to pick up the new `tailscale-operator-base` app definitions
- [ ] Sync `tailscale-operator-base` — verify CRDs, RBAC, operator Deployment come up
- [ ] Sync `tailscale-operator` — verify ProxyClass, DNSConfig still apply cleanly
- [ ] Verify existing Tailscale Ingresses still work (ProxyGroup pods healthy)
- [ ] Repeat for ringtail cluster
- [ ] After merge: apps already point at tags, no revision reset needed

Reviewed-on: #295
2026-03-15 17:44:35 -07:00
40f1568088 Remove unused Mosquitto MQTT broker from ringtail
Mosquitto has been dormant since frigate-notify switched from MQTT to
webapi polling (529ba10). Tear down live infra (ArgoCD app, namespace)
and remove all manifests, service-versions entry, services-check, and
doc references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:37:31 -07:00
c281fb5403 Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286)
## Summary

Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki).

- **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus
- **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes
- **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking
- **Prometheus** scrapes Tempo operational metrics

### Architecture

```
ringtail (k3s)                                indri (minikube)
┌──────────────────────┐                      ┌─────────────────────┐
│ Alloy+Beyla (eBPF)   │──OTLP HTTP────────→ │ Tempo               │
│  ↳ Frigate, ntfy,    │  via tailnet         │  ↳ trace storage    │
│    Ollama, Immich     │                      │  ↳ RED → Prometheus │
└──────────────────────┘                      │                     │
                                              │ Grafana             │
                                              │  ↳ Tempo datasource │
                                              └─────────────────────┘
```

### New files (12)
- `docs/reference/services/tempo.md` — reference doc
- `docs/changelog.d/feature-otel-tracing.feature.md`
- `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files)
- `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files)

### Modified files (6)
- `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields
- `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target
- `service-versions.yaml` — tempo + alloy-tracing-ringtail entries
- `docs/reference/services/grafana.md` — Tempo in datasources table
- `docs/reference/reference.md` — Tempo in services index
- `docs/reference/operations/observability.md` — Tempo in components list

## Deployment and Testing

- [ ] Sync `apps` app to pick up new Application definitions
- [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo`
- [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo`
- [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready`
- [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring`
- [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail`
- [ ] Check Beyla discovery in alloy-tracing logs on ringtail
- [ ] Sync grafana-config for updated datasources
- [ ] Sync prometheus for updated scrape config
- [ ] Test Grafana Tempo datasource connection
- [ ] Generate test traffic and search traces in Grafana Explore → Tempo
- [ ] After merge: reset all ArgoCD app revisions back to main

Reviewed-on: #286
2026-03-05 10:51:07 -08:00
6ca3c67705 Add Ollama reference card and update indexes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 19:43:14 -08:00
de54b4e33d Port CloudNative-PG off Helm to direct release manifest (#268)
## Summary
- Point ArgoCD app directly at forge-mirrored upstream repo (`mirrors/cloudnative-pg`) instead of the Helm charts repo
- Use `directory.include` to select the specific release manifest (`cnpg-1.27.1.yaml`) from the `releases/` directory
- No vendored files, no Helm — upgrades are a two-line change (`targetRevision` + `directory.include`)
- Delete unused `values.yaml` (was empty, all Helm defaults)

## Deployment and Testing
- [ ] Register mirror repo in ArgoCD: `argocd repo add ssh://forgejo@forge.ops.eblu.me:2222/mirrors/cloudnative-pg.git --ssh-private-key-path <key>`
- [ ] `argocd app set cloudnative-pg --revision feature/cnpg-direct-source && argocd app sync cloudnative-pg`
- [ ] Verify operator pod running: `kubectl get pods -n cnpg-system --context=minikube-indri`
- [ ] Verify CRDs exist: `kubectl get crd --context=minikube-indri | grep cnpg`
- [ ] Verify existing clusters healthy: `kubectl get clusters -A --context=minikube-indri`
- [ ] After merge: `argocd app set cloudnative-pg --revision main && argocd app sync cloudnative-pg`

## Notes
- The forge mirror was created via `mise run mirror-create` from `https://github.com/cloudnative-pg/cloudnative-pg.git`
- ArgoCD may need the mirror repo added to its known repositories if the credential template doesn't already match `mirrors/*`

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/268
2026-02-25 17:37:53 -08:00
291fff345c Fix services-check and update docs for Frigate migration to ringtail (#218)
## Summary
- Move mosquitto, ntfy, frigate, frigate-notify pod checks from `minikube-indri` to `k3s-ringtail` context in `services-check`
- Add `nvidia-device-plugin` pod check for ringtail k3s
- Rename "Kubernetes pods" section to "Indri minikube pods" for clarity
- Update 8 documentation files to reflect the migration completed in PRs #216/#217

## Files Changed
| File | Change |
|------|--------|
| `mise-tasks/services-check` | Move 4 pod checks to k3s-ringtail, add nvidia-device-plugin |
| `docs/reference/services/frigate.md` | Image→tensorrt, detector→ONNX/CUDA, shm→512Mi |
| `docs/reference/infrastructure/ringtail.md` | List actual k3s workloads |
| `docs/reference/infrastructure/indri.md` | Note frigate migration |
| `docs/explanation/architecture.md` | Add ringtail to diagram + compute layer |
| `docs/reference/kubernetes/cluster.md` | Note two clusters, add k3s section |
| `docs/reference/reference.md` | Update frigate/ntfy location |
| `docs/how-to/plans/completed/operationalize-reolink-camera.md` | Add post-completion migration note |
| `CLAUDE.md` | Add k3s-ringtail context guidance |

## Test plan
- [ ] `mise run services-check` — all checks pass
- [ ] Review each doc for accuracy against deployed state

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/218
2026-02-19 14:38:21 -08:00
63e67b927a Add CV service reference card and docs updates (#171)
## Summary
- Add CV service reference card (`docs/reference/services/cv.md`)
- Add CV to services index, ArgoCD apps registry, and Caddy proxied services table
- Changelog fragment

## Files changed
- `docs/reference/services/cv.md` (new) — CV reference card
- `docs/reference/reference.md` — Add CV to services table
- `docs/reference/kubernetes/apps.md` — Add CV to app registry
- `docs/reference/services/caddy.md` — Add CV to proxied k8s services
- `docs/changelog.d/cv-docs.doc.md` (new) — Changelog fragment

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/171
2026-02-12 11:45:32 -08:00
b0bac91ca9 Fix frontmatter field name for Quartz date display (#158)
## Summary

- Rename `date-modified` -> `modified` in all 80 docs and the `docs-check-frontmatter` task

Quartz's `CreatedModifiedDate` plugin recognizes `modified`, `lastmod`, `updated`, and `last-modified` — but not `date-modified`. The wrong field name caused Quartz to ignore frontmatter dates entirely and fall through to filesystem timestamps (UTC inside Dagger), showing Feb 12 on pages built late on Feb 11 PST.

## Test plan

- [x] `mise run docs-check-frontmatter` passes
- [ ] Kick off docs release after merge — verify rendered dates match frontmatter values

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/158
2026-02-11 16:45:12 -08:00
b197bd5f58 Adopt Dagger CI for docs build (Phase 2) (#157)
## Summary

Migrates the docs build pipeline to Dagger (Phase 2 of the Dagger CI adoption plan).

- **Backfill `date-modified` frontmatter** on all 80 docs — Dagger's `--src=.` excludes `.git`, so Quartz can't use git history for page dates. Frontmatter dates work with or without git.
- **New `docs-check-frontmatter` mise task + pre-commit hook** — validates all docs have `title`, `tags`, and `date-modified`
- **New Dagger functions** — `build_changelog` (towncrier in Python container) and `build_docs` (chains changelog → Quartz build in Node container, returns tarball)
- **Simplified CI workflow** — the ~44-line inline Quartz build (clone, npm ci, build, tar, cleanup) is replaced by `dagger call build-docs`. Changelog step remains local on the runner since towncrier needs to modify the host working tree for the git commit.

### Design decisions

- **Towncrier runs twice in CI**: once inside Dagger (for the docs tarball) and once on the runner (for the git commit). This is intentional — Dagger's directory export is additive and can't delete the consumed changelog fragments from the host.
- **Artifact hosting stays on Forgejo Releases** (not migrated to Forgejo Packages as the plan doc originally suggested). That migration can happen independently.
- **`date-modified` frontmatter** preserved even though `build_changelog` installs git — the git there is only for towncrier's `git add` call, not for history. The local iteration story (`dagger call build-docs --src=. --version=dev` with uncommitted changes) depends on frontmatter dates.

### Local iteration

```bash
dagger call build-docs --src=. --version=dev export --path=./docs-dev.tar.gz
tar tf docs-dev.tar.gz | head -20
```

## Deployment and Testing

- [x] `dagger call build-docs --src=. --version=dev` produces valid 1.1MB tarball (149 HTML pages)
- [x] Pre-commit hooks pass (including new `docs-check-frontmatter`)
- [ ] Full `workflow_dispatch` run after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/157
2026-02-11 16:33:16 -08:00
e6cf7e47e0 Restrict flyio-proxy ACLs to dedicated tag:flyio-target endpoints (#126)
All checks were successful
Deploy Fly.io Proxy / deploy (push) Successful in 1m8s
## Summary
- Introduce `tag:flyio-target` so services must explicitly opt in to be reachable by the fly.io proxy
- Replace broad `tag:k8s` and `tag:homelab` grants with the new tag in the ACL rule and test
- Add `tailscale.com/tags: "tag:k8s,tag:flyio-target"` annotation to docs, loki, and prometheus Ingresses
- Switch Alloy push endpoints from `*.ops.eblu.me` (Caddy) to `*.tail8d86e.ts.net` (Tailscale Ingress)
- Update docs: flyio-proxy, caddy, tailscale, forgejo (future public access + security checklist), expose-service-publicly

## Manual step (not in PR)
Update the k8s operator OAuth client in the Tailscale admin console to include `tag:flyio-target` in its scope. Without this, the operator cannot assign the new tag to Ingress proxy nodes.

## Deployment order
1. **Pulumi ACLs** — `mise run tailnet-preview && mise run tailnet-up`
2. **OAuth client** — Manual update in Tailscale admin console
3. **K8s Ingresses** — `argocd app sync apps && argocd app sync docs loki prometheus`
4. **Fly.io proxy** — `mise run fly-deploy`
5. **Verify** — `mise run services-check`, check Grafana dashboards

## Test plan
- [ ] `mise run tailnet-preview` shows clean diff
- [ ] `argocd app diff docs`, `argocd app diff loki`, `argocd app diff prometheus` show only annotation additions
- [ ] After deploy: Grafana dashboards show continued log/metric flow
- [ ] `curl -sf https://docs.eblu.me` returns 200
- [ ] `mise run services-check` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/126
2026-02-08 21:54:18 -08:00
dc46eb7def Update all docs titles to human-readable (#117)
## Summary
- Updated frontmatter `title:` in all 63 doc cards from slug-case to human-readable (e.g. `borgmatic` → `Borgmatic`, `ai-assistance-guide` → `AI Assistance Guide`)
- Titles now closely match file stems so `[[wiki-links]]` render naturally without alternate anchor text
- Corrected titles that diverged from stems (e.g. `host-inventory` → `Hosts`, `grafana-alloy` → `Alloy`, `argocd-applications` → `Apps`)
- Deleted `title-test-alpha.md` and `title-test-beta.md` test cards and removed their reference index entry

## Deployment and Testing
- [x] `docs-check-links` passes — all wiki-links valid
- [x] `docs-check-index` passes
- [x] `docs-check-filenames` passes
- [ ] Verify titles render correctly on docs site after deploy

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/117
2026-02-07 21:44:57 -08:00
3da455e49c Enforce unique doc filenames and simple wiki-links (#109)
## Summary
- Rename section index files to match their titles (tutorials.md, reference.md, how-to.md, explanation.md) so all filenames are unique
- Convert all ~47 path-based wiki-links to simple filename format across 15 files
- Update doc-filenames task to no longer skip index.md files
- Update doc-links task to reject path-based links containing '/'

This ensures all wiki-links work correctly in obsidian.nvim by making links resolvable by filename alone.

## Testing
- `mise run doc-filenames` - all unique
- `mise run doc-links` - no broken or path-based links
- `mise run doc-titles` - no duplicates

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/109
2026-02-04 17:21:34 -08:00
7ebac4aef6 Add Phase 3 tutorials with audience targeting (#94)
## Summary
- Create tutorials directory structure with index page
- Add 5 main tutorials targeting different audiences:
  - **what-is-blumeops** (Reader, AI) - High-level orientation
  - **exploring-the-docs** (All) - Navigation guide
  - **ai-assistance-guide** (AI, Owner) - Context for AI-assisted operations
  - **contributing** (Contributor) - First contribution workflow
  - **replicating-blumeops** (Replicator) - Overview for building similar setup
- Add 4 replication sub-tutorials:
  - tailscale-setup, kubernetes-bootstrap, argocd-config, observability-stack
- Update README.md to mark Phase 3 complete
- Add changelog fragment

Each tutorial explicitly identifies its target audiences and links to reference material rather than re-explaining concepts.

## Deployment and Testing
- [x] All pre-commit hooks pass (doc-links validates wiki links)
- [ ] Build docs via workflow to verify rendering
- [ ] Review content for accuracy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/94
2026-02-03 18:51:57 -08:00
9630655cc9 Switch to filename-based wiki-links (Quartz resolves by filename)
- Convert all wiki-links from title-based to filename-based
- Update doc-links to validate against filenames
- Add doc-filenames task for duplicate filename detection
- Consolidate doc hooks into single local block in pre-commit config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:29:31 -08:00
9261ec7679 Add spaces around pipe in wiki-links (test for Quartz resolution)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:17:20 -08:00
3e4b5c2dd3 Convert wiki-link titles to lowercase slugs (#92)
## Summary
- Convert all frontmatter titles to lowercase-hyphenated format (e.g., `grafana-alloy` instead of `Grafana Alloy`)
- Update all wiki-links to use the new slug format
- Update `doc-titles` task to validate slug format (lowercase, hyphens only)

Quartz appears to require titles without spaces for wiki-link resolution.

## Deployment and Testing
- [x] Pre-commit hooks pass (`doc-titles` and `doc-links`)
- [ ] Build docs v1.0.8 and deploy
- [ ] Verify wiki-links resolve correctly (e.g., `[[grafana-alloy]]`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/92
2026-02-03 16:06:35 -08:00
01adc4cf0f Switch to title-based wiki-links (#91)
## Summary
- Remove aliases from all zk cards to prevent them from capturing wiki-links
- Convert all wiki-links from `[[filename|Title]]` to `[[Title]]` format
- Replace `doc-filenames` task with `doc-titles` for duplicate title detection
- Update pre-commit hook to use `doc-titles`

Wiki-links now resolve to reference docs by their frontmatter title, which is more readable and maintainable than filename-based links.

## Deployment and Testing
- [x] Pre-commit hooks pass (including new `doc-titles` check)
- [x] Manually verified zk cards have aliases removed
- [ ] Deploy docs v1.0.7 and verify wiki-links resolve correctly
- [ ] Test links to reference docs (e.g., [[Grafana Alloy]], [[ArgoCD]])

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/91
2026-02-03 15:55:31 -08:00
4ccb7b9a26 Fix wiki-links to use filename-based resolution (#90)
## Summary
- Quartz's "shortest" path mode resolves wiki-links by **filename**, not frontmatter title
- Previous PR used title-based links like `[[Grafana Alloy]]` which looked for non-existent `Grafana-Alloy.md`
- Now using filename-based links like `[[alloy|Grafana Alloy]]` which correctly resolve

## Changes
- Rename zk duplicate files with `-log` suffix (e.g., `argocd.md` → `argocd-log.md`)
- Rename `reference/storage/postgresql.md` to `postgresql-storage.md`
- Convert all 175 wiki-links from `[[Title]]` to `[[filename|Title]]` format
- Rename `doc-card-titles` task to `doc-filenames` (checks filename uniqueness, not titles)
- Update pre-commit hook for renamed task

## Deployment and Testing
- [x] Pre-commit hooks pass
- [x] `mise run doc-filenames` shows no duplicate filenames
- [ ] Verify wiki-links work correctly in Quartz build

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/90
2026-02-03 15:30:42 -08:00
a45a78b478 Convert wiki-links to title-based format (#89)
## Summary
- Add `doc-card-titles` mise task to enumerate all doc cards by title/id and detect duplicates
- Remove redundant aliases from zk cards where alias matched the id
- Rename `reference/storage/postgresql.md` title to "PostgreSQL Storage" to avoid duplicate with `reference/services/postgresql.md`
- Convert all 175 path-based wiki-links `[[reference/path|Title]]` to title-based `[[Title]]` format
- Add pre-commit hook to check for duplicate card titles on doc changes

## Deployment and Testing
- [x] Pre-commit hooks pass
- [x] `mise run doc-card-titles` shows no duplicates
- [ ] Verify wiki-links work correctly in Quartz build

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/89
2026-02-03 15:04:38 -08:00
b294caa734 Fix wiki-link paths in reference docs
Links like [[services/foo]] need to be [[reference/services/foo]]
to resolve correctly from the docs root.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 14:41:55 -08:00
254b93096a Phase 2: Add Reference section with 24 technical reference cards (#88)
## Summary
- Create `docs/reference/` section with 24 technical reference cards
- Services (16): alloy, argocd, borgmatic, 1password, forgejo, grafana, jellyfin, kiwix, loki, miniflux, navidrome, postgresql, prometheus, teslamate, transmission, zot
- Infrastructure (3): hosts, tailscale, routing
- Kubernetes (2): cluster, apps
- Storage (2): sifaka, backups
- Update README to mark Phase 2 as complete
- Add towncrier changelog fragment

## Deployment and Testing
- [ ] Build docs locally to verify wiki-links resolve
- [ ] Deploy via ArgoCD and verify at docs.ops.eblu.me/reference/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/88
2026-02-03 14:27:37 -08:00