- Nix 32.5%
- Jinja 21.5%
- Python 17.9%
- Shell 11.8%
- Go 8.1%
- Other 8.2%
## Summary
Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki).
- **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus
- **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes
- **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking
- **Prometheus** scrapes Tempo operational metrics
### Architecture
```
ringtail (k3s) indri (minikube)
┌──────────────────────┐ ┌─────────────────────┐
│ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │
│ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │
│ Ollama, Immich │ │ ↳ RED → Prometheus │
└──────────────────────┘ │ │
│ Grafana │
│ ↳ Tempo datasource │
└─────────────────────┘
```
### New files (12)
- `docs/reference/services/tempo.md` — reference doc
- `docs/changelog.d/feature-otel-tracing.feature.md`
- `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files)
- `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files)
### Modified files (6)
- `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields
- `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target
- `service-versions.yaml` — tempo + alloy-tracing-ringtail entries
- `docs/reference/services/grafana.md` — Tempo in datasources table
- `docs/reference/reference.md` — Tempo in services index
- `docs/reference/operations/observability.md` — Tempo in components list
## Deployment and Testing
- [ ] Sync `apps` app to pick up new Application definitions
- [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo`
- [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo`
- [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready`
- [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring`
- [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail`
- [ ] Check Beyla discovery in alloy-tracing logs on ringtail
- [ ] Sync grafana-config for updated datasources
- [ ] Sync prometheus for updated scrape config
- [ ] Test Grafana Tempo datasource connection
- [ ] Generate test traffic and search traces in Grafana Explore → Tempo
- [ ] After merge: reset all ArgoCD app revisions back to main
Reviewed-on: #286
|
||
|---|---|---|
| .claude | ||
| .dagger | ||
| .forgejo/workflows | ||
| .github | ||
| ansible | ||
| argocd | ||
| containers | ||
| docs | ||
| fly | ||
| mise-tasks | ||
| nixos/ringtail | ||
| pulumi | ||
| .ansible-lint | ||
| .gitignore | ||
| .yamllint.yaml | ||
| Brewfile | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| dagger.json | ||
| LICENSE | ||
| mise.toml | ||
| prek.toml | ||
| README.md | ||
| service-versions.yaml | ||
| towncrier.toml | ||
blumeops
aka "Blue Mops"
Tools and configuration for Erich Blume's personal infrastructure, orchestrated across a Tailscale tailnet.
This is a homelab, but it's also a testing ground for AI-assisted infrastructure development. Much of this codebase was co-authored with Claude Code, and the repo places heavy emphasis on documentation, process, and change classification to make that collaboration work well. I don't know entirely how I feel about LLMs in our current era (there are real concerns about how training data is sourced and energy subsidy) but it felt important to learn how to work with these tools.
The full documentation is published at docs.eblu.me
and lives in the docs/ directory, structured around the
Diataxis framework and designed to be compatible with
Obsidian/Obsidian.nvim.
What runs here
Services are a mix of Kubernetes pods (managed by ArgoCD), macOS LaunchAgent services (managed by Ansible), and NixOS systemd services (managed by Nix flakes), all connected via Tailscale:
- Indri (Mac Mini M1) - primary server. Most services run in Minikube via ArgoCD; Forgejo, Caddy, and others run natively as LaunchAgent services via Ansible.
- Ringtail (NixOS desktop, RTX 4080) - GPU workloads (Frigate NVR, Authentik SSO) on k3s, plus NixOS systemd services.
- Sifaka (Synology NAS) - backup target and bulk storage.
Notable services include Grafana/Prometheus/Loki observability, Immich photos, Jellyfin media, Forgejo git forge, a Zot container registry, and more. Public access is routed through a Fly.io proxy; everything else is tailnet-only.
Project structure
ansible/ Ansible playbooks and roles (indri, sifaka)
argocd/apps/ ArgoCD Application definitions
argocd/manifests/ Kubernetes manifests per service
containers/ Custom container builds (Dockerfile + Nix)
docs/ Diataxis documentation (published at docs.eblu.me)
fly/ Fly.io public proxy configuration
mise-tasks/ Operational scripts run via mise
nixos/ NixOS configuration for ringtail
pulumi/ Pulumi IaC (Tailscale ACLs, Gandi DNS)
.dagger/ Dagger CI pipelines
.forgejo/ Forgejo Actions CI/CD workflows
Getting started
You'll need Homebrew and mise:
brew bundle # install CLI tools (argocd, tea, flyctl, etc.)
mise install # install managed toolchains (ansible, pulumi, dagger, etc.)
prek install # set up git hooks
Git hooks (via prek) enforce secret scanning
(TruffleHog), linting, formatting, and custom checks like doc link validation
and the Mikado branch invariant. They run automatically on git commit.
Operational tasks are driven through mise. Run mise tasks to see what's
available. Key examples:
mise run provision-indri # deploy to indri via Ansible
mise run services-check # verify service health
mise run container-list # list tracked container images
AI-assisted development
This repo is designed to be worked on by both humans and AI agents. The
CLAUDE.md file provides instructions for Claude Code, and the
docs/tutorials/ai-assistance-guide.md
explains the full workflow.
Changes are classified before starting work:
- C0 - quick fixes, committed directly to main
- C1 - feature branch + PR, documentation written before code
- C2 - multi-phase work using the Mikado method for dependency tracking
See the agent change process for details.