blumeops/docs/reference/kubernetes/apps.md

57 lines
2.7 KiB
Markdown
Raw Normal View History

---
title: Apps
modified: 2026-03-04
tags:
- kubernetes
- argocd
---
# ArgoCD Applications
Registry of all applications deployed via [[argocd]].
## Application Registry
| App | Namespace | Path/Source | Service |
|-----|-----------|-------------|---------|
| `apps` | argocd | `argocd/apps/` | App-of-apps root |
| `argocd` | argocd | `argocd/manifests/argocd/` | [[argocd]] |
| `tailscale-operator` | tailscale | `argocd/manifests/tailscale-operator/` | [[tailscale-operator]] |
| `1password-connect` | 1password | `argocd/manifests/1password-connect/` | [[1password]] |
| `external-secrets` | external-secrets | Helm chart | [[1password]] |
| `external-secrets-config` | external-secrets | `argocd/manifests/external-secrets-config/` | [[1password]] |
| `cloudnative-pg` | cnpg-system | `mirrors/cloudnative-pg` release manifest | PostgreSQL operator |
| `blumeops-pg` | databases | `argocd/manifests/databases/` | [[postgresql]] |
| `prometheus` | monitoring | `argocd/manifests/prometheus/` | [[prometheus]] |
| `loki` | monitoring | `argocd/manifests/loki/` | [[loki]] |
| `grafana` | monitoring | `argocd/manifests/grafana/` | [[grafana]] |
| `grafana-config` | monitoring | `argocd/manifests/grafana-config/` | [[grafana]] |
| `immich` | immich | `argocd/manifests/immich/` | [[immich]] |
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
| `tempo` | monitoring | `argocd/manifests/tempo/` | [[tempo]] |
| `alloy-k8s` | alloy | `argocd/manifests/alloy-k8s/` | [[alloy|Alloy]] |
Add OpenTelemetry distributed tracing (Tempo + Beyla eBPF) (#286) ## Summary Adds the third observability pillar — **distributed tracing** — alongside existing metrics (Prometheus) and logs (Loki). - **Grafana Tempo 2.10.1** on minikube-indri for trace storage with 7d retention, OTLP receivers, and `metrics_generator` that remote-writes span-metrics (RED) to Prometheus - **Beyla eBPF auto-instrumentation** via a privileged Alloy DaemonSet on ringtail — instruments HTTP services (Frigate, ntfy, Ollama, Immich) without code changes - **Grafana integration** — Tempo datasource with trace↔log and trace↔metrics correlation, plus Loki derivedFields for trace ID linking - **Prometheus** scrapes Tempo operational metrics ### Architecture ``` ringtail (k3s) indri (minikube) ┌──────────────────────┐ ┌─────────────────────┐ │ Alloy+Beyla (eBPF) │──OTLP HTTP────────→ │ Tempo │ │ ↳ Frigate, ntfy, │ via tailnet │ ↳ trace storage │ │ Ollama, Immich │ │ ↳ RED → Prometheus │ └──────────────────────┘ │ │ │ Grafana │ │ ↳ Tempo datasource │ └─────────────────────┘ ``` ### New files (12) - `docs/reference/services/tempo.md` — reference doc - `docs/changelog.d/feature-otel-tracing.feature.md` - `argocd/apps/tempo.yaml` + `argocd/manifests/tempo/` (6 files) - `argocd/apps/alloy-tracing-ringtail.yaml` + `argocd/manifests/alloy-tracing-ringtail/` (4 files) ### Modified files (6) - `argocd/manifests/grafana/datasources.yaml` — Tempo datasource + Loki derivedFields - `argocd/manifests/prometheus/prometheus.yml` — Tempo scrape target - `service-versions.yaml` — tempo + alloy-tracing-ringtail entries - `docs/reference/services/grafana.md` — Tempo in datasources table - `docs/reference/reference.md` — Tempo in services index - `docs/reference/operations/observability.md` — Tempo in components list ## Deployment and Testing - [ ] Sync `apps` app to pick up new Application definitions - [ ] `argocd app set tempo --revision feature/otel-tracing && argocd app sync tempo` - [ ] Verify Tempo pod: `kubectl --context=minikube-indri get pods -n monitoring -l app=tempo` - [ ] Verify Tempo ready: port-forward 3200 and `curl localhost:3200/ready` - [ ] Verify Tailscale ingresses: `kubectl --context=minikube-indri get ingress -n monitoring` - [ ] `argocd app set alloy-tracing-ringtail --revision feature/otel-tracing && argocd app sync alloy-tracing-ringtail` - [ ] Check Beyla discovery in alloy-tracing logs on ringtail - [ ] Sync grafana-config for updated datasources - [ ] Sync prometheus for updated scrape config - [ ] Test Grafana Tempo datasource connection - [ ] Generate test traffic and search traces in Grafana Explore → Tempo - [ ] After merge: reset all ArgoCD app revisions back to main Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/286
2026-03-05 10:51:07 -08:00
| `alloy-tracing-ringtail` | alloy | `argocd/manifests/alloy-tracing-ringtail/` | [[alloy|Alloy]] (eBPF tracing) |
| `kube-state-metrics` | monitoring | `argocd/manifests/kube-state-metrics/` | K8s metrics |
| `miniflux` | miniflux | `argocd/manifests/miniflux/` | [[miniflux]] |
| `kiwix` | kiwix | `argocd/manifests/kiwix/` | [[kiwix]] |
| `torrent` | torrent | `argocd/manifests/torrent/` | [[transmission]] |
| `navidrome` | navidrome | `argocd/manifests/navidrome/` | [[navidrome]] |
| `teslamate` | teslamate | `argocd/manifests/teslamate/` | [[teslamate]] |
| `cv` | cv | `argocd/manifests/cv/` | [[cv]] |
| `forgejo-runner` | forgejo-runner | `argocd/manifests/forgejo-runner/` | [[forgejo]] CI |
| `ollama` | ollama | `argocd/manifests/ollama/` | [[ollama]] |
| `mealie` | mealie | `argocd/manifests/mealie/` | [[mealie]] |
2026-04-08 17:54:12 -07:00
| `paperless` | paperless | `argocd/manifests/paperless/` | [[paperless]] |
C1: deploy adelaide-baby-shower-app to ringtail k3s (#349) ## Summary Brings up the Adelaide / Heidi / Addie baby shower app on ringtail k3s with the public/private split that the app's hosting contract calls for: `shower.eblu.me` (public, via Fly proxy) and `shower.ops.eblu.me` (tailnet). App is consumed as a wheel from the Forgejo PyPI index — source lives at [`adelaide-baby-shower-app`](https://forge.eblu.me/eblume/adelaide-baby-shower-app). ### What's included - **ArgoCD app + manifests** under `argocd/manifests/shower/` (deployment, service, ProxyGroup ingress, ConfigMap for `DJANGO_DEBUG`/`DJANGO_ADMIN_URL`, ExternalSecret for `DJANGO_SECRET_KEY` from 1Password item `Shower (blumeops)`, NFS PV on sifaka, RWX media PVC, RWO local-path data PVC for SQLite). Recreate rollout because SQLite is single-writer. - **Public surface** (`fly/`): new `shower.eblu.me` server block proxying to `shower.ops.eblu.me`. `/admin/` returns 403 at the edge except `/admin/login/` and `/admin/logout/`, which are rate-limited via a new `shower_auth` zone. `X-Clacks-Overhead` on. GNU Terry Pratchett. - **fail2ban** filter (`shower-admin-login.conf`) matching 401/403/429 on `/admin/login/` and jail (`shower.conf`) with `maxretry=5/findtime=600/bantime=3600`. The `nginx-deny` action was generalized to take a per-jail `nginx_deny_file` so the shower has its own deny list (forge keeps using the legacy default). - **Caddy** route on indri (`shower.ops.eblu.me` → `https://shower.tail8d86e.ts.net`). - **Pulumi** Gandi CNAME `shower.eblu.me → blumeops-proxy.fly.dev.`. - **Grafana** APM dashboard `configmap-shower-apm.yaml` (request rate, error rate, failed admin login count, latency percentiles, bandwidth, access logs) mirroring `docs-apm.json` with a `host="shower.eblu.me"` filter. - **Container** `containers/shower/default.nix` — `dockerTools.buildLayeredImage` with a nixpkgs Python and a startup wrapper that creates `/app/data/.venv`, pip-installs `adelaide-baby-shower-app==1.0.0` from the forge PyPI index on first boot, runs migrations + collectstatic, and execs gunicorn. A `local_settings.py` shim pins `DATABASES.NAME`/`MEDIA_ROOT`/`STATIC_ROOT` to absolute paths so they don't end up in site-packages. - **Docs** runbook at `docs/how-to/operations/shower-app.md` linked from the apps registry, plus changelog fragments. ### Defense layers on the public surface 1. fly nginx geo+fail2ban `$shower_banned` (per-service deny list) 2. fly nginx `limit_req zone=shower_auth` (3 r/s per Fly-Client-IP) 3. django-axes (5 fails / 1h, keyed on username+ip_address) 4. edge `/admin/` block (returns 403 for anything that isn't login/logout) ## Prerequisites for the user to do (NOT in this PR) Halted on these per request — they touch shared/manual systems: - [x] **NFS share** on sifaka: `/volume1/shower`, NFS rule for ringtail RW, `chown 1000:1000` - [ ] **1Password item** `Shower (blumeops)` in the blumeops vault with a freshly minted `secret-key` field (`openssl rand -base64 48`) — do NOT reuse anything that has lived in git - [ ] **Container build**: `mise run container-build-and-release shower`, then update `images[].newTag` in `argocd/manifests/shower/kustomization.yaml` to the resulting `v1.0.0-<sha>-nix` - [x] **DNS**: `mise run dns-up` after merge - [x] **Fly cert**: `fly certs add shower.eblu.me -a blumeops-proxy` - [ ] **Caddy push**: `mise run provision-indri -- --tags caddy` - [ ] **Fly redeploy** to pick up the new nginx block + fail2ban jail: `mise run fly-deploy` - [ ] **ArgoCD sync**: `argocd app set shower --revision shower-app-deploy && argocd app sync shower` to test from this branch before merging ## Test plan - [ ] Container builds successfully on nix-container-builder runner - [ ] Pod starts, migrations run, gunicorn answers on :8000 - [ ] `kubectl --context=k3s-ringtail -n shower logs deploy/shower` clean - [ ] `curl -sf https://shower.ops.eblu.me/` returns the splash page (tailnet) - [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/` returns 200 (pre-DNS verification) - [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/users/` returns 403 (edge block) - [ ] `curl -I -H "Host: shower.eblu.me" https://blumeops-proxy.fly.dev/admin/login/` returns a Django login response - [ ] After DNS is up: `curl -I https://shower.eblu.me/` returns 200 with `X-Clacks-Overhead` - [ ] Grafana dashboard "Shower APM" appears and starts showing traffic - [ ] `mise run services-check` passes Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/349
2026-05-11 13:47:18 -07:00
| `shower` | shower | `argocd/manifests/shower/` | [[shower-app]] |
| `prowler` | prowler | `argocd/manifests/prowler/` | [[prowler]] |
## Sync Policies
| Application | Policy | Rationale |
|-------------|--------|-----------|
| `apps` | Automated | Picks up new Application manifests |
| All others | Manual | Explicit control over deployments |
## Related
- [[argocd]] - GitOps platform details
- [[cluster|Cluster]] - Kubernetes infrastructure