blumeops/argocd/manifests/alloy-ringtail/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: alloy

resources:
  - namespace.yaml
  - rbac.yaml
  - daemonset.yaml

images:
  - name: registry.ops.eblu.me/blumeops/alloy
    newTag: v1.16.0-26a3ab5-nix

configMapGenerator:
  - name: alloy-config
    files:
      - config.alloy
Add multi-cluster observability with ringtail metrics and dashboards (#270) ## Summary - Add `cluster` label (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s metrics/logs, and Alloy host metrics/logs - Deploy kube-state-metrics on ringtail's k3s cluster (ArgoCD app + manifests) - Deploy Alloy on ringtail to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki - Replace single-cluster "Minikube Kubernetes" and "K8s Services Health" dashboards with: - Kubernetes Clusters dashboard — multi-cluster with `cluster` and `namespace` template variables - Ringtail (k3s) dashboard — dedicated ringtail view with GPU usage panels ## Deployment and Testing 1. Sync `apps` on indri ArgoCD to pick up new app definitions (`kube-state-metrics-ringtail`, `alloy-ringtail`) 2. Sync `prometheus` → verify `cluster` label on scraped metrics 3. Sync `alloy-k8s` → verify `cluster=indri` on remote-written metrics and logs 4. Run `mise run provision-indri -- --tags alloy` → verify `cluster=indri` on host Alloy metrics/logs 5. Sync `kube-state-metrics-ringtail` → verify pods running on ringtail 6. Sync `alloy-ringtail` → verify pods running, check Prometheus for `kube_pod_info{cluster="ringtail"}` 7. Sync `grafana-config` → verify dashboards appear, cluster variable populates both values 8. Check Loki for `{cluster="ringtail"}` logs from ringtail pods ## Notes - Alloy on ringtail uses `insecure_skip_verify=true` for TLS to Prometheus/Loki (Tailscale-managed certs not in container trust store) — tighten later - DNS resolution for `*.tail8d86e.ts.net` from ringtail pods depends on CoreDNS inheriting host's MagicDNS resolver; may need CoreDNS forwarding rules if pods can't resolve - The old services dashboard (blackbox probes) is removed — those probes are still running in alloy-k8s and the data is still in Prometheus, just not in a dedicated dashboard Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/270 2026-02-25 22:01:00 -08:00			`apiVersion: kustomize.config.k8s.io/v1beta1`
			`kind: Kustomization`

			`namespace: alloy`

			`resources:`
			`- namespace.yaml`
			`- rbac.yaml`
			`- daemonset.yaml`

			`images:`
Localize Alloy container image (#300) ## Summary - Add `containers/alloy/` with dual Dockerfile + Nix build files for Grafana Alloy v1.14.0 - Both builds fetch source from forge mirror (`forge.ops.eblu.me/mirrors/alloy.git`), build the web UI (Node), then compile the Go binary with `netgo embedalloyui` tags - Update all three alloy deployments (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail) to use `registry.ops.eblu.me/blumeops/alloy` - `promtail_journal_enabled` tag omitted — requires systemd headers and none of our configs use `loki.source.journal` ## Build verification - Dockerfile: Tested locally via `docker build`, binary reports `v1.14.0` with correct tags - Nix: Tested on ringtail via `nix-build`, all three hashes (fetchgit, npmDeps, goModules) resolved and build succeeds ## Post-merge steps 1. Wait for CI to build the container from main (both Dockerfile and Nix workflows) 2. `mise run container-list alloy` to find the `[main]` tagged image 3. C0 follow-up to update `newTag` in all three kustomizations from `v1.14.0-placeholder` to the real tag 4. Sync ArgoCD apps and verify pods come up healthy Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/300 2026-03-17 16:42:53 -07:00			`- name: registry.ops.eblu.me/blumeops/alloy`
Alloy V1.16.0 (#345) Bump Grafana Alloy v1.14.0 → v1.16.0 across all four services (alloy-k8s, alloy-ringtail, alloy-tracing-ringtail; alloy native ansible). Also migrate the indri build path from `Dockerfile` to a native Dagger `container.py` per the build-container-image migration playbook. ## Highlights from upstream - v1.15: database observability promoted to stable, OTel Collector → v0.147.0 - v1.16: clustering for `loki.source.kubernetes_events`, MySQL exporter 0.19.0 - One pre-existing breaking change in v1.15 (`loki.source.awsfirehose` undocumented metric prefix rename) — not used here. ## Build infra Alloy v1.16.0's go.mod requires Go 1.26.2. The nix derivation now uses `pkgs.go_1_26` with `GOTOOLCHAIN=local` to avoid auto-downloading a toolchain blob that violated the fixed-output rule. ## Test plan - [ ] CI: `mise run container-build-and-release alloy --ref alloy-v1.16.0` (dispatched as run 522; nix job to be re-triggered with the v1.16.0 goModules outputHash once the local ringtail build surfaces it) - [ ] After CI green, bump `images[].newTag` in three kustomizations to the new `-<sha>` and `-<sha>-nix` tags, deploy from this branch via `argocd app set <app> --revision alloy-v1.16.0 && argocd app sync <app>` - [ ] Manual rebuild of macOS native binary on gilbert (per ansible/roles/alloy README) and `mise run provision-indri -- --tags alloy --check --diff` - [ ] `mise run services-check` after merge & redeploy Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/345 2026-05-01 08:05:37 -07:00			`newTag: v1.16.0-26a3ab5-nix`
Add multi-cluster observability with ringtail metrics and dashboards (#270) ## Summary - Add `cluster` label (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s metrics/logs, and Alloy host metrics/logs - Deploy kube-state-metrics on ringtail's k3s cluster (ArgoCD app + manifests) - Deploy Alloy on ringtail to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki - Replace single-cluster "Minikube Kubernetes" and "K8s Services Health" dashboards with: - Kubernetes Clusters dashboard — multi-cluster with `cluster` and `namespace` template variables - Ringtail (k3s) dashboard — dedicated ringtail view with GPU usage panels ## Deployment and Testing 1. Sync `apps` on indri ArgoCD to pick up new app definitions (`kube-state-metrics-ringtail`, `alloy-ringtail`) 2. Sync `prometheus` → verify `cluster` label on scraped metrics 3. Sync `alloy-k8s` → verify `cluster=indri` on remote-written metrics and logs 4. Run `mise run provision-indri -- --tags alloy` → verify `cluster=indri` on host Alloy metrics/logs 5. Sync `kube-state-metrics-ringtail` → verify pods running on ringtail 6. Sync `alloy-ringtail` → verify pods running, check Prometheus for `kube_pod_info{cluster="ringtail"}` 7. Sync `grafana-config` → verify dashboards appear, cluster variable populates both values 8. Check Loki for `{cluster="ringtail"}` logs from ringtail pods ## Notes - Alloy on ringtail uses `insecure_skip_verify=true` for TLS to Prometheus/Loki (Tailscale-managed certs not in container trust store) — tighten later - DNS resolution for `*.tail8d86e.ts.net` from ringtail pods depends on CoreDNS inheriting host's MagicDNS resolver; may need CoreDNS forwarding rules if pods can't resolve - The old services dashboard (blackbox probes) is removed — those probes are still running in alloy-k8s and the data is still in Prometheus, just not in a dedicated dashboard Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/270 2026-02-25 22:01:00 -08:00
			`configMapGenerator:`
			`- name: alloy-config`
			`files:`
			`- config.alloy`