## Summary
- Add `cluster` label (indri/ringtail) to all Prometheus scrape jobs, Alloy k8s metrics/logs, and Alloy host metrics/logs
- Deploy kube-state-metrics on ringtail's k3s cluster (ArgoCD app + manifests)
- Deploy Alloy on ringtail to collect pod metrics and logs, remote-writing to indri's Prometheus and Loki
- Replace single-cluster "Minikube Kubernetes" and "K8s Services Health" dashboards with:
- **Kubernetes Clusters** dashboard — multi-cluster with `cluster` and `namespace` template variables
- **Ringtail (k3s)** dashboard — dedicated ringtail view with GPU usage panels
## Deployment and Testing
1. Sync `apps` on indri ArgoCD to pick up new app definitions (`kube-state-metrics-ringtail`, `alloy-ringtail`)
2. Sync `prometheus` → verify `cluster` label on scraped metrics
3. Sync `alloy-k8s` → verify `cluster=indri` on remote-written metrics and logs
4. Run `mise run provision-indri -- --tags alloy` → verify `cluster=indri` on host Alloy metrics/logs
5. Sync `kube-state-metrics-ringtail` → verify pods running on ringtail
6. Sync `alloy-ringtail` → verify pods running, check Prometheus for `kube_pod_info{cluster="ringtail"}`
7. Sync `grafana-config` → verify dashboards appear, cluster variable populates both values
8. Check Loki for `{cluster="ringtail"}` logs from ringtail pods
## Notes
- Alloy on ringtail uses `insecure_skip_verify=true` for TLS to Prometheus/Loki (Tailscale-managed certs not in container trust store) — tighten later
- DNS resolution for `*.tail8d86e.ts.net` from ringtail pods depends on CoreDNS inheriting host's MagicDNS resolver; may need CoreDNS forwarding rules if pods can't resolve
- The old services dashboard (blackbox probes) is removed — those probes are still running in alloy-k8s and the data is still in Prometheus, just not in a dedicated dashboard
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/270
86 lines
2.1 KiB
YAML
86 lines
2.1 KiB
YAML
apiVersion: apps/v1
|
|
kind: DaemonSet
|
|
metadata:
|
|
name: alloy
|
|
namespace: alloy
|
|
labels:
|
|
app: alloy
|
|
spec:
|
|
selector:
|
|
matchLabels:
|
|
app: alloy
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: alloy
|
|
spec:
|
|
serviceAccountName: alloy
|
|
securityContext:
|
|
fsGroup: 473 # alloy user group
|
|
containers:
|
|
- name: alloy
|
|
image: grafana/alloy
|
|
args:
|
|
- run
|
|
- --server.http.listen-addr=0.0.0.0:12345
|
|
- --storage.path=/var/lib/alloy/data
|
|
- /etc/alloy/config.alloy
|
|
ports:
|
|
- containerPort: 12345
|
|
name: http
|
|
env:
|
|
- name: HOSTNAME
|
|
valueFrom:
|
|
fieldRef:
|
|
fieldPath: spec.nodeName
|
|
resources:
|
|
requests:
|
|
cpu: 50m
|
|
memory: 128Mi
|
|
limits:
|
|
cpu: 500m
|
|
memory: 512Mi
|
|
volumeMounts:
|
|
- name: config
|
|
mountPath: /etc/alloy
|
|
- name: varlog
|
|
mountPath: /var/log
|
|
readOnly: true
|
|
- name: data
|
|
mountPath: /var/lib/alloy/data
|
|
- name: proc
|
|
mountPath: /host/proc
|
|
readOnly: true
|
|
- name: sys
|
|
mountPath: /host/sys
|
|
readOnly: true
|
|
- name: root
|
|
mountPath: /host/root
|
|
mountPropagation: HostToContainer
|
|
readOnly: true
|
|
securityContext:
|
|
allowPrivilegeEscalation: false
|
|
readOnlyRootFilesystem: true
|
|
capabilities:
|
|
drop:
|
|
- ALL
|
|
tolerations:
|
|
- operator: Exists
|
|
volumes:
|
|
- name: config
|
|
configMap:
|
|
name: alloy-config
|
|
- name: varlog
|
|
hostPath:
|
|
path: /var/log
|
|
- name: data
|
|
emptyDir: {}
|
|
- name: proc
|
|
hostPath:
|
|
path: /proc
|
|
- name: sys
|
|
hostPath:
|
|
path: /sys
|
|
- name: root
|
|
hostPath:
|
|
path: /
|