## Summary - Remove stale `/opt/homebrew/var/loki` from borgmatic backup (Loki migrated to k8s) - Add Alloy k8s DaemonSet for automatic pod log collection with auto-discovery - Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd - Add transmission-exporter sidecar for full metrics (speed, torrent counts, ratios) - Replace stale devpi dashboard with probe-based metrics (status, response time, uptime) - Add unified "K8s Services Health" dashboard for service uptime/response monitoring ## Manual cleanup already performed - Deleted stale textfile metrics on indri: `devpi.prom`, `transmission.prom` - Deleted stale data directories on indri: `/opt/homebrew/var/loki/`, `/opt/homebrew/var/prometheus/` ## Deployment and Testing - [x] Sync `apps` application to pick up new alloy-k8s app - [x] Deploy alloy-k8s on feature branch: `argocd app set alloy-k8s --revision feature/observability-cleanup && argocd app sync alloy-k8s` - [x] Deploy torrent on feature branch (for transmission exporter): `argocd app set torrent --revision feature/observability-cleanup && argocd app sync torrent` - [x] Deploy prometheus on feature branch (for new scrape config): `argocd app set prometheus --revision feature/observability-cleanup && argocd app sync prometheus` - [x] Deploy grafana-config on feature branch (for dashboards): `argocd app set grafana-config --revision feature/observability-cleanup && argocd app sync grafana-config` - [x] Verify pod logs appear in Loki/Grafana - [x] Verify transmission metrics appear in Prometheus - [x] Verify service probe metrics appear in Prometheus - [x] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic config - [ ] After merge, reset apps to main and resync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/43
79 lines
1.8 KiB
YAML
79 lines
1.8 KiB
YAML
apiVersion: v1
|
|
kind: ServiceAccount
|
|
metadata:
|
|
name: kube-state-metrics
|
|
namespace: monitoring
|
|
---
|
|
apiVersion: rbac.authorization.k8s.io/v1
|
|
kind: ClusterRole
|
|
metadata:
|
|
name: kube-state-metrics
|
|
rules:
|
|
- apiGroups: [""]
|
|
resources:
|
|
- configmaps
|
|
- secrets
|
|
- nodes
|
|
- pods
|
|
- services
|
|
- serviceaccounts
|
|
- resourcequotas
|
|
- replicationcontrollers
|
|
- limitranges
|
|
- persistentvolumeclaims
|
|
- persistentvolumes
|
|
- namespaces
|
|
- endpoints
|
|
verbs: ["list", "watch"]
|
|
- apiGroups: ["apps"]
|
|
resources:
|
|
- statefulsets
|
|
- daemonsets
|
|
- deployments
|
|
- replicasets
|
|
verbs: ["list", "watch"]
|
|
- apiGroups: ["batch"]
|
|
resources:
|
|
- cronjobs
|
|
- jobs
|
|
verbs: ["list", "watch"]
|
|
- apiGroups: ["autoscaling"]
|
|
resources:
|
|
- horizontalpodautoscalers
|
|
verbs: ["list", "watch"]
|
|
- apiGroups: ["networking.k8s.io"]
|
|
resources:
|
|
- networkpolicies
|
|
- ingresses
|
|
verbs: ["list", "watch"]
|
|
- apiGroups: ["coordination.k8s.io"]
|
|
resources:
|
|
- leases
|
|
verbs: ["list", "watch"]
|
|
- apiGroups: ["certificates.k8s.io"]
|
|
resources:
|
|
- certificatesigningrequests
|
|
verbs: ["list", "watch"]
|
|
- apiGroups: ["storage.k8s.io"]
|
|
resources:
|
|
- storageclasses
|
|
- volumeattachments
|
|
verbs: ["list", "watch"]
|
|
- apiGroups: ["admissionregistration.k8s.io"]
|
|
resources:
|
|
- mutatingwebhookconfigurations
|
|
- validatingwebhookconfigurations
|
|
verbs: ["list", "watch"]
|
|
---
|
|
apiVersion: rbac.authorization.k8s.io/v1
|
|
kind: ClusterRoleBinding
|
|
metadata:
|
|
name: kube-state-metrics
|
|
roleRef:
|
|
apiGroup: rbac.authorization.k8s.io
|
|
kind: ClusterRole
|
|
name: kube-state-metrics
|
|
subjects:
|
|
- kind: ServiceAccount
|
|
name: kube-state-metrics
|
|
namespace: monitoring
|