Observability cleanup and k8s service monitoring (#43) (#43)
## Summary
- Remove stale `/opt/homebrew/var/loki` from borgmatic backup (Loki migrated to k8s)
- Add Alloy k8s DaemonSet for automatic pod log collection with auto-discovery
- Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd
- Add transmission-exporter sidecar for full metrics (speed, torrent counts, ratios)
- Replace stale devpi dashboard with probe-based metrics (status, response time, uptime)
- Add unified "K8s Services Health" dashboard for service uptime/response monitoring
## Manual cleanup already performed
- Deleted stale textfile metrics on indri: `devpi.prom`, `transmission.prom`
- Deleted stale data directories on indri: `/opt/homebrew/var/loki/`, `/opt/homebrew/var/prometheus/`
## Deployment and Testing
- [x] Sync `apps` application to pick up new alloy-k8s app
- [x] Deploy alloy-k8s on feature branch: `argocd app set alloy-k8s --revision feature/observability-cleanup && argocd app sync alloy-k8s`
- [x] Deploy torrent on feature branch (for transmission exporter): `argocd app set torrent --revision feature/observability-cleanup && argocd app sync torrent`
- [x] Deploy prometheus on feature branch (for new scrape config): `argocd app set prometheus --revision feature/observability-cleanup && argocd app sync prometheus`
- [x] Deploy grafana-config on feature branch (for dashboards): `argocd app set grafana-config --revision feature/observability-cleanup && argocd app sync grafana-config`
- [x] Verify pod logs appear in Loki/Grafana
- [x] Verify transmission metrics appear in Prometheus
- [x] Verify service probe metrics appear in Prometheus
- [x] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic config
- [ ] After merge, reset apps to main and resync
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/43
2026-01-22 13:51:01 -08:00
|
|
|
apiVersion: apps/v1
|
|
|
|
|
kind: Deployment
|
|
|
|
|
metadata:
|
|
|
|
|
name: kube-state-metrics
|
|
|
|
|
namespace: monitoring
|
|
|
|
|
labels:
|
|
|
|
|
app: kube-state-metrics
|
|
|
|
|
spec:
|
|
|
|
|
replicas: 1
|
|
|
|
|
selector:
|
|
|
|
|
matchLabels:
|
|
|
|
|
app: kube-state-metrics
|
|
|
|
|
template:
|
|
|
|
|
metadata:
|
|
|
|
|
labels:
|
|
|
|
|
app: kube-state-metrics
|
|
|
|
|
spec:
|
|
|
|
|
serviceAccountName: kube-state-metrics
|
|
|
|
|
containers:
|
|
|
|
|
- name: kube-state-metrics
|
2026-03-06 08:15:06 -08:00
|
|
|
image: registry.k8s.io/kube-state-metrics/kube-state-metrics:kustomized
|
Observability cleanup and k8s service monitoring (#43) (#43)
## Summary
- Remove stale `/opt/homebrew/var/loki` from borgmatic backup (Loki migrated to k8s)
- Add Alloy k8s DaemonSet for automatic pod log collection with auto-discovery
- Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd
- Add transmission-exporter sidecar for full metrics (speed, torrent counts, ratios)
- Replace stale devpi dashboard with probe-based metrics (status, response time, uptime)
- Add unified "K8s Services Health" dashboard for service uptime/response monitoring
## Manual cleanup already performed
- Deleted stale textfile metrics on indri: `devpi.prom`, `transmission.prom`
- Deleted stale data directories on indri: `/opt/homebrew/var/loki/`, `/opt/homebrew/var/prometheus/`
## Deployment and Testing
- [x] Sync `apps` application to pick up new alloy-k8s app
- [x] Deploy alloy-k8s on feature branch: `argocd app set alloy-k8s --revision feature/observability-cleanup && argocd app sync alloy-k8s`
- [x] Deploy torrent on feature branch (for transmission exporter): `argocd app set torrent --revision feature/observability-cleanup && argocd app sync torrent`
- [x] Deploy prometheus on feature branch (for new scrape config): `argocd app set prometheus --revision feature/observability-cleanup && argocd app sync prometheus`
- [x] Deploy grafana-config on feature branch (for dashboards): `argocd app set grafana-config --revision feature/observability-cleanup && argocd app sync grafana-config`
- [x] Verify pod logs appear in Loki/Grafana
- [x] Verify transmission metrics appear in Prometheus
- [x] Verify service probe metrics appear in Prometheus
- [x] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic config
- [ ] After merge, reset apps to main and resync
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/43
2026-01-22 13:51:01 -08:00
|
|
|
ports:
|
|
|
|
|
- containerPort: 8080
|
|
|
|
|
name: http-metrics
|
|
|
|
|
- containerPort: 8081
|
|
|
|
|
name: telemetry
|
|
|
|
|
livenessProbe:
|
|
|
|
|
httpGet:
|
|
|
|
|
path: /healthz
|
|
|
|
|
port: 8080
|
|
|
|
|
initialDelaySeconds: 5
|
|
|
|
|
timeoutSeconds: 5
|
|
|
|
|
readinessProbe:
|
|
|
|
|
httpGet:
|
|
|
|
|
path: /
|
|
|
|
|
port: 8080
|
|
|
|
|
initialDelaySeconds: 5
|
|
|
|
|
timeoutSeconds: 5
|
|
|
|
|
resources:
|
|
|
|
|
requests:
|
|
|
|
|
cpu: 10m
|
|
|
|
|
memory: 64Mi
|
|
|
|
|
limits:
|
|
|
|
|
cpu: 100m
|
|
|
|
|
memory: 256Mi
|
|
|
|
|
securityContext:
|
|
|
|
|
allowPrivilegeEscalation: false
|
|
|
|
|
readOnlyRootFilesystem: true
|
|
|
|
|
runAsNonRoot: true
|
|
|
|
|
runAsUser: 65534
|
|
|
|
|
capabilities:
|
|
|
|
|
drop:
|
|
|
|
|
- ALL
|
Add Prowler mutelist and fix kube-state-metrics seccomp (#319)
## Summary
- Add mutelist files to suppress expected/accepted Prowler CIS findings from components we don't control
- Mutelist files stored in `mutelist/` directory, grouped by category, merged at runtime via initContainer
- Fix missing seccomp `RuntimeDefault` profile on kube-state-metrics deployment
### Mutelist categories
| File | Checks | Covers |
|------|--------|--------|
| `apiserver.yaml` | 12 | Minikube apiserver flags |
| `control-plane.yaml` | 3 | Scheduler, controller-manager, kubelet |
| `core-pod-security.yaml` | 7 | System pods, Tailscale operator, Grafana init, Prowler hostPID, forgejo-runner |
| `rbac.yaml` | 3 | Built-in K8s roles, ArgoCD, CNPG |
Muted findings appear as `status=MUTED` in reports (not hidden), preserving audit trail.
### Not muted (follow-up)
- Alloy, Immich pods missing seccomp — need separate investigation (Helm/operator-managed)
## Test plan
- [ ] `kubectl kustomize argocd/manifests/prowler/` renders cleanly
- [ ] Trigger manual scan: `kubectl --context=minikube-indri -n prowler create job prowler-mutelist-test --from=cronjob/prowler`
- [ ] Verify initContainer merges successfully (check pod logs)
- [ ] Verify muted findings show as `MUTED` in report
- [ ] Sync kube-state-metrics and verify pod starts with seccomp profile
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/319
2026-03-30 17:22:31 -07:00
|
|
|
seccompProfile:
|
|
|
|
|
type: RuntimeDefault
|