## Summary - Add "Unhealthy Pods" stat panel showing count of pods in error states (ImagePullBackOff, CrashLoopBackOff, etc.) with red background when > 0 - Add "Pods by Waiting Reason" time series chart showing container waiting states over time - Provides visibility into stuck pods that ArgoCD doesn't track (since it manages CronJobs, not the Jobs/Pods they spawn) ## Context This addresses the issue where a `zim-watcher` cronjob pod was stuck in `ImagePullBackOff` for 11 days without any alerting. ArgoCD showed the CronJob as "Synced, Healthy" because it only manages the CronJob resource, not its spawned Jobs/Pods. ## Deployment and Testing - [ ] Sync grafana-config app to test branch - [ ] Verify dashboard renders correctly - [ ] Confirm "Unhealthy Pods" shows 0 (green) when no issues - [ ] Reset to main after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/83 |
||
|---|---|---|
| .. | ||
| apps | ||
| manifests | ||