blumeops/docs/how-to/runbooks/runbook-pod-not-ready.md

55 lines
1.5 KiB
Markdown
Raw Normal View History

---
title: "Runbook: Pod Not Ready"
modified: 2026-03-22
tags:
- how-to
- alerting
- runbook
---
# Runbook: Pod Not Ready
**Alert name:** `PodNotReady`
A Kubernetes pod has been in a not-ready state for 5+ minutes.
## Diagnostic Steps
1. **Identify the pod** from the alert labels (`pod`, `namespace`):
```fish
kubectl describe pod <pod> -n <namespace> --context=minikube-indri
```
2. **Check events** — look for scheduling failures, image pull errors, or probe failures:
```fish
kubectl get events -n <namespace> --context=minikube-indri --sort-by='.lastTimestamp' | tail -20
```
3. **Check logs**:
```fish
kubectl logs <pod> -n <namespace> --context=minikube-indri --tail=50
```
4. **Check node resources**:
```fish
kubectl top nodes --context=minikube-indri
kubectl top pods -n <namespace> --context=minikube-indri
```
## Common Causes
- **CrashLoopBackOff** — app is crashing on startup, check logs
- **ImagePullBackOff** — container image not found or registry unreachable
- **Pending** — insufficient resources (CPU/memory), or PVC not bound
- **Readiness probe failing** — service is running but not healthy
- **NFS mount issue** — services depending on sifaka (kiwix, transmission, navidrome, jellyfin) will fail if NFS is down
## Silencing
1. Grafana → Alerting → Silences → Create Silence
2. Match `alertname = PodNotReady`
3. Optionally match `namespace = <namespace>` to silence a specific service
## Related
- [[deploy-infra-alerting]] — Alerting pipeline overview