blumeops/docs/how-to/runbooks/runbook-pod-not-ready.md
Erich Blume 67883950c3 C2(deploy-infra-alerting): finalize rewrite cards as historical docs
Remove all Mikado frontmatter (status, branch, requires) from chain
cards. Rename docs/how-to/alerts/ to docs/how-to/runbooks/ and update
all runbook_url references. Add changelog fragment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 14:40:21 -07:00

1.5 KiB

title modified tags
Runbook: Pod Not Ready 2026-03-22
how-to
alerting
runbook

Runbook: Pod Not Ready

Alert name: PodNotReady

A Kubernetes pod has been in a not-ready state for 5+ minutes.

Diagnostic Steps

  1. Identify the pod from the alert labels (pod, namespace):

    kubectl describe pod <pod> -n <namespace> --context=minikube-indri
    
  2. Check events — look for scheduling failures, image pull errors, or probe failures:

    kubectl get events -n <namespace> --context=minikube-indri --sort-by='.lastTimestamp' | tail -20
    
  3. Check logs:

    kubectl logs <pod> -n <namespace> --context=minikube-indri --tail=50
    
  4. Check node resources:

    kubectl top nodes --context=minikube-indri
    kubectl top pods -n <namespace> --context=minikube-indri
    

Common Causes

  • CrashLoopBackOff — app is crashing on startup, check logs
  • ImagePullBackOff — container image not found or registry unreachable
  • Pending — insufficient resources (CPU/memory), or PVC not bound
  • Readiness probe failing — service is running but not healthy
  • NFS mount issue — services depending on sifaka (kiwix, transmission, navidrome, jellyfin) will fail if NFS is down

Silencing

  1. Grafana → Alerting → Silences → Create Silence
  2. Match alertname = PodNotReady
  3. Optionally match namespace = <namespace> to silence a specific service