blumeops/docs/how-to/runbooks/runbook-pod-not-ready.md

---
title: "Runbook: Pod Not Ready"
modified: 2026-03-22
tags:
  - how-to
  - alerting
  - runbook
---

# Runbook: Pod Not Ready

**Alert name:** `PodNotReady`

A Kubernetes pod has been in a not-ready state for 5+ minutes.

## Diagnostic Steps

1. **Identify the pod** from the alert labels (`pod`, `namespace`):
   ```fish
   kubectl describe pod <pod> -n <namespace> --context=minikube-indri
   ```

2. **Check events** — look for scheduling failures, image pull errors, or probe failures:
   ```fish
   kubectl get events -n <namespace> --context=minikube-indri --sort-by='.lastTimestamp' | tail -20
   ```

3. **Check logs**:
   ```fish
   kubectl logs <pod> -n <namespace> --context=minikube-indri --tail=50
   ```

4. **Check node resources**:
   ```fish
   kubectl top nodes --context=minikube-indri
   kubectl top pods -n <namespace> --context=minikube-indri
   ```

## Common Causes

- **CrashLoopBackOff** — app is crashing on startup, check logs
- **ImagePullBackOff** — container image not found or registry unreachable
- **Pending** — insufficient resources (CPU/memory), or PVC not bound
- **Readiness probe failing** — service is running but not healthy
- **NFS mount issue** — services depending on sifaka (kiwix, transmission, navidrome, jellyfin) will fail if NFS is down

## Silencing

1. Grafana → Alerting → Silences → Create Silence
2. Match `alertname = PodNotReady`
3. Optionally match `namespace = <namespace>` to silence a specific service

## Related

- [[deploy-infra-alerting]] — Alerting pipeline overview
C2: Deploy infrastructure alerting pipeline (#303) ## Summary Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications. Design: - Grafana Unified Alerting evaluates rules against Prometheus/Loki - ntfy webhook contact point delivers iOS notifications - Anti-noise policy: page once per 24h per alert group - Every alert links to a runbook in `docs/how-to/alerts/` - services-check eventually queries the alerting API instead of doing its own probes Chain (bottom-up): 1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy 2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure 3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks 4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API 5. `deploy-infra-alerting` — goal card 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.eblu.me/eblume/blumeops/pulls/303 2026-03-22 14:52:56 -07:00			`---`
			`title: "Runbook: Pod Not Ready"`
			`modified: 2026-03-22`
			`tags:`
			`- how-to`
			`- alerting`
			`- runbook`
			`---`

			`# Runbook: Pod Not Ready`

			Alert name: `PodNotReady`

			`A Kubernetes pod has been in a not-ready state for 5+ minutes.`

			`## Diagnostic Steps`

			1. Identify the pod from the alert labels (`pod`, `namespace`):
			```fish
			`kubectl describe pod <pod> -n <namespace> --context=minikube-indri`
			```

			`2. Check events — look for scheduling failures, image pull errors, or probe failures:`
			```fish
			`kubectl get events -n <namespace> --context=minikube-indri --sort-by='.lastTimestamp' \| tail -20`
			```

			`3. Check logs:`
			```fish
			`kubectl logs <pod> -n <namespace> --context=minikube-indri --tail=50`
			```

			`4. Check node resources:`
			```fish
			`kubectl top nodes --context=minikube-indri`
			`kubectl top pods -n <namespace> --context=minikube-indri`
			```

			`## Common Causes`

			`- CrashLoopBackOff — app is crashing on startup, check logs`
			`- ImagePullBackOff — container image not found or registry unreachable`
			`- Pending — insufficient resources (CPU/memory), or PVC not bound`
			`- Readiness probe failing — service is running but not healthy`
			`- NFS mount issue — services depending on sifaka (kiwix, transmission, navidrome, jellyfin) will fail if NFS is down`

			`## Silencing`

			`1. Grafana → Alerting → Silences → Create Silence`
			2. Match `alertname = PodNotReady`
			3. Optionally match `namespace = <namespace>` to silence a specific service

			`## Related`

			`- [[deploy-infra-alerting]] — Alerting pipeline overview`