blumeops/docs/how-to/runbooks/runbook-argocd-out-of-sync.md
Erich Blume 6d65e6928c C2: Deploy infrastructure alerting pipeline (#303)
## Summary

Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications.

**Design:**
- Grafana Unified Alerting evaluates rules against Prometheus/Loki
- ntfy webhook contact point delivers iOS notifications
- Anti-noise policy: page once per 24h per alert group
- Every alert links to a runbook in `docs/how-to/alerts/`
- services-check eventually queries the alerting API instead of doing its own probes

**Chain (bottom-up):**
1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy
2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure
3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks
4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API
5. `deploy-infra-alerting` — goal card

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #303
2026-03-22 14:52:56 -07:00

1.8 KiB

title modified tags
Runbook: ArgoCD App Out of Sync 2026-03-22
how-to
alerting
runbook

Runbook: ArgoCD App Out of Sync

Alert name: ArgoCDAppOutOfSync

An ArgoCD application has been out of sync for 30+ minutes. This means the live state in Kubernetes differs from what's declared in Git.

Diagnostic Steps

  1. Check which app is out of sync — the name label in the alert tells you:

    argocd app get <app-name>
    
  2. View the diff:

    argocd app diff <app-name>
    
  3. Check if it's a branch revision issue — during C1/C2 work, apps may be pointed at a feature branch. After merge, they need to be reset to main:

    argocd app get <app-name> -o json | python3 -c "import json,sys; print(json.load(sys.stdin)['spec']['source']['targetRevision'])"
    
  4. Check ArgoCD UIhttps://argocd.ops.eblu.me — look for sync errors or degraded status.

Common Causes

  • Forgot to sync after push — ArgoCD uses manual sync; changes require explicit argocd app sync
  • Branch revision not reset after PR merge — app still points at a deleted branch
  • Kustomize/manifest error — invalid YAML or unsatisfiable resource requirements
  • Pruning needed — old ConfigMaps from configMapGenerator need pruning

Resolution

# Simple sync
argocd app sync <app-name>

# If pruning is needed
argocd app sync <app-name> --prune

# If stuck on a deleted branch
argocd app set <app-name> --revision main
argocd app sync <app-name>

Silencing

During active C1/C2 development, apps may intentionally be out of sync:

  1. Grafana → Alerting → Silences → Create Silence
  2. Match alertname = ArgoCDAppOutOfSync and name = <app-name>