Add HA for CV and Docs: zero-downtime deploys (#273)

## Summary
- Set `replicas: 2` with `maxUnavailable: 0` / `maxSurge: 1` on CV and Docs deployments so rolling updates never drop below 2 ready pods
- Add PodDisruptionBudgets (`minAvailable: 1`) to protect against node drains and cluster maintenance
- Add Fly.io cache purge step to `cv-deploy.yaml` workflow (docs already had this) so CV deploys don't serve stale cached content

## Deployment and Testing
- [ ] `argocd app diff cv` / `argocd app diff docs` from branch
- [ ] Deploy from branch: `argocd app set cv --revision feature/ha-cv-docs-zero-downtime && argocd app sync cv`
- [ ] Verify 2 pods running: `kubectl get pods -n cv --context=minikube-indri`
- [ ] Test rolling restart: `kubectl rollout restart deployment/cv -n cv --context=minikube-indri`
- [ ] During rollout, confirm continuous availability via `curl -I https://cv.eblu.me`
- [ ] After merge: reset ArgoCD to main, re-sync both apps

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/273
This commit is contained in:
Erich Blume 2026-02-26 07:53:21 -08:00
commit be3cdad1cb
8 changed files with 43 additions and 2 deletions

View file

@ -5,7 +5,12 @@ metadata:
name: docs
namespace: docs
spec:
replicas: 1
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app: docs

View file

@ -6,6 +6,7 @@ resources:
- deployment.yaml
- service.yaml
- ingress-tailscale.yaml
- pdb.yaml
images:
- name: registry.ops.eblu.me/blumeops/quartz
newTag: v1.28.2-ffa8727

View file

@ -0,0 +1,10 @@
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: docs
spec:
minAvailable: 1
selector:
matchLabels:
app: docs