blumeops/docs/how-to/troubleshooting.md
Erich Blume b0bac91ca9 Fix frontmatter field name for Quartz date display (#158)
## Summary

- Rename `date-modified` -> `modified` in all 80 docs and the `docs-check-frontmatter` task

Quartz's `CreatedModifiedDate` plugin recognizes `modified`, `lastmod`, `updated`, and `last-modified` — but not `date-modified`. The wrong field name caused Quartz to ignore frontmatter dates entirely and fall through to filesystem timestamps (UTC inside Dagger), showing Feb 12 on pages built late on Feb 11 PST.

## Test plan

- [x] `mise run docs-check-frontmatter` passes
- [ ] Kick off docs release after merge — verify rendered dates match frontmatter values

Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/158
2026-02-11 16:45:12 -08:00

4.9 KiB

title modified tags
Troubleshooting 2026-02-07
how-to
operations

Troubleshooting Common Issues

Quick reference for diagnosing and fixing common BlumeOps issues.

General Health Check

Run the comprehensive service health check:

mise run services-check

This checks all services on indri and in Kubernetes.

Kubernetes Issues

Pod not starting

# Check pod status
kubectl --context=minikube-indri -n <namespace> get pods

# Describe pod for events
kubectl --context=minikube-indri -n <namespace> describe pod <pod>

# Check logs
kubectl --context=minikube-indri -n <namespace> logs <pod>

# Previous container logs (if restarting)
kubectl --context=minikube-indri -n <namespace> logs <pod> --previous

Common causes:

  • ImagePullBackOff - Image doesn't exist or registry unreachable
  • CrashLoopBackOff - Application crashing; check logs
  • Pending - Insufficient resources or node issues
  • ContainerCreating - Waiting for volumes or secrets

ArgoCD sync issues

# Check app status
argocd app get <app>

# See what will change
argocd app diff <app>

# Force sync
argocd app sync <app> --force

# Sync with prune (removes deleted resources)
argocd app sync <app> --prune

App stuck in "Syncing": Check if there are failed hooks or jobs:

kubectl --context=minikube-indri -n <namespace> get jobs
kubectl --context=minikube-indri -n <namespace> get pods --field-selector=status.phase=Failed

ArgoCD login expired:

argocd login argocd.ops.eblu.me --username admin --password "$(op --vault vg6xf6vvfmoh5hqjjhlhbeoaie item get srogeebssulhtb6tnqd7ls6qey --fields password --reveal)"

kubectl connection refused

# Check if minikube is running (on indri)
ssh indri 'minikube status'

# Restart if needed
ssh indri 'minikube start'

# Verify tailscale is serving the API
ssh indri 'tailscale serve status --json'

Indri Service Issues

Service not responding

# Check LaunchAgent status
ssh indri 'launchctl list | grep mcquack'

# Restart a LaunchAgent
ssh indri 'launchctl unload ~/Library/LaunchAgents/mcquack.<service>.plist'
ssh indri 'launchctl load ~/Library/LaunchAgents/mcquack.<service>.plist'

# Check service logs
ssh indri 'tail -50 ~/Library/Logs/mcquack.<service>.err.log'
ssh indri 'tail -50 ~/Library/Logs/mcquack.<service>.out.log'

Forgejo not accessible

# Check if forgejo is running
ssh indri 'lsof -nP -iTCP:3001 -sTCP:LISTEN'

# Check logs
ssh indri 'tail -50 ~/Library/Logs/mcquack.forgejo.err.log'

# Restart forgejo
ssh indri 'launchctl kickstart -k gui/$(id -u)/mcquack.forgejo'

Registry (Zot) issues

# Test registry API
ssh indri 'curl -s http://localhost:5050/v2/_catalog | jq'

# Check if zot is running
ssh indri 'lsof -nP -iTCP:5050 -sTCP:LISTEN'

# Restart zot
ssh indri 'launchctl kickstart -k gui/$(id -u)/mcquack.zot'

Network Issues

Service unreachable via *.ops.eblu.me

Caddy handles routing for *.ops.eblu.me:

# Check if Caddy is running
ssh indri 'launchctl list | grep caddy'

# View Caddy logs
ssh indri 'tail -50 ~/Library/Logs/caddy/access.log'
ssh indri 'tail -50 ~/Library/Logs/caddy/error.log'

# Restart Caddy
ssh indri 'launchctl kickstart -k gui/$(id -u)/homebrew.mxcl.caddy'

Tailscale MagicDNS not resolving

# Check tailscale serve status
ssh indri 'tailscale serve status --json'

# Restart tailscale if needed
ssh indri 'tailscale down && tailscale up'

Observability

Check metrics

# Open Grafana
open https://grafana.ops.eblu.me

# Check Prometheus directly
open https://prometheus.ops.eblu.me

Check logs

# Open Grafana Explore
open https://grafana.ops.eblu.me/explore

# Query Loki directly
curl -G 'https://loki.ops.eblu.me/loki/api/v1/query_range' \
  --data-urlencode 'query={service="<service>"}' \
  --data-urlencode 'limit=100'

Alloy (metrics/logs collector) issues

# Indri alloy (host metrics)
ssh indri 'launchctl list | grep alloy'
ssh indri 'tail -50 ~/Library/Logs/alloy/alloy.log'

# K8s alloy (pod logs)
kubectl --context=minikube-indri -n monitoring logs -l app=alloy

Database Issues

PostgreSQL connection failed

# Check CNPG cluster status
kubectl --context=minikube-indri -n databases get cluster

# Check PostgreSQL pods
kubectl --context=minikube-indri -n databases get pods -l cnpg.io/cluster=blumeops-pg

# Connect to database
kubectl --context=minikube-indri -n databases exec -it blumeops-pg-1 -- psql -U postgres

Backup Issues

Check backup status

# View latest backup info
ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom'

# Run backup manually
ssh indri 'borgmatic --verbosity 1'

# Check backup logs
ssh indri 'tail -100 /opt/homebrew/var/log/borgmatic/borgmatic.log'