Operations and observability for sifaka NAS #135

Merged
eblume merged 6 commits from feature/sifaka-ops-observability into main 2026-02-09 17:44:06 -08:00
Owner

Summary

  • Add smartctl_exporter Docker container to sifaka for SMART disk health monitoring
  • Formalize existing node_exporter container under Ansible management
  • Route both exporters through Caddy L4 TCP proxy (nas.ops.eblu.me:9100, nas.ops.eblu.me:9633), replacing the hardcoded LAN IP in Prometheus
  • Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime)
  • Introduce ansible/playbooks/sifaka.yml and mise run provision-sifaka — first Ansible playbook for the NAS
  • Shared exporter port variables in group_vars/all.yml to avoid duplication between Caddy and sifaka roles

Prerequisites before deploy

  • Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP)
  • Verify ssh eblume@sifaka 'docker ps' works
  • Run mise run provision-sifaka to deploy containers
  • Run mise run provision-indri -- --tags caddy to add L4 routes
  • argocd app sync prometheus + argocd app sync grafana-config

Test plan

  • Verify smartctl_exporter metrics: curl http://nas.ops.eblu.me:9633/metrics
  • Verify Prometheus targets page shows both sifaka jobs as UP
  • Verify Grafana "Sifaka Disk Health" dashboard loads with data

🤖 Generated with Claude Code

## Summary - Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring - Formalize existing `node_exporter` container under Ansible management - Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus - Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime) - Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS - Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles ## Prerequisites before deploy - [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP) - [ ] Verify `ssh eblume@sifaka 'docker ps'` works - [ ] Run `mise run provision-sifaka` to deploy containers - [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes - [ ] `argocd app sync prometheus` + `argocd app sync grafana-config` ## Test plan - [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics` - [ ] Verify Prometheus targets page shows both sifaka jobs as UP - [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Adds smartctl_exporter alongside the existing node_exporter on sifaka,
routed through Caddy L4 TCP proxy at nas.ops.eblu.me, with a Grafana
dashboard for disk health visibility. Introduces the first Ansible
playbook for sifaka (mise run provision-sifaka) and shared exporter
port variables in group_vars/all.yml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use full docker path (/volume1/@appstore/ContainerManager/usr/bin/docker)
- Match existing container name (prom-node-exporter-1)
- Remove unnecessary node_exporter flags (--pid=host, volume mounts)
- Add become: true for all docker tasks (requires sudo on Synology)
- Run smartctl_exporter as --user=root (image drops to nobody internally)
- Explicitly specify /dev/sata* devices (Synology uses non-standard paths)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds one-time setup steps (SSH, sudoers, Docker path, device naming)
to the sifaka reference card for reproducibility if the NAS is replaced.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ansible searches for group_vars/ relative to the inventory directory,
not the project root. Also adds first-time setup docs and hardware
details to the sifaka reference card.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- smartctl_device_smart_healthy → smartctl_device_smart_status
- Make all stat panels full-width with auto orientation so 4 device
  values display side-by-side instead of stacked vertically

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch to auto orientation and increase height so the 4 device
status blocks display as horizontal squares instead of vertical strips.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
eblume merged commit 85e36cd807 into main 2026-02-09 17:44:06 -08:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
eblume/blumeops!135
No description provided.