Operations and observability for sifaka NAS (#135)
## Summary - Add `smartctl_exporter` Docker container to sifaka for SMART disk health monitoring - Formalize existing `node_exporter` container under Ansible management - Route both exporters through Caddy L4 TCP proxy (`nas.ops.eblu.me:9100`, `nas.ops.eblu.me:9633`), replacing the hardcoded LAN IP in Prometheus - Create "Sifaka Disk Health" Grafana dashboard (health status, temperature, wear indicators, lifetime) - Introduce `ansible/playbooks/sifaka.yml` and `mise run provision-sifaka` — first Ansible playbook for the NAS - Shared exporter port variables in `group_vars/all.yml` to avoid duplication between Caddy and sifaka roles ## Prerequisites before deploy - [ ] Enable SSH on sifaka (DSM Control Panel > Terminal & SNMP) - [ ] Verify `ssh eblume@sifaka 'docker ps'` works - [ ] Run `mise run provision-sifaka` to deploy containers - [ ] Run `mise run provision-indri -- --tags caddy` to add L4 routes - [ ] `argocd app sync prometheus` + `argocd app sync grafana-config` ## Test plan - [ ] Verify smartctl_exporter metrics: `curl http://nas.ops.eblu.me:9633/metrics` - [ ] Verify Prometheus targets page shows both sifaka jobs as UP - [ ] Verify Grafana "Sifaka Disk Health" dashboard loads with data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/135
This commit is contained in:
parent
4ee643a81d
commit
85e36cd807
15 changed files with 538 additions and 9 deletions
|
|
@ -0,0 +1 @@
|
|||
Add SMART disk health monitoring for sifaka NAS with smartctl_exporter, Grafana dashboard, Ansible playbook, and Caddy L4 routing via ops.eblu.me.
|
||||
|
|
@ -62,6 +62,8 @@ DNS CNAMEs point to `blumeops-proxy.fly.dev`. TLS via Fly.io-managed Let's Encry
|
|||
| 443 | Caddy | HTTPS | 0.0.0.0 | Reverse proxy |
|
||||
| 2222 | Caddy L4 | TCP | 0.0.0.0 | SSH proxy to Forgejo |
|
||||
| 5432 | Caddy L4 | TCP | 0.0.0.0 | PostgreSQL proxy |
|
||||
| 9100 | Caddy L4 | TCP | 0.0.0.0 | Sifaka node_exporter proxy |
|
||||
| 9633 | Caddy L4 | TCP | 0.0.0.0 | Sifaka smartctl_exporter proxy |
|
||||
| 2200 | Forgejo SSH | TCP | localhost | Built-in SSH server |
|
||||
| 3001 | Forgejo | HTTP | localhost | Web UI |
|
||||
| 5050 | Zot | HTTP | localhost | Registry API |
|
||||
|
|
|
|||
|
|
@ -13,8 +13,8 @@ Synology NAS providing network storage and backup target.
|
|||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Dashboard** | https://nas.ops.eblu.me |
|
||||
| **Model** | Synology |
|
||||
| **Storage** | 10.9TB RAID 5 |
|
||||
| **Model** | Synology DS423+ (DSM 7) |
|
||||
| **Storage** | 10.9TB RAID 5 (4x Seagate IronWolf 4TB, ST4000VN006) |
|
||||
| **Role** | Backup target, media storage |
|
||||
|
||||
## Network Shares
|
||||
|
|
@ -37,7 +37,70 @@ Synology NAS providing network storage and backup target.
|
|||
|
||||
## Monitoring
|
||||
|
||||
Node exporter running in Docker container, scraped by [[prometheus]] at `sifaka:9100`.
|
||||
Prometheus exporters run as Docker containers, managed by Ansible (`mise run provision-sifaka`).
|
||||
|
||||
| Exporter | Port | Purpose |
|
||||
|----------|------|---------|
|
||||
| node_exporter | 9100 | System metrics (CPU, memory, disk I/O) |
|
||||
| smartctl_exporter | 9633 | SMART disk health data |
|
||||
|
||||
Scraped by [[prometheus]] via Caddy L4 TCP proxy at `nas.ops.eblu.me:9100` and `nas.ops.eblu.me:9633`. Dashboard: [[grafana]] > Sifaka Disk Health.
|
||||
|
||||
## First-Time Setup
|
||||
|
||||
These steps were performed once to enable Ansible provisioning. They are documented here for reference if sifaka is ever replaced or reset.
|
||||
|
||||
### 1. Enable SSH
|
||||
|
||||
DSM Control Panel > Terminal & SNMP > Enable SSH service (port 22).
|
||||
|
||||
### 2. SSH Key Authentication
|
||||
|
||||
From a tailnet client with an existing SSH key:
|
||||
|
||||
```bash
|
||||
ssh-copy-id eblume@sifaka # uses password auth initially
|
||||
```
|
||||
|
||||
Synology requires strict permissions on the home directory. On sifaka:
|
||||
|
||||
```bash
|
||||
chmod 755 ~ # DSM defaults to 777; SSH refuses keys otherwise
|
||||
chmod 700 ~/.ssh
|
||||
chmod 600 ~/.ssh/authorized_keys
|
||||
```
|
||||
|
||||
Home directory path: `/var/services/homes/eblume`.
|
||||
|
||||
### 3. Passwordless Sudo for Docker
|
||||
|
||||
Ansible needs `become: true` for Docker commands. Create a sudoers drop-in:
|
||||
|
||||
```bash
|
||||
sudo vi /etc/sudoers.d/docker-ansible
|
||||
```
|
||||
|
||||
Contents:
|
||||
|
||||
```
|
||||
eblume ALL=(ALL) NOPASSWD: /volume1/@appstore/ContainerManager/usr/bin/docker
|
||||
```
|
||||
|
||||
This grants passwordless sudo only for the Docker binary — no broader root access.
|
||||
|
||||
### 4. Docker Path
|
||||
|
||||
Synology installs Docker via Container Manager at a non-standard path:
|
||||
|
||||
```
|
||||
/volume1/@appstore/ContainerManager/usr/bin/docker
|
||||
```
|
||||
|
||||
This is configured in the `sifaka_exporters` role defaults.
|
||||
|
||||
### 5. Synology Device Naming
|
||||
|
||||
Synology uses `/dev/sata*` (e.g., `/dev/sata1` through `/dev/sata4`) instead of the standard `/dev/sd*` naming. The `smartctl_exporter` cannot auto-detect these devices, so they are passed explicitly via `--smartctl.device=` flags in the Ansible role.
|
||||
|
||||
## Tailscale
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue