Commit graph

79 commits

Author SHA1 Message Date
6dc698d4ef Fix zot dashboard job label filter
Alloy labels scraped metrics with job="prometheus.scrape.zot",
not just "zot". Updated all dashboard queries to match.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:02:27 -08:00
09983b3ae0 Fix Jinja2 escaping in minikube-metrics template
The {{.Host}} Go template syntax was being interpreted by Jinja2.
Added {% raw %} block to escape it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 11:58:38 -08:00
3337392b55 Add Grafana dashboards and metrics collection for zot and minikube
Phase 0 followup:
- Enable zot native metrics endpoint (/metrics)
- Add zot scraping to Alloy config
- Create zot Grafana dashboard (status, requests, latency, memory)
- Create minikube_metrics role (collects cluster health metrics)
- Create minikube Grafana dashboard (status, pod/namespace counts, logs)
- Update indri-services-check with minikube-metrics checks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 11:53:04 -08:00
67d3ba65f3 Add Step 0.13 implementation notes - Phase 0 complete
All Phase 0 steps completed:
- Zot registry with pull-through cache
- Podman container runtime
- Minikube Kubernetes cluster
- Remote kubectl access with 1Password credentials
- Health checks in indri-services-check
- Zettelkasten documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:53:28 -08:00
bb58898731 Add Step 0.12 implementation notes
Zettelkasten documentation created:
- zot.md: Container registry management log
- minikube.md: Kubernetes cluster management log
- Updated main blumeops card with new services, tags, and ports

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:52:39 -08:00
ce3e5fc1e0 Add minikube health checks to indri-services-check
Step 0.11 implementation:
- Check minikube status via SSH to indri
- Check k8s API server accessible from indri
- Check k8s API server accessible remotely from gilbert (via 1Password creds)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:37:12 -08:00
5ba2cf3861 Add fish abbreviations for minikube-indri to plan docs
Step 0.12 update:
- Document ki, k9i, k9 abbreviations for quick kubectl/k9s access
- These avoid accidentally triggering work SAML flow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:30:13 -08:00
e032e27b66 Configure remote kubectl access with 1Password credentials
Step 0.10 implementation:
- Recreate minikube with --apiserver-names=indri --listen-address=0.0.0.0
- Add kubectl-credential-1password exec plugin for 1Password integration
- Client certs fetched from 1Password on-demand (no private keys on disk)
- CA cert stored locally (not secret - public key for server verification)

Minikube role updates:
- Add minikube_apiserver_names and minikube_listen_address variables
- Update tasks to include remote access flags

This mirrors the 1Password SSH agent pattern - biometric auth required
for each kubectl command that needs credentials.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:12:58 -08:00
9950c8207f Update plan with Step 0.10 and 0.12 implementation details
Step 0.10 (kubeconfig on gilbert):
- Document research on kubectl remote access options
- Choose --apiserver-names + --listen-address approach
- Add references to sources

Step 0.12 (zettelkasten):
- Add instructions to update main blumeops card
- Fix zot port from 5000 to 5050
- Add minikube.md template with remote access docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 09:10:53 -08:00
2c90dc01a4 Add minikube role for Kubernetes cluster on indri
- Create ansible/roles/minikube for minikube cluster setup
- Use podman driver with cri-o runtime
- Set memory to 7800MB (vs 8192 podman) to account for VM overhead
- Add minikube role to indri playbook
- Update k8s-migration plan with implementation details

Deployed with Kubernetes v1.34.0 and CRI-O 1.24.6.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 22:39:38 -08:00
55f0335a1e Add podman role with known issue documentation
- Create ansible/roles/podman for podman machine setup on indri
- Document known reliability issue with podman machine init/start via SSH
  (race condition from containers/podman#16945)
- Role attempts init/start but doesn't fail if start hangs
- Workaround: manual init/start on indri if needed
- Update k8s-migration plan with implementation details

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 22:23:45 -08:00
8d96f91365 Add Step 0.7 implementation details to plan
Note use of Tailscale service URL for health check.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:18:08 -08:00
8c7a746227 Add zot checks to indri-services-check
- zot LaunchAgent running
- zot-metrics LaunchAgent running
- Zot registry HTTP endpoint (via Tailscale service URL)
- Zot metrics file exists

Also updated Grafana, Kiwix, Forgejo, Devpi to use Tailscale
service URLs for consistency with Miniflux and Zot.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:17:58 -08:00
8c01181e55 Add zot log collection to Alloy
Collects stdout and stderr logs from zot to Loki.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:13:30 -08:00
7d673cbee9 Add zot_metrics role for Prometheus monitoring
Collects zot_up metric via textfile collector every 60 seconds.
Additional metrics can be added later if zot exposes them.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:09:20 -08:00
e0e8c5449b Add podman to Brewfile for container operations on gilbert
Required for testing zot registry push from workstation.
Podman uses a Linux VM under the hood on macOS.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 21:00:55 -08:00
1ee2863fc7 Fix zot port and sync config, update plan with implementation details
- Change zot port from 5000 to 5050 (macOS ControlCenter uses 5000)
- Fix sync config: use destination for namespacing, prefix ** for matching
- Update tailscale_serve to use port 5050
- Add zot role to main playbook
- Document implementation details in plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:44:45 -08:00
b62b10a60f Add registry to tailscale_serve configuration
Exposes zot registry (localhost:5000) as registry.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:31:28 -08:00
2bba28fc30 Add zot container registry ansible role
Phase 0: Creates zot role with:
- Config for pull-through cache (Docker Hub, GHCR, Quay)
- mcquack LaunchAgent for service management
- Sync registries configured for on-demand caching

Binary is built from source at ~/code/3rd/zot (not homebrew).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:24:58 -08:00
8defa0ef6e Apply tag:registry to indri via Pulumi
- Add tag:registry to indri DeviceTags in __main__.py
- Update plan with implementation details noting the tag is
  managed via Pulumi, not manually in admin console

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:16:06 -08:00
83956afe92 Add tag:registry for Zot container registry
Phase 0 of k8s migration: Add registry tag to ACLs.
- Admins get full access via wildcard grant
- Members denied access (infrastructure only)
- Enables tailscale serve for registry.tail8d86e.ts.net

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 20:10:36 -08:00
ee196b0c10 Fix Phase 0 plan based on review feedback (#25)
## Summary
- Step 0.3: Use launchctl unload/load pattern for handlers (consistent with existing handlers)
- Step 0.6: Correct file path - add zot logs to alloy defaults/main.yml
- Step 0.9: Use cri-o runtime instead of containerd
- Step 0.10: Simplify kubeconfig instructions - focus on goal not implementation

## Deployment and Testing
- [x] Documentation-only change, no deployment needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/25
2026-01-17 20:07:10 -08:00
c8433467c1 Add Kubernetes migration plan documentation (#24)
## Summary
- Comprehensive phased plan for migrating blumeops services to minikube
- Technical decisions documented: Zot registry, Podman driver, CloudNativePG, Tailscale Operator
- 9 migration phases with verification and rollback procedures
- LaunchAgent absolute path requirements documented
- Observability requirements (zk docs, logging, metrics, dashboards) for new services

## Deployment and Testing
- [x] Plan document created at `docs/k8s-migration.md`
- [ ] Review plan phases for completeness
- [ ] Validate technical decisions align with requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/24
2026-01-17 17:34:53 -08:00
e6d302b40b Harden Tailscale ACL policy with least-privilege grants (#23)
## Summary
- Replace permissive wildcard ACL (`*` -> `*`) with specific service grants
- Admin: full access to all services including NAS
- Member: user-facing services only (no Grafana/Loki/NAS)
- Add device tagging for gilbert (workstation) and sifaka (NAS) via Pulumi
- SSH hardening: remove root access, use "check" action with MFA
- Add ACL tests to validate policy behavior

## Deployment and Testing
- [x] Pulumi preview passes
- [x] HuJSON syntax validated
- [x] ACL tests defined and passing
- [ ] Deploy with `mise run tailnet-up`
- [ ] Verify SSH access from gilbert to indri
- [ ] Verify Allison cannot access Grafana/Loki/NAS

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/23
2026-01-17 11:58:04 -08:00
0918764e93 Rename Node Exporter dashboard to macOS (#22)
## Summary
- Renamed dashboard from "Node Exporter - macOS" to just "macOS" since it now uses Alloy
- Updated filename, title, uid, and tags to reflect the change

## Deployment and Testing
- [ ] Deploy with `mise run provision-indri -- --tags grafana`
- [ ] Verify dashboard accessible at https://grafana.tail8d86e.ts.net

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/22
2026-01-17 09:29:19 -08:00
3962e5a7de Fix borgmatic PostgreSQL backup and update backup sources (#21)
## Summary
- Fix PostgreSQL backup failure by adding explicit `pg_dump_command` path (was failing with "pg_dump: command not found" in LaunchAgent)
- Remove `~/code/3rd/kiwix-tools` from backups (was just symlinks to ZIM archives in transmission)
- Enable Loki log backup by removing from exclude_patterns

## Deployment and Testing
- [x] Dry run with `--check --diff` shows expected changes
- [ ] Deploy with `mise run provision-indri -- --tags borgmatic`
- [ ] Verify config deployed: `ssh indri 'cat ~/.config/borgmatic/config.yaml'`
- [ ] Run manual backup to test: `ssh indri 'mise x -- borgmatic create --verbosity 1'`
- [ ] Verify PostgreSQL dump succeeds (no "pg_dump: command not found" error)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/21
2026-01-17 09:22:01 -08:00
75426be1dc Remove ansible role meta dependencies to fix duplicate execution (#20)
## Summary
- Remove all `meta/main.yml` dependencies from ansible roles
- Role ordering is now controlled entirely by `indri.yml` playbook
- Fix incorrect roles path in CLAUDE.md (`playbooks/roles` → `roles`)

## Why
Ansible's tag accumulation behavior prevents proper role deduplication when using meta dependencies. When a role is pulled in as a dependency, the parent role's tags are added to the dependency's tags (e.g., `[loki]` becomes `[alloy, loki]`), making them appear as different invocations to Ansible and causing roles to run multiple times.

## Deployment and Testing
- [x] Verified with `ansible-playbook --list-tasks` that each role now appears exactly once
- [x] Run full provision to verify no regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/20
2026-01-16 22:50:34 -08:00
9931829d03 Add pre-commit hooks for code quality (#19)
## Summary
- Add pre-commit framework with hooks for YAML, Ansible, Python, shell, TOML, JSON, and secret detection
- Fix all 91+ ansible-lint violations (variable naming, handler capitalization, changed_when)
- Fix shellcheck warnings in mise-tasks scripts
- Document pre-commit setup in README.md

## Deployment and Testing
- [x] All pre-commit hooks pass (`uvx pre-commit run --all-files`)
- [x] Test ansible playbook with `--check` mode
- [x] Run `mise run indri-services-check` after deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/19
2026-01-16 19:33:02 -08:00
78f14f8bde Reworded CLAUDE.md 2026-01-16 18:47:47 -08:00
d3d3041b27 Decouple ZIM/torrent ansible tasks for faster provisioning (#18)
## Summary
- Simplify kiwix role from 213 lines to 151 lines (-30%)
- Replace per-archive torrent status loops with single shell command
- Decouple kiwix startup from declared inventory - now serves whatever completed ZIM files exist
- Fix tailscale_serve role to handle empty JSON in check mode

## Performance improvement
- **Before**: ~132 operations (44 archives × 3 loops for status check, recheck, symlink)
- **After**: ~5 operations (1 shell script + 1 find + conditional symlinks)
- Expected reduction: ~3 minutes per ansible run

## Test plan
- [x] Ran `mise run provision-indri -- --check --diff` to preview changes
- [x] Ran `mise run provision-indri` to apply changes
- [x] Ran `mise run indri-services-check` - all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/18
2026-01-16 15:14:00 -08:00
812b78bf61 Use explicit PostgreSQL superuser name and fix check mode (#17)
## Summary
- Add `postgresql_superuser` variable (`eblume`) to prevent PostgreSQL from inheriting OS username during initdb
- Update all psql/createdb commands to use explicit `-U` flag
- Add `check_mode: false` to op commands so 1Password fetches run during `--check` mode
- Add PostgreSQL and Miniflux health checks to indri-services-check

## Test plan
- [x] Renamed existing superuser from `erichblume` to `eblume`
- [x] Ran `mise run provision-indri -- --tags postgresql --check --diff` successfully
- [x] Verified connection as `eblume` superuser via Tailscale
- [x] Ran `mise run indri-services-check` - all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/17
2026-01-16 14:41:36 -08:00
adf6f4fbe9 Add PostgreSQL and Miniflux services to tailnet (#16)
## Summary
- Add PostgreSQL 18 as a new service at `pg.tail8d86e.ts.net:5432`
- Add Miniflux RSS/Atom feed reader at `feed.tail8d86e.ts.net`
- Both services managed via homebrew/brew services
- Pulumi ACL tags added (tag:pg, tag:feed)
- Alloy log collection configured for both services
- Zettelkasten documentation updated

## Manual Setup Required

Before running ansible, the following steps are needed on indri:

### 1. Apply Pulumi tags
```bash
mise run tailnet-up
```
Then apply tags to indri in Tailscale admin console.

### 2. Create 1Password entries
- miniflux PostgreSQL user password
- miniflux admin password (for first run)

### 3. Set PostgreSQL user password (after ansible installs postgres)
```bash
ssh indri '/opt/homebrew/opt/postgresql@18/bin/psql -c "ALTER USER miniflux PASSWORD '\''your-password'\'';"'
```

### 4. Create password files on indri
```bash
ssh indri 'echo "your-db-password" > ~/.miniflux-db-password && chmod 600 ~/.miniflux-db-password'
ssh indri 'echo "your-admin-password" > ~/.miniflux-admin-password && chmod 600 ~/.miniflux-admin-password'
```

### 5. Create ~/.pgpass for borgmatic
```bash
ssh indri 'echo "localhost:5432:miniflux:miniflux:YOUR_PASSWORD" > ~/.pgpass && chmod 600 ~/.pgpass'
```

### 6. Run ansible with first-run admin creation
```bash
mise run provision-indri -- -e miniflux_create_admin=1
```

### 7. Update borgmatic config
Add to `~/.config/borgmatic/config.yaml` on indri:
```yaml
postgresql_databases:
    - name: miniflux
      hostname: localhost
      port: 5432
      username: miniflux
```

### 8. Cleanup after first run
```bash
ssh indri 'rm ~/.miniflux-admin-password'
```

## Test plan
- [ ] Run `mise run tailnet-up` and verify Pulumi changes
- [ ] Apply tags to indri in Tailscale admin
- [ ] Run `mise run provision-indri -- --check --diff` for dry run
- [ ] Run `mise run provision-indri -- -e miniflux_create_admin=1`
- [ ] Approve services in Tailscale admin
- [ ] Verify PostgreSQL: `ssh indri '/opt/homebrew/opt/postgresql@18/bin/pg_isready'`
- [ ] Verify Miniflux: `curl https://feed.tail8d86e.ts.net/healthcheck`
- [ ] Run `mise run indri-services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/16
2026-01-16 12:30:20 -08:00
3f4e40f3ae Add Pulumi for tailnet IaC management (#15)
## Summary
- Manage tail8d86e.ts.net ACLs, tags, and DNS via Pulumi + Python
- State stored in Pulumi Cloud (free tier) to avoid circular dependency
- OAuth authentication via 1Password for secure credential management
- New mise tasks: `tailnet-preview`, `tailnet-up`

## Architecture
Two-layer approach:
- **Layer 1 (Pulumi)**: Tailnet-wide config (ACLs, tags, DNS)
- **Layer 2 (Ansible)**: Node-local `tailscale serve` config (unchanged)

## Test plan
- [x] Exported current ACL from Tailscale API
- [x] Imported existing ACL into Pulumi state
- [x] Verified `mise run tailnet-preview` shows no changes
- [x] Verified `mise run tailnet-up` applies successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/15
2026-01-15 20:55:25 -08:00
72c2dd7096 Add blumeops-tasks mise task for Todoist integration (#14)
## Summary
- Add `mise run blumeops-tasks` to fetch and display tasks from Todoist
- Uses uv run script with inline dependencies (httpx, rich)
- Fetches API credential securely via 1Password CLI
- Sorts tasks by custom priority order: p1, p2, p4, p3 (backlog last)
- Documents the task discovery workflow in CLAUDE.md

## Test plan
- [x] Verified `mise run blumeops-tasks` fetches and displays tasks correctly
- [x] Confirmed priority sorting works as expected

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/14
2026-01-15 18:03:19 -08:00
ae1513e7e9 Add Plex Media Server observability (#13)
## Summary
- Add `plex_metrics` ansible role with textfile collector for Prometheus metrics
- Add Plex log collection to Alloy (forwards to Loki)
- Add Grafana dashboard for Plex monitoring (status, library counts, sessions, transcoding, logs)

## Metrics Collected
- `plex_up` - server health
- `plex_version_info` - server version
- `plex_sessions_total/playing/paused` - active sessions
- `plex_transcode_sessions_total/video/audio` - transcoding status
- `plex_library_items{library,type}` - library item counts

## Prerequisites
Plex token must be stored at `~/.plex-token` on indri (already done).

## Test plan
- [x] Dry-run passed (`mise run provision-indri -- --check --diff`)
- [ ] Apply changes (`mise run provision-indri`)
- [ ] Verify metrics: `ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/plex.prom'`
- [ ] Verify logs in Grafana Explore: `{service="plex"}`
- [ ] Check Plex dashboard in Grafana

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/13
2026-01-15 15:27:59 -08:00
2a1359a3b6 Fix ansible handler timeouts for alloy and loki restarts (#12)
## Summary
- Use async with poll: 0 for alloy and loki restart handlers
- Fire-and-forget approach prevents ansible from hanging on graceful shutdown

## Test plan
- [x] Manually verified `brew services restart grafana-alloy` works
- [x] Run full ansible playbook and verify it completes without timeout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/12
2026-01-15 13:56:11 -08:00
ba5cd75ee2 Fix ansible handler timeouts for alloy and loki restarts
Use async with poll: 0 to fire-and-forget service restarts.
These services have graceful shutdown periods that can exceed
ansible's default command timeout.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 12:39:28 -08:00
242c1880de Add Grafana Alloy and Loki for unified observability (#11)
## Summary
- Add Grafana Alloy to replace node_exporter for metrics collection
- Add Loki for log aggregation and storage
- Configure Alloy to collect logs from all services (grafana, forgejo, prometheus, tailscale, transmission, devpi, kiwix, borgmatic)
- Update Prometheus to accept metrics via remote_write
- Add Loki datasource to Grafana

## Test plan
- [ ] Run \`mise run provision-indri -- --check --diff\` to verify changes
- [ ] Apply with \`mise run provision-indri\`
- [ ] Verify services: \`mise run indri-services-check\`
- [ ] Check Grafana Explore with Loki datasource
- [ ] Query logs: \`{service="grafana"}\`
- [ ] Verify metrics still flowing to Prometheus dashboards

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/11
2026-01-15 12:24:13 -08:00
070f26dc6d Add zk-docs mise task for zettelkasten documentation (#10)
## Summary
- Add `mise run zk-docs` task to concatenate all blumeops-tagged zettelkasten cards
- Main project card is shown first, followed by service management logs
- Uses `bat` for output (added to Brewfile)
- Args are passed through to bat for custom formatting
- Update CLAUDE.md to use zk-docs command with plain output options
- Update README.md to note zettelkasten is private with contact email

## Test plan
- [x] `mise run zk-docs` displays all 6 blumeops cards
- [x] `mise run zk-docs -- --style=header --color=never --decorations=always` shows filenames without decoration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/10
2026-01-15 11:25:02 -08:00
2e326eb30d Critical security note for claude 2026-01-15 09:02:27 -08:00
c660674891 Remove settings.local.json from repo and add to gitignore
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 08:59:24 -08:00
d8a0ef6482 Add devpi PyPI caching proxy role for indri (#9)
## Summary
- Add ansible role for devpi-server as a transparent PyPI caching proxy
- LaunchAgent with KeepAlive runs via `mise x -- devpi-server`
- Listens on port 3141, data stored in `~/devpi`
- Health checks added to `indri-services-check` script

## Manual Setup Required (on indri, before provisioning)
1. Add to `~/.config/mise/config.toml`:
   ```toml
   [tools]
   "pipx:devpi-server" = "latest"
   "pipx:devpi-web" = "latest"
   "pipx:devpi-client" = "latest"
   ```
2. Run `mise install`
3. Initialize: `mise x -- devpi-init --serverdir ~/devpi`

## Post-Provisioning
- Set up Tailscale service `pypi` on port 443 → 3141
- Configure client pip.conf with index-url

## Test plan
- [x] Ansible syntax check passes
- [x] Dry-run: `mise run provision-indri -- --check --diff`
- [x] Apply: `mise run provision-indri`
- [x] Health check: `mise run indri-services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/9
2026-01-15 08:31:09 -08:00
50c713b5de Add macOS-compatible Node Exporter Grafana dashboard (#8)
## Summary
- Adds a new Grafana dashboard for Node Exporter metrics on macOS hosts
- Uses macOS-native memory metrics (node_memory_total_bytes, node_memory_active_bytes, etc.) instead of Linux-specific ones
- Includes dropdown selectors for instance, disk, and network device filtering

## Details
The standard Node Exporter dashboards show "No Data" for memory panels on macOS because they query Linux-specific metrics like `node_memory_MemTotal_bytes`. macOS node_exporter exports different metrics:

| Linux | macOS |
|-------|-------|
| node_memory_MemTotal_bytes | node_memory_total_bytes |
| node_memory_MemFree_bytes | node_memory_free_bytes |
| node_memory_Buffers_bytes | (not available) |
| node_memory_Cached_bytes | (not available) |

macOS has unique memory categories: Wired, Active, Compressed, Inactive, Free.

## Test plan
- [x] Dashboard deployed to indri via ansible
- [x] All panels showing data for indri
- [x] Instance selector works to switch between hosts
- [x] Disk and network device filters work

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/8
2026-01-14 20:53:57 -08:00
d9be8c27bc Add 32 devdocs ZIM archives for programming documentation (#7)
## Summary
- Adds offline documentation for: bash, c, click, cmake, cpp, css, django-rest-framework, django, docker, duckdb, fish, gcc, git, go, godot, hammerspoon, homebrew, javascript, kubectl, kubernetes, latex, lua, markdown, nginx, nix, postgresql, python, redis, sqlite, typescript, werkzeug, zig
- All January 2026 versions from download.kiwix.org/zim/devdocs/
- Downloads via BitTorrent through transmission

## Test plan
- [x] Deployed to indri via `mise run provision-indri`
- [x] All 32 torrents added and downloaded (small files, completed instantly)
- [x] 43 ZIM files now available in kiwix directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/7
2026-01-14 18:28:34 -08:00
10012a4cf2 Add upload/download ratio and period transfer panels to Transmission dashboard (#6)
## Summary
- Adds Upload/Download Ratio stat panel with color thresholds (red < 0.5, yellow < 1, green >= 1)
- Adds Downloaded (Period) stat panel showing bytes downloaded in selected time range
- Adds Uploaded (Period) stat panel showing bytes uploaded in selected time range

Uses PromQL `increase()` on existing counter metrics - no new metrics collection needed.

## Test plan
- [x] Deployed to indri via `mise run provision-indri`
- [x] Grafana restarted successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/6
2026-01-14 18:08:39 -08:00
ba03af15eb Set MISE_TASK_OUTPUT=interleave in provision-indri
Shows ansible output in real-time instead of buffered.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:15:11 -08:00
2f28b151f5 Fix launchctl idempotency in kiwix and borgmatic roles
Check if LaunchAgent is already loaded before attempting to load it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:14:52 -08:00
e534e59556 Add provision-indri mise task and fix idempotency
- Add mise-tasks/provision-indri script to run ansible playbook
- Fix transmission_metrics launchctl load to be idempotent
- Update CLAUDE.md to reference mise run provision-indri

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:10:30 -08:00
e264b39cd6 Add total torrent size metric and dashboard panel
- Query torrent-get RPC to sum totalSize of all torrents
- Add transmission_torrents_size_bytes gauge metric
- Add "Total Torrent Size" timeseries panel to dashboard

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:00:52 -08:00
d18c3a6f3c Add node_exporter and transmission-metrics to service checks
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 13:57:08 -08:00