Commit graph

52 commits

Author SHA1 Message Date
9fac4439b1 Migrate minikube ansible role from qemu2 to docker driver
- Change driver from qemu2 to docker
- Remove socket_vmnet and qemu dependencies
- Remove NFS mount and minikube mount LaunchAgent/LaunchDaemon
- Remove old podman zot-mirror.conf
- Update containerd registry mirror config for docker driver
  - Uses host.minikube.internal:5050 to reach zot
  - Configures pull-through cache for docker.io, ghcr.io, quay.io
- Add dynamic tailscale serve configuration for k8s API
  (port is dynamic with docker driver, not fixed at 6443)
- Remove svc:k8s from tailscale_serve defaults (minikube role handles it)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 13:52:52 -08:00
2c28a3fc54 Update tailscale_serve for qemu2 API server address
The k8s API server is now at 192.168.105.2:6443 (inside qemu2 VM)
instead of localhost:44491 (old podman port mapping).

Note: TCP passthrough via tailscale svc:k8s is configured but
connection times out - may need admin console approval or debugging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 11:45:31 -08:00
b096df4c71 Fix ansible idempotency and document macOS network permission
- Check containerd registry config before writing to avoid unnecessary changes
- Fix ansible_env deprecation warnings (use ansible_facts['env'])
- Document macOS network permission popup for minikube mount
- Document passwordless sudo configuration for indri
- Add checks to skip sudo tasks when state already matches

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 11:24:44 -08:00
40376b635f Add LaunchDaemon/LaunchAgent for persistent NFS and minikube mounts
- LaunchDaemon: mounts sifaka:/volume1/torrents to /Volumes/torrents-nfs at boot
- LaunchAgent: runs minikube mount to pass through to /mnt/torrents in VM
- Handlers to load both services when plist files change

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 08:22:53 -08:00
26ec02e1be P5.1: Add VM config to ansible role, mark phase complete
- Add hosts file entry for registry.tail8d86e.ts.net in VM
- Configure containerd registry mirror to use local zot
- Update P5.1 doc with implementation notes and manual steps
- Mark P5.1 as complete

Manual steps still required after cluster creation:
1. sudo brew services start socket_vmnet (once per reboot)
2. sudo mount -t nfs sifaka:/volume1/torrents /Volumes/torrents-nfs
3. minikube mount /Volumes/torrents-nfs:/mnt/torrents (GUI session)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 08:03:21 -08:00
4b2c1a346f Add socket_vmnet for proper qemu2 networking
- Install socket_vmnet via homebrew
- Start socket_vmnet service (requires sudo)
- Add --network=socket_vmnet to minikube start

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 21:41:47 -08:00
0474962e89 Increase minikube resources to 6 CPUs and 12GB RAM
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 21:20:04 -08:00
919f926241 P5.1: Update minikube role for QEMU2 driver
- Change minikube driver from podman to qemu2
- Change container runtime from cri-o to containerd
- Add qemu installation to minikube role
- Remove podman role from indri.yml playbook
- Update handlers for containerd instead of cri-o
- Temporarily disable registry mirror config (needs containerd format)
- Add k8s-storage synology user creation steps to P5.1 doc
- Add post-migration tasks for zot registry mirror reconfiguration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 21:06:53 -08:00
0439fbb704 P5: Migrate devpi to Kubernetes (#34)
## Summary
- Migrate devpi PyPI caching proxy from indri LaunchAgent to Kubernetes
- Custom container image with devpi-server + devpi-web + auto-init
- StatefulSet with 50Gi PVC, Tailscale Ingress at pypi.tail8d86e.ts.net
- Remove devpi from ansible playbooks and update CLAUDE.md with k8s workflow

## Key Changes
- Add CRI-O registry mirror config for registry.tail8d86e.ts.net
- Change ArgoCD apps to manual sync (was auto-sync causing issues)
- 2Gi memory limit for Whoosh indexer (reclaimed after startup)

## Deployment and Testing
- [x] devpi pod healthy in k8s
- [x] pip install through proxy works
- [x] mcquack 1.0.0 uploaded and installable
- [x] Old devpi stopped on indri

## Post-Merge
Reset ArgoCD to main:
```
argocd app set apps --revision main && argocd app sync apps
argocd app set devpi --revision main && argocd app sync devpi
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/34
2026-01-20 14:55:37 -08:00
735b643429 P4: Miniflux migration + PostgreSQL consolidation (#33)
## Summary
- Deploy miniflux in k8s via ArgoCD
- Expose via Tailscale Ingress at feed.tail8d86e.ts.net
- Retire brew PostgreSQL (no longer needed)
- Rename k8s-pg to pg (canonical hostname)
- Remove ansible miniflux and postgresql roles
- Update borgmatic to backup pg.tail8d86e.ts.net
- Update all zk documentation

## Deployment and Testing
- [x] Miniflux pod running in k8s
- [x] User login works at https://feed.tail8d86e.ts.net
- [x] Feeds and entries visible
- [x] brew miniflux and postgresql stopped
- [x] Tailscale services migrated (feed, pg)
- [x] zk documentation updated
- [x] Run ansible to apply role removals
- [ ] Verify borgmatic backup with new pg hostname

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/33
2026-01-20 09:04:47 -08:00
eb952aae01 P3: PostgreSQL disaster recovery test and borgmatic k8s-pg backup (#32)
## Summary
- Fixed borgmatic `borg: command not found` by adding `local_path` config option
- Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg
- Added borgmatic user to k8s-pg via CloudNativePG managed roles
- Configured borgmatic to backup both localhost and k8s-pg PostgreSQL databases
- Added Tailscale ACL grant for `tag:homelab` → `tag:k8s` on port 5432
- Disabled selfHeal on apps app to allow manual revision changes during development

## Changes
- `ansible/roles/borgmatic/` - Added `local_path` and k8s-pg database entry
- `ansible/roles/postgresql/tasks/main.yml` - Added k8s-pg to `.pgpass`
- `argocd/apps/apps.yaml` - Disabled selfHeal
- `argocd/manifests/databases/blumeops-pg.yaml` - Added borgmatic managed role
- `argocd/manifests/databases/secret-borgmatic.yaml.tpl` - New secret template
- `pulumi/policy.hujson` - Added ACL grant for backup access

## Deployment and Testing
- [x] Borgmatic backup runs successfully
- [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries verified)
- [x] borgmatic user created in k8s-pg with pg_read_all_data role
- [x] Both localhost and k8s-pg databases in backup archive
- [x] zk documentation updated (borgmatic.md, postgresql.md)
- [ ] After merge: set blumeops-pg app back to main revision

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/32
2026-01-19 18:00:32 -08:00
f2541c3f77 Fix minikube role idempotency for zot mirror config (#31)
## Summary
- Fixed trailing newline mismatch in config comparison (ansible command module strips whitespace, slurp preserves it)
- Only copy temp file when config actually needs updating (avoids spurious changes)
- Task now properly skips when config is already correct

## Deployment and Testing
- [x] Verified idempotency: `changed=0` on repeated runs
- [x] Verified change detection: corrupted config triggers proper update
- [x] ansible-lint passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/31
2026-01-19 16:19:52 -08:00
130c044523 Fix hanging minikube provision 2026-01-19 15:49:11 -08:00
7e6742ad24 K8s Migration Phase 2: Grafana to Kubernetes (#30)
## Summary
- Migrate Grafana from Homebrew/Ansible to Kubernetes deployment
- Switch CloudNativePG to use forge-mirrored Helm chart (HTTPS, no auth needed)
- Add Grafana Helm chart deployment via ArgoCD with multi-source pattern
- Add Grafana config (Tailscale Ingress, 9 dashboard ConfigMaps)
- Update Loki to bind 0.0.0.0 for k8s pod access via `host.containers.internal`

## Key Changes
- `argocd/apps/grafana.yaml` - Grafana Helm chart Application
- `argocd/apps/grafana-config.yaml` - Ingress + dashboard ConfigMaps
- `argocd/apps/cloudnative-pg.yaml` - Now uses forge mirror instead of external Helm repo
- `ansible/roles/loki/templates/loki-config.yaml.j2` - Bind 0.0.0.0

## Deployment and Testing
- [x] Deploy Loki config change: `mise run provision-indri -- --tags loki`
- [x] Create namespace: `ki create namespace monitoring`
- [x] Create secret: `op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | ki apply -f -`
- [x] Sync ArgoCD apps (grafana, grafana-config)
- [x] Verify Grafana works at https://grafana.tail8d86e.ts.net
- [x] Remove svc:grafana from ansible tailscale_serve
- [x] Stop brew grafana: `ssh indri 'brew services stop grafana'`
- [x] Delete ansible grafana role

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/30
2026-01-19 14:40:25 -08:00
a8f4d00294 K8s Migration Phase 1: Infrastructure Setup (#29)
## Summary
- Split k8s migration plan into phases folder for easier navigation
- Added `tag:k8s` to Pulumi ACLs for Kubernetes workloads
- Phase 1 work in progress

## Phase 1 Goals
- Tailscale Kubernetes Operator
- CloudNativePG Operator
- PostgreSQL cluster for future app migrations

## Deployment and Testing
- [ ] Review Phase 1 plan
- [ ] `mise run tailnet-preview` to verify ACL changes
- [ ] `mise run tailnet-up` to apply ACL changes
- [ ] Create Tailscale OAuth client (manual)
- [ ] Deploy operators and PostgreSQL cluster

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/29
2026-01-19 09:49:52 -08:00
61dced048b Fix borgmatic-metrics script PATH issue (#28)
## Summary
- Fixed borgmatic-metrics script failing in LaunchAgent context
- Changed from `mise x -- borg` to absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`)
- This fixes the Grafana dashboard showing "DOWN" for Repository Status and missing time series data

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags borgmatic-metrics` to deploy the fix
- [ ] Wait for the hourly metrics collection (or manually run `ssh indri '~/bin/borgmatic-metrics'`)
- [ ] Verify Grafana dashboard shows "UP" status and populated graphs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/28
2026-01-18 14:57:35 -08:00
3679124ebd Expose Kubernetes API as Tailscale service (Step 0.14) (#27)
## Summary
- Add `tag:k8s-api` to Pulumi ACLs and indri device tags
- Configure Tailscale serve with TCP passthrough for k8s API at `k8s.tail8d86e.ts.net`
- Update minikube role to include `k8s.tail8d86e.ts.net` in certificate SANs
- Add `apiserver_port` config option (internal port 6443, dynamic host port with podman driver)
- Document Step 0.14 in k8s-migration plan (added post-Phase 0 completion)

The Kubernetes API is now accessible at `https://k8s.tail8d86e.ts.net` using TCP passthrough to preserve mTLS authentication.

## Deployment and Testing
- [x] Pulumi ACLs applied
- [x] Tailscale service created and approved in admin console
- [x] Minikube cluster recreated with new cert SANs
- [x] tailscale serve configured with TCP passthrough
- [x] 1Password credentials updated with new certs
- [x] Kubeconfig updated on gilbert
- [x] `mise run indri-services-check` passes
- [x] `kubectl --context=minikube-indri get nodes` works via Tailscale

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/27
2026-01-18 12:49:20 -08:00
19a82373d5 K8s Migration Phase 0: Foundation Infrastructure (#26)
## Summary
- Step 0.1: Update Pulumi ACLs with tag:registry
- Step 0.3: Create Zot registry ansible role with mcquack LaunchAgent
- Step 0.4: Add Zot to Tailscale Serve configuration
- Step 0.5: Create Zot metrics role for Prometheus scraping
- Step 0.6: Add Zot log collection to Alloy
- Step 0.7: Update indri-services-check with zot checks
- Step 0.8: Add podman role for container runtime
- Step 0.9: Add minikube role for Kubernetes cluster
- Step 0.10: Configure remote kubectl access with 1Password credentials

## Remaining Steps
- [ ] Step 0.11: Add minikube to indri-services-check
- [ ] Step 0.12: Create zettelkasten documentation
- [ ] Step 0.13: Verify main playbook (already done - roles added)

## Deployment and Testing
- [x] Zot registry deployed and accessible at https://registry.tail8d86e.ts.net
- [x] Podman machine running on indri
- [x] Minikube cluster running on indri
- [x] kubectl access from gilbert working with 1Password credentials
- [ ] indri-services-check passes all checks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/26
2026-01-18 12:06:28 -08:00
0918764e93 Rename Node Exporter dashboard to macOS (#22)
## Summary
- Renamed dashboard from "Node Exporter - macOS" to just "macOS" since it now uses Alloy
- Updated filename, title, uid, and tags to reflect the change

## Deployment and Testing
- [ ] Deploy with `mise run provision-indri -- --tags grafana`
- [ ] Verify dashboard accessible at https://grafana.tail8d86e.ts.net

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/22
2026-01-17 09:29:19 -08:00
3962e5a7de Fix borgmatic PostgreSQL backup and update backup sources (#21)
## Summary
- Fix PostgreSQL backup failure by adding explicit `pg_dump_command` path (was failing with "pg_dump: command not found" in LaunchAgent)
- Remove `~/code/3rd/kiwix-tools` from backups (was just symlinks to ZIM archives in transmission)
- Enable Loki log backup by removing from exclude_patterns

## Deployment and Testing
- [x] Dry run with `--check --diff` shows expected changes
- [ ] Deploy with `mise run provision-indri -- --tags borgmatic`
- [ ] Verify config deployed: `ssh indri 'cat ~/.config/borgmatic/config.yaml'`
- [ ] Run manual backup to test: `ssh indri 'mise x -- borgmatic create --verbosity 1'`
- [ ] Verify PostgreSQL dump succeeds (no "pg_dump: command not found" error)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/21
2026-01-17 09:22:01 -08:00
75426be1dc Remove ansible role meta dependencies to fix duplicate execution (#20)
## Summary
- Remove all `meta/main.yml` dependencies from ansible roles
- Role ordering is now controlled entirely by `indri.yml` playbook
- Fix incorrect roles path in CLAUDE.md (`playbooks/roles` → `roles`)

## Why
Ansible's tag accumulation behavior prevents proper role deduplication when using meta dependencies. When a role is pulled in as a dependency, the parent role's tags are added to the dependency's tags (e.g., `[loki]` becomes `[alloy, loki]`), making them appear as different invocations to Ansible and causing roles to run multiple times.

## Deployment and Testing
- [x] Verified with `ansible-playbook --list-tasks` that each role now appears exactly once
- [x] Run full provision to verify no regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/20
2026-01-16 22:50:34 -08:00
9931829d03 Add pre-commit hooks for code quality (#19)
## Summary
- Add pre-commit framework with hooks for YAML, Ansible, Python, shell, TOML, JSON, and secret detection
- Fix all 91+ ansible-lint violations (variable naming, handler capitalization, changed_when)
- Fix shellcheck warnings in mise-tasks scripts
- Document pre-commit setup in README.md

## Deployment and Testing
- [x] All pre-commit hooks pass (`uvx pre-commit run --all-files`)
- [x] Test ansible playbook with `--check` mode
- [x] Run `mise run indri-services-check` after deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/19
2026-01-16 19:33:02 -08:00
d3d3041b27 Decouple ZIM/torrent ansible tasks for faster provisioning (#18)
## Summary
- Simplify kiwix role from 213 lines to 151 lines (-30%)
- Replace per-archive torrent status loops with single shell command
- Decouple kiwix startup from declared inventory - now serves whatever completed ZIM files exist
- Fix tailscale_serve role to handle empty JSON in check mode

## Performance improvement
- **Before**: ~132 operations (44 archives × 3 loops for status check, recheck, symlink)
- **After**: ~5 operations (1 shell script + 1 find + conditional symlinks)
- Expected reduction: ~3 minutes per ansible run

## Test plan
- [x] Ran `mise run provision-indri -- --check --diff` to preview changes
- [x] Ran `mise run provision-indri` to apply changes
- [x] Ran `mise run indri-services-check` - all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/18
2026-01-16 15:14:00 -08:00
812b78bf61 Use explicit PostgreSQL superuser name and fix check mode (#17)
## Summary
- Add `postgresql_superuser` variable (`eblume`) to prevent PostgreSQL from inheriting OS username during initdb
- Update all psql/createdb commands to use explicit `-U` flag
- Add `check_mode: false` to op commands so 1Password fetches run during `--check` mode
- Add PostgreSQL and Miniflux health checks to indri-services-check

## Test plan
- [x] Renamed existing superuser from `erichblume` to `eblume`
- [x] Ran `mise run provision-indri -- --tags postgresql --check --diff` successfully
- [x] Verified connection as `eblume` superuser via Tailscale
- [x] Ran `mise run indri-services-check` - all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/17
2026-01-16 14:41:36 -08:00
adf6f4fbe9 Add PostgreSQL and Miniflux services to tailnet (#16)
## Summary
- Add PostgreSQL 18 as a new service at `pg.tail8d86e.ts.net:5432`
- Add Miniflux RSS/Atom feed reader at `feed.tail8d86e.ts.net`
- Both services managed via homebrew/brew services
- Pulumi ACL tags added (tag:pg, tag:feed)
- Alloy log collection configured for both services
- Zettelkasten documentation updated

## Manual Setup Required

Before running ansible, the following steps are needed on indri:

### 1. Apply Pulumi tags
```bash
mise run tailnet-up
```
Then apply tags to indri in Tailscale admin console.

### 2. Create 1Password entries
- miniflux PostgreSQL user password
- miniflux admin password (for first run)

### 3. Set PostgreSQL user password (after ansible installs postgres)
```bash
ssh indri '/opt/homebrew/opt/postgresql@18/bin/psql -c "ALTER USER miniflux PASSWORD '\''your-password'\'';"'
```

### 4. Create password files on indri
```bash
ssh indri 'echo "your-db-password" > ~/.miniflux-db-password && chmod 600 ~/.miniflux-db-password'
ssh indri 'echo "your-admin-password" > ~/.miniflux-admin-password && chmod 600 ~/.miniflux-admin-password'
```

### 5. Create ~/.pgpass for borgmatic
```bash
ssh indri 'echo "localhost:5432:miniflux:miniflux:YOUR_PASSWORD" > ~/.pgpass && chmod 600 ~/.pgpass'
```

### 6. Run ansible with first-run admin creation
```bash
mise run provision-indri -- -e miniflux_create_admin=1
```

### 7. Update borgmatic config
Add to `~/.config/borgmatic/config.yaml` on indri:
```yaml
postgresql_databases:
    - name: miniflux
      hostname: localhost
      port: 5432
      username: miniflux
```

### 8. Cleanup after first run
```bash
ssh indri 'rm ~/.miniflux-admin-password'
```

## Test plan
- [ ] Run `mise run tailnet-up` and verify Pulumi changes
- [ ] Apply tags to indri in Tailscale admin
- [ ] Run `mise run provision-indri -- --check --diff` for dry run
- [ ] Run `mise run provision-indri -- -e miniflux_create_admin=1`
- [ ] Approve services in Tailscale admin
- [ ] Verify PostgreSQL: `ssh indri '/opt/homebrew/opt/postgresql@18/bin/pg_isready'`
- [ ] Verify Miniflux: `curl https://feed.tail8d86e.ts.net/healthcheck`
- [ ] Run `mise run indri-services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/16
2026-01-16 12:30:20 -08:00
3f4e40f3ae Add Pulumi for tailnet IaC management (#15)
## Summary
- Manage tail8d86e.ts.net ACLs, tags, and DNS via Pulumi + Python
- State stored in Pulumi Cloud (free tier) to avoid circular dependency
- OAuth authentication via 1Password for secure credential management
- New mise tasks: `tailnet-preview`, `tailnet-up`

## Architecture
Two-layer approach:
- **Layer 1 (Pulumi)**: Tailnet-wide config (ACLs, tags, DNS)
- **Layer 2 (Ansible)**: Node-local `tailscale serve` config (unchanged)

## Test plan
- [x] Exported current ACL from Tailscale API
- [x] Imported existing ACL into Pulumi state
- [x] Verified `mise run tailnet-preview` shows no changes
- [x] Verified `mise run tailnet-up` applies successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/15
2026-01-15 20:55:25 -08:00
ae1513e7e9 Add Plex Media Server observability (#13)
## Summary
- Add `plex_metrics` ansible role with textfile collector for Prometheus metrics
- Add Plex log collection to Alloy (forwards to Loki)
- Add Grafana dashboard for Plex monitoring (status, library counts, sessions, transcoding, logs)

## Metrics Collected
- `plex_up` - server health
- `plex_version_info` - server version
- `plex_sessions_total/playing/paused` - active sessions
- `plex_transcode_sessions_total/video/audio` - transcoding status
- `plex_library_items{library,type}` - library item counts

## Prerequisites
Plex token must be stored at `~/.plex-token` on indri (already done).

## Test plan
- [x] Dry-run passed (`mise run provision-indri -- --check --diff`)
- [ ] Apply changes (`mise run provision-indri`)
- [ ] Verify metrics: `ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/plex.prom'`
- [ ] Verify logs in Grafana Explore: `{service="plex"}`
- [ ] Check Plex dashboard in Grafana

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/13
2026-01-15 15:27:59 -08:00
2a1359a3b6 Fix ansible handler timeouts for alloy and loki restarts (#12)
## Summary
- Use async with poll: 0 for alloy and loki restart handlers
- Fire-and-forget approach prevents ansible from hanging on graceful shutdown

## Test plan
- [x] Manually verified `brew services restart grafana-alloy` works
- [x] Run full ansible playbook and verify it completes without timeout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/12
2026-01-15 13:56:11 -08:00
ba5cd75ee2 Fix ansible handler timeouts for alloy and loki restarts
Use async with poll: 0 to fire-and-forget service restarts.
These services have graceful shutdown periods that can exceed
ansible's default command timeout.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 12:39:28 -08:00
242c1880de Add Grafana Alloy and Loki for unified observability (#11)
## Summary
- Add Grafana Alloy to replace node_exporter for metrics collection
- Add Loki for log aggregation and storage
- Configure Alloy to collect logs from all services (grafana, forgejo, prometheus, tailscale, transmission, devpi, kiwix, borgmatic)
- Update Prometheus to accept metrics via remote_write
- Add Loki datasource to Grafana

## Test plan
- [ ] Run \`mise run provision-indri -- --check --diff\` to verify changes
- [ ] Apply with \`mise run provision-indri\`
- [ ] Verify services: \`mise run indri-services-check\`
- [ ] Check Grafana Explore with Loki datasource
- [ ] Query logs: \`{service="grafana"}\`
- [ ] Verify metrics still flowing to Prometheus dashboards

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/11
2026-01-15 12:24:13 -08:00
d8a0ef6482 Add devpi PyPI caching proxy role for indri (#9)
## Summary
- Add ansible role for devpi-server as a transparent PyPI caching proxy
- LaunchAgent with KeepAlive runs via `mise x -- devpi-server`
- Listens on port 3141, data stored in `~/devpi`
- Health checks added to `indri-services-check` script

## Manual Setup Required (on indri, before provisioning)
1. Add to `~/.config/mise/config.toml`:
   ```toml
   [tools]
   "pipx:devpi-server" = "latest"
   "pipx:devpi-web" = "latest"
   "pipx:devpi-client" = "latest"
   ```
2. Run `mise install`
3. Initialize: `mise x -- devpi-init --serverdir ~/devpi`

## Post-Provisioning
- Set up Tailscale service `pypi` on port 443 → 3141
- Configure client pip.conf with index-url

## Test plan
- [x] Ansible syntax check passes
- [x] Dry-run: `mise run provision-indri -- --check --diff`
- [x] Apply: `mise run provision-indri`
- [x] Health check: `mise run indri-services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/9
2026-01-15 08:31:09 -08:00
50c713b5de Add macOS-compatible Node Exporter Grafana dashboard (#8)
## Summary
- Adds a new Grafana dashboard for Node Exporter metrics on macOS hosts
- Uses macOS-native memory metrics (node_memory_total_bytes, node_memory_active_bytes, etc.) instead of Linux-specific ones
- Includes dropdown selectors for instance, disk, and network device filtering

## Details
The standard Node Exporter dashboards show "No Data" for memory panels on macOS because they query Linux-specific metrics like `node_memory_MemTotal_bytes`. macOS node_exporter exports different metrics:

| Linux | macOS |
|-------|-------|
| node_memory_MemTotal_bytes | node_memory_total_bytes |
| node_memory_MemFree_bytes | node_memory_free_bytes |
| node_memory_Buffers_bytes | (not available) |
| node_memory_Cached_bytes | (not available) |

macOS has unique memory categories: Wired, Active, Compressed, Inactive, Free.

## Test plan
- [x] Dashboard deployed to indri via ansible
- [x] All panels showing data for indri
- [x] Instance selector works to switch between hosts
- [x] Disk and network device filters work

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/8
2026-01-14 20:53:57 -08:00
d9be8c27bc Add 32 devdocs ZIM archives for programming documentation (#7)
## Summary
- Adds offline documentation for: bash, c, click, cmake, cpp, css, django-rest-framework, django, docker, duckdb, fish, gcc, git, go, godot, hammerspoon, homebrew, javascript, kubectl, kubernetes, latex, lua, markdown, nginx, nix, postgresql, python, redis, sqlite, typescript, werkzeug, zig
- All January 2026 versions from download.kiwix.org/zim/devdocs/
- Downloads via BitTorrent through transmission

## Test plan
- [x] Deployed to indri via `mise run provision-indri`
- [x] All 32 torrents added and downloaded (small files, completed instantly)
- [x] 43 ZIM files now available in kiwix directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/7
2026-01-14 18:28:34 -08:00
10012a4cf2 Add upload/download ratio and period transfer panels to Transmission dashboard (#6)
## Summary
- Adds Upload/Download Ratio stat panel with color thresholds (red < 0.5, yellow < 1, green >= 1)
- Adds Downloaded (Period) stat panel showing bytes downloaded in selected time range
- Adds Uploaded (Period) stat panel showing bytes uploaded in selected time range

Uses PromQL `increase()` on existing counter metrics - no new metrics collection needed.

## Test plan
- [x] Deployed to indri via `mise run provision-indri`
- [x] Grafana restarted successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/6
2026-01-14 18:08:39 -08:00
2f28b151f5 Fix launchctl idempotency in kiwix and borgmatic roles
Check if LaunchAgent is already loaded before attempting to load it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:14:52 -08:00
e534e59556 Add provision-indri mise task and fix idempotency
- Add mise-tasks/provision-indri script to run ansible playbook
- Fix transmission_metrics launchctl load to be idempotent
- Update CLAUDE.md to reference mise run provision-indri

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:10:30 -08:00
e264b39cd6 Add total torrent size metric and dashboard panel
- Query torrent-get RPC to sum totalSize of all torrents
- Add transmission_torrents_size_bytes gauge metric
- Add "Total Torrent Size" timeseries panel to dashboard

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:00:52 -08:00
0afd34590d Fix transmission-metrics session ID parsing
Transmission doesn't support HEAD requests, so use -i flag with sed to
parse only the HTTP headers (stopping at the blank line before body).
Also anchor grep pattern to line start to avoid matching HTML content.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 13:56:08 -08:00
7468023cd2 Add transmission dashboard to grafana
- Add node_exporter ansible role to enable textfile collector
- Add transmission_metrics role with script and LaunchAgent
  - Collects metrics every 60s via transmission RPC
  - Writes to /opt/homebrew/var/node_exporter/textfile/transmission.prom
- Update grafana role to provision dashboards from files
- Add transmission.json dashboard with:
  - Status indicator, torrent counts
  - Transfer speeds, cumulative stats
  - Time series graphs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 13:46:51 -08:00
eb2f5b44cd Fix kiwix plist to only include available ZIM archives
The LaunchAgent plist now dynamically includes only ZIM files that
actually exist in the kiwix directory, rather than all configured
archives. This prevents kiwix-serve from crashing when torrents are
still downloading.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 13:17:55 -08:00
3847c12b42 Fix transmission config to prevent perpetual ansible diffs
- Expand settings.json template to include all transmission defaults
- Use static pre-hashed rpc-password so transmission doesn't regenerate
- Change file mode from 0644 to 0600 to match transmission's default
- Add Jinja comment explaining the RPC password workaround

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 13:03:46 -08:00
4add1684c3 Enable additional ZIM archives for kiwix
New archives (~95G total):
- Project Gutenberg 2023 (72G) - 60,000+ public domain books
- iFixit (3.3G) - Repair guides
- Stack Exchange: SuperUser (3.7G), Math (6.9G)
- LibreTexts: Biology, Chemistry, Engineering, Mathematics, Physics, Humanities

Also:
- Fix transmission to only restart when config changes
- Update CLAUDE.md to use full ansible paths

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 12:47:54 -08:00
6bb5f323d6 Fix transmission config path to use homebrew location
- Homebrew's transmission-cli service uses /opt/homebrew/var/transmission/
  not ~/.config/transmission-daemon/
- Add task to clean up old config directory
- Update zettelkasten with correct paths

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 11:59:24 -08:00
b865d70456 Add transmission role for torrent-based ZIM downloads
- Add new transmission ansible role using homebrew + brew services
- Configure transmission to download to ~/transmission with localhost-only RPC
- Modify kiwix role to use transmission for downloading ZIM archives via BitTorrent
- Add role dependency so running --tags kiwix auto-runs transmission
- Keep fallback to direct HTTP download when kiwix_use_transmission: false
- Symlink completed downloads from transmission dir to kiwix-tools dir

This reduces load on kiwix.org servers and allows downloads to continue
in the background without blocking ansible runs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 11:59:24 -08:00
4b365a1294 Fix borgmatic LaunchAgent to work with mise-installed binaries
The LaunchAgent was failing because launchd runs with a minimal PATH
that doesn't include mise-installed binaries or homebrew. This adds:

- Use `mise x` wrapper to run borgmatic (survives version updates)
- Add /opt/homebrew/bin to PATH for borg dependency
- Add ansible tags to indri playbook for targeted role runs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 10:33:48 -08:00
4283f2237d Add grafana datasource provisioning and update workflow docs
- Configure grafana to use provisioned datasources instead of UI config
- Add prometheus datasource template managed by ansible
- Create minimal grafana.ini with custom provisioning path
- Move ansible_managed to group_vars (fixes deprecation warning)
- Add Remote Hosts and Git Workflow sections to CLAUDE.md
- Document feature branch workflow with tea CLI for PRs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 07:23:10 -08:00
d1396b1cfb Add forgejo role to ansible playbook
Manages installation and service via homebrew. Config at
/opt/homebrew/var/forgejo/custom/conf/app.ini contains secrets
and is not templated - backed up by borgmatic instead.

Includes check that fails with restore instructions if config missing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 23:00:46 -08:00
d761e61809 Add borgmatic role to ansible playbook
Manages scheduled LaunchAgent for daily backups at 2:00 AM.
Borgmatic itself is installed via mise (pipx), not managed by ansible.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 22:50:28 -08:00
6977197c98 Add ZIM archive management to kiwix role
- Configure ZIM archives as a variable list with download URLs
- Auto-download missing archives from download.kiwix.org
- Template plist to serve all configured archives
- Skip checksum calculation on stat for performance
- Add commented options for Gutenberg, iFixit, Stack Exchange, LibreTexts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 22:36:53 -08:00
0d2978259d Add kiwix role to ansible playbook
Manages kiwix-serve LaunchAgent configuration for the wikipedia
mirror on indri (port 5501).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 21:58:12 -08:00