Commit graph

80 commits

Author SHA1 Message Date
b97d461a5a P6: Kiwix and Transmission migration planning (#35)
## Summary
- Detailed planning document for Phase 6 of k8s migration
- Transmission as standalone general-purpose torrent service with web UI at torrent.tail8d86e.ts.net
- NFS storage on sifaka (/volume1/torrents) shared between both services
- Declarative ZIM torrent list in kiwix's ConfigMap, synced to transmission via sidecar
- ZIM watcher CronJob for automatic kiwix restart when new archives complete
- Supports both GitOps (declarative) and interactive (web UI) torrent management

## Architecture Highlights
- **torrent namespace**: Standalone transmission with Tailscale ingress
- **kiwix namespace**: kiwix-serve with torrent-sync sidecar
- **Shared NFS PV**: Single PV referenced by PVCs in both namespaces
- **No backup needed**: Sifaka is RAID 5/6 and already the backup target

## Deployment and Testing
- [ ] Review plan document
- [ ] Verify NFS export on sifaka is feasible
- [ ] Approve architecture decisions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/35
2026-01-20 18:42:11 -08:00
f98103a58d P5 done 2026-01-20 15:04:46 -08:00
0439fbb704 P5: Migrate devpi to Kubernetes (#34)
## Summary
- Migrate devpi PyPI caching proxy from indri LaunchAgent to Kubernetes
- Custom container image with devpi-server + devpi-web + auto-init
- StatefulSet with 50Gi PVC, Tailscale Ingress at pypi.tail8d86e.ts.net
- Remove devpi from ansible playbooks and update CLAUDE.md with k8s workflow

## Key Changes
- Add CRI-O registry mirror config for registry.tail8d86e.ts.net
- Change ArgoCD apps to manual sync (was auto-sync causing issues)
- 2Gi memory limit for Whoosh indexer (reclaimed after startup)

## Deployment and Testing
- [x] devpi pod healthy in k8s
- [x] pip install through proxy works
- [x] mcquack 1.0.0 uploaded and installable
- [x] Old devpi stopped on indri

## Post-Merge
Reset ArgoCD to main:
```
argocd app set apps --revision main && argocd app sync apps
argocd app set devpi --revision main && argocd app sync devpi
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/34
2026-01-20 14:55:37 -08:00
b2307412fc Add P4 implementation notes and mark complete 2026-01-20 09:10:23 -08:00
735b643429 P4: Miniflux migration + PostgreSQL consolidation (#33)
## Summary
- Deploy miniflux in k8s via ArgoCD
- Expose via Tailscale Ingress at feed.tail8d86e.ts.net
- Retire brew PostgreSQL (no longer needed)
- Rename k8s-pg to pg (canonical hostname)
- Remove ansible miniflux and postgresql roles
- Update borgmatic to backup pg.tail8d86e.ts.net
- Update all zk documentation

## Deployment and Testing
- [x] Miniflux pod running in k8s
- [x] User login works at https://feed.tail8d86e.ts.net
- [x] Feeds and entries visible
- [x] brew miniflux and postgresql stopped
- [x] Tailscale services migrated (feed, pg)
- [x] zk documentation updated
- [x] Run ansible to apply role removals
- [ ] Verify borgmatic backup with new pg hostname

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/33
2026-01-20 09:04:47 -08:00
463f476374 P3 done
Updated P3_postgresql.complete.md with full implementation notes including:
- borgmatic borg path fix
- Disaster recovery testing
- CloudNativePG managed roles for borgmatic user
- Dual database backup configuration
- ACL grant for homelab → k8s
- ArgoCD selfHeal disabled for feature branch workflow
- CNPG default values to prevent drift

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 18:19:33 -08:00
e69a3df2d4 P3 done 2026-01-19 18:03:48 -08:00
0c6f0a13c3 Add CNPG default values to prevent ArgoCD drift
CloudNativePG operator fills in connectionLimit, ensure, and inherit
defaults on managed roles. Adding these explicitly keeps ArgoCD in sync.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 18:02:42 -08:00
eb952aae01 P3: PostgreSQL disaster recovery test and borgmatic k8s-pg backup (#32)
## Summary
- Fixed borgmatic `borg: command not found` by adding `local_path` config option
- Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg
- Added borgmatic user to k8s-pg via CloudNativePG managed roles
- Configured borgmatic to backup both localhost and k8s-pg PostgreSQL databases
- Added Tailscale ACL grant for `tag:homelab` → `tag:k8s` on port 5432
- Disabled selfHeal on apps app to allow manual revision changes during development

## Changes
- `ansible/roles/borgmatic/` - Added `local_path` and k8s-pg database entry
- `ansible/roles/postgresql/tasks/main.yml` - Added k8s-pg to `.pgpass`
- `argocd/apps/apps.yaml` - Disabled selfHeal
- `argocd/manifests/databases/blumeops-pg.yaml` - Added borgmatic managed role
- `argocd/manifests/databases/secret-borgmatic.yaml.tpl` - New secret template
- `pulumi/policy.hujson` - Added ACL grant for backup access

## Deployment and Testing
- [x] Borgmatic backup runs successfully
- [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries verified)
- [x] borgmatic user created in k8s-pg with pg_read_all_data role
- [x] Both localhost and k8s-pg databases in backup archive
- [x] zk documentation updated (borgmatic.md, postgresql.md)
- [ ] After merge: set blumeops-pg app back to main revision

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/32
2026-01-19 18:00:32 -08:00
f2541c3f77 Fix minikube role idempotency for zot mirror config (#31)
## Summary
- Fixed trailing newline mismatch in config comparison (ansible command module strips whitespace, slurp preserves it)
- Only copy temp file when config actually needs updating (avoids spurious changes)
- Task now properly skips when config is already correct

## Deployment and Testing
- [x] Verified idempotency: `changed=0` on repeated runs
- [x] Verified change detection: corrupted config triggers proper update
- [x] ansible-lint passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/31
2026-01-19 16:19:52 -08:00
130c044523 Fix hanging minikube provision 2026-01-19 15:49:11 -08:00
f0c28a3cdd Rename P2 plan to .complete.md 2026-01-19 15:06:27 -08:00
45dfefa8df Mark P2 complete with implementation notes
Documents lessons learned:
- SSH credential template for all forge repos
- Kustomize patches must omit namespace for matching
- Tailscale hostname cutover requires manual admin console deletion
- ArgoCD workflow: all apps target main, manual sync for control

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:06:14 -08:00
258c88f2f7 Fix kustomize patch: remove namespace for proper matching
Kustomize matches patches before namespace transformation, so the
patch file shouldn't specify namespace (kustomization.yaml adds it).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 15:00:33 -08:00
623b122f58 Fix kustomization: known_hosts as resource not patch
The argocd-ssh-known-hosts-cm ConfigMap needs to be a resource,
not a patch, because the upstream install.yaml includes it inline
in a way kustomize can't patch.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 14:45:33 -08:00
7e6742ad24 K8s Migration Phase 2: Grafana to Kubernetes (#30)
## Summary
- Migrate Grafana from Homebrew/Ansible to Kubernetes deployment
- Switch CloudNativePG to use forge-mirrored Helm chart (HTTPS, no auth needed)
- Add Grafana Helm chart deployment via ArgoCD with multi-source pattern
- Add Grafana config (Tailscale Ingress, 9 dashboard ConfigMaps)
- Update Loki to bind 0.0.0.0 for k8s pod access via `host.containers.internal`

## Key Changes
- `argocd/apps/grafana.yaml` - Grafana Helm chart Application
- `argocd/apps/grafana-config.yaml` - Ingress + dashboard ConfigMaps
- `argocd/apps/cloudnative-pg.yaml` - Now uses forge mirror instead of external Helm repo
- `ansible/roles/loki/templates/loki-config.yaml.j2` - Bind 0.0.0.0

## Deployment and Testing
- [x] Deploy Loki config change: `mise run provision-indri -- --tags loki`
- [x] Create namespace: `ki create namespace monitoring`
- [x] Create secret: `op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl | ki apply -f -`
- [x] Sync ArgoCD apps (grafana, grafana-config)
- [x] Verify Grafana works at https://grafana.tail8d86e.ts.net
- [x] Remove svc:grafana from ansible tailscale_serve
- [x] Stop brew grafana: `ssh indri 'brew services stop grafana'`
- [x] Delete ansible grafana role

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/30
2026-01-19 14:40:25 -08:00
4c1c4b92e1 Scan full repo history in trufflehog 2026-01-19 10:12:56 -08:00
680ad1095b Rename P1 to complete 2026-01-19 10:03:52 -08:00
a8f4d00294 K8s Migration Phase 1: Infrastructure Setup (#29)
## Summary
- Split k8s migration plan into phases folder for easier navigation
- Added `tag:k8s` to Pulumi ACLs for Kubernetes workloads
- Phase 1 work in progress

## Phase 1 Goals
- Tailscale Kubernetes Operator
- CloudNativePG Operator
- PostgreSQL cluster for future app migrations

## Deployment and Testing
- [ ] Review Phase 1 plan
- [ ] `mise run tailnet-preview` to verify ACL changes
- [ ] `mise run tailnet-up` to apply ACL changes
- [ ] Create Tailscale OAuth client (manual)
- [ ] Deploy operators and PostgreSQL cluster

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/29
2026-01-19 09:49:52 -08:00
61dced048b Fix borgmatic-metrics script PATH issue (#28)
## Summary
- Fixed borgmatic-metrics script failing in LaunchAgent context
- Changed from `mise x -- borg` to absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`)
- This fixes the Grafana dashboard showing "DOWN" for Repository Status and missing time series data

## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags borgmatic-metrics` to deploy the fix
- [ ] Wait for the hourly metrics collection (or manually run `ssh indri '~/bin/borgmatic-metrics'`)
- [ ] Verify Grafana dashboard shows "UP" status and populated graphs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/28
2026-01-18 14:57:35 -08:00
3679124ebd Expose Kubernetes API as Tailscale service (Step 0.14) (#27)
## Summary
- Add `tag:k8s-api` to Pulumi ACLs and indri device tags
- Configure Tailscale serve with TCP passthrough for k8s API at `k8s.tail8d86e.ts.net`
- Update minikube role to include `k8s.tail8d86e.ts.net` in certificate SANs
- Add `apiserver_port` config option (internal port 6443, dynamic host port with podman driver)
- Document Step 0.14 in k8s-migration plan (added post-Phase 0 completion)

The Kubernetes API is now accessible at `https://k8s.tail8d86e.ts.net` using TCP passthrough to preserve mTLS authentication.

## Deployment and Testing
- [x] Pulumi ACLs applied
- [x] Tailscale service created and approved in admin console
- [x] Minikube cluster recreated with new cert SANs
- [x] tailscale serve configured with TCP passthrough
- [x] 1Password credentials updated with new certs
- [x] Kubeconfig updated on gilbert
- [x] `mise run indri-services-check` passes
- [x] `kubectl --context=minikube-indri get nodes` works via Tailscale

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/27
2026-01-18 12:49:20 -08:00
19a82373d5 K8s Migration Phase 0: Foundation Infrastructure (#26)
## Summary
- Step 0.1: Update Pulumi ACLs with tag:registry
- Step 0.3: Create Zot registry ansible role with mcquack LaunchAgent
- Step 0.4: Add Zot to Tailscale Serve configuration
- Step 0.5: Create Zot metrics role for Prometheus scraping
- Step 0.6: Add Zot log collection to Alloy
- Step 0.7: Update indri-services-check with zot checks
- Step 0.8: Add podman role for container runtime
- Step 0.9: Add minikube role for Kubernetes cluster
- Step 0.10: Configure remote kubectl access with 1Password credentials

## Remaining Steps
- [ ] Step 0.11: Add minikube to indri-services-check
- [ ] Step 0.12: Create zettelkasten documentation
- [ ] Step 0.13: Verify main playbook (already done - roles added)

## Deployment and Testing
- [x] Zot registry deployed and accessible at https://registry.tail8d86e.ts.net
- [x] Podman machine running on indri
- [x] Minikube cluster running on indri
- [x] kubectl access from gilbert working with 1Password credentials
- [ ] indri-services-check passes all checks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/26
2026-01-18 12:06:28 -08:00
ee196b0c10 Fix Phase 0 plan based on review feedback (#25)
## Summary
- Step 0.3: Use launchctl unload/load pattern for handlers (consistent with existing handlers)
- Step 0.6: Correct file path - add zot logs to alloy defaults/main.yml
- Step 0.9: Use cri-o runtime instead of containerd
- Step 0.10: Simplify kubeconfig instructions - focus on goal not implementation

## Deployment and Testing
- [x] Documentation-only change, no deployment needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/25
2026-01-17 20:07:10 -08:00
c8433467c1 Add Kubernetes migration plan documentation (#24)
## Summary
- Comprehensive phased plan for migrating blumeops services to minikube
- Technical decisions documented: Zot registry, Podman driver, CloudNativePG, Tailscale Operator
- 9 migration phases with verification and rollback procedures
- LaunchAgent absolute path requirements documented
- Observability requirements (zk docs, logging, metrics, dashboards) for new services

## Deployment and Testing
- [x] Plan document created at `docs/k8s-migration.md`
- [ ] Review plan phases for completeness
- [ ] Validate technical decisions align with requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/24
2026-01-17 17:34:53 -08:00
e6d302b40b Harden Tailscale ACL policy with least-privilege grants (#23)
## Summary
- Replace permissive wildcard ACL (`*` -> `*`) with specific service grants
- Admin: full access to all services including NAS
- Member: user-facing services only (no Grafana/Loki/NAS)
- Add device tagging for gilbert (workstation) and sifaka (NAS) via Pulumi
- SSH hardening: remove root access, use "check" action with MFA
- Add ACL tests to validate policy behavior

## Deployment and Testing
- [x] Pulumi preview passes
- [x] HuJSON syntax validated
- [x] ACL tests defined and passing
- [ ] Deploy with `mise run tailnet-up`
- [ ] Verify SSH access from gilbert to indri
- [ ] Verify Allison cannot access Grafana/Loki/NAS

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/23
2026-01-17 11:58:04 -08:00
0918764e93 Rename Node Exporter dashboard to macOS (#22)
## Summary
- Renamed dashboard from "Node Exporter - macOS" to just "macOS" since it now uses Alloy
- Updated filename, title, uid, and tags to reflect the change

## Deployment and Testing
- [ ] Deploy with `mise run provision-indri -- --tags grafana`
- [ ] Verify dashboard accessible at https://grafana.tail8d86e.ts.net

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/22
2026-01-17 09:29:19 -08:00
3962e5a7de Fix borgmatic PostgreSQL backup and update backup sources (#21)
## Summary
- Fix PostgreSQL backup failure by adding explicit `pg_dump_command` path (was failing with "pg_dump: command not found" in LaunchAgent)
- Remove `~/code/3rd/kiwix-tools` from backups (was just symlinks to ZIM archives in transmission)
- Enable Loki log backup by removing from exclude_patterns

## Deployment and Testing
- [x] Dry run with `--check --diff` shows expected changes
- [ ] Deploy with `mise run provision-indri -- --tags borgmatic`
- [ ] Verify config deployed: `ssh indri 'cat ~/.config/borgmatic/config.yaml'`
- [ ] Run manual backup to test: `ssh indri 'mise x -- borgmatic create --verbosity 1'`
- [ ] Verify PostgreSQL dump succeeds (no "pg_dump: command not found" error)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/21
2026-01-17 09:22:01 -08:00
75426be1dc Remove ansible role meta dependencies to fix duplicate execution (#20)
## Summary
- Remove all `meta/main.yml` dependencies from ansible roles
- Role ordering is now controlled entirely by `indri.yml` playbook
- Fix incorrect roles path in CLAUDE.md (`playbooks/roles` → `roles`)

## Why
Ansible's tag accumulation behavior prevents proper role deduplication when using meta dependencies. When a role is pulled in as a dependency, the parent role's tags are added to the dependency's tags (e.g., `[loki]` becomes `[alloy, loki]`), making them appear as different invocations to Ansible and causing roles to run multiple times.

## Deployment and Testing
- [x] Verified with `ansible-playbook --list-tasks` that each role now appears exactly once
- [x] Run full provision to verify no regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/20
2026-01-16 22:50:34 -08:00
9931829d03 Add pre-commit hooks for code quality (#19)
## Summary
- Add pre-commit framework with hooks for YAML, Ansible, Python, shell, TOML, JSON, and secret detection
- Fix all 91+ ansible-lint violations (variable naming, handler capitalization, changed_when)
- Fix shellcheck warnings in mise-tasks scripts
- Document pre-commit setup in README.md

## Deployment and Testing
- [x] All pre-commit hooks pass (`uvx pre-commit run --all-files`)
- [x] Test ansible playbook with `--check` mode
- [x] Run `mise run indri-services-check` after deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/19
2026-01-16 19:33:02 -08:00
78f14f8bde Reworded CLAUDE.md 2026-01-16 18:47:47 -08:00
d3d3041b27 Decouple ZIM/torrent ansible tasks for faster provisioning (#18)
## Summary
- Simplify kiwix role from 213 lines to 151 lines (-30%)
- Replace per-archive torrent status loops with single shell command
- Decouple kiwix startup from declared inventory - now serves whatever completed ZIM files exist
- Fix tailscale_serve role to handle empty JSON in check mode

## Performance improvement
- **Before**: ~132 operations (44 archives × 3 loops for status check, recheck, symlink)
- **After**: ~5 operations (1 shell script + 1 find + conditional symlinks)
- Expected reduction: ~3 minutes per ansible run

## Test plan
- [x] Ran `mise run provision-indri -- --check --diff` to preview changes
- [x] Ran `mise run provision-indri` to apply changes
- [x] Ran `mise run indri-services-check` - all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/18
2026-01-16 15:14:00 -08:00
812b78bf61 Use explicit PostgreSQL superuser name and fix check mode (#17)
## Summary
- Add `postgresql_superuser` variable (`eblume`) to prevent PostgreSQL from inheriting OS username during initdb
- Update all psql/createdb commands to use explicit `-U` flag
- Add `check_mode: false` to op commands so 1Password fetches run during `--check` mode
- Add PostgreSQL and Miniflux health checks to indri-services-check

## Test plan
- [x] Renamed existing superuser from `erichblume` to `eblume`
- [x] Ran `mise run provision-indri -- --tags postgresql --check --diff` successfully
- [x] Verified connection as `eblume` superuser via Tailscale
- [x] Ran `mise run indri-services-check` - all services healthy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/17
2026-01-16 14:41:36 -08:00
adf6f4fbe9 Add PostgreSQL and Miniflux services to tailnet (#16)
## Summary
- Add PostgreSQL 18 as a new service at `pg.tail8d86e.ts.net:5432`
- Add Miniflux RSS/Atom feed reader at `feed.tail8d86e.ts.net`
- Both services managed via homebrew/brew services
- Pulumi ACL tags added (tag:pg, tag:feed)
- Alloy log collection configured for both services
- Zettelkasten documentation updated

## Manual Setup Required

Before running ansible, the following steps are needed on indri:

### 1. Apply Pulumi tags
```bash
mise run tailnet-up
```
Then apply tags to indri in Tailscale admin console.

### 2. Create 1Password entries
- miniflux PostgreSQL user password
- miniflux admin password (for first run)

### 3. Set PostgreSQL user password (after ansible installs postgres)
```bash
ssh indri '/opt/homebrew/opt/postgresql@18/bin/psql -c "ALTER USER miniflux PASSWORD '\''your-password'\'';"'
```

### 4. Create password files on indri
```bash
ssh indri 'echo "your-db-password" > ~/.miniflux-db-password && chmod 600 ~/.miniflux-db-password'
ssh indri 'echo "your-admin-password" > ~/.miniflux-admin-password && chmod 600 ~/.miniflux-admin-password'
```

### 5. Create ~/.pgpass for borgmatic
```bash
ssh indri 'echo "localhost:5432:miniflux:miniflux:YOUR_PASSWORD" > ~/.pgpass && chmod 600 ~/.pgpass'
```

### 6. Run ansible with first-run admin creation
```bash
mise run provision-indri -- -e miniflux_create_admin=1
```

### 7. Update borgmatic config
Add to `~/.config/borgmatic/config.yaml` on indri:
```yaml
postgresql_databases:
    - name: miniflux
      hostname: localhost
      port: 5432
      username: miniflux
```

### 8. Cleanup after first run
```bash
ssh indri 'rm ~/.miniflux-admin-password'
```

## Test plan
- [ ] Run `mise run tailnet-up` and verify Pulumi changes
- [ ] Apply tags to indri in Tailscale admin
- [ ] Run `mise run provision-indri -- --check --diff` for dry run
- [ ] Run `mise run provision-indri -- -e miniflux_create_admin=1`
- [ ] Approve services in Tailscale admin
- [ ] Verify PostgreSQL: `ssh indri '/opt/homebrew/opt/postgresql@18/bin/pg_isready'`
- [ ] Verify Miniflux: `curl https://feed.tail8d86e.ts.net/healthcheck`
- [ ] Run `mise run indri-services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/16
2026-01-16 12:30:20 -08:00
3f4e40f3ae Add Pulumi for tailnet IaC management (#15)
## Summary
- Manage tail8d86e.ts.net ACLs, tags, and DNS via Pulumi + Python
- State stored in Pulumi Cloud (free tier) to avoid circular dependency
- OAuth authentication via 1Password for secure credential management
- New mise tasks: `tailnet-preview`, `tailnet-up`

## Architecture
Two-layer approach:
- **Layer 1 (Pulumi)**: Tailnet-wide config (ACLs, tags, DNS)
- **Layer 2 (Ansible)**: Node-local `tailscale serve` config (unchanged)

## Test plan
- [x] Exported current ACL from Tailscale API
- [x] Imported existing ACL into Pulumi state
- [x] Verified `mise run tailnet-preview` shows no changes
- [x] Verified `mise run tailnet-up` applies successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/15
2026-01-15 20:55:25 -08:00
72c2dd7096 Add blumeops-tasks mise task for Todoist integration (#14)
## Summary
- Add `mise run blumeops-tasks` to fetch and display tasks from Todoist
- Uses uv run script with inline dependencies (httpx, rich)
- Fetches API credential securely via 1Password CLI
- Sorts tasks by custom priority order: p1, p2, p4, p3 (backlog last)
- Documents the task discovery workflow in CLAUDE.md

## Test plan
- [x] Verified `mise run blumeops-tasks` fetches and displays tasks correctly
- [x] Confirmed priority sorting works as expected

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/14
2026-01-15 18:03:19 -08:00
ae1513e7e9 Add Plex Media Server observability (#13)
## Summary
- Add `plex_metrics` ansible role with textfile collector for Prometheus metrics
- Add Plex log collection to Alloy (forwards to Loki)
- Add Grafana dashboard for Plex monitoring (status, library counts, sessions, transcoding, logs)

## Metrics Collected
- `plex_up` - server health
- `plex_version_info` - server version
- `plex_sessions_total/playing/paused` - active sessions
- `plex_transcode_sessions_total/video/audio` - transcoding status
- `plex_library_items{library,type}` - library item counts

## Prerequisites
Plex token must be stored at `~/.plex-token` on indri (already done).

## Test plan
- [x] Dry-run passed (`mise run provision-indri -- --check --diff`)
- [ ] Apply changes (`mise run provision-indri`)
- [ ] Verify metrics: `ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/plex.prom'`
- [ ] Verify logs in Grafana Explore: `{service="plex"}`
- [ ] Check Plex dashboard in Grafana

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/13
2026-01-15 15:27:59 -08:00
2a1359a3b6 Fix ansible handler timeouts for alloy and loki restarts (#12)
## Summary
- Use async with poll: 0 for alloy and loki restart handlers
- Fire-and-forget approach prevents ansible from hanging on graceful shutdown

## Test plan
- [x] Manually verified `brew services restart grafana-alloy` works
- [x] Run full ansible playbook and verify it completes without timeout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/12
2026-01-15 13:56:11 -08:00
ba5cd75ee2 Fix ansible handler timeouts for alloy and loki restarts
Use async with poll: 0 to fire-and-forget service restarts.
These services have graceful shutdown periods that can exceed
ansible's default command timeout.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 12:39:28 -08:00
242c1880de Add Grafana Alloy and Loki for unified observability (#11)
## Summary
- Add Grafana Alloy to replace node_exporter for metrics collection
- Add Loki for log aggregation and storage
- Configure Alloy to collect logs from all services (grafana, forgejo, prometheus, tailscale, transmission, devpi, kiwix, borgmatic)
- Update Prometheus to accept metrics via remote_write
- Add Loki datasource to Grafana

## Test plan
- [ ] Run \`mise run provision-indri -- --check --diff\` to verify changes
- [ ] Apply with \`mise run provision-indri\`
- [ ] Verify services: \`mise run indri-services-check\`
- [ ] Check Grafana Explore with Loki datasource
- [ ] Query logs: \`{service="grafana"}\`
- [ ] Verify metrics still flowing to Prometheus dashboards

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/11
2026-01-15 12:24:13 -08:00
070f26dc6d Add zk-docs mise task for zettelkasten documentation (#10)
## Summary
- Add `mise run zk-docs` task to concatenate all blumeops-tagged zettelkasten cards
- Main project card is shown first, followed by service management logs
- Uses `bat` for output (added to Brewfile)
- Args are passed through to bat for custom formatting
- Update CLAUDE.md to use zk-docs command with plain output options
- Update README.md to note zettelkasten is private with contact email

## Test plan
- [x] `mise run zk-docs` displays all 6 blumeops cards
- [x] `mise run zk-docs -- --style=header --color=never --decorations=always` shows filenames without decoration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/10
2026-01-15 11:25:02 -08:00
2e326eb30d Critical security note for claude 2026-01-15 09:02:27 -08:00
c660674891 Remove settings.local.json from repo and add to gitignore
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 08:59:24 -08:00
d8a0ef6482 Add devpi PyPI caching proxy role for indri (#9)
## Summary
- Add ansible role for devpi-server as a transparent PyPI caching proxy
- LaunchAgent with KeepAlive runs via `mise x -- devpi-server`
- Listens on port 3141, data stored in `~/devpi`
- Health checks added to `indri-services-check` script

## Manual Setup Required (on indri, before provisioning)
1. Add to `~/.config/mise/config.toml`:
   ```toml
   [tools]
   "pipx:devpi-server" = "latest"
   "pipx:devpi-web" = "latest"
   "pipx:devpi-client" = "latest"
   ```
2. Run `mise install`
3. Initialize: `mise x -- devpi-init --serverdir ~/devpi`

## Post-Provisioning
- Set up Tailscale service `pypi` on port 443 → 3141
- Configure client pip.conf with index-url

## Test plan
- [x] Ansible syntax check passes
- [x] Dry-run: `mise run provision-indri -- --check --diff`
- [x] Apply: `mise run provision-indri`
- [x] Health check: `mise run indri-services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/9
2026-01-15 08:31:09 -08:00
50c713b5de Add macOS-compatible Node Exporter Grafana dashboard (#8)
## Summary
- Adds a new Grafana dashboard for Node Exporter metrics on macOS hosts
- Uses macOS-native memory metrics (node_memory_total_bytes, node_memory_active_bytes, etc.) instead of Linux-specific ones
- Includes dropdown selectors for instance, disk, and network device filtering

## Details
The standard Node Exporter dashboards show "No Data" for memory panels on macOS because they query Linux-specific metrics like `node_memory_MemTotal_bytes`. macOS node_exporter exports different metrics:

| Linux | macOS |
|-------|-------|
| node_memory_MemTotal_bytes | node_memory_total_bytes |
| node_memory_MemFree_bytes | node_memory_free_bytes |
| node_memory_Buffers_bytes | (not available) |
| node_memory_Cached_bytes | (not available) |

macOS has unique memory categories: Wired, Active, Compressed, Inactive, Free.

## Test plan
- [x] Dashboard deployed to indri via ansible
- [x] All panels showing data for indri
- [x] Instance selector works to switch between hosts
- [x] Disk and network device filters work

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/8
2026-01-14 20:53:57 -08:00
d9be8c27bc Add 32 devdocs ZIM archives for programming documentation (#7)
## Summary
- Adds offline documentation for: bash, c, click, cmake, cpp, css, django-rest-framework, django, docker, duckdb, fish, gcc, git, go, godot, hammerspoon, homebrew, javascript, kubectl, kubernetes, latex, lua, markdown, nginx, nix, postgresql, python, redis, sqlite, typescript, werkzeug, zig
- All January 2026 versions from download.kiwix.org/zim/devdocs/
- Downloads via BitTorrent through transmission

## Test plan
- [x] Deployed to indri via `mise run provision-indri`
- [x] All 32 torrents added and downloaded (small files, completed instantly)
- [x] 43 ZIM files now available in kiwix directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/7
2026-01-14 18:28:34 -08:00
10012a4cf2 Add upload/download ratio and period transfer panels to Transmission dashboard (#6)
## Summary
- Adds Upload/Download Ratio stat panel with color thresholds (red < 0.5, yellow < 1, green >= 1)
- Adds Downloaded (Period) stat panel showing bytes downloaded in selected time range
- Adds Uploaded (Period) stat panel showing bytes uploaded in selected time range

Uses PromQL `increase()` on existing counter metrics - no new metrics collection needed.

## Test plan
- [x] Deployed to indri via `mise run provision-indri`
- [x] Grafana restarted successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/6
2026-01-14 18:08:39 -08:00
ba03af15eb Set MISE_TASK_OUTPUT=interleave in provision-indri
Shows ansible output in real-time instead of buffered.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:15:11 -08:00
2f28b151f5 Fix launchctl idempotency in kiwix and borgmatic roles
Check if LaunchAgent is already loaded before attempting to load it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:14:52 -08:00
e534e59556 Add provision-indri mise task and fix idempotency
- Add mise-tasks/provision-indri script to run ansible playbook
- Fix transmission_metrics launchctl load to be idempotent
- Update CLAUDE.md to reference mise run provision-indri

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:10:30 -08:00
e264b39cd6 Add total torrent size metric and dashboard panel
- Query torrent-get RPC to sum totalSize of all torrents
- Add transmission_torrents_size_bytes gauge metric
- Add "Total Torrent Size" timeseries panel to dashboard

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 14:00:52 -08:00