blumeops

Author	SHA1	Message	Date
Erich Blume	8ca8798121	Switch to Buildah for container builds (#51 ) All checks were successful Test CI / test (push) Successful in 4s Details ## Summary - Replace Docker with Buildah for container image builds - No Docker socket required - buildah is daemonless - Cleaner security model (no privileged containers or socket mounting) - Remove Docker-related security context from deployment ## Changes - Update Dockerfile to install buildah/podman instead of docker-cli - Configure buildah storage with overlay driver and fuse-overlayfs - Update composite action to use `buildah bud` and `buildah push` - Add `imagePullPolicy: Always` to ensure fresh image pulls - Update test workflow to verify buildah/podman ## Testing - [ ] Runner pod starts successfully - [ ] Buildah is available in runner - [ ] Test workflow verifies buildah/podman versions - [ ] Container build workflow builds and pushes to zot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/51	2026-01-24 13:30:26 -08:00
Erich Blume	7893c41020	Enable Forgejo Actions (Phase 1) (#48 ) All checks were successful Test CI / test (push) Successful in 0s Details ## Summary - Refactor Forgejo app.ini to be managed by ansible with secrets from 1Password - Enable Forgejo Actions in config (`[actions] ENABLED = true`) - Add `repo.actions` to DEFAULT_REPO_UNITS - Clean up unused MySQL database fields (we use SQLite) ## Phase 1 Progress This PR covers the first part of Phase 1 (ci-cd-bootstrap plan): - [x] Refactor app.ini to ansible template - [x] Store secrets in 1Password - [x] Enable Actions in config - [ ] Deploy config changes (pending review) - [ ] Create runner registration token - [ ] Deploy runner to k8s - [ ] Test with simple workflow ## Deployment and Testing - [ ] Run `mise run provision-indri -- --tags forgejo` to deploy - [ ] Verify Forgejo restarts correctly - [ ] Verify Actions tab appears in repo settings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/48	2026-01-23 17:00:12 -08:00
Erich Blume	272ddb213b	Add TeslaMate deployment for Tesla Model Y data logging (#47 ) ## Summary - Add TeslaMate k8s deployment with Tailscale ingress at tesla.tail8d86e.ts.net - Add teslamate user to CloudNativePG blumeops-pg cluster - Add TeslaMate PostgreSQL datasource to Grafana - Import 18 TeslaMate Grafana dashboards for charging, drives, efficiency, etc. - Add teslamate database to borgmatic backup configuration ## Deployment and Testing - [ ] Create 1Password items: "TeslaMate DB Password" and "TeslaMate Encryption Key" - [ ] Apply database user secret: `op inject -i argocd/manifests/databases/secret-teslamate.yaml.tpl \| kubectl apply -f -` - [ ] Sync blumeops-pg: `argocd app sync blumeops-pg` - [ ] Create teslamate database - [ ] Apply teslamate secrets (encryption key, db connection) - [ ] Apply Grafana datasource secret: `op inject -i argocd/manifests/grafana-config/secret-teslamate-datasource.yaml.tpl \| kubectl apply -f -` - [ ] Sync apps and teslamate: `argocd app sync apps teslamate grafana grafana-config` - [ ] Complete Tesla API OAuth flow at https://tesla.tail8d86e.ts.net - [ ] Verify data collection starts - [ ] Verify Grafana dashboards show data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/47	2026-01-22 21:25:44 -08:00
Erich Blume	16bfe06b7b	Fix LaunchDaemon check to use become: true LaunchDaemons run in the system domain and require sudo to query. Without become: true, the check always fails and tries to reload. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 17:34:23 -08:00
Erich Blume	57bf8512dc	Log filtering cleanup and observability improvements (#45 ) ## Summary - Suppress noisy storage-provisioner Endpoints deprecation warning (upstream minikube issue) - Disable thermal collector on indri Alloy (not supported on macOS M1) - Add macOS power/thermal metrics collection via powermetrics LaunchDaemon - Add Power & Thermal section to macOS Grafana dashboard - Add logfmt parser for k8s log level extraction (Loki, Prometheus, etc.) - Extract more fields from JSON logs (zot compatibility - uses "message" not "msg") - Silence logfmt parse errors for non-logfmt logs - Fix JSON escaping in devpi dashboard ## Deployment and Testing - [x] Deployed Alloy config changes to indri via ansible - [x] Synced alloy-k8s and grafana-config via ArgoCD - [x] Verified power metrics appearing in Prometheus - [x] Verified thermal collector errors stopped - [x] Verified logfmt parse errors silenced - [x] Verified devpi dashboard loads correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/45	2026-01-22 17:30:08 -08:00
Erich Blume	e4a8405de7	Observability cleanup and k8s service monitoring (#43 ) (#43 ) ## Summary - Remove stale `/opt/homebrew/var/loki` from borgmatic backup (Loki migrated to k8s) - Add Alloy k8s DaemonSet for automatic pod log collection with auto-discovery - Add blackbox probes for miniflux, kiwix, transmission, devpi, argocd - Add transmission-exporter sidecar for full metrics (speed, torrent counts, ratios) - Replace stale devpi dashboard with probe-based metrics (status, response time, uptime) - Add unified "K8s Services Health" dashboard for service uptime/response monitoring ## Manual cleanup already performed - Deleted stale textfile metrics on indri: `devpi.prom`, `transmission.prom` - Deleted stale data directories on indri: `/opt/homebrew/var/loki/`, `/opt/homebrew/var/prometheus/` ## Deployment and Testing - [x] Sync `apps` application to pick up new alloy-k8s app - [x] Deploy alloy-k8s on feature branch: `argocd app set alloy-k8s --revision feature/observability-cleanup && argocd app sync alloy-k8s` - [x] Deploy torrent on feature branch (for transmission exporter): `argocd app set torrent --revision feature/observability-cleanup && argocd app sync torrent` - [x] Deploy prometheus on feature branch (for new scrape config): `argocd app set prometheus --revision feature/observability-cleanup && argocd app sync prometheus` - [x] Deploy grafana-config on feature branch (for dashboards): `argocd app set grafana-config --revision feature/observability-cleanup && argocd app sync grafana-config` - [x] Verify pod logs appear in Loki/Grafana - [x] Verify transmission metrics appear in Prometheus - [x] Verify service probe metrics appear in Prometheus - [x] Run `mise run provision-indri -- --tags borgmatic` to update borgmatic config - [ ] After merge, reset apps to main and resync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/43	2026-01-22 13:51:01 -08:00
Erich Blume	17023085cb	Migrate observability stack to Kubernetes (#42 ) Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack. Summary - Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal) - Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses - Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics - Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net) - Add ACL rule for port 9187 (CNPG metrics) - Delete obsolete ansible roles for prometheus and loki Changes - argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress - argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications - argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS - argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint - argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics - ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints - pulumi/policy.hujson - ACL for port 9187 - Deleted ansible/roles/prometheus/ and ansible/roles/loki/ Deployment and Testing - Stop prometheus and loki on indri - Sync ArgoCD apps (apps, prometheus, loki, grafana) - Run mise run provision-indri -- --tags alloy - Verify Grafana dashboards show data 🤖 Generated with https://claude.ai/claude-code Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42	2026-01-22 12:06:02 -08:00
Erich Blume	5a829e0afd	Remove unused indri tags and ansible roles (#41 ) ## Summary - Remove ansible roles for services migrated to k8s: devpi, kiwix, transmission - Also remove unused node_exporter and podman ansible roles - Remove service tags from indri for k8s-hosted services (grafana, kiwix, devpi, pg, feed) - Update indri description to reflect current architecture ## Changes Ansible roles removed (34 files, ~1000 lines): - devpi, devpi_metrics - kiwix - transmission, transmission_metrics - node_exporter - podman Pulumi indri tags removed: - tag:grafana, tag:kiwix, tag:devpi, tag:pg, tag:feed These services now run in k8s with their own Tailscale devices via tailscale-operator. ## Deployment and Testing - [x] Verified remaining ansible roles match indri.yml - [x] Verified no playbooks or role dependencies reference removed roles - [ ] Run `pulumi preview` to verify tag changes - [ ] Run `pulumi up` to apply tag changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/41	2026-01-21 20:18:53 -08:00
Erich Blume	7ec98210a9	P6: Migrate Kiwix and Transmission to Kubernetes (#39 ) ## Summary - Add Transmission BitTorrent daemon to k8s (torrent namespace) - Add Kiwix ZIM archive server to k8s (kiwix namespace) - NFS storage from sifaka for shared torrent/ZIM data - Torrent-sync sidecar in kiwix deployment to manage declarative ZIM list - ZIM-watcher CronJob to auto-restart kiwix when new archives appear - Remove transmission, transmission_metrics, and kiwix ansible roles from indri - Remove svc:kiwix from tailscale_serve defaults ## Key Decisions - Direct NFS mount for kiwix (no PVC) since it shares storage with transmission - Shell wrapper for kiwix-serve command (glob expansion) - Accept HTTP 409 as "ready" in torrent sync (transmission session ID mechanism) - Completed downloads stored in `/downloads/complete/` on sifaka ## Deployment and Testing - [x] Deployed transmission to k8s - [x] Verified transmission web UI at torrent.tail8d86e.ts.net - [x] Moved existing ZIM files to complete folder - [x] Deployed kiwix to k8s - [x] Verified kiwix web UI at kiwix.tail8d86e.ts.net - [x] Stopped old services on indri - [x] Cleared svc:kiwix from Tailscale serve on indri - [x] Updated zk documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/39	2026-01-21 18:07:40 -08:00
Erich Blume	21848a7919	P5.1: Migrate minikube from podman to QEMU2 driver (#38 ) ## Summary - Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support - Update ansible minikube role with qemu installation and containerd runtime - Remove podman role dependency from indri.yml - Add synology user creation steps and post-migration zot reconfiguration notes ## Why Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support. ## Deployment and Testing - [ ] Create k8s-storage user on Synology DSM - [ ] Store credentials in 1Password (synology-k8s-storage) - [ ] Export current k8s state - [ ] Stop and delete podman-based minikube cluster - [ ] Run ansible to create QEMU2 cluster - [ ] Test NFS volume mount with test pod - [ ] Redeploy ArgoCD and all apps - [ ] Verify all services healthy - [ ] Reconfigure zot registry mirrors for containerd (post-migration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38	2026-01-21 16:03:37 -08:00
Erich Blume	0439fbb704	P5: Migrate devpi to Kubernetes (#34 ) ## Summary - Migrate devpi PyPI caching proxy from indri LaunchAgent to Kubernetes - Custom container image with devpi-server + devpi-web + auto-init - StatefulSet with 50Gi PVC, Tailscale Ingress at pypi.tail8d86e.ts.net - Remove devpi from ansible playbooks and update CLAUDE.md with k8s workflow ## Key Changes - Add CRI-O registry mirror config for registry.tail8d86e.ts.net - Change ArgoCD apps to manual sync (was auto-sync causing issues) - 2Gi memory limit for Whoosh indexer (reclaimed after startup) ## Deployment and Testing - [x] devpi pod healthy in k8s - [x] pip install through proxy works - [x] mcquack 1.0.0 uploaded and installable - [x] Old devpi stopped on indri ## Post-Merge Reset ArgoCD to main: ``` argocd app set apps --revision main && argocd app sync apps argocd app set devpi --revision main && argocd app sync devpi ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/34	2026-01-20 14:55:37 -08:00
Erich Blume	735b643429	P4: Miniflux migration + PostgreSQL consolidation (#33 ) ## Summary - Deploy miniflux in k8s via ArgoCD - Expose via Tailscale Ingress at feed.tail8d86e.ts.net - Retire brew PostgreSQL (no longer needed) - Rename k8s-pg to pg (canonical hostname) - Remove ansible miniflux and postgresql roles - Update borgmatic to backup pg.tail8d86e.ts.net - Update all zk documentation ## Deployment and Testing - [x] Miniflux pod running in k8s - [x] User login works at https://feed.tail8d86e.ts.net - [x] Feeds and entries visible - [x] brew miniflux and postgresql stopped - [x] Tailscale services migrated (feed, pg) - [x] zk documentation updated - [x] Run ansible to apply role removals - [ ] Verify borgmatic backup with new pg hostname 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/33	2026-01-20 09:04:47 -08:00
Erich Blume	eb952aae01	P3: PostgreSQL disaster recovery test and borgmatic k8s-pg backup (#32 ) ## Summary - Fixed borgmatic `borg: command not found` by adding `local_path` config option - Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg - Added borgmatic user to k8s-pg via CloudNativePG managed roles - Configured borgmatic to backup both localhost and k8s-pg PostgreSQL databases - Added Tailscale ACL grant for `tag:homelab` → `tag:k8s` on port 5432 - Disabled selfHeal on apps app to allow manual revision changes during development ## Changes - `ansible/roles/borgmatic/` - Added `local_path` and k8s-pg database entry - `ansible/roles/postgresql/tasks/main.yml` - Added k8s-pg to `.pgpass` - `argocd/apps/apps.yaml` - Disabled selfHeal - `argocd/manifests/databases/blumeops-pg.yaml` - Added borgmatic managed role - `argocd/manifests/databases/secret-borgmatic.yaml.tpl` - New secret template - `pulumi/policy.hujson` - Added ACL grant for backup access ## Deployment and Testing - [x] Borgmatic backup runs successfully - [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries verified) - [x] borgmatic user created in k8s-pg with pg_read_all_data role - [x] Both localhost and k8s-pg databases in backup archive - [x] zk documentation updated (borgmatic.md, postgresql.md) - [ ] After merge: set blumeops-pg app back to main revision 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/32	2026-01-19 18:00:32 -08:00
Erich Blume	f2541c3f77	Fix minikube role idempotency for zot mirror config (#31 ) ## Summary - Fixed trailing newline mismatch in config comparison (ansible command module strips whitespace, slurp preserves it) - Only copy temp file when config actually needs updating (avoids spurious changes) - Task now properly skips when config is already correct ## Deployment and Testing - [x] Verified idempotency: `changed=0` on repeated runs - [x] Verified change detection: corrupted config triggers proper update - [x] ansible-lint passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/31	2026-01-19 16:19:52 -08:00
Erich Blume	130c044523	Fix hanging minikube provision	2026-01-19 15:49:11 -08:00
Erich Blume	7e6742ad24	K8s Migration Phase 2: Grafana to Kubernetes (#30 ) ## Summary - Migrate Grafana from Homebrew/Ansible to Kubernetes deployment - Switch CloudNativePG to use forge-mirrored Helm chart (HTTPS, no auth needed) - Add Grafana Helm chart deployment via ArgoCD with multi-source pattern - Add Grafana config (Tailscale Ingress, 9 dashboard ConfigMaps) - Update Loki to bind 0.0.0.0 for k8s pod access via `host.containers.internal` ## Key Changes - `argocd/apps/grafana.yaml` - Grafana Helm chart Application - `argocd/apps/grafana-config.yaml` - Ingress + dashboard ConfigMaps - `argocd/apps/cloudnative-pg.yaml` - Now uses forge mirror instead of external Helm repo - `ansible/roles/loki/templates/loki-config.yaml.j2` - Bind 0.0.0.0 ## Deployment and Testing - [x] Deploy Loki config change: `mise run provision-indri -- --tags loki` - [x] Create namespace: `ki create namespace monitoring` - [x] Create secret: `op inject -i argocd/manifests/grafana-config/secret-admin.yaml.tpl \| ki apply -f -` - [x] Sync ArgoCD apps (grafana, grafana-config) - [x] Verify Grafana works at https://grafana.tail8d86e.ts.net - [x] Remove svc:grafana from ansible tailscale_serve - [x] Stop brew grafana: `ssh indri 'brew services stop grafana'` - [x] Delete ansible grafana role 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/30	2026-01-19 14:40:25 -08:00
Erich Blume	a8f4d00294	K8s Migration Phase 1: Infrastructure Setup (#29 ) ## Summary - Split k8s migration plan into phases folder for easier navigation - Added `tag:k8s` to Pulumi ACLs for Kubernetes workloads - Phase 1 work in progress ## Phase 1 Goals - Tailscale Kubernetes Operator - CloudNativePG Operator - PostgreSQL cluster for future app migrations ## Deployment and Testing - [ ] Review Phase 1 plan - [ ] `mise run tailnet-preview` to verify ACL changes - [ ] `mise run tailnet-up` to apply ACL changes - [ ] Create Tailscale OAuth client (manual) - [ ] Deploy operators and PostgreSQL cluster 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/29	2026-01-19 09:49:52 -08:00
Erich Blume	61dced048b	Fix borgmatic-metrics script PATH issue (#28 ) ## Summary - Fixed borgmatic-metrics script failing in LaunchAgent context - Changed from `mise x -- borg` to absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`) - This fixes the Grafana dashboard showing "DOWN" for Repository Status and missing time series data ## Deployment and Testing - [ ] Run `mise run provision-indri -- --tags borgmatic-metrics` to deploy the fix - [ ] Wait for the hourly metrics collection (or manually run `ssh indri '~/bin/borgmatic-metrics'`) - [ ] Verify Grafana dashboard shows "UP" status and populated graphs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/28	2026-01-18 14:57:35 -08:00
Erich Blume	3679124ebd	Expose Kubernetes API as Tailscale service (Step 0.14) (#27 ) ## Summary - Add `tag:k8s-api` to Pulumi ACLs and indri device tags - Configure Tailscale serve with TCP passthrough for k8s API at `k8s.tail8d86e.ts.net` - Update minikube role to include `k8s.tail8d86e.ts.net` in certificate SANs - Add `apiserver_port` config option (internal port 6443, dynamic host port with podman driver) - Document Step 0.14 in k8s-migration plan (added post-Phase 0 completion) The Kubernetes API is now accessible at `https://k8s.tail8d86e.ts.net` using TCP passthrough to preserve mTLS authentication. ## Deployment and Testing - [x] Pulumi ACLs applied - [x] Tailscale service created and approved in admin console - [x] Minikube cluster recreated with new cert SANs - [x] tailscale serve configured with TCP passthrough - [x] 1Password credentials updated with new certs - [x] Kubeconfig updated on gilbert - [x] `mise run indri-services-check` passes - [x] `kubectl --context=minikube-indri get nodes` works via Tailscale 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/27	2026-01-18 12:49:20 -08:00
Erich Blume	19a82373d5	K8s Migration Phase 0: Foundation Infrastructure (#26 ) ## Summary - Step 0.1: Update Pulumi ACLs with tag:registry - Step 0.3: Create Zot registry ansible role with mcquack LaunchAgent - Step 0.4: Add Zot to Tailscale Serve configuration - Step 0.5: Create Zot metrics role for Prometheus scraping - Step 0.6: Add Zot log collection to Alloy - Step 0.7: Update indri-services-check with zot checks - Step 0.8: Add podman role for container runtime - Step 0.9: Add minikube role for Kubernetes cluster - Step 0.10: Configure remote kubectl access with 1Password credentials ## Remaining Steps - [ ] Step 0.11: Add minikube to indri-services-check - [ ] Step 0.12: Create zettelkasten documentation - [ ] Step 0.13: Verify main playbook (already done - roles added) ## Deployment and Testing - [x] Zot registry deployed and accessible at https://registry.tail8d86e.ts.net - [x] Podman machine running on indri - [x] Minikube cluster running on indri - [x] kubectl access from gilbert working with 1Password credentials - [ ] indri-services-check passes all checks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/26	2026-01-18 12:06:28 -08:00
Erich Blume	0918764e93	Rename Node Exporter dashboard to macOS (#22 ) ## Summary - Renamed dashboard from "Node Exporter - macOS" to just "macOS" since it now uses Alloy - Updated filename, title, uid, and tags to reflect the change ## Deployment and Testing - [ ] Deploy with `mise run provision-indri -- --tags grafana` - [ ] Verify dashboard accessible at https://grafana.tail8d86e.ts.net 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/22	2026-01-17 09:29:19 -08:00
Erich Blume	3962e5a7de	Fix borgmatic PostgreSQL backup and update backup sources (#21 ) ## Summary - Fix PostgreSQL backup failure by adding explicit `pg_dump_command` path (was failing with "pg_dump: command not found" in LaunchAgent) - Remove `~/code/3rd/kiwix-tools` from backups (was just symlinks to ZIM archives in transmission) - Enable Loki log backup by removing from exclude_patterns ## Deployment and Testing - [x] Dry run with `--check --diff` shows expected changes - [ ] Deploy with `mise run provision-indri -- --tags borgmatic` - [ ] Verify config deployed: `ssh indri 'cat ~/.config/borgmatic/config.yaml'` - [ ] Run manual backup to test: `ssh indri 'mise x -- borgmatic create --verbosity 1'` - [ ] Verify PostgreSQL dump succeeds (no "pg_dump: command not found" error) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/21	2026-01-17 09:22:01 -08:00
Erich Blume	75426be1dc	Remove ansible role meta dependencies to fix duplicate execution (#20 ) ## Summary - Remove all `meta/main.yml` dependencies from ansible roles - Role ordering is now controlled entirely by `indri.yml` playbook - Fix incorrect roles path in CLAUDE.md (`playbooks/roles` → `roles`) ## Why Ansible's tag accumulation behavior prevents proper role deduplication when using meta dependencies. When a role is pulled in as a dependency, the parent role's tags are added to the dependency's tags (e.g., `[loki]` becomes `[alloy, loki]`), making them appear as different invocations to Ansible and causing roles to run multiple times. ## Deployment and Testing - [x] Verified with `ansible-playbook --list-tasks` that each role now appears exactly once - [x] Run full provision to verify no regressions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/20	2026-01-16 22:50:34 -08:00
Erich Blume	9931829d03	Add pre-commit hooks for code quality (#19 ) ## Summary - Add pre-commit framework with hooks for YAML, Ansible, Python, shell, TOML, JSON, and secret detection - Fix all 91+ ansible-lint violations (variable naming, handler capitalization, changed_when) - Fix shellcheck warnings in mise-tasks scripts - Document pre-commit setup in README.md ## Deployment and Testing - [x] All pre-commit hooks pass (`uvx pre-commit run --all-files`) - [x] Test ansible playbook with `--check` mode - [x] Run `mise run indri-services-check` after deploy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/19	2026-01-16 19:33:02 -08:00
Erich Blume	d3d3041b27	Decouple ZIM/torrent ansible tasks for faster provisioning (#18 ) ## Summary - Simplify kiwix role from 213 lines to 151 lines (-30%) - Replace per-archive torrent status loops with single shell command - Decouple kiwix startup from declared inventory - now serves whatever completed ZIM files exist - Fix tailscale_serve role to handle empty JSON in check mode ## Performance improvement - Before: ~132 operations (44 archives × 3 loops for status check, recheck, symlink) - After: ~5 operations (1 shell script + 1 find + conditional symlinks) - Expected reduction: ~3 minutes per ansible run ## Test plan - [x] Ran `mise run provision-indri -- --check --diff` to preview changes - [x] Ran `mise run provision-indri` to apply changes - [x] Ran `mise run indri-services-check` - all services healthy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/18	2026-01-16 15:14:00 -08:00
Erich Blume	812b78bf61	Use explicit PostgreSQL superuser name and fix check mode (#17 ) ## Summary - Add `postgresql_superuser` variable (`eblume`) to prevent PostgreSQL from inheriting OS username during initdb - Update all psql/createdb commands to use explicit `-U` flag - Add `check_mode: false` to op commands so 1Password fetches run during `--check` mode - Add PostgreSQL and Miniflux health checks to indri-services-check ## Test plan - [x] Renamed existing superuser from `erichblume` to `eblume` - [x] Ran `mise run provision-indri -- --tags postgresql --check --diff` successfully - [x] Verified connection as `eblume` superuser via Tailscale - [x] Ran `mise run indri-services-check` - all services healthy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/17	2026-01-16 14:41:36 -08:00
Erich Blume	adf6f4fbe9	Add PostgreSQL and Miniflux services to tailnet (#16 ) ## Summary - Add PostgreSQL 18 as a new service at `pg.tail8d86e.ts.net:5432` - Add Miniflux RSS/Atom feed reader at `feed.tail8d86e.ts.net` - Both services managed via homebrew/brew services - Pulumi ACL tags added (tag:pg, tag:feed) - Alloy log collection configured for both services - Zettelkasten documentation updated ## Manual Setup Required Before running ansible, the following steps are needed on indri: ### 1. Apply Pulumi tags ```bash mise run tailnet-up ``` Then apply tags to indri in Tailscale admin console. ### 2. Create 1Password entries - miniflux PostgreSQL user password - miniflux admin password (for first run) ### 3. Set PostgreSQL user password (after ansible installs postgres) ```bash ssh indri '/opt/homebrew/opt/postgresql@18/bin/psql -c "ALTER USER miniflux PASSWORD '\''your-password'\'';"' ``` ### 4. Create password files on indri ```bash ssh indri 'echo "your-db-password" > ~/.miniflux-db-password && chmod 600 ~/.miniflux-db-password' ssh indri 'echo "your-admin-password" > ~/.miniflux-admin-password && chmod 600 ~/.miniflux-admin-password' ``` ### 5. Create ~/.pgpass for borgmatic ```bash ssh indri 'echo "localhost:5432:miniflux:miniflux:YOUR_PASSWORD" > ~/.pgpass && chmod 600 ~/.pgpass' ``` ### 6. Run ansible with first-run admin creation ```bash mise run provision-indri -- -e miniflux_create_admin=1 ``` ### 7. Update borgmatic config Add to `~/.config/borgmatic/config.yaml` on indri: ```yaml postgresql_databases: - name: miniflux hostname: localhost port: 5432 username: miniflux ``` ### 8. Cleanup after first run ```bash ssh indri 'rm ~/.miniflux-admin-password' ``` ## Test plan - [ ] Run `mise run tailnet-up` and verify Pulumi changes - [ ] Apply tags to indri in Tailscale admin - [ ] Run `mise run provision-indri -- --check --diff` for dry run - [ ] Run `mise run provision-indri -- -e miniflux_create_admin=1` - [ ] Approve services in Tailscale admin - [ ] Verify PostgreSQL: `ssh indri '/opt/homebrew/opt/postgresql@18/bin/pg_isready'` - [ ] Verify Miniflux: `curl https://feed.tail8d86e.ts.net/healthcheck` - [ ] Run `mise run indri-services-check` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/16	2026-01-16 12:30:20 -08:00
Erich Blume	3f4e40f3ae	Add Pulumi for tailnet IaC management (#15 ) ## Summary - Manage tail8d86e.ts.net ACLs, tags, and DNS via Pulumi + Python - State stored in Pulumi Cloud (free tier) to avoid circular dependency - OAuth authentication via 1Password for secure credential management - New mise tasks: `tailnet-preview`, `tailnet-up` ## Architecture Two-layer approach: - Layer 1 (Pulumi): Tailnet-wide config (ACLs, tags, DNS) - Layer 2 (Ansible): Node-local `tailscale serve` config (unchanged) ## Test plan - [x] Exported current ACL from Tailscale API - [x] Imported existing ACL into Pulumi state - [x] Verified `mise run tailnet-preview` shows no changes - [x] Verified `mise run tailnet-up` applies successfully 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/15	2026-01-15 20:55:25 -08:00
Erich Blume	ae1513e7e9	Add Plex Media Server observability (#13 ) ## Summary - Add `plex_metrics` ansible role with textfile collector for Prometheus metrics - Add Plex log collection to Alloy (forwards to Loki) - Add Grafana dashboard for Plex monitoring (status, library counts, sessions, transcoding, logs) ## Metrics Collected - `plex_up` - server health - `plex_version_info` - server version - `plex_sessions_total/playing/paused` - active sessions - `plex_transcode_sessions_total/video/audio` - transcoding status - `plex_library_items{library,type}` - library item counts ## Prerequisites Plex token must be stored at `~/.plex-token` on indri (already done). ## Test plan - [x] Dry-run passed (`mise run provision-indri -- --check --diff`) - [ ] Apply changes (`mise run provision-indri`) - [ ] Verify metrics: `ssh indri 'cat /opt/homebrew/var/node_exporter/textfile/plex.prom'` - [ ] Verify logs in Grafana Explore: `{service="plex"}` - [ ] Check Plex dashboard in Grafana 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/13	2026-01-15 15:27:59 -08:00
Erich Blume	2a1359a3b6	Fix ansible handler timeouts for alloy and loki restarts (#12 ) ## Summary - Use async with poll: 0 for alloy and loki restart handlers - Fire-and-forget approach prevents ansible from hanging on graceful shutdown ## Test plan - [x] Manually verified `brew services restart grafana-alloy` works - [x] Run full ansible playbook and verify it completes without timeout 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/12	2026-01-15 13:56:11 -08:00
Erich Blume	ba5cd75ee2	Fix ansible handler timeouts for alloy and loki restarts Use async with poll: 0 to fire-and-forget service restarts. These services have graceful shutdown periods that can exceed ansible's default command timeout. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-15 12:39:28 -08:00
Erich Blume	242c1880de	Add Grafana Alloy and Loki for unified observability (#11 ) ## Summary - Add Grafana Alloy to replace node_exporter for metrics collection - Add Loki for log aggregation and storage - Configure Alloy to collect logs from all services (grafana, forgejo, prometheus, tailscale, transmission, devpi, kiwix, borgmatic) - Update Prometheus to accept metrics via remote_write - Add Loki datasource to Grafana ## Test plan - [ ] Run \`mise run provision-indri -- --check --diff\` to verify changes - [ ] Apply with \`mise run provision-indri\` - [ ] Verify services: \`mise run indri-services-check\` - [ ] Check Grafana Explore with Loki datasource - [ ] Query logs: \`{service="grafana"}\` - [ ] Verify metrics still flowing to Prometheus dashboards 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/11	2026-01-15 12:24:13 -08:00
Erich Blume	d8a0ef6482	Add devpi PyPI caching proxy role for indri (#9 ) ## Summary - Add ansible role for devpi-server as a transparent PyPI caching proxy - LaunchAgent with KeepAlive runs via `mise x -- devpi-server` - Listens on port 3141, data stored in `~/devpi` - Health checks added to `indri-services-check` script ## Manual Setup Required (on indri, before provisioning) 1. Add to `~/.config/mise/config.toml`: ```toml [tools] "pipx:devpi-server" = "latest" "pipx:devpi-web" = "latest" "pipx:devpi-client" = "latest" ``` 2. Run `mise install` 3. Initialize: `mise x -- devpi-init --serverdir ~/devpi` ## Post-Provisioning - Set up Tailscale service `pypi` on port 443 → 3141 - Configure client pip.conf with index-url ## Test plan - [x] Ansible syntax check passes - [x] Dry-run: `mise run provision-indri -- --check --diff` - [x] Apply: `mise run provision-indri` - [x] Health check: `mise run indri-services-check` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/9	2026-01-15 08:31:09 -08:00
Erich Blume	50c713b5de	Add macOS-compatible Node Exporter Grafana dashboard (#8 ) ## Summary - Adds a new Grafana dashboard for Node Exporter metrics on macOS hosts - Uses macOS-native memory metrics (node_memory_total_bytes, node_memory_active_bytes, etc.) instead of Linux-specific ones - Includes dropdown selectors for instance, disk, and network device filtering ## Details The standard Node Exporter dashboards show "No Data" for memory panels on macOS because they query Linux-specific metrics like `node_memory_MemTotal_bytes`. macOS node_exporter exports different metrics: \| Linux \| macOS \| \|-------\|-------\| \| node_memory_MemTotal_bytes \| node_memory_total_bytes \| \| node_memory_MemFree_bytes \| node_memory_free_bytes \| \| node_memory_Buffers_bytes \| (not available) \| \| node_memory_Cached_bytes \| (not available) \| macOS has unique memory categories: Wired, Active, Compressed, Inactive, Free. ## Test plan - [x] Dashboard deployed to indri via ansible - [x] All panels showing data for indri - [x] Instance selector works to switch between hosts - [x] Disk and network device filters work 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/8	2026-01-14 20:53:57 -08:00
Erich Blume	d9be8c27bc	Add 32 devdocs ZIM archives for programming documentation (#7 ) ## Summary - Adds offline documentation for: bash, c, click, cmake, cpp, css, django-rest-framework, django, docker, duckdb, fish, gcc, git, go, godot, hammerspoon, homebrew, javascript, kubectl, kubernetes, latex, lua, markdown, nginx, nix, postgresql, python, redis, sqlite, typescript, werkzeug, zig - All January 2026 versions from download.kiwix.org/zim/devdocs/ - Downloads via BitTorrent through transmission ## Test plan - [x] Deployed to indri via `mise run provision-indri` - [x] All 32 torrents added and downloaded (small files, completed instantly) - [x] 43 ZIM files now available in kiwix directory 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/7	2026-01-14 18:28:34 -08:00
Erich Blume	10012a4cf2	Add upload/download ratio and period transfer panels to Transmission dashboard (#6 ) ## Summary - Adds Upload/Download Ratio stat panel with color thresholds (red < 0.5, yellow < 1, green >= 1) - Adds Downloaded (Period) stat panel showing bytes downloaded in selected time range - Adds Uploaded (Period) stat panel showing bytes uploaded in selected time range Uses PromQL `increase()` on existing counter metrics - no new metrics collection needed. ## Test plan - [x] Deployed to indri via `mise run provision-indri` - [x] Grafana restarted successfully 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/6	2026-01-14 18:08:39 -08:00
Erich Blume	2f28b151f5	Fix launchctl idempotency in kiwix and borgmatic roles Check if LaunchAgent is already loaded before attempting to load it. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 14:14:52 -08:00
Erich Blume	e534e59556	Add provision-indri mise task and fix idempotency - Add mise-tasks/provision-indri script to run ansible playbook - Fix transmission_metrics launchctl load to be idempotent - Update CLAUDE.md to reference mise run provision-indri Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 14:10:30 -08:00
Erich Blume	e264b39cd6	Add total torrent size metric and dashboard panel - Query torrent-get RPC to sum totalSize of all torrents - Add transmission_torrents_size_bytes gauge metric - Add "Total Torrent Size" timeseries panel to dashboard Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 14:00:52 -08:00
Erich Blume	0afd34590d	Fix transmission-metrics session ID parsing Transmission doesn't support HEAD requests, so use -i flag with sed to parse only the HTTP headers (stopping at the blank line before body). Also anchor grep pattern to line start to avoid matching HTML content. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 13:56:08 -08:00
Erich Blume	7468023cd2	Add transmission dashboard to grafana - Add node_exporter ansible role to enable textfile collector - Add transmission_metrics role with script and LaunchAgent - Collects metrics every 60s via transmission RPC - Writes to /opt/homebrew/var/node_exporter/textfile/transmission.prom - Update grafana role to provision dashboards from files - Add transmission.json dashboard with: - Status indicator, torrent counts - Transfer speeds, cumulative stats - Time series graphs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 13:46:51 -08:00
Erich Blume	eb2f5b44cd	Fix kiwix plist to only include available ZIM archives The LaunchAgent plist now dynamically includes only ZIM files that actually exist in the kiwix directory, rather than all configured archives. This prevents kiwix-serve from crashing when torrents are still downloading. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 13:17:55 -08:00
Erich Blume	3847c12b42	Fix transmission config to prevent perpetual ansible diffs - Expand settings.json template to include all transmission defaults - Use static pre-hashed rpc-password so transmission doesn't regenerate - Change file mode from 0644 to 0600 to match transmission's default - Add Jinja comment explaining the RPC password workaround Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 13:03:46 -08:00
Erich Blume	4add1684c3	Enable additional ZIM archives for kiwix New archives (~95G total): - Project Gutenberg 2023 (72G) - 60,000+ public domain books - iFixit (3.3G) - Repair guides - Stack Exchange: SuperUser (3.7G), Math (6.9G) - LibreTexts: Biology, Chemistry, Engineering, Mathematics, Physics, Humanities Also: - Fix transmission to only restart when config changes - Update CLAUDE.md to use full ansible paths Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 12:47:54 -08:00
Erich Blume	6bb5f323d6	Fix transmission config path to use homebrew location - Homebrew's transmission-cli service uses /opt/homebrew/var/transmission/ not ~/.config/transmission-daemon/ - Add task to clean up old config directory - Update zettelkasten with correct paths Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 11:59:24 -08:00
Erich Blume	b865d70456	Add transmission role for torrent-based ZIM downloads - Add new transmission ansible role using homebrew + brew services - Configure transmission to download to ~/transmission with localhost-only RPC - Modify kiwix role to use transmission for downloading ZIM archives via BitTorrent - Add role dependency so running --tags kiwix auto-runs transmission - Keep fallback to direct HTTP download when kiwix_use_transmission: false - Symlink completed downloads from transmission dir to kiwix-tools dir This reduces load on kiwix.org servers and allows downloads to continue in the background without blocking ansible runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 11:59:24 -08:00
Erich Blume	4b365a1294	Fix borgmatic LaunchAgent to work with mise-installed binaries The LaunchAgent was failing because launchd runs with a minimal PATH that doesn't include mise-installed binaries or homebrew. This adds: - Use `mise x` wrapper to run borgmatic (survives version updates) - Add /opt/homebrew/bin to PATH for borg dependency - Add ansible tags to indri playbook for targeted role runs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 10:33:48 -08:00
Erich Blume	4283f2237d	Add grafana datasource provisioning and update workflow docs - Configure grafana to use provisioned datasources instead of UI config - Add prometheus datasource template managed by ansible - Create minimal grafana.ini with custom provisioning path - Move ansible_managed to group_vars (fixes deprecation warning) - Add Remote Hosts and Git Workflow sections to CLAUDE.md - Document feature branch workflow with tea CLI for PRs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 07:23:10 -08:00
Erich Blume	6995a2d27d	Manual update to CLAUDE and moved mise.toml	2026-01-14 06:45:19 -08:00
Erich Blume	d1396b1cfb	Add forgejo role to ansible playbook Manages installation and service via homebrew. Config at /opt/homebrew/var/forgejo/custom/conf/app.ini contains secrets and is not templated - backed up by borgmatic instead. Includes check that fails with restore instructions if config missing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-13 23:00:46 -08:00

1 2

57 commits