- Add proxy-class annotation to use default ProxyClass
- Fixes CRI-O image name resolution issue
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Expose k8s-pg.tail8d86e.ts.net for testing during migration
- Temporary service until Phase 4 when pg.tail8d86e.ts.net switches
- Update README with connection info and cleanup notes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create blumeops-pg Cluster with CloudNativePG
- Add eblume superuser role (matches current brew pg setup)
- Configure pg_hba for password auth from any IP (Tailscale handles security)
- Add secret template for eblume password from 1Password
- Create ArgoCD Application with manual sync policy
- Update Phase 1 plan with implementation notes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configures minikube's CRI-O to use zot on indri as a pull-through cache
for docker.io, ghcr.io, and quay.io. Uses host.containers.internal:5050
which is stable across restarts.
This reduces external bandwidth, speeds up pulls, and provides resilience
if upstream registries are unreachable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add CloudNativePG Application using multi-source Helm pattern
- Helm chart from upstream cloudnative-pg repo
- Values file from our git repo for customization
- Manual sync policy consistent with other workload apps
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add argocd CLI to Brewfile
- Create argocd.yaml for ArgoCD self-management (manual sync)
- Create apps.yaml app-of-apps root (auto-sync for Application resources)
- Convert tailscale-operator to kustomize
- Update READMEs with bootstrap steps and ArgoCD CLI commands
- Change all workload Applications to manual sync policy
App-of-apps auto-syncs to pick up new Application manifests, but child
apps require manual sync for actual workload deployments.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds ArgoCD Application to manage Tailscale operator from forge:
- ArgoCD Application sourced from internal Forgejo via SSH
- DNS config for cluster-to-tailnet name resolution
- Egress proxy for accessing forge on indri
- ACL grants for k8s workloads to reach forge (ports 3001, 2200)
- Template for repository secret with 1Password SSH key reference
Key discovery: 1Password op read requires ?ssh-format=openssh parameter
to get keys in OpenSSH format that ArgoCD's SSH client can read.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Uses kustomize with remote base from upstream ArgoCD
- Adds Tailscale LoadBalancer service for external access
- Exposed at https://argocd.tail8d86e.ts.net
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CRI-O cannot resolve short image names like 'tailscale/tailscale:stable'.
The ProxyClass 'default' sets fully-qualified image references.
Services must use annotation: tailscale.com/proxy-class: "default"
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ACL changes:
- Added tag:k8s-operator for the Tailscale K8s Operator
- Made tag:k8s-operator an owner of tag:k8s so the operator can
assign that tag to resources it creates
Phase 1 plan updates:
- Added Kubernetes Tags Overview section explaining all three tags
- Expanded OAuth client creation instructions
- Added 1Password storage instructions
- Added verification and rollback sections
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added:
- tag:k8s to tagOwners for k8s workload management
- Grant for tag:k8s -> tag:registry access (for CI pushing images)
- ACL test case for k8s registry access
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reorganized the monolithic migration plan into separate files:
- 00_overview.md: Architecture, technical decisions, shared info
- P0_foundation.complete.md: Phase 0 (complete)
- P1_k8s_infrastructure.md: Phase 1 (in progress)
- P2-P9: Remaining phases (pending)
This makes the plan easier to navigate and track progress.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Fixed borgmatic-metrics script failing in LaunchAgent context
- Changed from `mise x -- borg` to absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`)
- This fixes the Grafana dashboard showing "DOWN" for Repository Status and missing time series data
## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags borgmatic-metrics` to deploy the fix
- [ ] Wait for the hourly metrics collection (or manually run `ssh indri '~/bin/borgmatic-metrics'`)
- [ ] Verify Grafana dashboard shows "UP" status and populated graphs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/28
## Summary
- Add `tag:k8s-api` to Pulumi ACLs and indri device tags
- Configure Tailscale serve with TCP passthrough for k8s API at `k8s.tail8d86e.ts.net`
- Update minikube role to include `k8s.tail8d86e.ts.net` in certificate SANs
- Add `apiserver_port` config option (internal port 6443, dynamic host port with podman driver)
- Document Step 0.14 in k8s-migration plan (added post-Phase 0 completion)
The Kubernetes API is now accessible at `https://k8s.tail8d86e.ts.net` using TCP passthrough to preserve mTLS authentication.
## Deployment and Testing
- [x] Pulumi ACLs applied
- [x] Tailscale service created and approved in admin console
- [x] Minikube cluster recreated with new cert SANs
- [x] tailscale serve configured with TCP passthrough
- [x] 1Password credentials updated with new certs
- [x] Kubeconfig updated on gilbert
- [x] `mise run indri-services-check` passes
- [x] `kubectl --context=minikube-indri get nodes` works via Tailscale
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/27
## Summary
- Replace permissive wildcard ACL (`*` -> `*`) with specific service grants
- Admin: full access to all services including NAS
- Member: user-facing services only (no Grafana/Loki/NAS)
- Add device tagging for gilbert (workstation) and sifaka (NAS) via Pulumi
- SSH hardening: remove root access, use "check" action with MFA
- Add ACL tests to validate policy behavior
## Deployment and Testing
- [x] Pulumi preview passes
- [x] HuJSON syntax validated
- [x] ACL tests defined and passing
- [ ] Deploy with `mise run tailnet-up`
- [ ] Verify SSH access from gilbert to indri
- [ ] Verify Allison cannot access Grafana/Loki/NAS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/23
## Summary
- Remove all `meta/main.yml` dependencies from ansible roles
- Role ordering is now controlled entirely by `indri.yml` playbook
- Fix incorrect roles path in CLAUDE.md (`playbooks/roles` → `roles`)
## Why
Ansible's tag accumulation behavior prevents proper role deduplication when using meta dependencies. When a role is pulled in as a dependency, the parent role's tags are added to the dependency's tags (e.g., `[loki]` becomes `[alloy, loki]`), making them appear as different invocations to Ansible and causing roles to run multiple times.
## Deployment and Testing
- [x] Verified with `ansible-playbook --list-tasks` that each role now appears exactly once
- [x] Run full provision to verify no regressions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/20
## Summary
- Simplify kiwix role from 213 lines to 151 lines (-30%)
- Replace per-archive torrent status loops with single shell command
- Decouple kiwix startup from declared inventory - now serves whatever completed ZIM files exist
- Fix tailscale_serve role to handle empty JSON in check mode
## Performance improvement
- **Before**: ~132 operations (44 archives × 3 loops for status check, recheck, symlink)
- **After**: ~5 operations (1 shell script + 1 find + conditional symlinks)
- Expected reduction: ~3 minutes per ansible run
## Test plan
- [x] Ran `mise run provision-indri -- --check --diff` to preview changes
- [x] Ran `mise run provision-indri` to apply changes
- [x] Ran `mise run indri-services-check` - all services healthy
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/18
## Summary
- Add `postgresql_superuser` variable (`eblume`) to prevent PostgreSQL from inheriting OS username during initdb
- Update all psql/createdb commands to use explicit `-U` flag
- Add `check_mode: false` to op commands so 1Password fetches run during `--check` mode
- Add PostgreSQL and Miniflux health checks to indri-services-check
## Test plan
- [x] Renamed existing superuser from `erichblume` to `eblume`
- [x] Ran `mise run provision-indri -- --tags postgresql --check --diff` successfully
- [x] Verified connection as `eblume` superuser via Tailscale
- [x] Ran `mise run indri-services-check` - all services healthy
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/17
## Summary
- Manage tail8d86e.ts.net ACLs, tags, and DNS via Pulumi + Python
- State stored in Pulumi Cloud (free tier) to avoid circular dependency
- OAuth authentication via 1Password for secure credential management
- New mise tasks: `tailnet-preview`, `tailnet-up`
## Architecture
Two-layer approach:
- **Layer 1 (Pulumi)**: Tailnet-wide config (ACLs, tags, DNS)
- **Layer 2 (Ansible)**: Node-local `tailscale serve` config (unchanged)
## Test plan
- [x] Exported current ACL from Tailscale API
- [x] Imported existing ACL into Pulumi state
- [x] Verified `mise run tailnet-preview` shows no changes
- [x] Verified `mise run tailnet-up` applies successfully
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/15
## Summary
- Add `mise run blumeops-tasks` to fetch and display tasks from Todoist
- Uses uv run script with inline dependencies (httpx, rich)
- Fetches API credential securely via 1Password CLI
- Sorts tasks by custom priority order: p1, p2, p4, p3 (backlog last)
- Documents the task discovery workflow in CLAUDE.md
## Test plan
- [x] Verified `mise run blumeops-tasks` fetches and displays tasks correctly
- [x] Confirmed priority sorting works as expected
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/14
## Summary
- Use async with poll: 0 for alloy and loki restart handlers
- Fire-and-forget approach prevents ansible from hanging on graceful shutdown
## Test plan
- [x] Manually verified `brew services restart grafana-alloy` works
- [x] Run full ansible playbook and verify it completes without timeout
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/12
Use async with poll: 0 to fire-and-forget service restarts.
These services have graceful shutdown periods that can exceed
ansible's default command timeout.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Add `mise run zk-docs` task to concatenate all blumeops-tagged zettelkasten cards
- Main project card is shown first, followed by service management logs
- Uses `bat` for output (added to Brewfile)
- Args are passed through to bat for custom formatting
- Update CLAUDE.md to use zk-docs command with plain output options
- Update README.md to note zettelkasten is private with contact email
## Test plan
- [x] `mise run zk-docs` displays all 6 blumeops cards
- [x] `mise run zk-docs -- --style=header --color=never --decorations=always` shows filenames without decoration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/10
## Summary
- Add ansible role for devpi-server as a transparent PyPI caching proxy
- LaunchAgent with KeepAlive runs via `mise x -- devpi-server`
- Listens on port 3141, data stored in `~/devpi`
- Health checks added to `indri-services-check` script
## Manual Setup Required (on indri, before provisioning)
1. Add to `~/.config/mise/config.toml`:
```toml
[tools]
"pipx:devpi-server" = "latest"
"pipx:devpi-web" = "latest"
"pipx:devpi-client" = "latest"
```
2. Run `mise install`
3. Initialize: `mise x -- devpi-init --serverdir ~/devpi`
## Post-Provisioning
- Set up Tailscale service `pypi` on port 443 → 3141
- Configure client pip.conf with index-url
## Test plan
- [x] Ansible syntax check passes
- [x] Dry-run: `mise run provision-indri -- --check --diff`
- [x] Apply: `mise run provision-indri`
- [x] Health check: `mise run indri-services-check`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/9
## Summary
- Adds a new Grafana dashboard for Node Exporter metrics on macOS hosts
- Uses macOS-native memory metrics (node_memory_total_bytes, node_memory_active_bytes, etc.) instead of Linux-specific ones
- Includes dropdown selectors for instance, disk, and network device filtering
## Details
The standard Node Exporter dashboards show "No Data" for memory panels on macOS because they query Linux-specific metrics like `node_memory_MemTotal_bytes`. macOS node_exporter exports different metrics:
| Linux | macOS |
|-------|-------|
| node_memory_MemTotal_bytes | node_memory_total_bytes |
| node_memory_MemFree_bytes | node_memory_free_bytes |
| node_memory_Buffers_bytes | (not available) |
| node_memory_Cached_bytes | (not available) |
macOS has unique memory categories: Wired, Active, Compressed, Inactive, Free.
## Test plan
- [x] Dashboard deployed to indri via ansible
- [x] All panels showing data for indri
- [x] Instance selector works to switch between hosts
- [x] Disk and network device filters work
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/8
## Summary
- Adds Upload/Download Ratio stat panel with color thresholds (red < 0.5, yellow < 1, green >= 1)
- Adds Downloaded (Period) stat panel showing bytes downloaded in selected time range
- Adds Uploaded (Period) stat panel showing bytes uploaded in selected time range
Uses PromQL `increase()` on existing counter metrics - no new metrics collection needed.
## Test plan
- [x] Deployed to indri via `mise run provision-indri`
- [x] Grafana restarted successfully
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/6
- Add mise-tasks/provision-indri script to run ansible playbook
- Fix transmission_metrics launchctl load to be idempotent
- Update CLAUDE.md to reference mise run provision-indri
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Query torrent-get RPC to sum totalSize of all torrents
- Add transmission_torrents_size_bytes gauge metric
- Add "Total Torrent Size" timeseries panel to dashboard
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Transmission doesn't support HEAD requests, so use -i flag with sed to
parse only the HTTP headers (stopping at the blank line before body).
Also anchor grep pattern to line start to avoid matching HTML content.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>