## Summary
- Update all references from `registry.tail8d86e.ts.net` to `registry.ops.eblu.me`
- Remove `tailscale_serve` ansible role (no longer needed - all services migrated to Caddy)
- Update minikube containerd config for new registry URL
- Update devpi manifest, CI actions, and mise tasks
## Deployment and Testing
- [ ] Run `mise run provision-indri -- --check --diff` (dry run)
- [ ] Run `mise run provision-indri -- --tags minikube` to update containerd config
- [ ] Sync devpi ArgoCD app: `argocd app sync devpi`
- [ ] Manually remove old Tailscale serve entry: `ssh indri 'tailscale serve --service=svc:registry off'`
- [ ] Test registry access: `curl https://registry.ops.eblu.me/v2/_catalog`
- [ ] Run `mise run indri-services-check` to verify all services healthy
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/58
## Summary
- Add layer4 TCP proxy configuration to Caddyfile template for SSH services
- Configure Forgejo SSH on port 2222 → localhost:2200
- Switch HTTPS from port 8443 (testing) to 443 (production)
- Requires Caddy rebuilt with `github.com/mholt/caddy-l4` plugin
## What This Enables
Git+SSH access via `forge.ops.eblu.me:2222` is now accessible from:
- Tailnet clients (gilbert)
- Docker containers on indri
- Kubernetes pods in minikube
This solves the DNS resolution issues where containers couldn't reach Tailscale MagicDNS names.
## Testing Done
- [x] Caddy rebuilt with layer4 plugin
- [x] Validated Caddyfile syntax
- [x] Cleared `svc:forge` from tailscale serve
- [x] Verified HTTPS works: `curl https://forge.ops.eblu.me`
- [x] Verified SSH works: `ssh -p 2222 forgejo@forge.ops.eblu.me`
- [x] Verified git clone works via new endpoint
- [x] Verified minikube pods can reach both HTTPS and SSH endpoints
## Deployment
Caddy is already running with the new config on indri. This PR captures the ansible changes.
## Next Steps
- Update zk docs with new git remote format
- Migrate registry and other services to Caddy
- Retire tailscale_services ansible role
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/56
Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack.
Summary
- Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal)
- Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses
- Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics
- Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net)
- Add ACL rule for port 9187 (CNPG metrics)
- Delete obsolete ansible roles for prometheus and loki
Changes
- argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress
- argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress
- argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications
- argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS
- argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint
- argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics
- ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints
- pulumi/policy.hujson - ACL for port 9187
- Deleted ansible/roles/prometheus/ and ansible/roles/loki/
Deployment and Testing
- Stop prometheus and loki on indri
- Sync ArgoCD apps (apps, prometheus, loki, grafana)
- Run mise run provision-indri -- --tags alloy
- Verify Grafana dashboards show data
🤖 Generated with https://claude.ai/claude-code
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42
## Summary
- Add `postgresql_superuser` variable (`eblume`) to prevent PostgreSQL from inheriting OS username during initdb
- Update all psql/createdb commands to use explicit `-U` flag
- Add `check_mode: false` to op commands so 1Password fetches run during `--check` mode
- Add PostgreSQL and Miniflux health checks to indri-services-check
## Test plan
- [x] Renamed existing superuser from `erichblume` to `eblume`
- [x] Ran `mise run provision-indri -- --tags postgresql --check --diff` successfully
- [x] Verified connection as `eblume` superuser via Tailscale
- [x] Ran `mise run indri-services-check` - all services healthy
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/17
## Summary
- Add ansible role for devpi-server as a transparent PyPI caching proxy
- LaunchAgent with KeepAlive runs via `mise x -- devpi-server`
- Listens on port 3141, data stored in `~/devpi`
- Health checks added to `indri-services-check` script
## Manual Setup Required (on indri, before provisioning)
1. Add to `~/.config/mise/config.toml`:
```toml
[tools]
"pipx:devpi-server" = "latest"
"pipx:devpi-web" = "latest"
"pipx:devpi-client" = "latest"
```
2. Run `mise install`
3. Initialize: `mise x -- devpi-init --serverdir ~/devpi`
## Post-Provisioning
- Set up Tailscale service `pypi` on port 443 → 3141
- Configure client pip.conf with index-url
## Test plan
- [x] Ansible syntax check passes
- [x] Dry-run: `mise run provision-indri -- --check --diff`
- [x] Apply: `mise run provision-indri`
- [x] Health check: `mise run indri-services-check`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/9
- Create mise-tasks/indri-services-check script
- Checks all indri services (prometheus, grafana, kiwix, transmission, forgejo)
- Verifies both local service status and HTTP endpoints
- Transmission RPC checked via SSH since it's localhost-only (secure)
- Update CLAUDE.md with instructions to run after service changes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>