## Summary
- Update all references from `registry.tail8d86e.ts.net` to `registry.ops.eblu.me`
- Remove `tailscale_serve` ansible role (no longer needed - all services migrated to Caddy)
- Update minikube containerd config for new registry URL
- Update devpi manifest, CI actions, and mise tasks
## Deployment and Testing
- [ ] Run `mise run provision-indri -- --check --diff` (dry run)
- [ ] Run `mise run provision-indri -- --tags minikube` to update containerd config
- [ ] Sync devpi ArgoCD app: `argocd app sync devpi`
- [ ] Manually remove old Tailscale serve entry: `ssh indri 'tailscale serve --service=svc:registry off'`
- [ ] Test registry access: `curl https://registry.ops.eblu.me/v2/_catalog`
- [ ] Run `mise run indri-services-check` to verify all services healthy
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/58
## Summary
- Update CLAUDE.md with new service routing documentation
- Document the two DNS domains: `*.ops.eblu.me` (Caddy) vs `*.tail8d86e.ts.net` (Tailscale)
- Fix incorrect service listings (Prometheus/Loki are in k8s, not indri)
## ZK Updates (not in this PR)
Also updated the blumeops zk card with:
- Source code URL (forge is primary, GitHub is mirror)
- Services split into Caddy vs Tailscale sections
- Updated port map for Caddy
- Updated "Adding a New Service" instructions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.ops.eblu.me/eblume/blumeops/pulls/57
## Summary
- Add layer4 TCP proxy configuration to Caddyfile template for SSH services
- Configure Forgejo SSH on port 2222 → localhost:2200
- Switch HTTPS from port 8443 (testing) to 443 (production)
- Requires Caddy rebuilt with `github.com/mholt/caddy-l4` plugin
## What This Enables
Git+SSH access via `forge.ops.eblu.me:2222` is now accessible from:
- Tailnet clients (gilbert)
- Docker containers on indri
- Kubernetes pods in minikube
This solves the DNS resolution issues where containers couldn't reach Tailscale MagicDNS names.
## Testing Done
- [x] Caddy rebuilt with layer4 plugin
- [x] Validated Caddyfile syntax
- [x] Cleared `svc:forge` from tailscale serve
- [x] Verified HTTPS works: `curl https://forge.ops.eblu.me`
- [x] Verified SSH works: `ssh -p 2222 forgejo@forge.ops.eblu.me`
- [x] Verified git clone works via new endpoint
- [x] Verified minikube pods can reach both HTTPS and SSH endpoints
## Deployment
Caddy is already running with the new config on indri. This PR captures the ansible changes.
## Next Steps
- Update zk docs with new git remote format
- Migrate registry and other services to Caddy
- Retire tailscale_services ansible role
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/56
## Summary
- Add Caddy ansible role following zot pattern (manual build, ansible deploy)
- Caddy built with Gandi DNS plugin for ACME DNS-01 challenges
- Gandi PAT fetched from 1Password and written to secured file on indri
- Configure wildcard TLS for `*.ops.eblu.me`
- Initial services: forge, registry (indri-local)
- Uses port 8443 during testing to avoid Tailscale serve conflicts
## Build Instructions (already done)
On indri:
```bash
cd ~/code/3rd/caddy && mise run build
```
## Deployment and Testing
- [ ] Review Caddyfile configuration
- [ ] Run `mise run provision-indri -- --tags caddy` to deploy
- [ ] Test: `curl -v https://forge.ops.eblu.me:8443` (should get TLS cert)
- [ ] Test: `curl -v https://registry.ops.eblu.me:8443/v2/` (should return `{}`)
- [ ] Once verified, switch to port 443 and migrate services from Tailscale serve
## Files Changed
- `ansible/playbooks/indri.yml` - Add pre_task for Gandi PAT, add caddy role
- `ansible/roles/caddy/` - New role with Caddyfile and LaunchAgent templates
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/55
## Summary
- Restructure Pulumi into separate projects: `pulumi/tailscale/` and `pulumi/gandi/`
- Add Gandi LiveDNS management for `eblu.me` domain
- Create wildcard DNS record `*.ops.eblu.me` → indri's Tailscale IP (100.98.163.89)
- Add mise tasks: `dns-up`, `dns-preview`
- Update `tailnet-up` to pass `--yes` by default
- Document PAT cycling process (expires every 30 days)
## Background
This enables using real DNS names (`*.ops.eblu.me`) that resolve to Tailscale IPs,
which allows containers and other systems to resolve services without depending on
MagicDNS. Since Tailscale IPs (100.x.x.x) are not publicly routable, services remain
tailnet-only while using standard DNS.
## Deployment and Testing
- [ ] Run `cd pulumi/gandi && uv sync` to install dependencies
- [ ] Run `cd pulumi/gandi && pulumi stack init eblu-me` to create stack
- [ ] Run `mise run dns-preview` to verify configuration
- [ ] Run `mise run dns-up` to apply DNS records
- [ ] Verify with `dig +short test.ops.eblu.me` returns `100.98.163.89`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/54
Use simple grep and awk to parse plain text tailscale status output
instead of trying to parse JSON. Also show the status output for
debugging.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use 'tailscale status' to get indri's Tailscale IP and add it to
/etc/hosts for registry hostname resolution. The registry service
runs on indri, so we need indri's IP specifically.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Tailscale container's DNS doesn't work because it runs in userspace
mode. Instead, resolve the registry IP using 'tailscale ip' and add it
to /etc/hosts inside the container before running skopeo.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add tag:ci-gateway to tagOwners
- Grant ci-gateway access to registry on port 443
- Add test for ci-gateway -> registry access
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Docker Desktop's VM can't resolve tailnet hostnames. Work around this by:
1. Starting a Tailscale container that joins the tailnet
2. Building the image with docker build
3. Saving to tarball with docker save
4. Pushing via skopeo inside the Tailscale container
Uses TS_CI_GATEWAY_AUTHKEY repository secret for authentication.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fetches logs for Forgejo Actions runs from indri's local storage.
Logs are stored as zstd-compressed files in the forgejo data directory.
Usage: mise run indri-runner-logs <run_id>
Only works for runs executed by the indri-host-runner.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Add `containers/nettest/` with Alpine-based Dockerfile and connectivity test script
- Add `.forgejo/workflows/build-nettest.yaml` workflow triggered by `nettest-v*` tags
- Test script checks DNS resolution and HTTPS connectivity to forge and registry
## Deployment and Testing
- [ ] Merge PR to main
- [ ] Run `mise run container-release nettest v0.1.0` to trigger first build
- [ ] Verify workflow runs successfully and container can reach tailnet services
- [ ] Manually test from minikube: `kubectl run nettest --rm -it --image=registry.tail8d86e.ts.net/blumeops/nettest:v0.1.0`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/52
## Summary
- Replace Docker with Buildah for container image builds
- No Docker socket required - buildah is daemonless
- Cleaner security model (no privileged containers or socket mounting)
- Remove Docker-related security context from deployment
## Changes
- Update Dockerfile to install buildah/podman instead of docker-cli
- Configure buildah storage with overlay driver and fuse-overlayfs
- Update composite action to use `buildah bud` and `buildah push`
- Add `imagePullPolicy: Always` to ensure fresh image pulls
- Update test workflow to verify buildah/podman
## Testing
- [ ] Runner pod starts successfully
- [ ] Buildah is available in runner
- [ ] Test workflow verifies buildah/podman versions
- [ ] Container build workflow builds and pushes to zot
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/51
## Summary
- Fix workflow to use `github.*` context variables (Forgejo schema validator only recognizes GitHub Actions syntax, not `gitea.*` aliases)
- Pass untrusted inputs through environment variables (security best practice per actionlint)
- Add actionlint to Brewfile and pre-commit config to catch workflow validation errors locally
## Deployment and Testing
- [x] Pre-commit hooks all pass
- [x] actionlint validates `.forgejo/workflows/test.yaml` successfully
- [ ] Verify workflow runs without errors on Forge after merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/49
- Mark Phase 1 (Enable Actions) as completed with date
- Check off all verification items in P1
- Add Step 6 to Phase 4 for runner logging and metrics
- Update overview table with status column
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stage.match selector wasn't preventing Alloy from logging decode
errors internally. Removing logfmt parsing entirely - JSON parsing
handles most structured logs, and plain text logs still get collected.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Use `stage.match` to conditionally apply logfmt parsing only to lines that don't start with `{`
- This prevents error spam like `"failed to decode logfmt" component_path=/ component_id=loki.process.pods component=stage type=logfmt err="logfmt syntax error at pos 2 on line 1: unexpected '\"'"` when JSON-formatted logs hit the logfmt parser
## Deployment and Testing
- [ ] Sync alloy-k8s app to feature branch and verify errors stop appearing
- [ ] Verify JSON logs are still parsed correctly
- [ ] Verify logfmt logs (from Loki, Prometheus etc.) are still parsed correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/46
LaunchDaemons run in the system domain and require sudo to query.
Without become: true, the check always fails and tries to reload.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack.
Summary
- Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal)
- Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses
- Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics
- Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net)
- Add ACL rule for port 9187 (CNPG metrics)
- Delete obsolete ansible roles for prometheus and loki
Changes
- argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress
- argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress
- argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications
- argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS
- argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint
- argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics
- ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints
- pulumi/policy.hujson - ACL for port 9187
- Deleted ansible/roles/prometheus/ and ansible/roles/loki/
Deployment and Testing
- Stop prometheus and loki on indri
- Sync ArgoCD apps (apps, prometheus, loki, grafana)
- Run mise run provision-indri -- --tags alloy
- Verify Grafana dashboards show data
🤖 Generated with https://claude.ai/claude-code
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42
## Summary
- Remove ansible roles for services migrated to k8s: devpi, kiwix, transmission
- Also remove unused node_exporter and podman ansible roles
- Remove service tags from indri for k8s-hosted services (grafana, kiwix, devpi, pg, feed)
- Update indri description to reflect current architecture
## Changes
**Ansible roles removed** (34 files, ~1000 lines):
- devpi, devpi_metrics
- kiwix
- transmission, transmission_metrics
- node_exporter
- podman
**Pulumi indri tags removed**:
- tag:grafana, tag:kiwix, tag:devpi, tag:pg, tag:feed
These services now run in k8s with their own Tailscale devices via tailscale-operator.
## Deployment and Testing
- [x] Verified remaining ansible roles match indri.yml
- [x] Verified no playbooks or role dependencies reference removed roles
- [ ] Run `pulumi preview` to verify tag changes
- [ ] Run `pulumi up` to apply tag changes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/41
## Summary
- New `pr-comments` mise task queries Forge API for unresolved review comments on a PR
- Task takes a PR number as argument and displays all comments without a resolver
- Updated CLAUDE.md to include using this task after user reviews PRs
## Deployment and Testing
- [x] Tested task on PR #39 (shows no unresolved comments since all were resolved)
- [x] Tested error handling with non-existent PR #9999🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/40
## Summary
- Add Transmission BitTorrent daemon to k8s (torrent namespace)
- Add Kiwix ZIM archive server to k8s (kiwix namespace)
- NFS storage from sifaka for shared torrent/ZIM data
- Torrent-sync sidecar in kiwix deployment to manage declarative ZIM list
- ZIM-watcher CronJob to auto-restart kiwix when new archives appear
- Remove transmission, transmission_metrics, and kiwix ansible roles from indri
- Remove svc:kiwix from tailscale_serve defaults
## Key Decisions
- Direct NFS mount for kiwix (no PVC) since it shares storage with transmission
- Shell wrapper for kiwix-serve command (glob expansion)
- Accept HTTP 409 as "ready" in torrent sync (transmission session ID mechanism)
- Completed downloads stored in `/downloads/complete/` on sifaka
## Deployment and Testing
- [x] Deployed transmission to k8s
- [x] Verified transmission web UI at torrent.tail8d86e.ts.net
- [x] Moved existing ZIM files to complete folder
- [x] Deployed kiwix to k8s
- [x] Verified kiwix web UI at kiwix.tail8d86e.ts.net
- [x] Stopped old services on indri
- [x] Cleared svc:kiwix from Tailscale serve on indri
- [x] Updated zk documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/39
## Summary
- Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support
- Update ansible minikube role with qemu installation and containerd runtime
- Remove podman role dependency from indri.yml
- Add synology user creation steps and post-migration zot reconfiguration notes
## Why
Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support.
## Deployment and Testing
- [ ] Create k8s-storage user on Synology DSM
- [ ] Store credentials in 1Password (synology-k8s-storage)
- [ ] Export current k8s state
- [ ] Stop and delete podman-based minikube cluster
- [ ] Run ansible to create QEMU2 cluster
- [ ] Test NFS volume mount with test pod
- [ ] Redeploy ArgoCD and all apps
- [ ] Verify all services healthy
- [ ] Reconfigure zot registry mirrors for containerd (post-migration)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38
## Summary
- Detailed planning document for Phase 6 of k8s migration
- Transmission as standalone general-purpose torrent service with web UI at torrent.tail8d86e.ts.net
- NFS storage on sifaka (/volume1/torrents) shared between both services
- Declarative ZIM torrent list in kiwix's ConfigMap, synced to transmission via sidecar
- ZIM watcher CronJob for automatic kiwix restart when new archives complete
- Supports both GitOps (declarative) and interactive (web UI) torrent management
## Architecture Highlights
- **torrent namespace**: Standalone transmission with Tailscale ingress
- **kiwix namespace**: kiwix-serve with torrent-sync sidecar
- **Shared NFS PV**: Single PV referenced by PVCs in both namespaces
- **No backup needed**: Sifaka is RAID 5/6 and already the backup target
## Deployment and Testing
- [ ] Review plan document
- [ ] Verify NFS export on sifaka is feasible
- [ ] Approve architecture decisions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/35
Updated P3_postgresql.complete.md with full implementation notes including:
- borgmatic borg path fix
- Disaster recovery testing
- CloudNativePG managed roles for borgmatic user
- Dual database backup configuration
- ACL grant for homelab → k8s
- ArgoCD selfHeal disabled for feature branch workflow
- CNPG default values to prevent drift
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CloudNativePG operator fills in connectionLimit, ensure, and inherit
defaults on managed roles. Adding these explicitly keeps ArgoCD in sync.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Fixed borgmatic `borg: command not found` by adding `local_path` config option
- Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg
- Added borgmatic user to k8s-pg via CloudNativePG managed roles
- Configured borgmatic to backup both localhost and k8s-pg PostgreSQL databases
- Added Tailscale ACL grant for `tag:homelab` → `tag:k8s` on port 5432
- Disabled selfHeal on apps app to allow manual revision changes during development
## Changes
- `ansible/roles/borgmatic/` - Added `local_path` and k8s-pg database entry
- `ansible/roles/postgresql/tasks/main.yml` - Added k8s-pg to `.pgpass`
- `argocd/apps/apps.yaml` - Disabled selfHeal
- `argocd/manifests/databases/blumeops-pg.yaml` - Added borgmatic managed role
- `argocd/manifests/databases/secret-borgmatic.yaml.tpl` - New secret template
- `pulumi/policy.hujson` - Added ACL grant for backup access
## Deployment and Testing
- [x] Borgmatic backup runs successfully
- [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries verified)
- [x] borgmatic user created in k8s-pg with pg_read_all_data role
- [x] Both localhost and k8s-pg databases in backup archive
- [x] zk documentation updated (borgmatic.md, postgresql.md)
- [ ] After merge: set blumeops-pg app back to main revision
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/32