- Create composite action: .forgejo/actions/build-push-image
- Add build-runner.yaml workflow (triggers on Dockerfile changes)
- Add build-devpi.yaml workflow (triggers on Dockerfile/start.sh changes)
- Mount Docker socket in runner deployment for container builds
- Run runner as root to access Docker socket
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Fix workflow to use `github.*` context variables (Forgejo schema validator only recognizes GitHub Actions syntax, not `gitea.*` aliases)
- Pass untrusted inputs through environment variables (security best practice per actionlint)
- Add actionlint to Brewfile and pre-commit config to catch workflow validation errors locally
## Deployment and Testing
- [x] Pre-commit hooks all pass
- [x] actionlint validates `.forgejo/workflows/test.yaml` successfully
- [ ] Verify workflow runs without errors on Forge after merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/49
- Mark Phase 1 (Enable Actions) as completed with date
- Check off all verification items in P1
- Add Step 6 to Phase 4 for runner logging and metrics
- Update overview table with status column
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stage.match selector wasn't preventing Alloy from logging decode
errors internally. Removing logfmt parsing entirely - JSON parsing
handles most structured logs, and plain text logs still get collected.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Use `stage.match` to conditionally apply logfmt parsing only to lines that don't start with `{`
- This prevents error spam like `"failed to decode logfmt" component_path=/ component_id=loki.process.pods component=stage type=logfmt err="logfmt syntax error at pos 2 on line 1: unexpected '\"'"` when JSON-formatted logs hit the logfmt parser
## Deployment and Testing
- [ ] Sync alloy-k8s app to feature branch and verify errors stop appearing
- [ ] Verify JSON logs are still parsed correctly
- [ ] Verify logfmt logs (from Loki, Prometheus etc.) are still parsed correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/46
LaunchDaemons run in the system domain and require sudo to query.
Without become: true, the check always fails and tries to reload.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Note: the name of this branch was chosen before the scope widened to encompass the entire observability stack.
Summary
- Fix Grafana data source URLs (docker driver uses host.minikube.internal, not host.containers.internal)
- Migrate Prometheus and Loki from indri to Kubernetes with Tailscale Ingresses
- Expose CNPG PostgreSQL metrics via Tailscale and update dashboard to use cnpg_* metrics
- Update Alloy to push metrics/logs to k8s endpoints (prometheus.tail8d86e.ts.net, loki.tail8d86e.ts.net)
- Add ACL rule for port 9187 (CNPG metrics)
- Delete obsolete ansible roles for prometheus and loki
Changes
- argocd/manifests/prometheus/ - New Prometheus StatefulSet with 20Gi PVC and Tailscale Ingress
- argocd/manifests/loki/ - New Loki StatefulSet with 20Gi PVC and Tailscale Ingress
- argocd/apps/prometheus.yaml, argocd/apps/loki.yaml - ArgoCD Applications
- argocd/manifests/grafana/values.yaml - Data sources now use k8s internal DNS
- argocd/manifests/databases/service-metrics-tailscale.yaml - CNPG metrics endpoint
- argocd/manifests/grafana-config/dashboards/configmap-postgresql.yaml - Updated to cnpg_* metrics
- ansible/roles/alloy/defaults/main.yml - Push to k8s Tailscale endpoints
- pulumi/policy.hujson - ACL for port 9187
- Deleted ansible/roles/prometheus/ and ansible/roles/loki/
Deployment and Testing
- Stop prometheus and loki on indri
- Sync ArgoCD apps (apps, prometheus, loki, grafana)
- Run mise run provision-indri -- --tags alloy
- Verify Grafana dashboards show data
🤖 Generated with https://claude.ai/claude-code
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/42
## Summary
- Remove ansible roles for services migrated to k8s: devpi, kiwix, transmission
- Also remove unused node_exporter and podman ansible roles
- Remove service tags from indri for k8s-hosted services (grafana, kiwix, devpi, pg, feed)
- Update indri description to reflect current architecture
## Changes
**Ansible roles removed** (34 files, ~1000 lines):
- devpi, devpi_metrics
- kiwix
- transmission, transmission_metrics
- node_exporter
- podman
**Pulumi indri tags removed**:
- tag:grafana, tag:kiwix, tag:devpi, tag:pg, tag:feed
These services now run in k8s with their own Tailscale devices via tailscale-operator.
## Deployment and Testing
- [x] Verified remaining ansible roles match indri.yml
- [x] Verified no playbooks or role dependencies reference removed roles
- [ ] Run `pulumi preview` to verify tag changes
- [ ] Run `pulumi up` to apply tag changes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/41
## Summary
- New `pr-comments` mise task queries Forge API for unresolved review comments on a PR
- Task takes a PR number as argument and displays all comments without a resolver
- Updated CLAUDE.md to include using this task after user reviews PRs
## Deployment and Testing
- [x] Tested task on PR #39 (shows no unresolved comments since all were resolved)
- [x] Tested error handling with non-existent PR #9999🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/40
## Summary
- Add Transmission BitTorrent daemon to k8s (torrent namespace)
- Add Kiwix ZIM archive server to k8s (kiwix namespace)
- NFS storage from sifaka for shared torrent/ZIM data
- Torrent-sync sidecar in kiwix deployment to manage declarative ZIM list
- ZIM-watcher CronJob to auto-restart kiwix when new archives appear
- Remove transmission, transmission_metrics, and kiwix ansible roles from indri
- Remove svc:kiwix from tailscale_serve defaults
## Key Decisions
- Direct NFS mount for kiwix (no PVC) since it shares storage with transmission
- Shell wrapper for kiwix-serve command (glob expansion)
- Accept HTTP 409 as "ready" in torrent sync (transmission session ID mechanism)
- Completed downloads stored in `/downloads/complete/` on sifaka
## Deployment and Testing
- [x] Deployed transmission to k8s
- [x] Verified transmission web UI at torrent.tail8d86e.ts.net
- [x] Moved existing ZIM files to complete folder
- [x] Deployed kiwix to k8s
- [x] Verified kiwix web UI at kiwix.tail8d86e.ts.net
- [x] Stopped old services on indri
- [x] Cleared svc:kiwix from Tailscale serve on indri
- [x] Updated zk documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/39
## Summary
- Migrate minikube from podman driver to qemu2 driver for proper NFS/SMB volume mount support
- Update ansible minikube role with qemu installation and containerd runtime
- Remove podman role dependency from indri.yml
- Add synology user creation steps and post-migration zot reconfiguration notes
## Why
Phase 6 (Kiwix/Transmission migration) was blocked because the podman driver lacks kernel capabilities for filesystem mounts. QEMU2 creates an actual VM with full mount support.
## Deployment and Testing
- [ ] Create k8s-storage user on Synology DSM
- [ ] Store credentials in 1Password (synology-k8s-storage)
- [ ] Export current k8s state
- [ ] Stop and delete podman-based minikube cluster
- [ ] Run ansible to create QEMU2 cluster
- [ ] Test NFS volume mount with test pod
- [ ] Redeploy ArgoCD and all apps
- [ ] Verify all services healthy
- [ ] Reconfigure zot registry mirrors for containerd (post-migration)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/38
## Summary
- Detailed planning document for Phase 6 of k8s migration
- Transmission as standalone general-purpose torrent service with web UI at torrent.tail8d86e.ts.net
- NFS storage on sifaka (/volume1/torrents) shared between both services
- Declarative ZIM torrent list in kiwix's ConfigMap, synced to transmission via sidecar
- ZIM watcher CronJob for automatic kiwix restart when new archives complete
- Supports both GitOps (declarative) and interactive (web UI) torrent management
## Architecture Highlights
- **torrent namespace**: Standalone transmission with Tailscale ingress
- **kiwix namespace**: kiwix-serve with torrent-sync sidecar
- **Shared NFS PV**: Single PV referenced by PVCs in both namespaces
- **No backup needed**: Sifaka is RAID 5/6 and already the backup target
## Deployment and Testing
- [ ] Review plan document
- [ ] Verify NFS export on sifaka is feasible
- [ ] Approve architecture decisions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/35
Updated P3_postgresql.complete.md with full implementation notes including:
- borgmatic borg path fix
- Disaster recovery testing
- CloudNativePG managed roles for borgmatic user
- Dual database backup configuration
- ACL grant for homelab → k8s
- ArgoCD selfHeal disabled for feature branch workflow
- CNPG default values to prevent drift
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CloudNativePG operator fills in connectionLimit, ensure, and inherit
defaults on managed roles. Adding these explicitly keeps ArgoCD in sync.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Fixed borgmatic `borg: command not found` by adding `local_path` config option
- Successfully tested disaster recovery: restored miniflux data from borgmatic backup to k8s-pg
- Added borgmatic user to k8s-pg via CloudNativePG managed roles
- Configured borgmatic to backup both localhost and k8s-pg PostgreSQL databases
- Added Tailscale ACL grant for `tag:homelab` → `tag:k8s` on port 5432
- Disabled selfHeal on apps app to allow manual revision changes during development
## Changes
- `ansible/roles/borgmatic/` - Added `local_path` and k8s-pg database entry
- `ansible/roles/postgresql/tasks/main.yml` - Added k8s-pg to `.pgpass`
- `argocd/apps/apps.yaml` - Disabled selfHeal
- `argocd/manifests/databases/blumeops-pg.yaml` - Added borgmatic managed role
- `argocd/manifests/databases/secret-borgmatic.yaml.tpl` - New secret template
- `pulumi/policy.hujson` - Added ACL grant for backup access
## Deployment and Testing
- [x] Borgmatic backup runs successfully
- [x] Miniflux data restored to k8s-pg (2 users, 2 feeds, 44 entries verified)
- [x] borgmatic user created in k8s-pg with pg_read_all_data role
- [x] Both localhost and k8s-pg databases in backup archive
- [x] zk documentation updated (borgmatic.md, postgresql.md)
- [ ] After merge: set blumeops-pg app back to main revision
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/32
Documents lessons learned:
- SSH credential template for all forge repos
- Kustomize patches must omit namespace for matching
- Tailscale hostname cutover requires manual admin console deletion
- ArgoCD workflow: all apps target main, manual sync for control
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Kustomize matches patches before namespace transformation, so the
patch file shouldn't specify namespace (kustomization.yaml adds it).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The argocd-ssh-known-hosts-cm ConfigMap needs to be a resource,
not a patch, because the upstream install.yaml includes it inline
in a way kustomize can't patch.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Fixed borgmatic-metrics script failing in LaunchAgent context
- Changed from `mise x -- borg` to absolute paths (`/opt/homebrew/bin/borg`, `/opt/homebrew/bin/jq`)
- This fixes the Grafana dashboard showing "DOWN" for Repository Status and missing time series data
## Deployment and Testing
- [ ] Run `mise run provision-indri -- --tags borgmatic-metrics` to deploy the fix
- [ ] Wait for the hourly metrics collection (or manually run `ssh indri '~/bin/borgmatic-metrics'`)
- [ ] Verify Grafana dashboard shows "UP" status and populated graphs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/28
## Summary
- Add `tag:k8s-api` to Pulumi ACLs and indri device tags
- Configure Tailscale serve with TCP passthrough for k8s API at `k8s.tail8d86e.ts.net`
- Update minikube role to include `k8s.tail8d86e.ts.net` in certificate SANs
- Add `apiserver_port` config option (internal port 6443, dynamic host port with podman driver)
- Document Step 0.14 in k8s-migration plan (added post-Phase 0 completion)
The Kubernetes API is now accessible at `https://k8s.tail8d86e.ts.net` using TCP passthrough to preserve mTLS authentication.
## Deployment and Testing
- [x] Pulumi ACLs applied
- [x] Tailscale service created and approved in admin console
- [x] Minikube cluster recreated with new cert SANs
- [x] tailscale serve configured with TCP passthrough
- [x] 1Password credentials updated with new certs
- [x] Kubeconfig updated on gilbert
- [x] `mise run indri-services-check` passes
- [x] `kubectl --context=minikube-indri get nodes` works via Tailscale
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/27
## Summary
- Replace permissive wildcard ACL (`*` -> `*`) with specific service grants
- Admin: full access to all services including NAS
- Member: user-facing services only (no Grafana/Loki/NAS)
- Add device tagging for gilbert (workstation) and sifaka (NAS) via Pulumi
- SSH hardening: remove root access, use "check" action with MFA
- Add ACL tests to validate policy behavior
## Deployment and Testing
- [x] Pulumi preview passes
- [x] HuJSON syntax validated
- [x] ACL tests defined and passing
- [ ] Deploy with `mise run tailnet-up`
- [ ] Verify SSH access from gilbert to indri
- [ ] Verify Allison cannot access Grafana/Loki/NAS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: https://forge.tail8d86e.ts.net/eblume/blumeops/pulls/23