blumeops

Author	SHA1	Message	Date
Erich Blume	d26a6ae3b2	Update docs for Caddy routing and direct WireGuard peering Comprehensive docs pass reflecting the new Fly proxy architecture: - Fly proxy routes through Caddy on indri (not per-service TS Ingress) - Direct WireGuard peering via --port=41641 pinning - DERP relay performance lesson in Tailscale docs - Caddy now in public traffic path - indri tagged as flyio-target - Removed fly-reload references - Updated architecture diagrams and per-service setup guide - Added changelog fragment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 09:57:30 -07:00
Erich Blume	fe0e913963	Switch Fly proxy to upstream keepalive pools (#337 ) All checks were successful Deploy Fly.io Proxy / deploy (push) Successful in 1m37s Details ## Summary - Replace per-request DNS resolution (variable-based `proxy_pass`) with static `upstream` blocks and `keepalive` connection pools - Reuses TLS connections through the Tailscale tunnel instead of handshaking per request - Add `mise run fly-reload` for nginx config reload without full redeploy (re-resolves upstream DNS) ## Trade-off DNS is resolved at config load, not per-request. If Tailscale Ingress pods get new IPs (restart, reschedule), `mise run fly-reload` is needed. A Grafana alert will be added to detect this. ## Still TODO on this branch - [ ] Grafana alert for upstream unreachable (triggers fly-reload reminder) - [ ] Docs pass - [ ] Deploy from branch and verify latency improvement - [ ] Changelog fragment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #337	2026-04-17 16:39:52 -07:00
Erich Blume	d7c3c687f4	Document DR rebuild procedure and update restart-indri - New how-to: rebuild-minikube-cluster with full bootstrap procedure validated during 2026-04-13 DR event - Update restart-indri: warn about minikube delete, macOS permission dialog on first Tailscale SSH, forgejo_actions_secrets dep cycle - Update disaster-recovery reference: link to rebuild procedure - Update CLAUDE.md: never run minikube delete Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 18:07:54 -07:00
Erich Blume	6e60287e99	Doc review: delete install-dagger-on-nix-runner, add service-versions ref card Outdated leaf card removed; zot.md now links to new service-versions reference card instead. Added reverse link from review-services. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:52:38 -07:00
Erich Blume	8d80a4a3a5	Rewrite runner-logs: API-based log fetching, multi-repo support Replace broken SSH+filesystem log retrieval with Forgejo web API endpoint. Fix CLI to use run numbers (not task IDs), add --repo for querying any forge repo (e.g. sporks), --limit/-n for listing size. Document runner-logs as the way to verify build success in CLAUDE.md and container build docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:42:58 -07:00
Erich Blume	c06eccc61c	Review hosts.md: add last-reviewed, normalize links, add reference tag Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 21:06:53 -07:00
Erich Blume	c86b5d7772	Native Dagger container builds + Navidrome v0.61.1 (#330 ) All checks were successful Build Container / detect (push) Successful in 3s Details Build Container / build-dagger (navidrome) (push) Successful in 22m26s Details ## Summary - Move Dagger module from `.dagger/` to repo root (`src/blumeops/`), rename `blumeops-ci` → `blumeops` - Replace opaque `docker_build()` with native Dagger pipelines that surface full build errors per step - Migrate navidrome as the first container (`containers/navidrome/container.py`) - Upgrade navidrome from v0.60.3 to v0.61.1 (major artwork overhaul, SQLite FTS5 search, server-managed transcoding) - Add `dagger call container-version` for CI version extraction without Dockerfile parsing - All mise tasks (`container-list`, `container-version-check`, `container-build-and-release`) updated for hybrid mode - Legacy `docker_build()` fallback preserved for all other containers ## Motivation When navidrome v0.61.0 added a new Go build tag (`sqlite_fts5`), `docker_build()` showed only "exit code: 1". We had to run `docker build --progress=plain` manually to find `undefined: buildtags.SQLITE_FTS5`. Native Dagger pipelines show the full error inline. ## Container build dispatch needed After merge, dispatch container build for navidrome: ``` mise run container-build-and-release navidrome --ref `470b4bd` ``` ## Deploy steps 1. Wait for container build to complete 2. Back up navidrome-data PVC (non-reversible DB migrations) 3. `argocd app set navidrome --revision main && argocd app sync navidrome` 4. Verify at https://dj.ops.eblu.me ## Future Remaining containers migrate incrementally in follow-up PRs using the same pattern. Reviewed-on: #330	2026-04-11 17:11:56 -07:00
Erich Blume	40556e5a2d	Review gandi.md: add missing forge.eblu.me CNAME record The Pulumi code has had a forge.eblu.me CNAME since it was added, but the doc's DNS table only listed docs and cv. Also fixed the __main__.py description to mention CNAMEs alongside A records. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 09:54:46 -07:00
Erich Blume	07f52e9488	Deploy Paperless-ngx document management (#328 ) All checks were successful Build Container / detect (push) Successful in 2s Details Build Container / build-dockerfile (paperless) (push) Successful in 9s Details ## Summary - Add paperless-ngx (v2.20.13) as a new ArgoCD-managed service on indri - Dockerfile built from forge mirror (`mirrors/paperless-ngx`), multi-stage with s6-overlay - PostgreSQL database via `blumeops-pg` CNPG cluster, Redis sidecar for Celery - NFS document storage on sifaka (`/volume1/paperless`) - Authentik OIDC SSO via baked JSON blob from 1Password - Caddy route at `paperless.ops.eblu.me` - 1Password item "Paperless (blumeops)" created with all secrets ## Files - `containers/paperless/Dockerfile` — multi-stage build - `argocd/manifests/paperless/` — full k8s manifest set - `argocd/apps/paperless.yaml` — ArgoCD application - `argocd/manifests/databases/` — CNPG role + ExternalSecret - `ansible/roles/caddy/defaults/main.yml` — Caddy route - `service-versions.yaml` — version tracking entry - `docs/reference/services/paperless.md` — reference card ## Remaining deploy steps 1. Build container: `mise run container-build-and-release paperless` 2. Update kustomization.yaml `newTag` with actual image tag 3. Create Authentik application/provider for paperless 4. Create `paperless` database on blumeops-pg 5. Sync ArgoCD apps, then sync paperless from branch 6. Provision Caddy: `mise run provision-indri -- --tags caddy` 7. Verify at https://paperless.ops.eblu.me 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #328	2026-04-08 17:54:12 -07:00
Erich Blume	efae404d1e	Remove superuser from teslamate PG role, transfer extension ownership teslamate had superuser on the shared blumeops-pg cluster (which also hosts miniflux and authentik). Downgraded to plain database owner with extension ownership (cube, earthdistance) transferred manually so it can still ALTER EXTENSION UPDATE. earthdistance is untrusted in PG so DROP+CREATE would need temporary superuser escalation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 15:36:39 -07:00
Erich Blume	fc34a7da5b	Review postgresql.md: add authentik user/db, immich-pg borgmatic secret Doc review found the authentik database, user, and external secret were missing, along with the immich-pg borgmatic secret. Added Cluster column to Users table for clarity. Set last-reviewed: 2026-04-07. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 15:21:48 -07:00
Erich Blume	f42fa2d558	Remove stale Helm chart mirror references from forgejo docs All Helm chart mirrors (grafana-helm-charts, connect-helm-charts, cloudnative-pg-charts) have been deleted from forge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 07:37:21 -07:00
Erich Blume	64200a55c5	Migrate Immich from Helm chart to kustomize manifests (v2.5.6 → v2.6.3) Replace the Helm chart deployment with plain kustomize manifests following the Authentik pattern (separate deployments per component). Consolidate the immich-storage ArgoCD app into the main immich app. Add no-helm-policy doc establishing kustomize as the standard deployment mechanism. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:42:25 -07:00
Erich Blume	08d57ef4d4	Review pulumi reference doc: fix Tailscale auth, add how-to links Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 10:55:57 -07:00
Erich Blume	a18a424866	Pin NixOS service versions via nixpkgs-services overlay (#321 ) ## Summary - Add `nixpkgs-services` flake input pinned to a specific nixpkgs commit, with an overlay that pulls `forgejo-runner`, `snowflake`, and `k3s` from it instead of the rolling `nixpkgs` - Dagger `flake-update` pipeline now excludes `nixpkgs-services` via `--exclude` - Fix stale nix-container-builder version in service-versions.yaml (was 12.6.4, actually running 12.7.2) - Add k3s and minikube to service-versions.yaml tracking - Document the pinning approach in review-services how-to and ringtail reference ## Motivation During service review, discovered that flake updates had silently upgraded forgejo-runner from 12.6.4 → 12.7.2 without updating service-versions.yaml. This "sneak-in upgrade" bypasses the service review process. The overlay ensures these three services only change versions deliberately. ## Test plan - [ ] Verify `nix flake update` from `nixos/ringtail/` does not change `nixpkgs-services` lock entry - [ ] Verify `mise run provision-ringtail` builds successfully with the overlay - [ ] Confirm running service versions unchanged after deploy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #321	2026-04-01 21:37:57 -07:00
Erich Blume	cfbf4cadbd	Review argocd-cli reference doc: stamp last-reviewed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 20:35:52 -07:00
Erich Blume	4059b3d27b	Add compensating controls framework and date-based report dirs (#320 ) ## Summary - Add `compensating-controls.yaml` tracking 9 named controls that justify suppressed security findings - Update all Prowler mutelist descriptions with `CC: <id>` references to named controls - Add `mise run review-compensating-controls` task — surfaces stalest control with all codebase references - Add [[review-compensating-controls]] how-to doc - Organize Prowler and Kingfisher reports into `YYYY-MM-DD` subdirectories ### Compensating controls \| ID \| Mitigates \| \|----\|-----------\| \| `single-user-cluster` \| Image cache abuse, RBAC breadth, system pod privileges \| \| `tailscale-network-isolation` \| Profiling endpoints, weak TLS, debug ports \| \| `local-registry` \| AlwaysPullImages gap \| \| `sso-gated-admin-tools` \| ArgoCD wildcard RBAC \| \| `operator-managed-pods` \| Tailscale proxy pod security settings \| \| `ephemeral-privileged-jobs` \| Prowler hostPID exposure \| \| `trusted-ci-only` \| Forgejo runner DinD \| \| `init-container-isolation` \| Grafana root init container \| \| `observability-stack-audit` \| Missing apiserver audit logging \| ## Test plan - [ ] `mise run review-compensating-controls` shows table and references - [ ] `kubectl kustomize argocd/manifests/prowler/` renders correctly - [ ] Sync prowler and kingfisher, verify next scan writes to dated subdirectory - [ ] Grep for `CC:` in mutelist files — every muted finding should have at least one 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #320	2026-03-30 17:44:11 -07:00
Erich Blume	1e391f96bb	Upgrade forgejo-runner 12.7.0 → 12.7.3, add service card Patch upgrade picks up idempotent FetchTask API, offline registration fix, cloudflare/circl security dep update, and custom gRPC user-agent. No config defaults changed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 16:31:06 -07:00
Erich Blume	77eebe507e	Review Ansible reference doc: add missing roles, clarify IaC positioning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 16:10:24 -07:00
Erich Blume	f9206bf10b	Build custom Kingfisher container from sporked deploy branch (#318 ) All checks were successful Build Container / detect (push) Successful in 2s Details Build Container / build-nix (kingfisher) (push) Successful in 12s Details ## Summary - Add Dockerfile for Kingfisher built from source (sporked deploy branch) - Multi-stage: Rust build with Boost/vectorscan, debian-slim runtime - Switch CronJob from upstream `ghcr.io/mongodb/kingfisher` to `registry.ops.eblu.me/blumeops/kingfisher` - Add kingfisher to service-versions.yaml (version tracks upstream main SHA) - Document spork workflow in CLAUDE.md ## Test plan - [ ] Build container: `mise run container-build-and-release kingfisher 1d37d29` - [ ] Verify image on registry: `mise run container-list` - [ ] Update kustomization newTag - [ ] Sync ArgoCD kingfisher app from branch - [ ] Trigger manual CronJob and verify scan completes - [ ] Verify reports on sifaka Reviewed-on: #318	2026-03-30 06:34:49 -07:00
Erich Blume	bb60369956	Simplify Kingfisher CronJob to HTML-only output Remove the second scan pass for JSON — one format is enough for now. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:50:54 -07:00
Erich Blume	2808ffd450	Document Kingfisher secret scanner service Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:47:37 -07:00
Erich Blume	2bd1611ac1	Document sifaka NFS/Tailscale TUN troubleshooting Sifaka's Tailscale can revert to userspace networking after package updates, causing NFS mounts to fail because the NFS daemon sees 127.0.0.1 instead of the client's Tailscale IP. Added troubleshooting how-to doc and updated sifaka reference card with frigate export and TUN requirement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 09:12:00 -07:00
Erich Blume	3017f759a7	Migrate Forgejo from Homebrew to source build (#316 ) ## Summary - Migrate Forgejo from Homebrew to source-built binary with mcquack LaunchAgent - Matches the established pattern used by zot, caddy, and alloy - Upgrades to v14.0.3 (7 security fixes: PKCE bypass, OAuth scope bypass, open redirect, and more) ## Changes - Ansible role: Replace brew install/services with binary stat check + LaunchAgent - Paths: `/opt/homebrew/var/forgejo` → `~/forgejo`, binary at `~/code/3rd/forgejo/forgejo` - Run user: `forgejo` → `erichblume` (LaunchAgent user; SSH git user stays `forgejo`) - Docs: Updated Forgejo reference card, restart-indri guide - Service review: Stamped frigate-notify, cloudnative-pg, blumeops-pg as current ## One-time migration steps (manual, on indri) 1. Clone from Codeberg, add forge mirror remote 2. Check out v14.0.3, build with `make build && make forgejo` 3. Stop brew, `cp -a` data to `~/forgejo`, fix ownership 4. Run `provision-indri --tags forgejo` 5. Verify, then `brew uninstall forgejo` ## Data safety - `cp -a` preserves everything (repos, SQLite DB, LFS, sessions, OAuth config) - Brew version stays installed as rollback until verification passes - No schema changes between 14.0.2 → 14.0.3 Reviewed-on: #316	2026-03-28 08:19:23 -07:00
Erich Blume	c78b86c72c	Add offsite backup for immich photo library to BorgBase (#315 ) ## Summary - Adds a second borgmatic config (`photos.yaml`) that backs up `/Volumes/photos` (sifaka SMB mount, ~128 GB) to a dedicated BorgBase repo (`immich-photos`), running daily at 4 AM - Separate launchd agent (`mcquack.eblume.borgmatic-photos`) so photo backups run independently from the main backup - Refactors `borgmatic_metrics` script to support multiple repos with a `repo` Prometheus label - Updates Grafana "Borg Backups" dashboard with a `repo` template variable so you can filter/compare repos - Docs updated: `backups.md`, `borgmatic.md` ## Prerequisites (manual) - [x] Create `immich-photos` repo on BorgBase with same SSH key - [ ] Upgrade BorgBase plan to Small ($24/yr) if currently on free tier (128 GB exceeds 10 GB limit) - [ ] After deploy: `borg init` the new repo (borgmatic does this automatically on first run) ## Test plan - [ ] Dry run: `mise run provision-indri -- --check --diff --tags borgmatic,borgmatic_metrics` - [ ] Deploy borgmatic role and verify both configs deployed - [ ] Run `borgmatic --config ~/.config/borgmatic/photos.yaml create --verbosity 1` manually for first backup (will take hours) - [ ] Verify metrics script collects from both repos: `~/.local/bin/borgmatic-metrics && cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom` - [ ] Sync grafana-config in ArgoCD and verify dashboard repo selector works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #315	2026-03-27 19:43:05 -07:00
Erich Blume	ca0c9354ee	Add borgmatic backups for authentik and immich databases (#314 ) ## Summary - Add `authentik` database (blumeops-pg cluster) to borgmatic pg_dump backups - Add `immich` database (immich-pg cluster) to borgmatic pg_dump backups - For immich-pg: new borgmatic managed role with `pg_read_all_data`, ExternalSecret, Tailscale LoadBalancer service, and Caddy L4 TCP proxy on port 5433 - Update backup docs to reflect all four CNPG databases + mealie SQLite ## Deploy plan Deploy order matters — k8s resources must exist before ansible can route to them: 1. ArgoCD (databases app): sync to pick up immich-pg borgmatic role, ExternalSecret, and Tailscale service ``` argocd app set blumeops-pg --revision feature/borgmatic-all-pg-backups argocd app sync blumeops-pg ``` 2. Wait for `immich-pg-tailscale` service to get a Tailscale IP and `immich-pg.tail8d86e.ts.net` to resolve 3. Ansible (caddy): deploy Caddy L4 route for port 5433 ``` mise run provision-indri -- --tags caddy ``` 4. Ansible (borgmatic): deploy updated config and .pgpass ``` mise run provision-indri -- --tags borgmatic ``` 5. Verify: trigger a manual borgmatic run and check all four pg_dump streams succeed ``` borgmatic --verbosity 1 2>&1 \| grep -E '(Dumping\|ERROR)' ``` ## Test plan - [x] `kubectl kustomize` builds cleanly - [x] `ansible --check --diff` for borgmatic and caddy show expected changes - [ ] ArgoCD sync succeeds for databases app - [ ] `immich-pg.tail8d86e.ts.net` resolves - [ ] `pg.ops.eblu.me:5433` accepts connections - [ ] `borgmatic --verbosity 1` dumps all four databases without errors Reviewed-on: #314	2026-03-27 16:59:58 -07:00
Erich Blume	33463764d1	Add QArt Tuner: QR code art generator with interactive web UI Single-file Go tool implementing the QArt technique (Russ Cox, 2012) using only the public rsc.io/qr API. Generates QR codes whose data modules form a recognizable image by exploiting error correction freedom via GF(2) Gaussian elimination. Includes a web UI with live-updating sliders for version, mask, rotation, dx/dy offset, and scale. Keyboard shortcuts for rapid iteration. Also works as a CLI for batch generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 15:33:36 -07:00
Erich Blume	831b82950a	Upgrade nvidia-device-plugin v0.18.2 → v0.19.0 and add reference card Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 07:19:24 -07:00
Erich Blume	687e972713	Review CV doc and close build-dep review gap Fix stale CV service doc (URL, forge domain, container tag) and add guidance for reviewing build-time dependencies in private forge repos during service reviews. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 07:12:38 -07:00
Erich Blume	fc8d2cdb12	Add preserve/* branch protection and document Pyroscope blocker branch-cleanup: Add PROTECTED_PREFIXES with preserve/* exclusion so preserved work-in-progress branches are never deleted. observability.md: Document Pyroscope profiling work on branch preserve/pyroscope-profiling/pr-313, blocked on ringtail kernel sysctl settings (kptr_restrict=0, perf_event_paranoid≤1). Also document Faro/RUM as future potential with privacy considerations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 15:32:25 -07:00
Erich Blume	b97e37543f	Deploy Tor Snowflake proxy on ringtail (#311 ) ## Summary - Add Snowflake proxy as a native systemd service on ringtail (NixOS) - Uses `pkgs.snowflake` from nixpkgs (v2.11.0) - Hardened systemd unit with DynamicUser, ProtectSystem=strict, 512MB memory limit - Prometheus metrics enabled on localhost:9999 ## What is Snowflake? A Tor pluggable transport that helps censored users reach the Tor network via WebRTC. This is NOT a Tor exit node — traffic exits through Tor exit nodes operated by others. The proxy operator cannot see traffic content (double-encrypted) and destination servers never see the proxy's IP. ## Changes - `nixos/ringtail/configuration.nix` — new systemd service definition - `docs/reference/services/snowflake-proxy.md` — service reference card - `docs/reference/infrastructure/ringtail.md` — updated systemd services section - `service-versions.yaml` — added entry (type: nixos) ## Deploy plan After review, deploy via `mise run provision-ringtail`. Service starts automatically. ## Test plan - [ ] `mise run provision-ringtail` succeeds - [ ] `ssh ringtail 'systemctl status snowflake-proxy'` shows active - [ ] `ssh ringtail 'journalctl -u snowflake-proxy --no-pager -n 20'` shows broker connections - [ ] `ssh ringtail 'curl -s localhost:9999/metrics'` returns Prometheus metrics Reviewed-on: #311	2026-03-24 20:51:40 -07:00
Erich Blume	fe201a495c	Add Prowler IaC scanning of blumeops repo (Saturday 2am) Clone repo in init container, scan Dockerfiles and K8s manifests with Prowler's IaC provider (Trivy). Reports written to sifaka:/volume1/reports/prowler-iac/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 16:49:38 -07:00
Erich Blume	696024306c	Add Prowler image vulnerability scanning for blumeops containers All checks were successful Build Container / detect (push) Successful in 39s Details Build Container / build-dockerfile (prowler) (push) Successful in 10m15s Details Add Trivy to the Prowler container for image and IaC scanning. New CronJob (Saturday 3am) scans all blumeops/* images in the registry for CVEs, embedded secrets, and Dockerfile misconfigs. Reports written to sifaka:/volume1/reports/prowler-images/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 16:43:08 -07:00
Erich Blume	d021b3534f	Deploy Prowler CIS scanner (#310 ) All checks were successful Build Container / detect (push) Successful in 4s Details Build Container / build-dockerfile (prowler) (push) Successful in 10s Details ## Summary - Deploy Prowler 5 as a weekly CronJob on minikube-indri for CIS Kubernetes Benchmark v1.11 scanning - Custom slim container build (strips PowerShell, Trivy, and non-K8s providers from upstream) - Reports (HTML, CSV, JSON-OCSF) written to NFS share on sifaka at `/volume1/reports/prowler/` - Read-only ClusterRole for pod, RBAC, and control plane inspection - Host path mounts + hostPID for kubelet file permission checks ## Follow-ups - Mirror prowler-cloud/prowler on forge for supply chain control - Build and push container image, update kustomization.yaml newTag - Consider adding k3s-ringtail scanning (core + RBAC checks only) ## Test plan - [ ] Build container: `mise run container-release prowler v5.22.0` - [ ] Update `argocd/manifests/prowler/kustomization.yaml` newTag to built image tag - [ ] Sync ArgoCD: `argocd app sync apps && argocd app set prowler --revision deploy-prowler && argocd app sync prowler` - [ ] Trigger manual job: `kubectl create job --from=cronjob/prowler prowler-manual -n prowler --context=minikube-indri` - [ ] Verify reports appear on sifaka NFS share - [ ] `mise run services-check` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #310	2026-03-24 16:08:09 -07:00
Erich Blume	fc45989a6c	Decommission JobSync service (#308 ) All checks were successful Build Container / detect (push) Successful in 3s Details ## Summary - Remove all JobSync infrastructure: ArgoCD app, k8s manifests, container build (nix), Caddy reverse proxy entry, Homepage dashboard entry, service-versions tracking, and all documentation - Runtime teardown already completed: ArgoCD app cascade-deleted (removes deployment, PVC, service, ingress, external-secret), forge mirror deleted, 1Password item archived, local clone removed ## Motivation Replacing JobSync with a datasette-based job tracking pipeline driven by mise tasks and a Claude agent frontend. JobSync's Next.js server actions don't expose a useful API for automation. ## Remaining manual steps after merge - Provision Caddy to remove the stale proxy route: `mise run provision-indri -- --tags caddy` - Sync Homepage: `argocd app sync homepage` - Verify namespace cleanup on ringtail: `kubectl get ns jobsync --context=k3s-ringtail` (should be gone) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #308	2026-03-24 08:44:23 -07:00
Erich Blume	bd0ff30d3f	Unify container build workflows (#306 ) All checks were successful Build Container / detect (push) Successful in 3s Details ## Summary - Merges `build-container.yaml` and `build-container-nix.yaml` into a single workflow - Detect job classifies each changed container by presence of `Dockerfile` and/or `default.nix` - Dockerfile containers build on `k8s` (indri) via Dagger; Nix containers build on `nix-container-builder` (ringtail) via nix-build + skopeo - Containers with both build files (alloy, nettest, ntfy) get built on both runners ## Test plan - [ ] Push a change to a Dockerfile-only container (e.g. grafana) — verify it builds on k8s only - [ ] Push a change to a nix-only container (e.g. jobsync) — verify it builds on nix-container-builder only - [ ] Push a change to a dual container (e.g. ntfy) — verify it builds on both runners - [ ] Test workflow_dispatch with a specific container name 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #306	2026-03-23 20:55:50 -07:00
Erich Blume	06e721841c	Review 12 reference docs: fix stale image refs, expand stubs, add cross-refs Replace hardcoded image tags in Quick Reference tables with pointers to kustomization manifests (tags drift with every container release). Fix Prometheus CNPG scrape target, remove misleading .ts.net URLs, expand external-secrets stub, add backup/disaster-recovery cross-references. Limit doc-reviewer agent to one doc per cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 09:51:57 -07:00
Erich Blume	2cb0ce4289	Review and correct Tailscale reference doc Fix wrong ACL path, add missing device tags (ringtail, per-service tags, ci-gateway, flyio-proxy), correct access matrix (PyPI→DevPI, homelab grants), add homelab→homelab SSH rule, document auto approvers section, and add last-reviewed frontmatter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 18:18:45 -07:00
Erich Blume	6d65e6928c	C2: Deploy infrastructure alerting pipeline (#303 ) ## Summary Mikado chain to replace `mise run services-check` with Grafana Unified Alerting backed by ntfy push notifications. Design: - Grafana Unified Alerting evaluates rules against Prometheus/Loki - ntfy webhook contact point delivers iOS notifications - Anti-noise policy: page once per 24h per alert group - Every alert links to a runbook in `docs/how-to/alerts/` - services-check eventually queries the alerting API instead of doing its own probes Chain (bottom-up): 1. `configure-grafana-alerting-pipeline` — enable alerting, ntfy contact point, notification policy 2. `first-alert-and-runbook` — end-to-end proof of concept with blackbox probe failure 3. `port-services-check-alerts` — migrate all services-check probes to alert rules + runbooks 4. `refactor-services-check-to-query-alerts` — rewrite services-check to query Grafana API 5. `deploy-infra-alerting` — goal card 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #303	2026-03-22 14:52:56 -07:00
Erich Blume	528d3da327	Review power.md: add ringtail, mark reviewed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 07:37:31 -07:00
Erich Blume	1f000c8e39	Add last-updated subsort to docs-review, review gilbert card Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 13:22:01 -07:00
Erich Blume	995478b91f	Review jellyfin and automounter services Both services current: jellyfin 10.11.6 (latest upstream), automounter 1.11.0 (Mac App Store). Add missing frigate share to automounter docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 13:06:23 -07:00
Erich Blume	6d7597670e	Add plan-a-meal how-to for Mealie cooking timelines Agent-facing guide for generating unified cooking timelines from Mealie meal plans. Covers querying the API, picking balanced meals (protein/carb/vegetable), and interleaving recipe steps into a relative timeline so everything finishes together. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 11:07:16 -07:00
Erich Blume	11330ebea0	Deploy Mealie recipe manager (#299 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (mealie) (push) Successful in 2s Details Build Container / build (mealie) (push) Successful in 8s Details ## Summary - Deploy Mealie (self-hosted recipe manager) on minikube-indri via ArgoCD - Build container from source via forge mirror (`mirrors/mealie`) — multi-stage Dockerfile with Node.js frontend + Python/uv backend - Add Caddy proxy entry for `meals.ops.eblu.me` - Part of a larger meal planning pipeline: Mealie stores categorized recipes, a planner script selects balanced meals, and Ollama generates unified cooking timelines ## Status - [x] Mirror mealie repo on forge - [x] Dockerfile (from-source build) - [x] ArgoCD app + k8s manifests - [x] Caddy proxy entry - [x] Service docs, routing table, app registry - [ ] Local Dagger build test - [ ] Container build + push to registry - [ ] Update kustomization.yaml with real image tag - [ ] Deploy and verify - [ ] Provision Caddy ## Test plan - Build container locally via `dagger call build --src=. --container-name=mealie` - Trigger CI build via `mise run container-build-and-release mealie` - Deploy from branch: `argocd app set mealie --revision deploy-mealie && argocd app sync mealie` - Verify Mealie UI at `https://meals.ops.eblu.me` - Verify API docs at `https://meals.ops.eblu.me/docs` Reviewed-on: #299	2026-03-16 21:59:10 -07:00
Erich Blume	4dc3e5cae2	Add UnPoller for UniFi network metrics (#298 ) All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details Build Container (Nix) / build (unpoller) (push) Successful in 2s Details Build Container / build (unpoller) (push) Successful in 7s Details ## Summary - Deploy UnPoller as a k8s service on indri to export UniFi controller metrics to Prometheus - Custom-built container from forge mirror (`containers/unpoller/Dockerfile`) - Credentials pulled from 1Password via external-secrets - Prometheus scrape job added, docs and service-versions updated ## Test plan - [ ] Build container: `mise run container-release unpoller v2.34.0` - [ ] Update kustomization tag with built image tag - [ ] Deploy from branch: `argocd app set unpoller --revision feature/unpoller && argocd app sync unpoller` - [ ] Verify pod connects to UX7 controller (check logs) - [ ] Confirm `unpoller` target appears in Prometheus - [ ] Query `unifi_` metrics in Grafana 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #298	2026-03-16 15:52:45 -07:00
Erich Blume	f46a04b902	Restructure docs: consolidate, recategorize, and extract All checks were successful Build Container (Nix) / detect (push) Successful in 2s Details Build Container / detect (push) Successful in 2s Details - Consolidate 4 Authentik Nix derivation docs into one card (authentik-nix-build-components.md) - Merge build-grafana-container + build-grafana-sidecar into build-grafana-images.md - Move agent-change-process from how-to/ to explanation/ (it's a methodology doc, not a task guide) - Extract Caddy custom build section from reference card into how-to/deployment/build-caddy-with-plugins.md - Move expose-service-publicly from how-to/ to tutorials/ (it's a comprehensive walkthrough, not a quick task reference) - Update all wiki-link references across affected docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 19:55:59 -07:00
Erich Blume	ac01c2d6e2	Fix stale docs and shell quoting in devpi start script - ArgoCD ref: correct Git Source URL to forge.ops.eblu.me:2222 - Authentik ref: add Zot as active OIDC client, blueprint, and secret - Federated login: remove Zot from Future Work (completed in PR #236) - devpi/start.sh: use bash array for command building (proper quoting) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 19:25:27 -07:00
Erich Blume	5b008a6ab6	Document ai-sources in AI guide, change process, and mise-tasks ref Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 18:43:39 -07:00
Erich Blume	2bea048dbf	Externalize Tailscale operator to forge mirror (#295 ) ## Summary - Mirrors `tailscale/tailscale` on forge (`mirrors/tailscale`) - Replaces vendored `operator.yaml` (495 KB / 5,386 lines) with ArgoCD apps sourcing the upstream static manifest, pinned via `targetRevision: v1.94.2` - Adds `tailscale-operator-base` app for indri and `tailscale-operator-base-ringtail` for ringtail - Local kustomization retains only ProxyClass and DNSConfig custom resources - Updates `[[tailscale-operator]]` doc to reflect new sourcing ## Deployment and Testing - [ ] Register `mirrors/tailscale` repo in ArgoCD (it needs to know about the new repo) - [ ] Sync `apps` app to pick up the new `tailscale-operator-base` app definitions - [ ] Sync `tailscale-operator-base` — verify CRDs, RBAC, operator Deployment come up - [ ] Sync `tailscale-operator` — verify ProxyClass, DNSConfig still apply cleanly - [ ] Verify existing Tailscale Ingresses still work (ProxyGroup pods healthy) - [ ] Repeat for ringtail cluster - [ ] After merge: apps already point at tags, no revision reset needed Reviewed-on: #295	2026-03-15 17:44:35 -07:00
Erich Blume	272ea1e767	Upgrade Caddy v2.10.2 → v2.11.2, fix forge mirrors (#294 ) ## Summary - Upgrade Caddy from v2.10.2 to v2.11.2 (7 CVE fixes across v2.11.1 and v2.11.2) - Create `mirrors/caddy-l4` forge mirror for Layer 4 plugin - Migrate all `~/code/3rd` clones on indri from `localhost:3001` to HTTPS `forge.ops.eblu.me/mirrors/` remotes - Remove stale clones (`apple-silicon-detector`, `whisper.cpp`) - Update caddy docs and service-versions tracking ## CVEs Fixed - CVE-2026-27585 through CVE-2026-27590 (path/host bypass, TLS fail-open, FastCGI issues) - Forward auth identity injection (privilege escalation) - `vars_regexp` placeholder secret exposure - Built on Go 1.26.1 (patches Go-level CVEs) ## What was done on indri (not in repo) - `xcaddy build` with Gandi DNS + Layer 4 plugins → `~/code/3rd/caddy/bin/caddy` now v2.11.2 - Remotes updated: caddy, forgejo-runner, zot → `https://forge.ops.eblu.me/mirrors/.git` - Deleted: `~/code/3rd/apple-silicon-detector`, `~/code/3rd/whisper.cpp` ## Deployment and Testing - [x] Ansible dry-run passed (`--tags caddy --check --diff`) - [ ] Restart caddy LaunchAgent to pick up the new binary - [ ] Verify all proxied services respond via `.ops.eblu.me` - [ ] Run `mise run services-check` Reviewed-on: #294	2026-03-15 10:33:48 -07:00

1 2 3

129 commits