blumeops

Author	SHA1	Message	Date
Erich Blume	0ec0246847	Remove doc-reviewer agent	2026-03-30 16:12:48 -07:00
Erich Blume	77eebe507e	Review Ansible reference doc: add missing roles, clarify IaC positioning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 16:10:24 -07:00
Erich Blume	c069f889d2	Harden borgmatic photos backup: restrict dirs, add keepalives + checkpoints Restrict backup to library/ and upload/ only (skip regenerable encoded-video/, thumbs/, backups/). Add SSH ServerAliveInterval to prevent broken pipe on long transfers, and checkpoint_interval so interrupted backups save progress. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 10:30:28 -07:00
Erich Blume	b000efd6c3	Fix Kingfisher CronJob exit code handling Kingfisher exits 200 (findings) or 205 (validated findings) on success. Normalize these to 0 so the CronJob completes instead of restarting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 07:16:02 -07:00
Erich Blume	457ab19416	Scope Kingfisher scan to eblume user only on ringtail Mirror repos cause scan failures (likely ephemeral storage or timeout). Scan only eblume/ repos until we investigate the root cause. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 07:11:52 -07:00
Erich Blume	2c1f0abefc	Deploy Kingfisher v165768b-0fe0eed-nix (tmp permissions fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 06:54:57 -07:00
Erich Blume	0fe0eed35a	Fix Kingfisher container: make /tmp world-writable All checks were successful Build Container / detect (push) Successful in 2s Details Build Container / build-nix (kingfisher) (push) Successful in 22s Details Container runs as user 65534 (nobody) but /tmp was owned by root. Set sticky bit + world-writable (1777) like a standard /tmp. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 06:53:34 -07:00
Erich Blume	14f366f993	Deploy Kingfisher v165768b-c494b62-nix (/tmp fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 06:51:17 -07:00
Erich Blume	c494b62713	Fix Kingfisher container: add /tmp directory All checks were successful Build Container / detect (push) Successful in 2s Details Build Container / build-nix (kingfisher) (push) Successful in 24s Details Kingfisher needs a writable temp directory for git clones and scanning. Nix containers don't create /tmp by default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 06:49:59 -07:00
Erich Blume	b01afb1c1d	Deploy Kingfisher v165768b-aa9cc70-nix (bash fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 06:47:26 -07:00
Erich Blume	aa9cc709ec	Fix Kingfisher container: add bash and coreutils for CronJob shell All checks were successful Build Container / detect (push) Successful in 2s Details Build Container / build-nix (kingfisher) (push) Successful in 22s Details Nix containers don't include a shell by default. The CronJob needs /bin/bash for the inline script that generates timestamped filenames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 06:45:39 -07:00
Erich Blume	f0c6845f0f	Deploy custom Kingfisher container v165768b-f9206bf-nix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 06:42:24 -07:00
Erich Blume	f9206bf10b	Build custom Kingfisher container from sporked deploy branch (#318 ) All checks were successful Build Container / detect (push) Successful in 2s Details Build Container / build-nix (kingfisher) (push) Successful in 12s Details ## Summary - Add Dockerfile for Kingfisher built from source (sporked deploy branch) - Multi-stage: Rust build with Boost/vectorscan, debian-slim runtime - Switch CronJob from upstream `ghcr.io/mongodb/kingfisher` to `registry.ops.eblu.me/blumeops/kingfisher` - Add kingfisher to service-versions.yaml (version tracks upstream main SHA) - Document spork workflow in CLAUDE.md ## Test plan - [ ] Build container: `mise run container-build-and-release kingfisher 1d37d29` - [ ] Verify image on registry: `mise run container-list` - [ ] Update kustomization newTag - [ ] Sync ArgoCD kingfisher app from branch - [ ] Trigger manual CronJob and verify scan completes - [ ] Verify reports on sifaka Reviewed-on: #318	2026-03-30 06:34:49 -07:00
Erich Blume	99a1a49175	Revert kingfisher skip in container build workflow Kingfisher will build via Nix on ringtail instead of Dockerfile on indri, so the skip is no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 21:42:38 -07:00
Erich Blume	a842b9c1e8	Skip kingfisher in CI container builds Kingfisher's Rust + Boost/vectorscan build exhausts indri's memory (aws-sdk-ec2 alone needs 2-3GB for rustc). Build locally on Gilbert and push manually until we have a beefier build host. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 20:52:07 -07:00
Erich Blume	924325ebd5	Fix DinD seccomp profile broken by RuntimeDefault rollout The pod-level RuntimeDefault seccomp profile (`07e9c81`) overrides the DinD sidecar's privileged flag in newer Kubernetes versions, blocking Docker daemon syscalls. Set Unconfined explicitly on the DinD container while keeping RuntimeDefault on the runner container. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 17:09:57 -07:00
Erich Blume	9115044219	spork-create: check for conflicting branch names before sporking Bail if upstream already has branches named 'blumeops' or 'deploy', which would conflict with the spork branch naming strategy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 09:36:53 -07:00
Erich Blume	99df78664e	Note upstream history rewrite as a spork sync failure mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:22:11 -07:00
Erich Blume	e1429fc3e7	Document Spork Attack supply-chain risk Upstream can push workflows (in .github/ or .forgejo/) that execute on our runners via any trigger mechanism including cron. Runner label mismatch is the current defense but is fragile. No complete fix exists short of disabling Actions entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 08:16:09 -07:00
Erich Blume	4e09aed9d8	spork-create: fix ambiguous main branch checkout in mirror-sync template git checkout <branch> is ambiguous when both origin and mirror remotes have the same branch name. Use -B to explicitly create from origin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 00:28:00 -07:00
Erich Blume	21b465bf18	spork-create: enable Actions on fork after creation Forks from mirror repos have has_actions disabled by default. PATCH the repo settings to enable it so the mirror-sync workflow runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 00:26:25 -07:00
Erich Blume	007b4fecdd	spork-create: preflight all checks before any mutations Check local path, mirror existence, and fork absence upfront. Fail fast with clear error messages before touching forge or disk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 23:06:10 -07:00
Erich Blume	ee6f516b2b	spork-create: bail if local clone already exists Trying to add remotes to an existing clone gets the origin wrong. Better to error out and let the user handle it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 23:03:38 -07:00
Erich Blume	6ecfaf02b6	Add spork strategy: tooling and documentation Spork-create mise task sets up a floating-branch soft-fork of a mirrored upstream project with daily mirror-sync via Forgejo Actions. Includes explanation card, how-to guides for setup and branch management, and the spork-create uv script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 22:58:10 -07:00
Erich Blume	bb60369956	Simplify Kingfisher CronJob to HTML-only output Remove the second scan pass for JSON — one format is enough for now. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:50:54 -07:00
Erich Blume	2808ffd450	Document Kingfisher secret scanner service Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:47:37 -07:00
Erich Blume	35705faca2	Add Kingfisher secret scanner CronJob (#317 ) ## Summary - Deploys MongoDB Kingfisher as a weekly CronJob on minikube-indri - Scans all Forgejo repos (eblume + all orgs) for leaked secrets with live validation - Produces timestamped HTML and JSON reports on sifaka NFS (`/volume1/reports/kingfisher/`) - Forgejo API token sourced from 1Password via ExternalSecret - Uses official `ghcr.io/mongodb/kingfisher:1.91.0` container image - Runs Sunday 4am (after Prowler's 3am k8s scan) ## Resources - CronJob, PV/PVC (sifaka NFS), ExternalSecret - ArgoCD Application with manual sync + CreateNamespace ## Test plan - [x] Sync ArgoCD `apps` app to pick up new kingfisher Application - [x] Set `--revision feature/kingfisher-cronjob` on kingfisher app - [x] Verify ExternalSecret creates the `kingfisher-forgejo-token` Secret - [x] Trigger manual job: `kubectl create job --from=cronjob/kingfisher kingfisher-manual -n kingfisher --context=minikube-indri` - [ ] Verify reports appear on sifaka at `/volume1/reports/kingfisher/` - [ ] After merge: set `--revision main` and re-sync Reviewed-on: #317	2026-03-28 21:39:55 -07:00
Erich Blume	6b1717bf28	Add Kingfisher secret scanner to prek hooks Running alongside TruffleHog to compare coverage. Kingfisher uses staged-only mode with validation disabled for fast, offline-safe pre-commit checks. Validation will be enabled in the planned cron job. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:07:07 -07:00
Forgejo Actions	7fb6eff388	Update docs release to v1.15.1 - Built changelog from towncrier fragments [skip ci]	2026-03-28 09:15:21 -07:00
Erich Blume	2bd1611ac1	Document sifaka NFS/Tailscale TUN troubleshooting v1.15.1 Sifaka's Tailscale can revert to userspace networking after package updates, causing NFS mounts to fail because the NFS daemon sees 127.0.0.1 instead of the client's Tailscale IP. Added troubleshooting how-to doc and updated sifaka reference card with frigate export and TUN requirement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 09:12:00 -07:00
Erich Blume	8cbd412380	Update services-check: forgejo uses launchctl not brew Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 08:21:51 -07:00
Erich Blume	3017f759a7	Migrate Forgejo from Homebrew to source build (#316 ) ## Summary - Migrate Forgejo from Homebrew to source-built binary with mcquack LaunchAgent - Matches the established pattern used by zot, caddy, and alloy - Upgrades to v14.0.3 (7 security fixes: PKCE bypass, OAuth scope bypass, open redirect, and more) ## Changes - Ansible role: Replace brew install/services with binary stat check + LaunchAgent - Paths: `/opt/homebrew/var/forgejo` → `~/forgejo`, binary at `~/code/3rd/forgejo/forgejo` - Run user: `forgejo` → `erichblume` (LaunchAgent user; SSH git user stays `forgejo`) - Docs: Updated Forgejo reference card, restart-indri guide - Service review: Stamped frigate-notify, cloudnative-pg, blumeops-pg as current ## One-time migration steps (manual, on indri) 1. Clone from Codeberg, add forge mirror remote 2. Check out v14.0.3, build with `make build && make forgejo` 3. Stop brew, `cp -a` data to `~/forgejo`, fix ownership 4. Run `provision-indri --tags forgejo` 5. Verify, then `brew uninstall forgejo` ## Data safety - `cp -a` preserves everything (repos, SQLite DB, LFS, sessions, OAuth config) - Brew version stays installed as rollback until verification passes - No schema changes between 14.0.2 → 14.0.3 Reviewed-on: #316	2026-03-28 08:19:23 -07:00
Erich Blume	3cb4303a54	Add changelog for Immich resource/probe fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 22:37:15 -07:00
Erich Blume	b632cd9ffb	Fix Immich resource limits and probe timeouts Resources were under wrong Helm value keys (server.resources, machine-learning.resources) and never applied to pods. Move to correct bjw-s chart paths (*.controllers.main.containers.main.resources). Increase liveness/readiness probe timeouts from 1s to 5s to prevent kubelet from killing healthy-but-busy pods during ML inference load. Remove CPU limits (keep requests only) to avoid throttling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 22:36:32 -07:00
Erich Blume	c78b86c72c	Add offsite backup for immich photo library to BorgBase (#315 ) ## Summary - Adds a second borgmatic config (`photos.yaml`) that backs up `/Volumes/photos` (sifaka SMB mount, ~128 GB) to a dedicated BorgBase repo (`immich-photos`), running daily at 4 AM - Separate launchd agent (`mcquack.eblume.borgmatic-photos`) so photo backups run independently from the main backup - Refactors `borgmatic_metrics` script to support multiple repos with a `repo` Prometheus label - Updates Grafana "Borg Backups" dashboard with a `repo` template variable so you can filter/compare repos - Docs updated: `backups.md`, `borgmatic.md` ## Prerequisites (manual) - [x] Create `immich-photos` repo on BorgBase with same SSH key - [ ] Upgrade BorgBase plan to Small ($24/yr) if currently on free tier (128 GB exceeds 10 GB limit) - [ ] After deploy: `borg init` the new repo (borgmatic does this automatically on first run) ## Test plan - [ ] Dry run: `mise run provision-indri -- --check --diff --tags borgmatic,borgmatic_metrics` - [ ] Deploy borgmatic role and verify both configs deployed - [ ] Run `borgmatic --config ~/.config/borgmatic/photos.yaml create --verbosity 1` manually for first backup (will take hours) - [ ] Verify metrics script collects from both repos: `~/.local/bin/borgmatic-metrics && cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom` - [ ] Sync grafana-config in ArgoCD and verify dashboard repo selector works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #315	2026-03-27 19:43:05 -07:00
Erich Blume	ca0c9354ee	Add borgmatic backups for authentik and immich databases (#314 ) ## Summary - Add `authentik` database (blumeops-pg cluster) to borgmatic pg_dump backups - Add `immich` database (immich-pg cluster) to borgmatic pg_dump backups - For immich-pg: new borgmatic managed role with `pg_read_all_data`, ExternalSecret, Tailscale LoadBalancer service, and Caddy L4 TCP proxy on port 5433 - Update backup docs to reflect all four CNPG databases + mealie SQLite ## Deploy plan Deploy order matters — k8s resources must exist before ansible can route to them: 1. ArgoCD (databases app): sync to pick up immich-pg borgmatic role, ExternalSecret, and Tailscale service ``` argocd app set blumeops-pg --revision feature/borgmatic-all-pg-backups argocd app sync blumeops-pg ``` 2. Wait for `immich-pg-tailscale` service to get a Tailscale IP and `immich-pg.tail8d86e.ts.net` to resolve 3. Ansible (caddy): deploy Caddy L4 route for port 5433 ``` mise run provision-indri -- --tags caddy ``` 4. Ansible (borgmatic): deploy updated config and .pgpass ``` mise run provision-indri -- --tags borgmatic ``` 5. Verify: trigger a manual borgmatic run and check all four pg_dump streams succeed ``` borgmatic --verbosity 1 2>&1 \| grep -E '(Dumping\|ERROR)' ``` ## Test plan - [x] `kubectl kustomize` builds cleanly - [x] `ansible --check --diff` for borgmatic and caddy show expected changes - [ ] ArgoCD sync succeeds for databases app - [ ] `immich-pg.tail8d86e.ts.net` resolves - [ ] `pg.ops.eblu.me:5433` accepts connections - [ ] `borgmatic --verbosity 1` dumps all four databases without errors Reviewed-on: #314	2026-03-27 16:59:58 -07:00
Erich Blume	33463764d1	Add QArt Tuner: QR code art generator with interactive web UI Single-file Go tool implementing the QArt technique (Russ Cox, 2012) using only the public rsc.io/qr API. Generates QR codes whose data modules form a recognizable image by exploiting error correction freedom via GF(2) Gaussian elimination. Includes a web UI with live-updating sliders for version, mask, rotation, dx/dy offset, and scale. Keyboard shortcuts for rapid iteration. Also works as a CLI for batch generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 15:33:36 -07:00
Erich Blume	66a47738dd	Add ringtail post-deploy maintenance: kernel check, generation pruning, GC Update manage-lockfile doc with post-deploy steps (kernel update detection, reboot guidance, generation pruning). Add prune-ringtail-generations mise task that keeps the 5 most recent generations plus the most recent one matching the booted kernel for safe rollback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 07:55:45 -07:00
Erich Blume	a5b33591d3	Update ringtail flake inputs (nixpkgs, home-manager) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 07:37:43 -07:00
Erich Blume	831b82950a	Upgrade nvidia-device-plugin v0.18.2 → v0.19.0 and add reference card Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 07:19:24 -07:00
Erich Blume	687e972713	Review CV doc and close build-dep review gap Fix stale CV service doc (URL, forge domain, container tag) and add guidance for reviewing build-time dependencies in private forge repos during service reviews. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 07:12:38 -07:00
Erich Blume	2c1652604b	Reduce PodNotReady alert lookback from 5m to 60s The 5-minute lookback window kept stale data from terminated pods visible during rollouts, causing the alert to sit in Pending for ~5 minutes after every routine deployment. 60s still covers two scrape cycles (30s interval) while clearing stale data much faster. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 19:48:37 -07:00
Erich Blume	a37012385f	Tighten ArgoCDAppOutOfSync alert timing to clear faster after sync Reduced `for` from 30m to 5m and lookback window from 5m to 1m. The old values caused alerts to linger long after apps returned to Synced state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 15:44:09 -07:00
Erich Blume	fc8d2cdb12	Add preserve/* branch protection and document Pyroscope blocker branch-cleanup: Add PROTECTED_PREFIXES with preserve/* exclusion so preserved work-in-progress branches are never deleted. observability.md: Document Pyroscope profiling work on branch preserve/pyroscope-profiling/pr-313, blocked on ringtail kernel sysctl settings (kptr_restrict=0, perf_event_paranoid≤1). Also document Faro/RUM as future potential with privacy considerations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 15:32:25 -07:00
Erich Blume	f97b5c9d5d	Deploy Homepage v1.11.0-e375859 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 10:25:07 -07:00
Erich Blume	e375859221	Upgrade Homepage container to v1.11.0 All checks were successful Build Container / detect (push) Successful in 3s Details Build Container / build-dockerfile (homepage) (push) Successful in 5m47s Details Minor release with new widgets (Tracearr, SparklyFitness), Seerr rename, and dependency bumps. No breaking changes for our config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 10:17:36 -07:00
Erich Blume	a5e51bd600	Review tailscale-setup tutorial: fix inaccuracies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 07:44:36 -07:00
Erich Blume	4ae55f9bf4	Review kubernetes-bootstrap tutorial: fix inaccuracies - Fix k3s table entry (BlumeOps uses k3s on ringtail) - Fix broken tailscale serve command (minikube ip returns IP, not port) - Rewrite NFS section to match actual static PV/PVC binding pattern - Fix "BluemeOps" typo - Add last-reviewed frontmatter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 16:16:33 -07:00
Erich Blume	796baaa41a	Upgrade External Secrets Operator v2.2.0 + migrate Helm to kustomize (#312 ) ## Summary - Upgrade External Secrets Operator from v1.3.2 (helm-chart-2.0.0) to v2.2.0 - Migrate from Helm chart deployment to static kustomize manifests, matching the repo's kustomize-first pattern - Merge separate `-config` ArgoCD apps into the main operator apps (6 → 4 apps) - Clean up Helm-specific labels (`helm.sh/chart`, `managed-by: Helm`) - Update README example from v1beta1 to v1 API ## Breaking changes assessment Low risk — v2.0.0 removed Alibaba and Device42 providers (we use neither). No templating changes affect us. All ExternalSecrets already use v1 API. ## Deployment steps 1. Sync CRDs first on both clusters (new CRD version) 2. Sync operator apps (now kustomize-based) 3. Verify ClusterSecretStore and all ExternalSecrets are healthy 4. Delete orphaned config apps: `argocd app delete external-secrets-config` and `-config-ringtail` 5. `mise run services-check` Reviewed-on: #312	2026-03-25 15:56:41 -07:00
Erich Blume	b97e37543f	Deploy Tor Snowflake proxy on ringtail (#311 ) ## Summary - Add Snowflake proxy as a native systemd service on ringtail (NixOS) - Uses `pkgs.snowflake` from nixpkgs (v2.11.0) - Hardened systemd unit with DynamicUser, ProtectSystem=strict, 512MB memory limit - Prometheus metrics enabled on localhost:9999 ## What is Snowflake? A Tor pluggable transport that helps censored users reach the Tor network via WebRTC. This is NOT a Tor exit node — traffic exits through Tor exit nodes operated by others. The proxy operator cannot see traffic content (double-encrypted) and destination servers never see the proxy's IP. ## Changes - `nixos/ringtail/configuration.nix` — new systemd service definition - `docs/reference/services/snowflake-proxy.md` — service reference card - `docs/reference/infrastructure/ringtail.md` — updated systemd services section - `service-versions.yaml` — added entry (type: nixos) ## Deploy plan After review, deploy via `mise run provision-ringtail`. Service starts automatically. ## Test plan - [ ] `mise run provision-ringtail` succeeds - [ ] `ssh ringtail 'systemctl status snowflake-proxy'` shows active - [ ] `ssh ringtail 'journalctl -u snowflake-proxy --no-pager -n 20'` shows broker connections - [ ] `ssh ringtail 'curl -s localhost:9999/metrics'` returns Prometheus metrics Reviewed-on: #311	2026-03-24 20:51:40 -07:00

1 2 3 4 5 ...

783 commits