Restrict backup to library/ and upload/ only (skip regenerable encoded-video/,
thumbs/, backups/). Add SSH ServerAliveInterval to prevent broken pipe on long
transfers, and checkpoint_interval so interrupted backups save progress.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kingfisher exits 200 (findings) or 205 (validated findings) on success.
Normalize these to 0 so the CronJob completes instead of restarting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirror repos cause scan failures (likely ephemeral storage or timeout).
Scan only eblume/ repos until we investigate the root cause.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Container runs as user 65534 (nobody) but /tmp was owned by root.
Set sticky bit + world-writable (1777) like a standard /tmp.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kingfisher needs a writable temp directory for git clones and scanning.
Nix containers don't create /tmp by default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Nix containers don't include a shell by default. The CronJob needs
/bin/bash for the inline script that generates timestamped filenames.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kingfisher will build via Nix on ringtail instead of Dockerfile on
indri, so the skip is no longer needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kingfisher's Rust + Boost/vectorscan build exhausts indri's memory
(aws-sdk-ec2 alone needs 2-3GB for rustc). Build locally on Gilbert
and push manually until we have a beefier build host.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pod-level RuntimeDefault seccomp profile (07e9c81) overrides the
DinD sidecar's privileged flag in newer Kubernetes versions, blocking
Docker daemon syscalls. Set Unconfined explicitly on the DinD container
while keeping RuntimeDefault on the runner container.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bail if upstream already has branches named 'blumeops' or 'deploy',
which would conflict with the spork branch naming strategy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upstream can push workflows (in .github/ or .forgejo/) that execute
on our runners via any trigger mechanism including cron. Runner label
mismatch is the current defense but is fragile. No complete fix exists
short of disabling Actions entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
git checkout <branch> is ambiguous when both origin and mirror remotes
have the same branch name. Use -B to explicitly create from origin.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Forks from mirror repos have has_actions disabled by default.
PATCH the repo settings to enable it so the mirror-sync workflow runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Check local path, mirror existence, and fork absence upfront.
Fail fast with clear error messages before touching forge or disk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trying to add remotes to an existing clone gets the origin wrong.
Better to error out and let the user handle it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spork-create mise task sets up a floating-branch soft-fork of a
mirrored upstream project with daily mirror-sync via Forgejo Actions.
Includes explanation card, how-to guides for setup and branch
management, and the spork-create uv script.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Deploys MongoDB Kingfisher as a weekly CronJob on minikube-indri
- Scans all Forgejo repos (eblume + all orgs) for leaked secrets with live validation
- Produces timestamped HTML and JSON reports on sifaka NFS (`/volume1/reports/kingfisher/`)
- Forgejo API token sourced from 1Password via ExternalSecret
- Uses official `ghcr.io/mongodb/kingfisher:1.91.0` container image
- Runs Sunday 4am (after Prowler's 3am k8s scan)
## Resources
- CronJob, PV/PVC (sifaka NFS), ExternalSecret
- ArgoCD Application with manual sync + CreateNamespace
## Test plan
- [x] Sync ArgoCD `apps` app to pick up new kingfisher Application
- [x] Set `--revision feature/kingfisher-cronjob` on kingfisher app
- [x] Verify ExternalSecret creates the `kingfisher-forgejo-token` Secret
- [x] Trigger manual job: `kubectl create job --from=cronjob/kingfisher kingfisher-manual -n kingfisher --context=minikube-indri`
- [ ] Verify reports appear on sifaka at `/volume1/reports/kingfisher/`
- [ ] After merge: set `--revision main` and re-sync
Reviewed-on: #317
Running alongside TruffleHog to compare coverage. Kingfisher uses
staged-only mode with validation disabled for fast, offline-safe
pre-commit checks. Validation will be enabled in the planned cron job.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sifaka's Tailscale can revert to userspace networking after package
updates, causing NFS mounts to fail because the NFS daemon sees
127.0.0.1 instead of the client's Tailscale IP. Added troubleshooting
how-to doc and updated sifaka reference card with frigate export and
TUN requirement.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resources were under wrong Helm value keys (server.resources,
machine-learning.resources) and never applied to pods. Move to correct
bjw-s chart paths (*.controllers.main.containers.main.resources).
Increase liveness/readiness probe timeouts from 1s to 5s to prevent
kubelet from killing healthy-but-busy pods during ML inference load.
Remove CPU limits (keep requests only) to avoid throttling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Adds a second borgmatic config (`photos.yaml`) that backs up `/Volumes/photos` (sifaka SMB mount, ~128 GB) to a dedicated BorgBase repo (`immich-photos`), running daily at 4 AM
- Separate launchd agent (`mcquack.eblume.borgmatic-photos`) so photo backups run independently from the main backup
- Refactors `borgmatic_metrics` script to support multiple repos with a `repo` Prometheus label
- Updates Grafana "Borg Backups" dashboard with a `repo` template variable so you can filter/compare repos
- Docs updated: `backups.md`, `borgmatic.md`
## Prerequisites (manual)
- [x] Create `immich-photos` repo on BorgBase with same SSH key
- [ ] Upgrade BorgBase plan to Small ($24/yr) if currently on free tier (128 GB exceeds 10 GB limit)
- [ ] After deploy: `borg init` the new repo (borgmatic does this automatically on first run)
## Test plan
- [ ] Dry run: `mise run provision-indri -- --check --diff --tags borgmatic,borgmatic_metrics`
- [ ] Deploy borgmatic role and verify both configs deployed
- [ ] Run `borgmatic --config ~/.config/borgmatic/photos.yaml create --verbosity 1` manually for first backup (will take hours)
- [ ] Verify metrics script collects from both repos: `~/.local/bin/borgmatic-metrics && cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom`
- [ ] Sync grafana-config in ArgoCD and verify dashboard repo selector works
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #315
## Summary
- Add `authentik` database (blumeops-pg cluster) to borgmatic pg_dump backups
- Add `immich` database (immich-pg cluster) to borgmatic pg_dump backups
- For immich-pg: new borgmatic managed role with `pg_read_all_data`, ExternalSecret, Tailscale LoadBalancer service, and Caddy L4 TCP proxy on port 5433
- Update backup docs to reflect all four CNPG databases + mealie SQLite
## Deploy plan
Deploy order matters — k8s resources must exist before ansible can route to them:
1. **ArgoCD (databases app):** sync to pick up immich-pg borgmatic role, ExternalSecret, and Tailscale service
```
argocd app set blumeops-pg --revision feature/borgmatic-all-pg-backups
argocd app sync blumeops-pg
```
2. **Wait** for `immich-pg-tailscale` service to get a Tailscale IP and `immich-pg.tail8d86e.ts.net` to resolve
3. **Ansible (caddy):** deploy Caddy L4 route for port 5433
```
mise run provision-indri -- --tags caddy
```
4. **Ansible (borgmatic):** deploy updated config and .pgpass
```
mise run provision-indri -- --tags borgmatic
```
5. **Verify:** trigger a manual borgmatic run and check all four pg_dump streams succeed
```
borgmatic --verbosity 1 2>&1 | grep -E '(Dumping|ERROR)'
```
## Test plan
- [x] `kubectl kustomize` builds cleanly
- [x] `ansible --check --diff` for borgmatic and caddy show expected changes
- [ ] ArgoCD sync succeeds for databases app
- [ ] `immich-pg.tail8d86e.ts.net` resolves
- [ ] `pg.ops.eblu.me:5433` accepts connections
- [ ] `borgmatic --verbosity 1` dumps all four databases without errors
Reviewed-on: #314
Single-file Go tool implementing the QArt technique (Russ Cox, 2012)
using only the public rsc.io/qr API. Generates QR codes whose data
modules form a recognizable image by exploiting error correction
freedom via GF(2) Gaussian elimination.
Includes a web UI with live-updating sliders for version, mask,
rotation, dx/dy offset, and scale. Keyboard shortcuts for rapid
iteration. Also works as a CLI for batch generation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update manage-lockfile doc with post-deploy steps (kernel update detection,
reboot guidance, generation pruning). Add prune-ringtail-generations mise
task that keeps the 5 most recent generations plus the most recent one
matching the booted kernel for safe rollback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix stale CV service doc (URL, forge domain, container tag) and add
guidance for reviewing build-time dependencies in private forge repos
during service reviews.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 5-minute lookback window kept stale data from terminated pods
visible during rollouts, causing the alert to sit in Pending for
~5 minutes after every routine deployment. 60s still covers two
scrape cycles (30s interval) while clearing stale data much faster.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduced `for` from 30m to 5m and lookback window from 5m to 1m. The old
values caused alerts to linger long after apps returned to Synced state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
branch-cleanup: Add PROTECTED_PREFIXES with preserve/* exclusion so
preserved work-in-progress branches are never deleted.
observability.md: Document Pyroscope profiling work on branch
preserve/pyroscope-profiling/pr-313, blocked on ringtail kernel
sysctl settings (kptr_restrict=0, perf_event_paranoid≤1). Also
document Faro/RUM as future potential with privacy considerations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minor release with new widgets (Tracearr, SparklyFitness), Seerr rename,
and dependency bumps. No breaking changes for our config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Upgrade External Secrets Operator from v1.3.2 (helm-chart-2.0.0) to v2.2.0
- Migrate from Helm chart deployment to static kustomize manifests, matching the repo's kustomize-first pattern
- Merge separate `-config` ArgoCD apps into the main operator apps (6 → 4 apps)
- Clean up Helm-specific labels (`helm.sh/chart`, `managed-by: Helm`)
- Update README example from v1beta1 to v1 API
## Breaking changes assessment
Low risk — v2.0.0 removed Alibaba and Device42 providers (we use neither). No templating changes affect us. All ExternalSecrets already use v1 API.
## Deployment steps
1. Sync CRDs first on both clusters (new CRD version)
2. Sync operator apps (now kustomize-based)
3. Verify ClusterSecretStore and all ExternalSecrets are healthy
4. Delete orphaned config apps: `argocd app delete external-secrets-config` and `-config-ringtail`
5. `mise run services-check`
Reviewed-on: #312
## Summary
- Add Snowflake proxy as a native systemd service on ringtail (NixOS)
- Uses `pkgs.snowflake` from nixpkgs (v2.11.0)
- Hardened systemd unit with DynamicUser, ProtectSystem=strict, 512MB memory limit
- Prometheus metrics enabled on localhost:9999
## What is Snowflake?
A Tor pluggable transport that helps censored users reach the Tor network via WebRTC. **This is NOT a Tor exit node** — traffic exits through Tor exit nodes operated by others. The proxy operator cannot see traffic content (double-encrypted) and destination servers never see the proxy's IP.
## Changes
- `nixos/ringtail/configuration.nix` — new systemd service definition
- `docs/reference/services/snowflake-proxy.md` — service reference card
- `docs/reference/infrastructure/ringtail.md` — updated systemd services section
- `service-versions.yaml` — added entry (type: nixos)
## Deploy plan
After review, deploy via `mise run provision-ringtail`. Service starts automatically.
## Test plan
- [ ] `mise run provision-ringtail` succeeds
- [ ] `ssh ringtail 'systemctl status snowflake-proxy'` shows active
- [ ] `ssh ringtail 'journalctl -u snowflake-proxy --no-pager -n 20'` shows broker connections
- [ ] `ssh ringtail 'curl -s localhost:9999/metrics'` returns Prometheus metrics
Reviewed-on: #311