Commit graph

445 commits

Author SHA1 Message Date
f9397b7fa0 Review core-services tutorial: add SSH rationale, runner example, TODO notice
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 21:22:00 -07:00
3c819cf16e Fix Homepage migration date: 2025-12 → 2026-02 (per git history)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:53:48 -07:00
6e06efa6d0 Add AI-drafted disclaimer to no-helm-policy explanation doc
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:50:53 -07:00
64200a55c5 Migrate Immich from Helm chart to kustomize manifests (v2.5.6 → v2.6.3)
Replace the Helm chart deployment with plain kustomize manifests following
the Authentik pattern (separate deployments per component). Consolidate
the immich-storage ArgoCD app into the main immich app. Add no-helm-policy
doc establishing kustomize as the standard deployment mechanism.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:42:25 -07:00
464e3222d2 Document upstream fix for Prowler --registry bug (pending release)
PR #10470 merged 2026-03-30; initContainer workaround stays until a
Prowler release includes the fix (latest is 5.22.0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 20:21:19 -07:00
75f9ba4943 Build Tempo container from source (2.10.3) (#323)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dockerfile (tempo) (push) Successful in 6s
## Summary
- Add `containers/tempo/Dockerfile` — two-stage Go build from forge mirror, modeled on loki
- Switch kustomization from upstream `grafana/tempo` to `registry.ops.eblu.me/blumeops/tempo`
- Bump Tempo 2.10.1 → 2.10.3

## Test plan
- [ ] Kick off container build via `mise run container-build-and-release tempo`
- [ ] Update kustomization `newTag` with built image tag
- [ ] Deploy from branch: `argocd app set tempo --revision local-tempo-container && argocd app sync tempo`
- [ ] Verify Tempo health: `curl tempo.ops.eblu.me/ready`
- [ ] Verify traces flowing in Grafana Tempo datasource

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #323
2026-04-02 13:45:02 -07:00
b1e2811077 Upgrade Grafana 12.3.3 → 12.4.2 (#322)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-dockerfile (grafana) (push) Successful in 7s
## Summary

- Bumps Grafana from 12.3.3 to 12.4.2
- Patches 7 CVEs, notably CVE-2026-27880 (unauthenticated OOM DoS, CVSS 7.5) and CVE-2026-27879 (authenticated OOM via resample queries)
- No config changes required — reviewed alerting, datasources, OIDC, and feature toggles against 12.4.x breaking changes

## Breaking changes reviewed

| Change | Impact |
|--------|--------|
| Alerting: pending period applies to NoData/Error | Net positive — reduces noise from transient blips |
| Default notification uses empty receiver | No impact — we explicitly set `ntfy-infra` |
| Removed feature toggles (4) | No impact — none configured |
| OAuth ID token signature validation | Low risk — verify OIDC login post-deploy |
| OpsGenie deprecated | No impact — using webhook |

## Test plan

- [ ] Container build completes at forge
- [ ] Update kustomization.yaml with new image tag
- [ ] `argocd app set grafana --revision upgrade/grafana-12.4.2 && argocd app sync grafana`
- [ ] Verify Grafana UI loads at grafana.ops.eblu.me
- [ ] Verify OIDC login via Authentik
- [ ] Verify dashboards and datasources load
- [ ] Check alerting rules are intact

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #322
2026-04-02 11:33:19 -07:00
08d57ef4d4 Review pulumi reference doc: fix Tailscale auth, add how-to links
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:55:57 -07:00
1e67975acb Add changelog fragment for compensating control review
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:04:38 -07:00
baee7ae54b Review single-user-cluster control and add evidence collection card
Stamp single-user-cluster last-reviewed to 2026-04-01 after verifying
Tailscale ACLs and kubeconfig distribution. Add aspirational how-to card
documenting what PCI DSS evidence collection would look like (CCW,
artifacts, Drata workflow). Link from existing review process card.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:01:57 -07:00
a18a424866 Pin NixOS service versions via nixpkgs-services overlay (#321)
## Summary

- Add `nixpkgs-services` flake input pinned to a specific nixpkgs commit, with an overlay that pulls `forgejo-runner`, `snowflake`, and `k3s` from it instead of the rolling `nixpkgs`
- Dagger `flake-update` pipeline now excludes `nixpkgs-services` via `--exclude`
- Fix stale nix-container-builder version in service-versions.yaml (was 12.6.4, actually running 12.7.2)
- Add k3s and minikube to service-versions.yaml tracking
- Document the pinning approach in review-services how-to and ringtail reference

## Motivation

During service review, discovered that flake updates had silently upgraded forgejo-runner from 12.6.4 → 12.7.2 without updating service-versions.yaml. This "sneak-in upgrade" bypasses the service review process. The overlay ensures these three services only change versions deliberately.

## Test plan

- [ ] Verify `nix flake update` from `nixos/ringtail/` does not change `nixpkgs-services` lock entry
- [ ] Verify `mise run provision-ringtail` builds successfully with the overlay
- [ ] Confirm running service versions unchanged after deploy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #321
2026-04-01 21:37:57 -07:00
cfbf4cadbd Review argocd-cli reference doc: stamp last-reviewed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:35:52 -07:00
Forgejo Actions
2b7b21dc9b Update docs release to v1.15.2
- Built changelog from towncrier fragments

[skip ci]
2026-03-30 17:48:40 -07:00
4059b3d27b Add compensating controls framework and date-based report dirs (#320)
## Summary

- Add `compensating-controls.yaml` tracking 9 named controls that justify suppressed security findings
- Update all Prowler mutelist descriptions with `CC: <id>` references to named controls
- Add `mise run review-compensating-controls` task — surfaces stalest control with all codebase references
- Add [[review-compensating-controls]] how-to doc
- Organize Prowler and Kingfisher reports into `YYYY-MM-DD` subdirectories

### Compensating controls

| ID | Mitigates |
|----|-----------|
| `single-user-cluster` | Image cache abuse, RBAC breadth, system pod privileges |
| `tailscale-network-isolation` | Profiling endpoints, weak TLS, debug ports |
| `local-registry` | AlwaysPullImages gap |
| `sso-gated-admin-tools` | ArgoCD wildcard RBAC |
| `operator-managed-pods` | Tailscale proxy pod security settings |
| `ephemeral-privileged-jobs` | Prowler hostPID exposure |
| `trusted-ci-only` | Forgejo runner DinD |
| `init-container-isolation` | Grafana root init container |
| `observability-stack-audit` | Missing apiserver audit logging |

## Test plan

- [ ] `mise run review-compensating-controls` shows table and references
- [ ] `kubectl kustomize argocd/manifests/prowler/` renders correctly
- [ ] Sync prowler and kingfisher, verify next scan writes to dated subdirectory
- [ ] Grep for `CC:` in mutelist files — every muted finding should have at least one

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #320
2026-03-30 17:44:11 -07:00
a76e471d54 Add Prowler mutelist and fix kube-state-metrics seccomp (#319)
## Summary

- Add mutelist files to suppress expected/accepted Prowler CIS findings from components we don't control
- Mutelist files stored in `mutelist/` directory, grouped by category, merged at runtime via initContainer
- Fix missing seccomp `RuntimeDefault` profile on kube-state-metrics deployment

### Mutelist categories

| File | Checks | Covers |
|------|--------|--------|
| `apiserver.yaml` | 12 | Minikube apiserver flags |
| `control-plane.yaml` | 3 | Scheduler, controller-manager, kubelet |
| `core-pod-security.yaml` | 7 | System pods, Tailscale operator, Grafana init, Prowler hostPID, forgejo-runner |
| `rbac.yaml` | 3 | Built-in K8s roles, ArgoCD, CNPG |

Muted findings appear as `status=MUTED` in reports (not hidden), preserving audit trail.

### Not muted (follow-up)

- Alloy, Immich pods missing seccomp — need separate investigation (Helm/operator-managed)

## Test plan

- [ ] `kubectl kustomize argocd/manifests/prowler/` renders cleanly
- [ ] Trigger manual scan: `kubectl --context=minikube-indri -n prowler create job prowler-mutelist-test --from=cronjob/prowler`
- [ ] Verify initContainer merges successfully (check pod logs)
- [ ] Verify muted findings show as `MUTED` in report
- [ ] Sync kube-state-metrics and verify pod starts with seccomp profile

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #319
2026-03-30 17:22:31 -07:00
1e391f96bb Upgrade forgejo-runner 12.7.0 → 12.7.3, add service card
Patch upgrade picks up idempotent FetchTask API, offline registration
fix, cloudflare/circl security dep update, and custom gRPC user-agent.
No config defaults changed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 16:31:06 -07:00
77eebe507e Review Ansible reference doc: add missing roles, clarify IaC positioning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 16:10:24 -07:00
c069f889d2 Harden borgmatic photos backup: restrict dirs, add keepalives + checkpoints
Restrict backup to library/ and upload/ only (skip regenerable encoded-video/,
thumbs/, backups/). Add SSH ServerAliveInterval to prevent broken pipe on long
transfers, and checkpoint_interval so interrupted backups save progress.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 10:30:28 -07:00
f9206bf10b Build custom Kingfisher container from sporked deploy branch (#318)
All checks were successful
Build Container / detect (push) Successful in 2s
Build Container / build-nix (kingfisher) (push) Successful in 12s
## Summary

- Add Dockerfile for Kingfisher built from source (sporked deploy branch)
- Multi-stage: Rust build with Boost/vectorscan, debian-slim runtime
- Switch CronJob from upstream `ghcr.io/mongodb/kingfisher` to `registry.ops.eblu.me/blumeops/kingfisher`
- Add kingfisher to service-versions.yaml (version tracks upstream main SHA)
- Document spork workflow in CLAUDE.md

## Test plan

- [ ] Build container: `mise run container-build-and-release kingfisher 1d37d29`
- [ ] Verify image on registry: `mise run container-list`
- [ ] Update kustomization newTag
- [ ] Sync ArgoCD kingfisher app from branch
- [ ] Trigger manual CronJob and verify scan completes
- [ ] Verify reports on sifaka

Reviewed-on: #318
2026-03-30 06:34:49 -07:00
99df78664e Note upstream history rewrite as a spork sync failure mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:22:11 -07:00
e1429fc3e7 Document Spork Attack supply-chain risk
Upstream can push workflows (in .github/ or .forgejo/) that execute
on our runners via any trigger mechanism including cron. Runner label
mismatch is the current defense but is fragile. No complete fix exists
short of disabling Actions entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:16:09 -07:00
6ecfaf02b6 Add spork strategy: tooling and documentation
Spork-create mise task sets up a floating-branch soft-fork of a
mirrored upstream project with daily mirror-sync via Forgejo Actions.
Includes explanation card, how-to guides for setup and branch
management, and the spork-create uv script.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:58:10 -07:00
bb60369956 Simplify Kingfisher CronJob to HTML-only output
Remove the second scan pass for JSON — one format is enough for now.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:50:54 -07:00
2808ffd450 Document Kingfisher secret scanner service
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:47:37 -07:00
35705faca2 Add Kingfisher secret scanner CronJob (#317)
## Summary

- Deploys MongoDB Kingfisher as a weekly CronJob on minikube-indri
- Scans all Forgejo repos (eblume + all orgs) for leaked secrets with live validation
- Produces timestamped HTML and JSON reports on sifaka NFS (`/volume1/reports/kingfisher/`)
- Forgejo API token sourced from 1Password via ExternalSecret
- Uses official `ghcr.io/mongodb/kingfisher:1.91.0` container image
- Runs Sunday 4am (after Prowler's 3am k8s scan)

## Resources

- CronJob, PV/PVC (sifaka NFS), ExternalSecret
- ArgoCD Application with manual sync + CreateNamespace

## Test plan

- [x] Sync ArgoCD `apps` app to pick up new kingfisher Application
- [x] Set `--revision feature/kingfisher-cronjob` on kingfisher app
- [x] Verify ExternalSecret creates the `kingfisher-forgejo-token` Secret
- [x] Trigger manual job: `kubectl create job --from=cronjob/kingfisher kingfisher-manual -n kingfisher --context=minikube-indri`
- [ ] Verify reports appear on sifaka at `/volume1/reports/kingfisher/`
- [ ] After merge: set `--revision main` and re-sync

Reviewed-on: #317
2026-03-28 21:39:55 -07:00
6b1717bf28 Add Kingfisher secret scanner to prek hooks
Running alongside TruffleHog to compare coverage. Kingfisher uses
staged-only mode with validation disabled for fast, offline-safe
pre-commit checks. Validation will be enabled in the planned cron job.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:07:07 -07:00
Forgejo Actions
7fb6eff388 Update docs release to v1.15.1
- Built changelog from towncrier fragments

[skip ci]
2026-03-28 09:15:21 -07:00
2bd1611ac1 Document sifaka NFS/Tailscale TUN troubleshooting
Sifaka's Tailscale can revert to userspace networking after package
updates, causing NFS mounts to fail because the NFS daemon sees
127.0.0.1 instead of the client's Tailscale IP. Added troubleshooting
how-to doc and updated sifaka reference card with frigate export and
TUN requirement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:12:00 -07:00
3017f759a7 Migrate Forgejo from Homebrew to source build (#316)
## Summary

- Migrate Forgejo from Homebrew to source-built binary with mcquack LaunchAgent
- Matches the established pattern used by zot, caddy, and alloy
- Upgrades to v14.0.3 (7 security fixes: PKCE bypass, OAuth scope bypass, open redirect, and more)

## Changes

- **Ansible role**: Replace brew install/services with binary stat check + LaunchAgent
- **Paths**: `/opt/homebrew/var/forgejo` → `~/forgejo`, binary at `~/code/3rd/forgejo/forgejo`
- **Run user**: `forgejo` → `erichblume` (LaunchAgent user; SSH git user stays `forgejo`)
- **Docs**: Updated Forgejo reference card, restart-indri guide
- **Service review**: Stamped frigate-notify, cloudnative-pg, blumeops-pg as current

## One-time migration steps (manual, on indri)

1. Clone from Codeberg, add forge mirror remote
2. Check out v14.0.3, build with `make build && make forgejo`
3. Stop brew, `cp -a` data to `~/forgejo`, fix ownership
4. Run `provision-indri --tags forgejo`
5. Verify, then `brew uninstall forgejo`

## Data safety

- `cp -a` preserves everything (repos, SQLite DB, LFS, sessions, OAuth config)
- Brew version stays installed as rollback until verification passes
- No schema changes between 14.0.2 → 14.0.3

Reviewed-on: #316
2026-03-28 08:19:23 -07:00
3cb4303a54 Add changelog for Immich resource/probe fix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 22:37:15 -07:00
c78b86c72c Add offsite backup for immich photo library to BorgBase (#315)
## Summary

- Adds a second borgmatic config (`photos.yaml`) that backs up `/Volumes/photos` (sifaka SMB mount, ~128 GB) to a dedicated BorgBase repo (`immich-photos`), running daily at 4 AM
- Separate launchd agent (`mcquack.eblume.borgmatic-photos`) so photo backups run independently from the main backup
- Refactors `borgmatic_metrics` script to support multiple repos with a `repo` Prometheus label
- Updates Grafana "Borg Backups" dashboard with a `repo` template variable so you can filter/compare repos
- Docs updated: `backups.md`, `borgmatic.md`

## Prerequisites (manual)

- [x] Create `immich-photos` repo on BorgBase with same SSH key
- [ ] Upgrade BorgBase plan to Small ($24/yr) if currently on free tier (128 GB exceeds 10 GB limit)
- [ ] After deploy: `borg init` the new repo (borgmatic does this automatically on first run)

## Test plan

- [ ] Dry run: `mise run provision-indri -- --check --diff --tags borgmatic,borgmatic_metrics`
- [ ] Deploy borgmatic role and verify both configs deployed
- [ ] Run `borgmatic --config ~/.config/borgmatic/photos.yaml create --verbosity 1` manually for first backup (will take hours)
- [ ] Verify metrics script collects from both repos: `~/.local/bin/borgmatic-metrics && cat /opt/homebrew/var/node_exporter/textfile/borgmatic.prom`
- [ ] Sync grafana-config in ArgoCD and verify dashboard repo selector works

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #315
2026-03-27 19:43:05 -07:00
ca0c9354ee Add borgmatic backups for authentik and immich databases (#314)
## Summary

- Add `authentik` database (blumeops-pg cluster) to borgmatic pg_dump backups
- Add `immich` database (immich-pg cluster) to borgmatic pg_dump backups
- For immich-pg: new borgmatic managed role with `pg_read_all_data`, ExternalSecret, Tailscale LoadBalancer service, and Caddy L4 TCP proxy on port 5433
- Update backup docs to reflect all four CNPG databases + mealie SQLite

## Deploy plan

Deploy order matters — k8s resources must exist before ansible can route to them:

1. **ArgoCD (databases app):** sync to pick up immich-pg borgmatic role, ExternalSecret, and Tailscale service
   ```
   argocd app set blumeops-pg --revision feature/borgmatic-all-pg-backups
   argocd app sync blumeops-pg
   ```
2. **Wait** for `immich-pg-tailscale` service to get a Tailscale IP and `immich-pg.tail8d86e.ts.net` to resolve
3. **Ansible (caddy):** deploy Caddy L4 route for port 5433
   ```
   mise run provision-indri -- --tags caddy
   ```
4. **Ansible (borgmatic):** deploy updated config and .pgpass
   ```
   mise run provision-indri -- --tags borgmatic
   ```
5. **Verify:** trigger a manual borgmatic run and check all four pg_dump streams succeed
   ```
   borgmatic --verbosity 1 2>&1 | grep -E '(Dumping|ERROR)'
   ```

## Test plan

- [x] `kubectl kustomize` builds cleanly
- [x] `ansible --check --diff` for borgmatic and caddy show expected changes
- [ ] ArgoCD sync succeeds for databases app
- [ ] `immich-pg.tail8d86e.ts.net` resolves
- [ ] `pg.ops.eblu.me:5433` accepts connections
- [ ] `borgmatic --verbosity 1` dumps all four databases without errors

Reviewed-on: #314
2026-03-27 16:59:58 -07:00
33463764d1 Add QArt Tuner: QR code art generator with interactive web UI
Single-file Go tool implementing the QArt technique (Russ Cox, 2012)
using only the public rsc.io/qr API. Generates QR codes whose data
modules form a recognizable image by exploiting error correction
freedom via GF(2) Gaussian elimination.

Includes a web UI with live-updating sliders for version, mask,
rotation, dx/dy offset, and scale. Keyboard shortcuts for rapid
iteration. Also works as a CLI for batch generation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:33:36 -07:00
66a47738dd Add ringtail post-deploy maintenance: kernel check, generation pruning, GC
Update manage-lockfile doc with post-deploy steps (kernel update detection,
reboot guidance, generation pruning). Add prune-ringtail-generations mise
task that keeps the 5 most recent generations plus the most recent one
matching the booted kernel for safe rollback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:55:45 -07:00
a5b33591d3 Update ringtail flake inputs (nixpkgs, home-manager)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:37:43 -07:00
831b82950a Upgrade nvidia-device-plugin v0.18.2 → v0.19.0 and add reference card
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:19:24 -07:00
687e972713 Review CV doc and close build-dep review gap
Fix stale CV service doc (URL, forge domain, container tag) and add
guidance for reviewing build-time dependencies in private forge repos
during service reviews.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:12:38 -07:00
2c1652604b Reduce PodNotReady alert lookback from 5m to 60s
The 5-minute lookback window kept stale data from terminated pods
visible during rollouts, causing the alert to sit in Pending for
~5 minutes after every routine deployment. 60s still covers two
scrape cycles (30s interval) while clearing stale data much faster.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:48:37 -07:00
a37012385f Tighten ArgoCDAppOutOfSync alert timing to clear faster after sync
Reduced `for` from 30m to 5m and lookback window from 5m to 1m. The old
values caused alerts to linger long after apps returned to Synced state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:44:09 -07:00
fc8d2cdb12 Add preserve/* branch protection and document Pyroscope blocker
branch-cleanup: Add PROTECTED_PREFIXES with preserve/* exclusion so
preserved work-in-progress branches are never deleted.

observability.md: Document Pyroscope profiling work on branch
preserve/pyroscope-profiling/pr-313, blocked on ringtail kernel
sysctl settings (kptr_restrict=0, perf_event_paranoid≤1). Also
document Faro/RUM as future potential with privacy considerations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:32:25 -07:00
e375859221 Upgrade Homepage container to v1.11.0
All checks were successful
Build Container / detect (push) Successful in 3s
Build Container / build-dockerfile (homepage) (push) Successful in 5m47s
Minor release with new widgets (Tracearr, SparklyFitness), Seerr rename,
and dependency bumps. No breaking changes for our config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:17:36 -07:00
a5e51bd600 Review tailscale-setup tutorial: fix inaccuracies
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 07:44:36 -07:00
4ae55f9bf4 Review kubernetes-bootstrap tutorial: fix inaccuracies
- Fix k3s table entry (BlumeOps uses k3s on ringtail)
- Fix broken tailscale serve command (minikube ip returns IP, not port)
- Rewrite NFS section to match actual static PV/PVC binding pattern
- Fix "BluemeOps" typo
- Add last-reviewed frontmatter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 16:16:33 -07:00
796baaa41a Upgrade External Secrets Operator v2.2.0 + migrate Helm to kustomize (#312)
## Summary

- Upgrade External Secrets Operator from v1.3.2 (helm-chart-2.0.0) to v2.2.0
- Migrate from Helm chart deployment to static kustomize manifests, matching the repo's kustomize-first pattern
- Merge separate `-config` ArgoCD apps into the main operator apps (6 → 4 apps)
- Clean up Helm-specific labels (`helm.sh/chart`, `managed-by: Helm`)
- Update README example from v1beta1 to v1 API

## Breaking changes assessment

Low risk — v2.0.0 removed Alibaba and Device42 providers (we use neither). No templating changes affect us. All ExternalSecrets already use v1 API.

## Deployment steps

1. Sync CRDs first on both clusters (new CRD version)
2. Sync operator apps (now kustomize-based)
3. Verify ClusterSecretStore and all ExternalSecrets are healthy
4. Delete orphaned config apps: `argocd app delete external-secrets-config` and `-config-ringtail`
5. `mise run services-check`

Reviewed-on: #312
2026-03-25 15:56:41 -07:00
b97e37543f Deploy Tor Snowflake proxy on ringtail (#311)
## Summary

- Add Snowflake proxy as a native systemd service on ringtail (NixOS)
- Uses `pkgs.snowflake` from nixpkgs (v2.11.0)
- Hardened systemd unit with DynamicUser, ProtectSystem=strict, 512MB memory limit
- Prometheus metrics enabled on localhost:9999

## What is Snowflake?

A Tor pluggable transport that helps censored users reach the Tor network via WebRTC. **This is NOT a Tor exit node** — traffic exits through Tor exit nodes operated by others. The proxy operator cannot see traffic content (double-encrypted) and destination servers never see the proxy's IP.

## Changes

- `nixos/ringtail/configuration.nix` — new systemd service definition
- `docs/reference/services/snowflake-proxy.md` — service reference card
- `docs/reference/infrastructure/ringtail.md` — updated systemd services section
- `service-versions.yaml` — added entry (type: nixos)

## Deploy plan

After review, deploy via `mise run provision-ringtail`. Service starts automatically.

## Test plan

- [ ] `mise run provision-ringtail` succeeds
- [ ] `ssh ringtail 'systemctl status snowflake-proxy'` shows active
- [ ] `ssh ringtail 'journalctl -u snowflake-proxy --no-pager -n 20'` shows broker connections
- [ ] `ssh ringtail 'curl -s localhost:9999/metrics'` returns Prometheus metrics

Reviewed-on: #311
2026-03-24 20:51:40 -07:00
Forgejo Actions
243a862901 Update docs release to v1.15.0
- Built changelog from towncrier fragments

[skip ci]
2026-03-24 19:51:17 -07:00
fe201a495c Add Prowler IaC scanning of blumeops repo (Saturday 2am)
Clone repo in init container, scan Dockerfiles and K8s manifests
with Prowler's IaC provider (Trivy). Reports written to
sifaka:/volume1/reports/prowler-iac/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:49:38 -07:00
696024306c Add Prowler image vulnerability scanning for blumeops containers
All checks were successful
Build Container / detect (push) Successful in 39s
Build Container / build-dockerfile (prowler) (push) Successful in 10m15s
Add Trivy to the Prowler container for image and IaC scanning.
New CronJob (Saturday 3am) scans all blumeops/* images in the
registry for CVEs, embedded secrets, and Dockerfile misconfigs.
Reports written to sifaka:/volume1/reports/prowler-images/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:43:08 -07:00
07e9c810ca Add RuntimeDefault seccomp profiles to all managed workloads
Addresses 32 CIS Kubernetes Benchmark failures from Prowler scan
(core_seccomp_profile_docker_default). Applied pod-level seccomp
RuntimeDefault to 18 deployments/statefulsets and 2 cronjobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 16:19:40 -07:00
d021b3534f Deploy Prowler CIS scanner (#310)
All checks were successful
Build Container / detect (push) Successful in 4s
Build Container / build-dockerfile (prowler) (push) Successful in 10s
## Summary
- Deploy Prowler 5 as a weekly CronJob on minikube-indri for CIS Kubernetes Benchmark v1.11 scanning
- Custom slim container build (strips PowerShell, Trivy, and non-K8s providers from upstream)
- Reports (HTML, CSV, JSON-OCSF) written to NFS share on sifaka at `/volume1/reports/prowler/`
- Read-only ClusterRole for pod, RBAC, and control plane inspection
- Host path mounts + hostPID for kubelet file permission checks

## Follow-ups
- Mirror prowler-cloud/prowler on forge for supply chain control
- Build and push container image, update kustomization.yaml newTag
- Consider adding k3s-ringtail scanning (core + RBAC checks only)

## Test plan
- [ ] Build container: `mise run container-release prowler v5.22.0`
- [ ] Update `argocd/manifests/prowler/kustomization.yaml` newTag to built image tag
- [ ] Sync ArgoCD: `argocd app sync apps && argocd app set prowler --revision deploy-prowler && argocd app sync prowler`
- [ ] Trigger manual job: `kubectl create job --from=cronjob/prowler prowler-manual -n prowler --context=minikube-indri`
- [ ] Verify reports appear on sifaka NFS share
- [ ] `mise run services-check`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #310
2026-03-24 16:08:09 -07:00